Datensicherheit — KI ohne Datenlecks

Self-hosted AI vs Cloud: Where Does Your Data Actually Stay

Aufrufe: 395 Veröffentlicht: 14.04.2026
🇺🇦 UK 🇺🇸 EN 🇩🇪 DE 🇪🇸 ES
Self-hosted AI vs Cloud: Where Does Your Data Actually Stay

You are choosing an AI service for working with corporate documents and see two camps: convenient cloud solutions like ChatGPT or Notion AI — and self-hosted options where everything is deployed on your own server. The difference in convenience is obvious. But where your documents physically end up in the process — that is a question most businesses never ask until their first GDPR audit. Short answer: cloud services store your data on servers in the US. Self-hosted — only on your server. For businesses in the EU, this is the difference between compliance and violation.

⚡ TL;DR

  • ☁️ OpenAI FileSearch: files are stored on OpenAI servers (US, Microsoft Azure) — no EU region by default
  • 📓 Notion AI: data is processed through sub-processors (Anthropic, OpenAI) — servers outside your control
  • 🏠 Self-hosted: all components on your server — no external access
  • ⚖️ GDPR status: cloud requires DPA + SCCs + DPIA; self-hosted is compliant by default with an EU server
  • 🏥 For healthcare and legal: cloud AI is legally unacceptable without special measures
  • 👇 Below — a detailed breakdown of each option with real facts from provider documentation

📚 Table of Contents

How cloud AI handles your documents

When you upload a document to a cloud AI service — it is physically copied to the provider's servers. There it is split into chunks, indexed and stored to answer your queries. Your document is no longer only yours.

Cloud AI services are convenient. You sign up, upload a PDF, and get answers in seconds. But behind that convenience is a technical process most users never see.

Here is what happens to your document after uploading to a cloud service:

For personal use — this is fine. But for corporate documents containing client personal data, medical records or attorney-client privilege — each of these steps is legally significant under GDPR. For a detailed comparison of popular AI document services, see our overview 5 AI Services for Document Work: Business Comparison →

It is important to understand: the provider is not necessarily misusing your data. But the mere fact that your documents physically reside on their servers means they are a data processor under GDPR — and the entire chain of requirements (DPA, transfer risk assessment, DPIA) becomes mandatory.

Summary: cloud AI always means transferring your documents to a third party. The question is whether that third party is in the right jurisdiction and whether you have the required documentation.

Where OpenAI FileSearch physically stores your data

OpenAI FileSearch stores uploaded files and vector indexes on OpenAI servers in the US (Microsoft Azure infrastructure). EU region selection is not available for standard API customers. EU data residency is only possible for ChatGPT Enterprise customers — a separate product at a separate price.

OpenAI FileSearch is a built-in tool for searching uploaded documents within the Assistants API and Responses API. Technically it works like this: you upload a file, it is automatically chunked, vectorized and stored in a vector store on OpenAI's servers.

Key facts about data storage according to official OpenAI documentation:

One nuance: ChatGPT Enterprise offers data residency — the ability to store data in an EU region. But this is a separate enterprise product priced at several thousand dollars per year, not standard API access. Most small and medium businesses use the standard API or ChatGPT Plus — with no region selection option.

Bottom line: if you use OpenAI FileSearch via the standard API for documents containing EU personal data — your data is stored in the US with no region choice. This requires a separate legal basis for cross-border transfer under GDPR Articles 44–49.

Summary: OpenAI FileSearch is a powerful tool, but GDPR-compliant enterprise use requires either an Enterprise plan or additional legal measures that most businesses simply do not implement.

Where Notion AI physically stores your data

Notion AI transfers your workspace content to sub-processors — Anthropic and OpenAI — to generate responses. Notion's servers are in the US (AWS). The Enterprise plan offers zero data retention at sub-processors, but not at Notion itself.

Notion is a popular corporate knowledge base platform. With the addition of Notion AI, businesses gained the ability to ask questions about their documents directly in the interface. But behind that convenience lies a more complex data processing chain.

What happens to your data in Notion AI according to official Notion documentation:

The core GDPR problem: even if Notion has a DPA and SCCs — your data still physically passes through multiple US companies (Notion → Anthropic or OpenAI). Each link in this chain is a potential liability point.

For businesses handling sensitive data this means: before using Notion AI you must sign a DPA with Notion, confirm your plan includes zero retention at sub-processors (i.e. Enterprise), conduct a DPIA and have a legal basis for transfer to the US. In practice — this is weeks of legal work.

Summary: Notion AI is convenient, but the sub-processor chain and US servers create GDPR compliance burden that most small and medium businesses simply do not realize when signing up.

What self-hosted means and how it differs architecturally

Self-hosted AI means all system components (database, vector index, documents and optionally the AI model itself) are deployed on your server. Data goes nowhere — it always stays with you.

Imagine the difference between two scenarios. In the first — you hand your documents to a third-party storage facility. Convenient, but they are no longer with you. In the second — you build your own archive room in your office. More responsibility, but full control.

A self-hosted AI document assistant works exactly on the second principle. Here is what the architecture consists of:

From a GDPR perspective this architecture is fundamentally different: no external data processor, no cross-border transfer (with an EU server), no DPA required with an AI provider. Your company is both controller and de-facto processor — the entire chain of responsibility stays with you.

Important: self-hosted does not mean "build it yourself". AskYourDocs is deployed turnkey in 5–7 business days — from server setup to document upload and chat widget configuration. After project handover we have no technical access to your database or documents — you receive full control along with administrator credentials. More about the implementation process — on our services page →

Summary: self-hosted AI is neither complex nor expensive. It is a different architecture where your data never leaves your perimeter.

Comparison table: OpenAI vs Notion vs self-hosted

The main difference is not in answer quality — it is in where your data physically lives and who has access to it.

Parameter OpenAI FileSearch Notion AI AskYourDocs (self-hosted)
Where documents are stored OpenAI servers (US) Notion servers (US, AWS) Your server (EU or anywhere)
Third parties with data access OpenAI, Microsoft Notion, Anthropic, OpenAI None
Data transfer outside EU Yes (US) Yes (US) No (with EU server)
DPA required Yes Yes No
Model training on your data No (API/Enterprise) No (officially) No (technically impossible)
Closed loop (no internet) Not possible Not possible Yes (with Ollama)
GDPR without additional measures No No Yes (with EU server)
LLM provider choice OpenAI only Notion/OpenAI/Anthropic only Any
Implementation cost Pay-per-use API From $16/month/user From $500 one-time
Vendor lock-in Full Full None

A few important notes on the table:

Summary: for businesses that take GDPR seriously — the table speaks for itself. Self-hosted solves the problem that cloud services only try to soften with contracts.


Which businesses cannot legally use cloud AI

There are industries where using cloud AI for corporate documents is not a matter of preference — it is a legal prohibition or critical risk. Healthcare, legal and government sectors are at the top of the list.

Let us look at specific cases:

Medical centers and clinics

Medical data is a special category of personal data under GDPR Article 9. Its processing is only permitted with the explicit consent of the patient or in cases strictly defined by law. Transferring medical records to US servers of ChatGPT or Notion AI without the explicit consent of each patient is a direct violation of Article 9. Additionally, many EU countries have national medical secrecy laws that are even stricter than GDPR. More details — in the article AI in healthcare: how to process medical data without breaking the law.

Legal and law firms

Attorney-client privilege is a fundamental principle of the legal system. In most EU jurisdictions, transferring client case materials to a third party without the client's explicit consent is a violation of professional ethics and potentially the law. If your client contracts and case files are being processed by OpenAI or Notion — your clients have every right to question confidentiality. More details — in the article AI for legal firms: protecting client data.

Financial institutions and insurance companies

Client financial data — accounts, credit files, insurance contracts — falls under GDPR and additionally under financial regulators (BaFin in Germany, FMA in Austria). Most financial regulators have clear requirements for data storage within the EU. Using cloud AI services with US servers without regulatory approval is a licensing risk.

Government and municipal bodies

For government entities the question does not even arise: processing state and citizens' personal data on US company servers is a de-facto prohibition in most EU countries. The data sovereignty requirement means a closed loop with no external transfers whatsoever.

Companies processing HR data

Employment contracts, salary data, medical examinations, employee reviews — all of this is personal data with heightened protection requirements. If your HR department uses cloud AI to work with these documents — every uploaded file is a potential GDPR violation.

Summary: if your business falls into even one of these categories — cloud AI for document work requires either very serious legal preparation, or replacement with a self-hosted solution.

Conclusion: when self-hosted is the only option

Self-hosted is the only option when you need a technical guarantee that data does not leave your perimeter. Provider contracts and statements are legal protection. But only self-hosted makes a leak through external services physically impossible.

There is a fundamental difference between two levels of protection. The first — legal: the provider signed a DPA, stated they do not train models on your data, holds a SOC 2 certificate. This matters, but it is paper. If tomorrow the provider suffers a breach, leadership changes or a regulator comes for an audit — your data is already on someone else's servers and you do not control what happens to it.

The second level — technical: data physically never leaves your server. No transfer — no risk of leakage through external services. No DPA provides that guarantee, because a DPA regulates human behavior, not the physical movement of data.

When self-hosted is the only option

Self-hosted is the only option if at least one of the following conditions applies:

When cloud AI is an acceptable option

Cloud is justified only if all three conditions are met simultaneously:

If even one of these conditions is not met — cloud AI creates a GDPR risk you may simply not have noticed yet.

The real math of the choice

For most small and medium businesses in Europe the comparison looks like this:

Not sure which option suits your business? AskYourDocs helps determine the optimal isolation level on the first call — without technical jargon. Send 2–3 documents and we will show how it works on your real data.

Summary: the question is not "cloud or self-hosted". The question is whether you are prepared for the legal and reputational consequences if a cloud provider faces a breach, a regulatory audit, or simply changes its privacy policy next quarter.

Frequently Asked Questions

Can OpenAI be made GDPR-compliant for enterprise use?

Yes, but it requires: a signed DPA with OpenAI, an Enterprise plan with EU data residency, a completed DPIA and a legal basis for processing. For most small and medium businesses this is either out of budget or too complex legally.

Notion says it does not train models on my data — is that enough?

No model training is just one aspect of GDPR. The mere fact of transferring personal data to US servers without a legal basis for cross-border transfer is already a violation — regardless of whether the model is trained on your data.

How much does self-hosted cost compared to cloud services?

AskYourDocs implementation — from $1500 one-time. Infrastructure (EU VPS) — $20–50/month. For comparison: Notion Plus for a team of 10 — around $160/month, and that does not resolve any GDPR issues. Detailed cost breakdown — in the article How much does an AI document assistant cost.

What if I use hybrid mode — is it safe?

Hybrid mode (documents stored locally + external LLM) is a good balance: only anonymized text fragments without file names and metadata are sent to the external LLM. This is the option we recommend to most clients as the optimal combination of answer quality and data protection — without the extra cost of full isolation. For healthcare and legal — only a fully closed loop.

How do I check where my current AI service stores data?

Check: Terms of Service, privacy policy page, list of sub-processors and whether a DPA is available. If the website does not give a clear answer about server geolocation — that is already a warning sign. Use our 10-question checklist to evaluate any AI service.

Key Takeaways

Key insight: cloud AI services solve the convenience problem, but not the data ownership problem. Self-hosted solves both.

Want to check your option?

Send 2–3 of your real documents — and in 30 minutes we will show a live demo: how AI answers questions from your knowledge base, and where your data physically resides in the process. Free. No registration. No obligation.

Write on Telegram →

Want to see the solution in action on the homepage? askyourdocs.org/en/#try-demo

Read also

Sources: OpenAI Europe Privacy Policy · OpenAI File Retention Policies · Notion AI Security Practices · GDPR at Notion · GDPR Article 83