You have implemented or are planning to implement an AI assistant to work with corporate documents — contracts, medical records, client files. And somewhere in the background there is a nagging feeling: is this actually legal? Are we violating something? Short answer: AI on documents and GDPR are compatible. But only if you understand three things: where data is stored, who is the data processor, and what level of isolation to choose.
⚡ In Brief
- 🏢 Who should read this: business owners, lawyers, heads of medical centres in the EU and Ukraine
- ⚠️ Main risk: most cloud AI services store your documents on servers in the US — this is a GDPR violation for businesses in the EU
- ✅ Solution: self-hosted AI on your own server in Europe — data never leaves your infrastructure
- 💰 Cost of violation: up to €20M or 4% of annual company turnover
- ⏱ Time to deploy a GDPR-compliant solution: 1–3 business days
- 👇 Below — a detailed breakdown: what GDPR is, what the risks are, what to check, and a 10-question checklist
📚 Contents
- What is GDPR and why it applies to AI
- What counts as personal data in the context of documents
- Where your data is physically stored when using AI services
- What "data processor" means and who it is in your case
- Self-hosted vs cloud AI: the difference from a GDPR perspective
- What happens if you violate GDPR when using AI
- Checklist: 10 questions before deploying AI in your business
- Frequently asked questions
- Conclusions
- Want a GDPR-compliant solution for your business?
What is GDPR and why it applies to AI
GDPR is the EU General Data Protection Regulation, which came into force in 2018. It governs any processing of personal data of EU citizens — regardless of where your company is located. If you process data of clients or employees from the EU through AI, GDPR applies to you directly.
Before 2018, data protection in Europe was handled in a patchwork manner: each country had its own law, requirements varied, and large companies could manoeuvre between jurisdictions. GDPR changed this radically — unified rules for the entire EU, a unified system of fines, and a unified approach to what constitutes a violation.
A key point for businesses: GDPR does not only apply to companies based in the EU. If your client or employee is an EU citizen, you fall under the regulation even if your office is in Kyiv or New York. This is particularly relevant for Ukrainian businesses working with clients in Germany, Austria or Poland.
Now, AI. When you upload corporate documents to ChatGPT, Notion AI or any other cloud service, you are effectively transferring data to a third party for processing. If those documents contain personal data — and they almost always do — that processing falls under GDPR. The question is no longer "do we use AI?" — it is where exactly that AI processes your data.
In December 2024, the European Data Protection Board (EDPB) published a dedicated opinion on AI and GDPR, clearly stating that even if an AI model was trained on anonymised data, this does not automatically place it outside the scope of GDPR — each use case requires a separate assessment.
Summary: GDPR is not about large corporations or abstract "data". It is about a specific document containing a client's name that you uploaded to an AI service this morning.
What counts as personal data in the context of documents
Personal data is any information that allows a natural person to be identified, directly or indirectly. In corporate documents, such data appears far more often than it might seem.
Most business owners think of personal data as passports and medical records. In reality the scope is much broader. Here is what typically appears in corporate documents and falls under GDPR:
- ✔️ Contracts — client name, address, tax ID, signature, contact details
- ✔️ Invoices and delivery notes — name of a natural person or sole trader, bank details
- ✔️ HR documents — CVs, employment contracts, salary data, medical examination records
- ✔️ Medical protocols — diagnoses, prescriptions, medical history (special category data)
- ✔️ Client correspondence — email, phone number, address, content of communications
- ✔️ Legal case files — details of the parties, case circumstances, court decisions
A separate topic: special categories of data under Article 9 GDPR. These include data concerning health, racial or ethnic origin, political opinions, religious beliefs, trade union membership, and genetic and biometric data. Heightened requirements apply to these: processing is only permitted with explicit consent or in cases strictly defined by law.
For medical centres this means that virtually every document carries heightened protection. For law firms, most client files contain data that falls under GDPR.
An important nuance: data does not cease to be personal simply because it is in a PDF or Word file. Format is irrelevant — content is what matters. If an AI service reads your PDF containing a client's name, it is processing personal data, and GDPR applies in full.
Summary: if your document workflow contains names, contact details or medical data of natural persons — and it does in almost every business — your documents fall under GDPR when transferred to AI services.
Where your data is physically stored when using AI services
Most popular AI services store uploaded documents and queries on servers in the United States. For businesses in the EU this automatically creates a problem: transferring data outside the EU is governed by Articles 44–49 GDPR and requires specific safeguards.
When you upload a document to a cloud AI service, the following happens: the file is copied to the provider's servers where it is stored for processing. Depending on the terms of use, it may remain there for anywhere from a few hours to several months. Here is where the servers of the most popular services are located:
- ✔️ OpenAI (ChatGPT, FileSearch) — primarily the US, Microsoft Azure data centres
- ✔️ Notion AI — the US, AWS infrastructure
- ✔️ Google (Gemini, NotebookLM) — the US, proprietary data centres
- ✔️ Microsoft Copilot — depends on the plan; business plans may include EU servers, but not by default
Why is this a problem? GDPR strictly regulates the transfer of personal data outside the EU. Following the landmark Court of Justice of the EU ruling in the Schrems II case in 2020, Standard Contractual Clauses (SCCs) alone are no longer sufficient — a risk assessment for the specific destination country is required. The United States is traditionally considered a problematic jurisdiction due to government data access laws (CLOUD Act, FISA).
A striking example: on 22 May 2023, the Irish DPC fined Meta €1.2 billion — a record sum in GDPR history — precisely for the systematic transfer of European users' data to servers in the US without adequate safeguards. Official EDPB decision →
The practical conclusion for businesses: if you are in the EU and upload documents containing personal data to ChatGPT or Notion AI, you are almost certainly transferring data to the US without the required legal basis. Even if the provider has signed SCCs, this may be insufficient without a DPIA (Data Protection Impact Assessment).
How AskYourDocs solves this
AskYourDocs is deployed entirely on your own server — in any EU region of your choice: Germany, Austria, the Netherlands, Poland. Documents, the knowledge base and all queries remain exclusively on your infrastructure. No file and no query is ever transferred outside the European zone.
After the project is handed over, I as the developer have no technical access to your database or documents — you receive full control along with administrator credentials. This is a fundamental difference from any SaaS solution where the provider always retains technical access to your data.
In fully closed-circuit mode (using a local Ollama model), even queries to the LLM never leave your server — the system operates in complete isolation from the internet.
Summary: "where is data stored?" is the first and most important question before choosing any AI service for document work. For businesses in the EU, the only answer that resolves this question completely is a server in Europe under your control.
What "data processor" means and who it is in your case
GDPR separates roles: the data controller (your company) decides why and what data to process. The data processor (the AI service) does so on your behalf. But responsibility remains with you — as the controller.
This is one of the most important and least understood aspects of GDPR for businesses. Let us work through it with an example.
Imagine a law firm that uploads client contracts to an AI assistant for search. In this setup:
- ✔️ The law firm = data controller — it decided to upload these documents and is responsible to clients for their data
- ✔️ The AI service = data processor — it processes data on behalf of the law firm
Under Article 28 GDPR, a Data Processing Agreement (DPA) must be signed between the controller and the processor. Without it the entire arrangement is unlawful, even if everything is technically configured correctly.
What is critical to understand: if the processor (the AI service) violates GDPR, the controller (your company) also bears responsibility. You cannot "shift" responsibility to the provider by signing a DPA. A DPA sets out the conditions of processing but does not relieve you of the obligation to select reliable processors and oversee them.
In practice: OpenAI, Notion and most SaaS providers offer standard DPA documents that can be signed. But the mere existence of a DPA does not resolve the questions of server geolocation and cross-border data transfers — those are separate requirements.
With a self-hosted solution where AI is deployed on your own server, the setup is fundamentally different: there is no external processor. Your company is both controller and de facto processor. The question of a DPA with a third party simply does not arise.
Summary: before deploying any AI service, check: is there a signed DPA, what does it say about data geolocation, and does this meet your obligations to your clients.
Self-hosted vs cloud AI: the difference from a GDPR perspective
Self-hosted AI is a solution where all components (database, documents, AI model) are deployed on your server. Cloud AI means transferring documents to a provider. From a GDPR perspective the difference is fundamental: in the first case data never leaves your perimeter; in the second, a full chain of requirements applies.
Let us compare the two approaches across key parameters:
| Parameter | Cloud AI (ChatGPT, Notion AI) | Self-hosted AI (AskYourDocs) |
|---|---|---|
| Where documents are stored | Provider's servers (US) | Your server (anywhere) |
| Data transfer outside the EU | Yes, automatically | No, if server is in the EU |
| DPA with provider required | Yes, mandatory | No (no external processor) |
| Provider access to data | Technically possible | None |
| Closed circuit (offline) | Not possible | Yes (with Ollama) |
| GDPR compliance | Requires additional measures | Compliant by default with EU server |
There is another important dimension: isolation level. A self-hosted solution can be configured in different ways:
- ✔️ Hybrid mode — documents and knowledge base on your server, but an external LLM (OpenAI, Mistral) is used to generate responses. Only text fragments — without file names or metadata — are sent to the LLM provider. A good balance between response quality and security.
- ✔️ Fully closed circuit — all components on your server, including the AI model (e.g. Ollama with Llama or Mistral). No query ever reaches the internet. The mandatory option for healthcare, law firms and public sector organisations.
Which isolation level suits your business depends on your industry, regulatory requirements and the type of documents involved. AskYourDocs helps determine the optimal configuration on the first call: together we analyse what data is being processed, what legal obligations apply, and what level of protection closes all GDPR questions in your specific case — without unnecessary costs for over-engineering.
For businesses in Germany and Austria it is worth factoring in an additional layer: federal data protection legislation (BDSG in Germany, DSG in Austria) may set stricter requirements than baseline GDPR. More on this in the article AI and GDPR in Germany and Austria: requirements for corporate systems.
Summary: self-hosted AI is not simply "more secure" — it eliminates an entire class of GDPR risks associated with transferring data to third parties.