The director of a law firm in Vienna asks: "We want self-hosted AI - what should we buy and where should we host it?" Most articles on this topic are either too technical or written for developers. This guide is for decision-makers who don't want to dive into GPU specs. It covers everything you need to know: which providers to choose, how much to pay, and what to ask a contractor before signing a contract.
⚡ TL;DR
- 🏠 Self-hosted servers are necessary when data cannot leave your infrastructure — healthcare, legal, HR, finance
- 🚫 AWS and Azure in Germany don't solve GDPR issues — they are US companies subject to the CLOUD Act regardless of server location
- ✅ Secure providers for the EU: Hetzner (Nuremberg/Finland), OVH (Strasbourg), Contabo (Munich)
- 💻 CPU vs. GPU: CPU is for internal tools where 30-90 seconds response time is acceptable. GPU is for public chat or when <10 seconds is required
- 💰 Real costs: from €4-8/month (CPU, small business) to €184/month (GPU for production AI)
- 🤝 Post-launch: a contractor sets it up; you or your administrator manage it. Or the contractor handles ongoing support
📚 Table of Contents
Why You Need a Self-Hosted Server in the First Place
A self-hosted server isn't an end in itself. It's a consequence of a specific requirement: your data cannot be processed on third-party infrastructure. If this requirement doesn't exist, cloud AI might be a cheaper and simpler option.
Most of our clients at AskYourDocs come with a specific question — not "I want a self-hosted server" but "Can we legally use cloud AI?" The answer determines whether a server is needed at all.
Three Reasons to Choose a Self-Hosted Server
Reason 1: Legal Requirements. Medical data, attorney-client privilege, client financial records — all are regulated in such a way that transferring them to third-party AI providers is either impossible or requires extensive, costly legal work. A self-hosted server solves this technically: data physically never leaves your infrastructure.
Reason 2: Predictable Costs. Cloud AI is priced "pay-per-use" — you don't know your bill for the next month in advance. A self-hosted server offers a fixed bill regardless of usage volume. At 500+ requests per day, a self-hosted server becomes cheaper than a cloud API.
Reason 3: Independence. OpenAI can change prices, terms, or even revoke access. Your model on your server: won't change without your knowledge, isn't dependent on decisions made by a US company, and will continue to work even if the AI provider experiences an outage.
When a Self-Hosted Server is NOT Needed: If you're just starting, your documents don't contain personal data, and your usage is less than 200 requests per day — start with a cloud API and migrate to a self-hosted server once you confirm its value. We always recommend starting with the option that allows you to test your hypothesis quickly.
Why AWS and Azure Germany Don't Solve GDPR Issues
The most common mistake we see: a company chooses an "EU region" in AWS or Azure and believes the GDPR problem is solved. This is not the case. The physical location of a server and the legal jurisdiction over data are different things.
This is the section we consider most important in this article — and one that most AI server guides ignore. So, let's explain it in detail and simply.
What is the CLOUD Act and Why It Affects Your Business
In 2018, the US enacted the Clarifying Lawful Overseas Use of Data Act (CLOUD Act). This law allows US law enforcement agencies to demand that US companies provide access to any of their clients' data — regardless of where the servers are physically located.
In simple terms: imagine you rent a safe deposit box at a bank in Frankfurt. But this bank is American. American federal agents have the right to approach the bank in the US with a request to open your safe deposit box in Frankfurt — and the bank is obligated to comply without notifying you. This is exactly how the CLOUD Act works for AWS, Azure, and Google Cloud.
AWS EU-Central-1 (Frankfurt), Azure Germany West Central, Google Cloud Europe-West — all these options are physically located in the EU but are operated by US companies. The CLOUD Act fully applies to them.
Why This Is Critical for Businesses in Austria and Germany
The Austrian Data Protection Authority (DSB) in the Google Analytics case (2022) established the strictest standard in the EU: it's insufficient to claim that "the probability of US intelligence accessing your data is low." A technical impossibility of such access is required. No US cloud provider can offer such a guarantee — by definition.
For medical centers and law firms in Austria and Germany, this means: AWS and Azure Germany are not acceptable solutions, regardless of price or product quality. A provider under EU jurisdiction is necessary.
| Provider |
Physical Location |
Jurisdiction |
CLOUD Act |
Suitable for EU/AT/DE? |
| AWS EU-Central-1 |
Frankfurt, DE |
🇺🇸 USA |
✅ Applies |
❌ No |
| Azure Germany West Central |
Frankfurt, DE |
🇺🇸 USA |
✅ Applies |
❌ No |
| Google Cloud Europe-West |
Belgium/Netherlands |
🇺🇸 USA |
✅ Applies |
❌ No |
| Hetzner |
Nuremberg/Falkenstein DE, Helsinki FI |
🇩🇪 Germany |
❌ Does not apply |
✅ Yes |
| OVHcloud |
Strasbourg FR, Warsaw PL |
🇫🇷 France |
❌ Does not apply |
✅ Yes |
| Contabo |
Munich DE, Nuremberg DE |
🇩🇪 Germany |
❌ Does not apply |
✅ Yes |
Choosing a Region and Provider: Hetzner, OVH, Contabo
For most businesses in AT/DE, we recommend Hetzner as the first choice — offering the best price/quality/GDPR compliance ratio among EU providers. OVH and Contabo are worthy alternatives depending on the task.
Important Update: Hetzner increased prices by 30-37% as of April 1, 2026, due to rising costs for server memory (HBM for GPUs). Even after the increase, Hetzner remains 2.5-3.3 times cheaper than AWS/GCP for equivalent configurations.
Hetzner Online — Our Default Choice
Hetzner Online GmbH is a private company headquartered in Gunzenhausen, Bavaria. Data centers are located in Nuremberg, Falkenstein (Saxony), and Helsinki. They are ISO 27001 certified. Their flat-rate pricing includes traffic (up to 20 TB in EU regions) with no hidden fees. Technical support is primarily via ticket system, without 24/7 phone support.
Ideal for: Most SMEs wanting maximum performance at minimum cost with guaranteed EU jurisdiction. Our choice for 90% of clients.
OVHcloud — The French Alternative
OVH SAS is a French company (Iliad Group), one of the largest hosting providers in the EU. Data centers are in Strasbourg, Roubaix, and Warsaw. They offer a broader range of managed services than Hetzner. Prices are slightly higher, but they have managed tiers for those who don't want to manage the server themselves.
Ideal for: Companies needing more support or additional managed services (databases, load balancers). A good alternative if Hetzner isn't suitable for technical reasons.
Contabo — The Cheapest CPU Option
Contabo GmbH is a Munich-based company operating since 2003, offering the most CPU resources for the lowest price on the market. Their €4.50/month starting price for 4 vCPUs / 8 GB RAM is a record for the EU market. They are ISO 27001 certified. Support is via tickets.
Important Note on Contabo GPUs: Their GPU servers are geared towards the enterprise segment (NVIDIA H100, L40S) and start at $790/month — unsuitable for SME AI tasks. For GPU servers, choose Hetzner or Scaleway.
Ideal for: CPU-only deployments for small businesses where cost is critical and GPUs are not needed.
| Provider |
CPU VPS from |
GPU Server from |
Location |
Support |
Best For |
| Hetzner ⭐ |
€3.49/month |
€184/month (RTX 4000 Ada 20 GB) |
DE, FI |
Ticket |
Most projects — CPU and GPU |
| OVHcloud |
€3.99/month |
from €100/month |
FR, PL |
Ticket + Phone |
Managed services, enhanced support |
| Contabo |
€4.50/month |
from $790/month (H100) |
DE |
Ticket |
CPU-only, maximum affordability |
| Scaleway |
€3.99/month |
from €150/month |
FR |
Ticket |
GPU alternative in France |
CPU vs. GPU: What Really Matters and When a GPU is Necessary
A GPU isn't about being "better" or "more powerful." GPU means "faster." CPU means "slower but cheaper." The question is whether a 30-90 second response is acceptable for your use case, versus 5-10 seconds. For most internal tools, it is.
This is a question we explain at every initial client meeting. Most executives assume a GPU is mandatory. In reality, it depends on who is waiting for the answer and how long they're willing to wait.
A Simple Analogy
A CPU is like an experienced senior lawyer: it thinks methodically, provides an accurate answer, but takes more time. A GPU is like a whole team of parallel assistants: they respond almost simultaneously due to massive parallelism. For a document where the answer is the same in both cases, it's purely a matter of waiting time.
When a CPU is Sufficient
- Internal employee tool: A manager asks a question and works on something else while waiting for a response. 30-60 seconds is acceptable. Compare this to 20 minutes of manual document searching.
- Nightly or background document processing: Reports, analysis, summaries without real-time requirements — a CPU is ideal.
- Models up to 14B parameters: Llama 3.2 8B or Qwen3 14B on a CPU deliver 5-15 tokens/sec — a response in 30-90 seconds.
- Budget is limited and you want to test: Start with a CPU, migrate to a GPU once the value is proven.
When a GPU is Mandatory
- Public website chat — for clients or patients: A person expects a real-time response. A 30-second wait equals a abandoned chat. A sub-10-second response is needed — a GPU is required.
- Telegram or WhatsApp bot with an external audience: Similarly, the wait time must be comfortable.
- Models with 22B+ parameters: Mistral Small 3 (24B) or Gemma 4 26B without a GPU take 60-120 seconds. With a 16 GB GPU, it's 5-10 seconds.
- More than 10 simultaneous users: A CPU processes requests sequentially, while a GPU processes them in parallel.
| Scenario |
Is CPU Enough? |
Recommendation |
| Internal FAQ for 5-10 employees |
✅ Yes |
CPU server, Llama 3.2 8B or Qwen3 14B |
| Public clinic website chat |
❌ No |
GPU 16 GB, Gemma 4 26B or Mistral Small 3 |
| Telegram bot for an internal team |
✅ Yes (if 60 sec response is okay) |
CPU or GPU depending on waiting tolerance |
| Law firm, contract search |
✅ Yes for internal use |
CPU to start, GPU if you want Llama 3.3 70B |
| Medical center, patient responses |
❌ No |
GPU is mandatory — patients expect real-time responses |
What Server Configuration Suits Your Scale
Three parameters determine the required configuration: the number of documents in the system, the number of daily requests, and the model needed for response quality. Everything else is a consequence of these three.
We don't recommend "minimum requirements" without context — it's pointless. Instead, here are four typical scenarios we see with clients.
| Scenario |
Documents |
Requests/Day |
Configuration |
Model |
Provider |
Startup / Test Small office, internal FAQ |
up to 200 |
up to 50 |
CPU-only 4 vCPU / 16 GB RAM / 100 GB SSD |
Llama 3.2 8B or Qwen3 14B |
Contabo or Hetzner CX |
Production without GPU Internal company tool |
200–1000 |
50–200 |
CPU-only 8 vCPU / 32 GB RAM / 200 GB SSD |
Qwen3 14B or Llama 3.3 70B (slow) |
Hetzner CPX or Contabo VPS XL |
Production with GPU Public chat, clients/patients |
500–5000 |
200–500 |
GPU 16–20 GB 32–64 GB RAM / 500 GB SSD |
Gemma 4 26B or Mistral Small 3 |
Hetzner GEX44 (€184/month) |
High Quality Law firm, medical center, maximum accuracy |
1000+ |
200–500 |
GPU 48 GB or 2xGPU 128 GB RAM / 1 TB SSD |
Llama 3.3 70B Q4 |
Hetzner Dedicated or Own Server |
Our advice for getting started: Begin with a CPU-only configuration under real-world load for 2-4 weeks. If the speed isn't satisfactory, migrating to a GPU takes 1 day, and your documents will already be in the system. Overpaying for a GPU upfront without a confirmed need is unjustified.
RAM and Disk, Separately
RAM: The model is fully loaded into memory. Llama 3.2 8B requires ~6 GB, Gemma 4 26B requires ~15 GB, and Llama 3.3 70B requires ~43 GB. Always allocate RAM with a ~30% buffer for the operating system and database. Insufficient RAM means the model is partially on disk, resulting in very slow performance.
Disk: The models themselves occupy 5 to 43 GB. Your documents typically take up 1-10 GB, even for large archives (text is very compact). The vector database (pgvector) adds a few more GB. A 200 GB SSD is sufficient for most SMEs.
Actual Monthly Server Costs
"How much does a server cost?" is an unanswerable question without context. The right question is: "How much does a server for my tasks cost compared to cloud AI?" Here's an honest comparison.
Prices are current as of June 2026. Hetzner increased prices by 30-37% on April 1, 2026, but remains the cheapest GDPR-compliant EU provider for AI tasks.
Current Hetzner Prices (after April 2026 increase)
| Configuration |
Specs |
Price/Month |
Suitable For |
| CX23 (CPU) |
2 vCPU / 4 GB RAM / 40 GB SSD |
€3.49 |
Testing only, minimal load |
| CX33 (CPU) |
4 vCPU / 8 GB RAM / 80 GB SSD |
€7.99 |
Small model, up to 20 requests/day |
| CX43 (CPU) |
8 vCPU / 16 GB RAM / 160 GB SSD |
~€18 |
Qwen3 14B, up to 50 requests/day |
| CPX51 (CPU) |
16 vCPU / 32 GB RAM / 360 GB SSD |
~€45 |
Qwen3 14B fast or Llama 70B slow |
| GEX44 (GPU) ⭐ |
Intel Core i5 / 64 GB RAM / NVIDIA RTX 4000 Ada 20 GB |
€184 |
Gemma 4 26B or Mistral Small 3, up to 500 requests/day |
Comparison with Cloud Alternatives
| Option |
Monthly Cost |
GDPR |
Notes |
| OpenAI GPT-4o mini API (500 requests/day) |
~$12–24 |
⚠️ Risk |
Cheap but data is in the US |
| OpenAI GPT-4o API (500 requests/day) |
~$100–200 |
⚠️ Risk |
Expensive and data is in the US |
| Hetzner CPU + Llama 3.2 8B |
€7–18 |
✅ Full Compliance |
Slow (~60 sec), but secure and cheap |
| Hetzner GPU GEX44 + Gemma 4 26B |
€184 |
✅ Full Compliance |
5-8 sec response, unlimited requests |
| AWS/Azure GPU equivalent |
$400–600 |
❌ CLOUD Act |
2.5–3x more expensive than Hetzner |
Hidden Costs Not Visible in Advertising
- IPv4 Address: Hetzner charges an additional €0.50/month. Required for Telegram bots or public web chats.
- Backups: +20% of the server price at Hetzner (e.g., +€37/month for GEX44). We always recommend enabling this.
- Inbound Traffic: Free with all EU providers. Outbound traffic is free up to 20 TB at Hetzner (practically unlimited for AI chat).
- Deployment by Contractor: One-time setup fee (typically €300–800 depending on complexity).
Who is Responsible for the Server After Launch — And What It Costs You
The most common question after a demo is: "So, who maintains all of this afterwards?" The answer is simple: either your administrator after training, or a contractor for a monthly fee. There is no third option.
The word "server" might sound intimidating to a non-technical manager. In practice, once properly configured, an AI assistant on a server requires far less attention than most people assume.
What Actually Needs "Maintenance" After Launch
There are four things that require attention:
- Document Updates: Someone from your team uploads new or updated documents via the admin panel. It's a drag-and-drop process that takes a minute. No IT knowledge is needed — an administrator or secretary can handle it.
- Restarting After a Failure: Hetzner automatically restarts the server in case of hardware failure. Docker containers with AI launch automatically upon reboot. In practice, this means 2–3 minutes of downtime every few months.
- Software Updates and Security: Updating Ubuntu, Docker, and dependencies. This needs to be done about once a month. Either your IT department or a contractor handles this.
- Monitoring: Checking if the system is responding to requests. Basic monitoring is included with Hetzner, while advanced monitoring requires additional tools.
Two Options After System Handover
Option A: Your administrator manages it independently. After project handover, we train one person from your team – usually for 2–3 hours. They can: upload documents, restart the service if needed, and answer team questions. For more complex tasks (updates, configuring a new interface), you can consult us on a one-time basis.
Option B: A contractor provides monthly support. We or another contractor take full technical responsibility: monitoring, updates, incident response, and consultations. This costs between $50 and $200 per month, depending on the scope. This is suitable if your company lacks an IT specialist.
|
Option A: Self-Managed |
Option B: Contractor |
| Cost |
$0/mo (administrator's time only) |
$50–200/mo |
| Who's Needed |
1 person with basic IT understanding |
No one — the contractor handles it |
| Response Time for Failure |
Depends on administrator's availability |
SLA — typically 2–4 hours during business hours |
| Suitable For |
Companies with an IT specialist or active administrator |
Companies without IT staff or with critical uptime requirements |
Our recommendation: For most SMEs, Option A after brief training is entirely sufficient. Option B is justified for public-facing services (like a website chat for a clinic) where downtime directly impacts customer experience.
What to Ask a Contractor Before Signing a Contract
Most managers don't know what to ask contractors, and end up signing contracts without understanding key details. These eight questions will protect you from unpleasant surprises after launch.
This section is for those who are about to sign a contract for implementing an AI assistant. Whether it's us at AskYourDocs or another contractor, ask these questions before signing.
1. Which server provider is used, and where are the data physically located?
Correct Answer: Specific EU provider name (Hetzner, OVH, Contabo) and a specific data center. "Servers in the EU" without details is insufficient. "AWS Frankfurt" is the wrong answer for GDPR-sensitive data.
2. Which model will be installed, and why this specific one?
Correct Answer: The specific model name (e.g., "Gemma 4 26B via Ollama") with an explanation of why this model is suitable for your task. If the contractor cannot explain their model choice, they don't understand the architecture.
3. Who has access to the server after handover?
Correct Answer: After handover, only your administrators should have access. The contractor should not have persistent access without your request. A contractor who keeps a "backdoor" for support without your knowledge poses a legal risk.
4. What happens to the data if you cease cooperation?
Correct Answer: Since the server is yours or rented in your name, you simply continue paying the hosting provider, and the system keeps running. If the server is rented under the contractor's name, demand a transfer before signing the contract.
5. What is the guaranteed response quality, and how is it verified?
Correct Answer: The contractor should describe the acceptance testing process — specific questions, quality criteria for responses, and what happens if the quality is not met. Guarantees of "it will respond well" without metrics are empty words.
6. How much does it cost to update documents after launch?
Correct Answer: Updating documents via the admin panel should be simple and free for you. If the contractor charges a fee for each new document upload, it's either poor architecture or a manipulative practice.
7. What is included in the deployment price, and what costs extra?
Correct Answer: A clear list: server, software installation, document upload, interface configuration, administrator training — what's included and what's billed separately. It's important to clarify: does the price cover setting up a Telegram bot, WhatsApp, or just a web chat?
8. Does the contractor have specific experience with local LLMs and GDPR?
Correct Answer: Specific case studies or references to real clients (even anonymized). A contractor who is implementing Ollama for the "first time" while promising GDPR compliance is a risk to your business. This is not an area to learn on your project.
Conclusions
- 🏠 A dedicated server is necessary when data is regulated (medical, legal, HR, finance) or when cloud AI becomes more expensive for your workload.
- 🚫 AWS and Azure Germany are not GDPR solutions. The CLOUD Act allows US authorities to request access to data regardless of server location.
- ✅ Hetzner is our default choice: EU jurisdiction, ISO 27001, best price/quality ratio. Even after the April 2026 price increase, it remains 2.5–3x cheaper than AWS/GCP.
- 💻 CPU vs. GPU: CPU is suitable if 30–90 second wait times are acceptable for an internal tool. GPU is needed for public chats or larger models with response times under 10 seconds.
- 💰 Real prices: from €4–8/mo (CPU, Contabo/Hetzner) to €184/mo (Hetzner GEX44 GPU). Plus €0–200/mo for support, depending on the chosen option.
- 🤝 After launch: managed by your administrator or a contractor on monthly support. The system operates autonomously — documents are updated via drag-and-drop.
- ❓ 8 questions for your contractor will safeguard you from unpleasant surprises — especially regarding access, server ownership, and acceptance testing.
Want to discuss a configuration for your specific needs? In 30 minutes, we can determine the right server, model, and cost for your particular scenario — without unnecessary technical jargon.
Write on Telegram →
Read also
⸻
Sources: Hetzner Cloud for AI Projects 2026 · Hetzner Cloud Review 2026 · EDPB — European Data Protection Board · DSB — Datenschutzbehörde Austria · GDPR Local — EU AI Act Summary