RAG & AI Technologies

Larger Embedding Models Aren't Always Better: Our Experience with Corporate Documents

Views: 84 Published: 10.06.2026
🇺🇦 UK 🇺🇸 EN 🇩🇪 DE 🇪🇸 ES
Larger Embedding Models Aren't Always Better: Our Experience with Corporate Documents
TL;DR: A larger embedding dimension doesn't always mean better search for corporate documents. At AskYourDocs, we chose 1536 over 3072 and achieved half the infrastructure costs without compromising search quality for legal, HR, and business documents.

What are embeddings and why their dimension impacts your business

When an AI system searches for answers within your documents, it doesn't read every file every time. Instead, it pre-processes each text chunk into a numerical "fingerprint" – a vector. This is an embedding.

Imagine each paragraph of your contract or instruction manual is transformed into a coordinate in a multi-dimensional space. When an employee asks a question, the system searches based on the similarity of these coordinates, not just keywords. This is why AI search can find the right contract clause even if the query is phrased differently than the text itself.

Dimension refers to the number of values in such a vector. OpenAI's text-embedding-3-small generates vectors with 1536 values, while text-embedding-3-large uses 3072. It seems intuitive: more numbers mean a more precise search. However, in practice, it's more complex.

Simply put: embedding dimension is like photo resolution. A 4K photo takes up four times more space than Full HD, but you might not notice the difference on a smartphone screen. For most corporate tasks, 1536 is your Full HD: clear enough without unnecessary expenses.

Why is this important for businesses? Because dimension directly affects three things: search speed, server memory usage, and document processing costs. If you choose a dimension "just in case," you pay for it every day.

Why companies overpay for dimension they don't need

When we start working with a new client, one of the first questions we discuss is which vector dimension will be suitable for their archive. Almost every time, we hear the same sentiment: "Let's go with 3072 – bigger is better, right?"

This logic is perfectly understandable. More dimensions seem to imply more details and a more accurate search. But in practice, this isn't always the case. We explain why – with numbers.

According to a study on optimizing vector storage in RAG systems (arXiv, 2025), one million documents with 1536-dimensional vectors occupy about 6.1 GB of RAM. With 3072-dimensional vectors, it's twice as much, over 12 GB. In cloud infrastructure, RAM is a direct driver of monthly costs.

Meanwhile, the increase in search quality is minimal. A comparative analysis of embedding models in 2026 shows that moving from 1536 to 3072 dimensions yields only a 2-4 point increase in nDCG on retrieval benchmarks. "The quality curve flattens very quickly after 768 dimensions for most tasks," yet storage costs are six times higher.

Here's how the comparison looks in numbers:

Parameter 1536 (text-embedding-3-small) 3072 (text-embedding-3-large)
API Cost $0.02 / million tokens $0.13 / million tokens
RAM for 1 million documents ~6.1 GB ~12.2 GB
Search Quality Improvement +2–4 nDCG points
Storage Costs baseline ~6x higher
Similarity Search Speed faster slower
Suitable for SMB Archives ✅ yes ⚠️ overkill

For an SMB company with an archive of 50,000–200,000 documents – a typical scale for a law firm, medical center, or distributor – the difference in annual costs can amount to several thousand dollars. Without any noticeable improvement in response quality.

The conclusion is simple: "more" in the case of embeddings is not always "better." It's "more expensive" and "slower." And for most corporate documents, the search quality with 1536 dimensions is entirely sufficient.


What we achieved with 1536 in a real product

At AskYourDocs, we went through this decision not in theory, but with a real product and real clients. We'll be honest: how we arrived at 1536, what it gave us, and where the challenges were.

Our stack: Spring Boot, Java 21, PostgreSQL with the pgvector extension, OpenRouter as a gateway to LLMs. Infrastructure – Railway EU West (Amsterdam), file storage – Cloudflare R2 in the EU jurisdiction. For embeddings, we chose text-embedding-3-small with a dimension of 1536.

Why 1536, not 3072

Initially, we also considered: perhaps we should use a larger model – just in case? But when we calculated the actual costs at scale and looked at benchmarks for our type of documents (legal contracts, internal regulations, HR documents, instructions), it became clear: the difference in search quality between 1536 and 3072 for homogeneous corporate texts in a single language is minimal. But the difference in costs is significant.

What changed after choosing 1536

Metric Result Comment
Service RAM on Railway ~470 MB (was ~1.2 GB) Optimization of Alpine Docker + 1536-dimensional vectors together led to a significant reduction in footprint
Similarity Search Speed faster with the same hardware A smaller dimension simplifies the operation of IVF_FLAT and HNSW indexes in pgvector
Search Accuracy on Scanned PDFs from ~17% to ~50% However, the main reason was the implementation of Vision OCR, not the embedding dimension
Embedding API Cost $0.02 / million tokens Compared to $0.13 for text-embedding-3-large – the difference is noticeable even with 100k+ documents
Model Hallucinations eliminated The reason was not the dimension, but limiting history to 6 messages and tightening the system prompt

An Important Lesson: Dimension is Not the Main Variable

When search accuracy was low (17%), the problem wasn't that we chose 1536 instead of 3072. The problem was the quality of OCR recognition – scanned PDFs provided "garbage" as input, and no embedding dimension could save that.

After implementing Vision OCR (GPT-4o-mini for recognition with automatic retry for 90°/180°/270° for inverted scans), search accuracy increased to an acceptable level – with the same 1536 dimensions.

This confirms a simple idea: the quality of input data is more important than the vector dimension. Garbage in, garbage out, regardless of whether it's 1536 or 3072.

Note: The figures provided are the result of our specific deployment and the types of documents our clients have. Your results will depend on archive volume, document language, and input file quality.

When 3072 is truly justified – and when it's an unnecessary expense

We're deliberately not saying that 3072 is always bad. That would be dishonest. There are scenarios where a larger dimension is indeed justified – and we ourselves recommend it to clients when we see the right conditions. However, such cases are significantly less common among SMB companies than one might think.

When 3072 Makes Sense

Multilingual documents within the same chunk. If your archive contains documents where multiple languages are mixed in a single paragraph – for example, a technical specification with English terms and a Ukrainian description, or a contract with quotes from foreign legislation – a larger dimension better captures semantic links between languages. 1536 can handle it, but with noticeable losses in cross-lingual search.

Complex domain terminology. Medical protocols, scientific articles, patent documentation – texts where phrasing is critically important and terms that sound similar have different meanings. Here, a higher dimension provides better "resolution" between closely related concepts.

Archives of millions of documents with business-critical search. If you have 2-5 million documents, and even one missed relevant result costs money or reputation – a 2-4 point nDCG increase on benchmarks becomes tangible in real scenarios. At this scale, it's worth testing both options.

There is an infrastructure budget. If RAM and storage are not a limitation, and the team is prepared for higher operational costs – 3072 provides a certain margin of accuracy "just in case.".

When 1536 is Entirely Sufficient

Homogeneous documents in a single language. Internal regulations, employment contracts, HR documents, employee instructions, price lists – this is the most common type of archive among our clients. Here, 1536 provides search quality indistinguishable from 3072 in practice.

Archives up to 500,000 documents. The typical scale for a law firm, medical center, or distributor. At this volume, the difference between models in real queries is statistical noise, not a business effect.

Deployments on limited resources. If you are deploying the system on your own servers or in an isolated environment (Hetzner, on-premise) – the twice-smaller RAM footprint from 1536 could mean the difference between one and two servers in a rack.

Response speed is important. Smaller dimension = faster similarity search with the same hardware. For products where users expect real-time responses, this is noticeable.

An Interesting Fact That Changes Intuition

As noted in the official OpenAI documentation, text-embedding-3-large, when reduced to 256 dimensions, still outperforms the full-size text-embedding-ada-002 (1536) on the MTEB benchmark. This means modern models have learned to pack information more efficiently – and the maximum dimension is no longer synonymous with maximum quality.

It's important to understand this: we are no longer in an era where "more dimensions = better." We are in an era of efficient encoding, where a well-trained model with 1536 dimensions wins against an outdated model with 3072.

In Short: The Choice Matrix

Situation Recommendation
Homogeneous documents, single language, up to 500k ✅ 1536
Limited server resources / on-premise ✅ 1536
High importance of speed and low latency ✅ 1536
Multilingual documents in a single chunk ⚠️ test 3072
Complex domain terminology (medical, legal) ⚠️ test 3072
Archive over 1 million documents ⚠️ test both
Business-critical search, budget available ⚠️ 3072 as an option

How to choose the dimension for your document archive – a checklist

This is the most practical section of the article. If you are currently choosing an embedding model for your AI system, answer the six questions below. After each question, there are specific examples from our experience so you can identify with your scenario.

Question 1: What is the language of your documents?

This is the first and most important question. The embedding dimension significantly impacts search quality, especially in a linguistic context.

Example from practice: a client – a Ukrainian law firm, with an archive of 40,000 contracts in Ukrainian. They chose 1536. Searching by the content of contract clauses works correctly, the client is satisfied with the result.

Another example: a distributor with documentation from foreign suppliers – some files in English, some in Ukrainian, some with mixed languages in specification tables. Here, we recommended testing both options before making the final choice.

Question 2: What is the volume of your archive?

The archive volume affects two factors simultaneously: indexing cost (API calls) and the cost of storing vectors in the database.

Archive Volume Typical Client Recommendation Estimated Indexing Cost
up to 50,000 documents law firm, HR department, medical center ✅ 1536 ~$1–2 one-time
50,000 – 200,000 distributor, franchise network ✅ 1536 ~$2–8 one-time
200,000 – 1 million large corporate archive ✅ 1536, test 3072 $8–40 one-time
over 1 million enterprise, public sector ⚠️ calculate TCO for both $40+ one-time

Important: indexing cost is a one-time expense. However, the cost of storing vectors in RAM is a monthly expense. With 3072 dimensions, it's twice as high. Also read: how to correctly prepare documents for AI search – file structure affects chunk size and, consequently, the number of vectors in the database.

Question 3: What document formats are in your archive?

This is a question often overlooked when choosing dimensions – and it shouldn't be. The quality of the input text is more important than the vector size.

Example from practice: we had a case where search accuracy on a scanned archive was 17%. We tried changing parameters – it didn't help. The problem was that the OCR provided garbled text with substitute characters. After implementing Vision OCR, accuracy increased to ~50% – with the same 1536 dimensions. The dimension was not a factor at all here.

Question 4: What is your infrastructure budget?

This question is not just about API costs, but about the total cost of ownership (TCO): server RAM, storage in the vector database, hosting costs.

Deployment Scenario RAM for 100k documents (1536) RAM for 100k documents (3072) Recommendation
Cloud (Railway, Render, Fly.io) ~0.6 GB ~1.2 GB 1536 – lower pricing tier
Own Server / Hetzner ~0.6 GB ~1.2 GB 1536 – more space for other services
On-premise, isolated environment ~0.6 GB ~1.2 GB 1536 – critical with limited hardware
Enterprise cloud with unlimited budget not critical not critical test both options

Question 5: How critical is search accuracy for your business?

Let's be honest: for most corporate tasks, the difference between 1536 and 3072 is practically imperceptible. But there are exceptions.

Question 6: Do you have resources for testing?

If you have doubts – the most honest answer is: conduct an A/B test on your actual archive. Take 50-100 typical queries from future users, index a sample of documents with both models, and compare the results. This will take a few hours and provide a more accurate outcome than any benchmark.

If you don't have resources for testing – focus on the table below:

Your Profile Recommendation Why
Law firm, contracts in one language ✅ 1536 Homogeneous content, SMB scale
HR department, internal documents ✅ 1536 Simple content, speed is more important
Medical center, protocols and records ⚠️ test Complex terminology, critical accuracy
Distributor, price lists and specifications ✅ 1536 Structured content, numerical data
Franchise network, standards ✅ 1536 Homogeneous documents, typical queries
Multilingual corporate archive ⚠️ test 3072 Cross-lingual semantics require higher dimension
On-premise, limited hardware ✅ 1536 Twice lower RAM footprint
Our recommendation: if your archive consists of internal regulations, contracts, HR documents, or instructions in a single language and up to 500,000 files – go with 1536 immediately, without overthinking. This is what we use ourselves at AskYourDocs and recommend to most of our clients. If you have a multilingual archive, complex domain terminology (medical, legal, technical specifications), or a volume exceeding a million documents – contact us, and we'll analyze your specific case together to select the optimal configuration.
Summary: embedding dimension is not the primary driver of search quality. For most SMB archives (law firms, medical centers, HR, distributors), 1536 provides sufficient accuracy at half the infrastructure costs. The main factors that truly impact results are the quality of input documents, proper chunking, and OCR settings for scanned files. 3072 should only be considered for multilingual archives, complex domain terminology, or scales exceeding a million documents.

If you are planning to implement AI search for corporate documents – we at AskYourDocs are ready to analyze your specific case: document types, archive volume, infrastructure. We will select a configuration that truly suits you, rather than the largest "just in case.".

Telegram: @name_lucky_lucky  |  WhatsApp

Read also