What you'll take away
The default RAG architecture decision for 2024 and 2025 has been: pick a vector database, embed your documents, store the vectors, retrieve by cosine similarity. Pinecone, Weaviate, Qdrant, Chroma, Milvus — the vector database ecosystem exploded to meet demand that outpaced critical evaluation of whether a dedicated vector store was actually the right tool for each use case.
For many production deployments, it is not. A dedicated vector database solves a specific problem — high-throughput, low-latency approximate nearest-neighbour search at billion-vector scale. Most enterprise RAG applications operate at thousands to millions of vectors, not billions, with query patterns that do not require the specialised indexing algorithms that dedicated vector databases are optimised for.
The Hidden Cost of the Dedicated Vector DB
Every dedicated vector database you add is a new operational dependency: a new infrastructure component to maintain, a new failure mode to handle, a new cost line to manage, and — critically — a new data synchronisation problem. Your product content lives in PostgreSQL. Your vector embeddings live in Pinecone. When product content changes, you need a sync pipeline that keeps the two stores consistent. Sync pipelines fail, lag, and create subtle consistency bugs that are maddening to debug.
The synchronisation problem compounds with scale. In a monolithic RAG architecture where all knowledge base content flows into a single vector index, a document update requires delete-and-re-embed in the vector store, coordinated with the corresponding update in the source system. Without exactly-once semantics and careful transaction design, you will serve retrieval results that point to stale or deleted content.
What Vectors Actually Are — And What You Actually Need
A vector embedding is a dense numerical representation of semantic meaning — a list of floats that encodes the content of a text chunk into a form that allows similarity search. The vector itself is not the data you serve; it is an index structure that helps you find the right data to serve. The distinction matters because it clarifies what storage system the vector actually belongs in.
If your documents live in a relational database, the semantic index for those documents belongs alongside them — ideally in the same system, using the same transaction guarantees, with native support for combined semantic + structured filtering. This is exactly what pgvector provides for PostgreSQL.
The Alternatives That Cover 80% of Cases
- pgvector (PostgreSQL extension) — Adds vector column types and similarity search operators (cosine, dot product, L2) directly to PostgreSQL. ACID transactions, no separate sync pipeline, full SQL filtering combined with semantic search, and operational familiarity. Handles tens of millions of vectors efficiently. Appropriate for the vast majority of enterprise RAG use cases.
- Elasticsearch / OpenSearch with vector fields — If you already run Elasticsearch for full-text search, adding a dense_vector field and kNN search capability adds semantic retrieval to your existing infrastructure without a new dependency. Particularly powerful for hybrid retrieval (BM25 keyword + semantic vector search in a single query).
- Snowflake Cortex / BigQuery vector search — For RAG applications whose source knowledge lives primarily in a data warehouse, vector search built into the warehouse eliminates the synchronisation problem entirely. Snowflake Cortex and BigQuery vector search are maturing rapidly and suitable for analytics-adjacent RAG applications.
- Redis with vector search — For latency-critical applications where sub-millisecond similarity search matters, Redis's vector search module provides in-memory performance. Appropriate when the vector index fits in memory and query speed is the dominant constraint.
Is your AI ready for production?
48-hour turnaround. No obligation.
When a Dedicated Vector Database Is Actually the Right Answer
Dedicated vector databases earn their operational overhead in specific scenarios: billion-vector scale with strict latency requirements (the indexing algorithms in Pinecone and Weaviate genuinely outperform pgvector at this scale), multi-modal embeddings where you need to index and search across text, image, and audio vectors simultaneously, and cases where your primary workload is pure semantic search with no relational filtering — making a purpose-built tool more economical than a general-purpose database with vector capabilities.
The Unified Data Layer Architecture
The principle underlying all the alternatives above is the same: keep your semantic index in the same system as your source data wherever possible. A unified data layer — where vector embeddings live alongside the records they represent — eliminates synchronisation complexity, reduces operational surface area, simplifies debugging, and often reduces cost. The penalty is slightly lower ceiling performance at extreme scale, which most production enterprise applications never approach.
Before you add a vector database to your architecture, ask one question: does my use case genuinely require billion-vector scale or sub-millisecond ANN search? If the answer is no, pgvector or your existing search stack will serve you better with significantly lower operational complexity.
GYSP's AI/ML Development practice designs RAG architectures that minimise operational complexity while meeting performance requirements — selecting vector storage strategies based on actual data scale and query patterns, not technology trends.
“The best architecture for a RAG application is usually the one with the fewest moving parts that meets the performance requirements. Adding a dedicated vector database to a workload that pgvector handles fine is adding complexity without adding value.”
— Rahul, AI/ML Delivery Head — GYSP.tech
