Vector Database

Infrastructure

A specialised database that stores high-dimensional embedding vectors and enables fast approximate nearest-neighbour (ANN) search — the backbone of RAG pipelines.

A vector database stores text (or image/audio) as numerical embedding vectors and retrieves the most semantically similar entries to a query vector using fast ANN (Approximate Nearest Neighbour) algorithms — enabling "meaning-based" search rather than keyword matching.

Core Operations

  1. Embed: Convert text → vector (via embedding model)
  2. Index: Build an ANN index (HNSW, IVF, FAISS) for fast similarity search
  3. Query: Embed the query → retrieve top-K vectors by cosine or dot-product similarity
  4. Filter: Apply metadata filters (date range, author, category) alongside vector search

Popular Vector Databases for On-Premise

ToolSelf-hostedIndexingBest for
ChromaYes (Python, SQLite)HNSWDevelopment, small datasets (<1M docs)
QdrantYes (Rust, Docker)HNSWProduction, filtering, large scale
MilvusYes (distributed)HNSW, IVF, DiskANNEnterprise, billion-scale
pgvectorPostgreSQL extensionIVF, HNSWIf already on PostgreSQL
WeaviateYes (Docker)HNSWBuilt-in modules (QA, classification)

ANN Index Algorithms

HNSW (Hierarchical NSW)

Graph-based. Best query speed. High memory usage. Ideal for <50M vectors. Chroma and Qdrant default.

IVF (Inverted File)

Clusters vectors into Voronoi cells. Lower memory than HNSW. Query searches nearby clusters only. pgvector default.

Why It Matters for On-Premise

For a fully air-gapped RAG pipeline: embed locally with BGE-M3 → store in local Qdrant (Docker) → query locally. No external API calls. Qdrant persists to disk and handles millions of documents on a single server. For development, Chroma requires zero infra — it runs in-process as a Python library with a persistent SQLite backend.