Vector Database – LLM Glossary

A vector database stores text (or image/audio) as numerical embedding vectors and retrieves the most semantically similar entries to a query vector using fast ANN (Approximate Nearest Neighbour) algorithms — enabling "meaning-based" search rather than keyword matching.

Core Operations

Embed: Convert text → vector (via embedding model)
Index: Build an ANN index (HNSW, IVF, FAISS) for fast similarity search
Query: Embed the query → retrieve top-K vectors by cosine or dot-product similarity
Filter: Apply metadata filters (date range, author, category) alongside vector search

Popular Vector Databases for On-Premise

Tool	Self-hosted	Indexing	Best for
Chroma	Yes (Python, SQLite)	HNSW	Development, small datasets (<1M docs)
Qdrant	Yes (Rust, Docker)	HNSW	Production, filtering, large scale
Milvus	Yes (distributed)	HNSW, IVF, DiskANN	Enterprise, billion-scale
pgvector	PostgreSQL extension	IVF, HNSW	If already on PostgreSQL
Weaviate	Yes (Docker)	HNSW	Built-in modules (QA, classification)

ANN Index Algorithms

HNSW (Hierarchical NSW)

Graph-based. Best query speed. High memory usage. Ideal for <50M vectors. Chroma and Qdrant default.

IVF (Inverted File)

Clusters vectors into Voronoi cells. Lower memory than HNSW. Query searches nearby clusters only. pgvector default.

Why It Matters for On-Premise

For a fully air-gapped RAG pipeline: embed locally with BGE-M3 → store in local Qdrant (Docker) → query locally. No external API calls. Qdrant persists to disk and handles millions of documents on a single server. For development, Chroma requires zero infra — it runs in-process as a Python library with a persistent SQLite backend.