A vector database stores text (or image/audio) as numerical embedding vectors and retrieves the most semantically similar entries to a query vector using fast ANN (Approximate Nearest Neighbour) algorithms — enabling "meaning-based" search rather than keyword matching.
Core Operations
- Embed: Convert text → vector (via embedding model)
- Index: Build an ANN index (HNSW, IVF, FAISS) for fast similarity search
- Query: Embed the query → retrieve top-K vectors by cosine or dot-product similarity
- Filter: Apply metadata filters (date range, author, category) alongside vector search
Popular Vector Databases for On-Premise
| Tool | Self-hosted | Indexing | Best for |
|---|---|---|---|
| Chroma | Yes (Python, SQLite) | HNSW | Development, small datasets (<1M docs) |
| Qdrant | Yes (Rust, Docker) | HNSW | Production, filtering, large scale |
| Milvus | Yes (distributed) | HNSW, IVF, DiskANN | Enterprise, billion-scale |
| pgvector | PostgreSQL extension | IVF, HNSW | If already on PostgreSQL |
| Weaviate | Yes (Docker) | HNSW | Built-in modules (QA, classification) |
ANN Index Algorithms
HNSW (Hierarchical NSW)
Graph-based. Best query speed. High memory usage. Ideal for <50M vectors. Chroma and Qdrant default.
IVF (Inverted File)
Clusters vectors into Voronoi cells. Lower memory than HNSW. Query searches nearby clusters only. pgvector default.
Why It Matters for On-Premise
For a fully air-gapped RAG pipeline: embed locally with BGE-M3 → store in local Qdrant (Docker) → query locally. No external API calls. Qdrant persists to disk and handles millions of documents on a single server. For development, Chroma requires zero infra — it runs in-process as a Python library with a persistent SQLite backend.