RAG and search engines: an unexpected parallelism

A professional with experience in data engineering and now active in the field of Large Language Models (LLMs) has highlighted a conceptual similarity between search engines like Google or Brave and vector stores, both used in the context of Retrieval Augmented Generation (RAG). The main difference lies in the scale.

Elasticsearch and OpenSearch: valid allies for RAG?

Elasticsearch and OpenSearch, based on Lucene, prove to be powerful tools for retrieval tasks. It is possible to integrate small BERT models (around 100 MB in FP32) directly within Elasticsearch or OpenSearch, running them on the CPU, to obtain vector embedding functionality.

BERT and small datasets

For relatively small datasets (less than 10,000 documents) and with good variance, a small BERT model may be sufficient. In some cases, embeddings can even be avoided altogether. However, for deeper semantic similarity or for closely related documents, it is preferable to use more powerful embedding models. For those evaluating on-premise deployments, there are trade-offs in adopting different architectures. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these options.