Navigating Code with AI: Semantic Graphs with LLMs Outperform Embeddings

AI-assisted coding tools promise to revolutionize software development, but their effectiveness is often limited by their ability to efficiently understand and retrieve context from complex codebases. The main challenge lies in creating "persistent structured memory" for code, moving beyond simply re-reading the entire repository in each session, an approach that wastes resources and tokens. A team of developers has shared their findings after a year of experimentation, highlighting how traditional retrieval methods proved insufficient, while an approach based on knowledge graphs enriched by Large Language Models (LLMs) demonstrated superior effectiveness.

Current industry discourse tends to oversimplify solutions, often suggesting the use of "just embeddings" or "just Tree-sitter." However, practical experience reveals that these techniques, while useful in other contexts, have significant limitations when applied to deep code understanding. For companies evaluating AI solutions for development, understanding these trade-offs is crucial for optimizing Total Cost of Ownership (TCO) and ensuring data sovereignty, central aspects for AI-RADAR.

The Challenges of Semantic Code Retrieval

Experiments have highlighted the shortcomings of two widely discussed approaches. The first, using vector embeddings on code chunks, proved ineffective. Functions with similar names, such as a process() in a payments service and a process() in an image pipeline, embed to similar vectors due to token similarity. However, these functions have no semantic or contextual relationship. Vectors tend to "flatten" fundamental structural relationships in code, such as call graphs, inheritance, and imports, making retrieval precision too low to be useful. This approach was entirely abandoned.

The second method tested was Abstract Syntax Tree (AST) parsing via Tree-sitter. This tool is fast and precise in providing the syntactic structure of code, identifying the existence of a function and its calls. However, it cannot extract meaning or business context. It cannot, for example, indicate that "this function handles webhook retries for failed Stripe payments." For answering questions phrased in business language, pure AST falls short, lacking the ability to bridge the gap between syntax and semantics.

The Solution: LLM-Enriched Knowledge Graphs

The solution that proved effective is based on a per-file analysis performed by an LLM. This process generates a purpose, a summary, and a business context for each file, which are then stored as nodes in a Neo4j graph. The edges of this graph connect the nodes to classes, functions, keywords, and imports. Retrieval then occurs via full-text search across these semantic fields, rather than via vector similarity.

This approach aligns with recent academic findings. Studies like RepoGraph (ICLR 2025) have shown a +32.8% improvement on SWE-bench with graph-based methodologies, while Code-Craft reported an 82% increase in top-1 retrieval precision using bottom-up LLM summaries from code graphs. The main trade-off of this methodology is the initial indexing cost, as each file requires an LLM call. However, the use of an SHA-256 based diffing system allows reindexing only changed files, making the process manageable over time.

Implications for On-Premise Deployments and Data Sovereignty

The proposed solution, named Bytebell, has been released as Open Source and stands out for its orientation towards self-hosted deployments. The architecture includes a local Bun daemon and leverages the customer's own infrastructure (BYO infra), with storage based on Neo4j and MongoDB. External calls for per-file analysis are directed to OpenRouter, but the system binds to the 127.0.0.1 interface, offering the possibility to route these requests to a local model, thus ensuring complete data control.

This deployment model is particularly relevant for CTOs, DevOps leads, and infrastructure architects who prioritize data sovereignty, compliance, and TCO management. The ability to keep code and its analyses within the local environment, potentially even in air-gapped configurations, eliminates the risks associated with sending source code to external cloud services. While Bytebell is not a multi-tenant product or a chat UI, its emphasis on a local architecture and granular semantic context management positions it as a promising solution for organizations seeking self-hosted alternatives for AI/LLM workloads. For those evaluating on-premise deployments, AI-RADAR offers analytical frameworks on /llm-onpremise to assess the trade-offs between control, costs, and performance.

Navigating Code with AI: Semantic Graphs with LLMs Outperform Embeddings