Architecture Overview
Code Intelligence Architecture
How TuringMind maps your repository, models trust boundaries, and prepares your codebase for autonomous reasoning.
The fundamental challenge of using AI for codebase security is context size. You cannot fit a 5-million line enterprise monorepo into an LLM's context window. TuringMind solves this by strictly separating Ingestion from Retrieval, utilizing a custom pipeline that physically maps your repository's logic.
Orchestrator
Phase 1: Ingestion (The Gobbler Pipeline)
Before an autonomous agent ever sees your code, the repository is processed by our high-throughput pipeline known as the Gobbler. This is a multi-stage deterministic compiler that breaks down your codebase.
- AST Parser: The Gobbler does not just read plain text. It parses the Abstract Syntax Tree (AST) of your codebase to extract true semantic meaning, identifying function definitions, class structures, and stripping out comments and noise.
- Logic Chunker: Standard AI models chunk text arbitrarily by character count (e.g., every 500 tokens). The Gobbler chunks code along logical boundaries, ensuring that an entire function body or class implementation stays together in the database.
- Edge Generator: The most critical step. The Gobbler resolves module imports, class inheritance, and function calls across different files, generating explicit Edges between the code chunks.
Why not standard Vector RAG?
Standard Vector RAG (Retrieval-Augmented Generation) is notoriously terrible at coding tasks. Vector DBs rely on "cosine similarity"—they find chunks of text that use similar vocabulary.
Security analysis doesn't care about vocabulary; it cares about Control Flow. If `Function A` calls `Function B` which calls vulnerable `Function C`, a Vector DB cannot connect them because they don't share vocabulary.
Our Semantic Graph Index solves this. By explicitly mapping Nodes (Functions) to Edges (Calls), the LangGraph agent can mathematically traverse the call stack up or down, proving exactly how data flows through your application.
Phase 2: Retrieval (RepoChatIndex)
The RepoChatIndex is TuringMind's retrieval interface — the live query layer that sits between the LangGraph Orchestrator and the Semantic Graph Index. With the codebase fully indexed as a graph, our Security Orchestrators can iteratively traverse it without pulling the entire repository into context.
Conclusive → Synthesize
The Iterative Investigation Loop
Instead of reading the whole repo, the Orchestrator uses the turingmind_qna_tool to iteratively traverse the Semantic Graph Index.
- It asks: "Find all usages of `left-pad`" and receives the specific file nodes.
- It follows up: "Get the implementation of the parent function calling it."
- It pivots: "Trace the inputs of this function back to an API endpoint."
This architecture proves that our agents don't rely on LLM hallucinations; they rely on deterministic, highly-scalable graph retrieval, enabling them to confidently prove false positive CVEs across massive monorepos.
Continue reading
The Agent Fleet
Meet the four specialized state machines that run on top of this architecture.
Connect your repo and eliminate 95% of SCA noise today.
Book a technical deep-dive