← Home

sutraDB Theory

Interactive visualizations of database theory — from how graph and vector databases work individually, to the innovations that make sutraDB different.

Background: How Databases Work

How Graph Databases Work
SPARQL traverses RDF triples to answer multi-hop relationship queries. Watch a query find great-grandfathers step by step.
RDFSPARQL
How Vector Databases Work
HNSW graph traversal finds nearest neighbors by hopping through hierarchical layers. See greedy search in action.
HNSWEmbeddings
Traditional Hybrid Databases
The clunky handoff when a vector database and graph database are bolted together through JSON documents.
System BoundariesJSON Handoff
sutraDB: Unified Vector-Graph
Interleaved traversal — graph hops happen during the vector search, not after it. One graph, zero context switches.
sutraDBInterleaved Traversal

Innovations

Ontochronology: Time as a Structural Axis
Time is not metadata on triples — it is a first-class index dimension. Query the complete world state at any moment with a single range scan.
NovelTSPO IndexWorld State
Implementing HNSW in RDF
Vectors as typed RDF literals, HNSW neighbors as virtual triples. The vector index becomes the 4th index alongside SPO/POS/OSP, queryable by SPARQL.
Novelsutra:f32vecsutra:hnswNeighbor
Subgraph Indexing for SIMD
Auto-discover repeated subgraph patterns and flatten them into columnar pseudo-tables. Turns multi-hop joins into SIMD-accelerated column scans.
NovelPseudo-TablesAVX2
PageRank Entry Points & Traversal Counters
Use PageRank to pick structurally important starting nodes, and runtime traversal counters to materialize adjacency lists for hot areas.
NovelAdaptive Indexing
SPARQL Exit Conditions (UNTIL)
Extend SPARQL property paths with per-step exit predicates. Terminate traversal early instead of exhaustively scanning the graph.
NovelSPARQL+

Architecture Deep Dives

Four-Index Architecture
HNSW as a first-class index type alongside SPO/POS/OSP. The query planner sees all four and picks the best access path.
SPOPOSOSPHNSW
IRI Interning & Content-Addressed RDF-star
Every IRI becomes 8 bytes. Quoted triples get deterministic hash IDs. Single-instruction comparison, SIMD-friendly index keys.
u64 TermIdsRDF-star
SIMD Acceleration
AVX2 and SSE intrinsics for vector distance functions (8 f32/cycle) and columnar TermId scanning (4 u64/cycle). And where SIMD doesn't help.
AVX2SSEDistance Functions
HNSW Implementation Details
Tombstone deletion, multiple entry points, ephemeral rebuild with optional snapshots, and concurrent search design.
TombstonesConcurrency
Cost-Based Query Planning
Structural heuristic weights, cardinality estimation, predicate pushdown, and how VECTOR_SIMILAR integrates with pattern reordering.
Query PlannerCardinality
Pseudo-Table Discovery
Statistical significance testing for characteristic sets, segment-level zonemaps, cliff steepness as a data health metric.
ZonemapsData Quality
VECTOR_SIMILAR & VECTOR_SCORE
Vector similarity search as SPARQL operators that participate in query planning, not a separate API with JSON handoffs.
SPARQL+HNSW
RDF-star Edge Annotation
Metadata on edges without reification. Vector embeddings on relationships. Content-addressed quoted triple IDs.
RDF-starEdge Embeddings
Client-Side OWL Validation
Store first, reason second. The database accepts all triples. SDKs validate OWL constraints before insertion.
OWLSDK
Health Diagnostics
AI-readable health reports with HNSW metrics, pseudo-table coverage, and actionable maintenance recommendations.
Agent-FirstObservability
Rhizomatic Description Framework
RDF as a rhizome: why the Resource Description Framework is accidentally the most Deleuzian data structure ever built. No roots, no hierarchy, just connections.
PhilosophyRDFDeleuze