← Home

sutraDB Theory

Interactive visualizations of database theory — from how graph and vector databases work individually, to the innovations that make sutraDB different.

Background: How Databases Work

How Graph Databases Work

SPARQL traverses RDF triples to answer multi-hop relationship queries. Watch a query find great-grandfathers step by step.

How Vector Databases Work

HNSW graph traversal finds nearest neighbors by hopping through hierarchical layers. See greedy search in action.

Traditional Hybrid Databases

The clunky handoff when a vector database and graph database are bolted together through JSON documents.

System BoundariesJSON Handoff

sutraDB: Unified Vector-Graph

Interleaved traversal — graph hops happen during the vector search, not after it. One graph, zero context switches.

sutraDBInterleaved Traversal

Innovations

Ontochronology: Time as a Structural Axis

Time is not metadata on triples — it is a first-class index dimension. Query the complete world state at any moment with a single range scan.

NovelTSPO IndexWorld State

Implementing HNSW in RDF

Vectors as typed RDF literals, HNSW neighbors as virtual triples. The vector index becomes the 4th index alongside SPO/POS/OSP, queryable by SPARQL.

Novelsutra:f32vecsutra:hnswNeighbor

Subgraph Indexing for SIMD

Auto-discover repeated subgraph patterns and flatten them into columnar pseudo-tables. Turns multi-hop joins into SIMD-accelerated column scans.

NovelPseudo-TablesAVX2

PageRank Entry Points & Traversal Counters

Use PageRank to pick structurally important starting nodes, and runtime traversal counters to materialize adjacency lists for hot areas.

NovelAdaptive Indexing

SPARQL Exit Conditions (UNTIL)

Extend SPARQL property paths with per-step exit predicates. Terminate traversal early instead of exhaustively scanning the graph.

Architecture Deep Dives

Four-Index Architecture

HNSW as a first-class index type alongside SPO/POS/OSP. The query planner sees all four and picks the best access path.

IRI Interning & Content-Addressed RDF-star

Every IRI becomes 8 bytes. Quoted triples get deterministic hash IDs. Single-instruction comparison, SIMD-friendly index keys.

u64 TermIdsRDF-star

SIMD Acceleration

AVX2 and SSE intrinsics for vector distance functions (8 f32/cycle) and columnar TermId scanning (4 u64/cycle). And where SIMD doesn't help.

AVX2SSEDistance Functions

HNSW Implementation Details

Tombstone deletion, multiple entry points, ephemeral rebuild with optional snapshots, and concurrent search design.

TombstonesConcurrency

Cost-Based Query Planning

Structural heuristic weights, cardinality estimation, predicate pushdown, and how VECTOR_SIMILAR integrates with pattern reordering.

Query PlannerCardinality

Pseudo-Table Discovery

Statistical significance testing for characteristic sets, segment-level zonemaps, cliff steepness as a data health metric.

ZonemapsData Quality

VECTOR_SIMILAR & VECTOR_SCORE

Vector similarity search as SPARQL operators that participate in query planning, not a separate API with JSON handoffs.

RDF-star Edge Annotation

Metadata on edges without reification. Vector embeddings on relationships. Content-addressed quoted triple IDs.

RDF-starEdge Embeddings

Client-Side OWL Validation

Store first, reason second. The database accepts all triples. SDKs validate OWL constraints before insertion.

Health Diagnostics

AI-readable health reports with HNSW metrics, pseudo-table coverage, and actionable maintenance recommendations.

Agent-FirstObservability

Rhizomatic Description Framework

RDF as a rhizome: why the Resource Description Framework is accidentally the most Deleuzian data structure ever built. No roots, no hierarchy, just connections.

PhilosophyRDFDeleuze