The .sdb File Format

Why a New Format?

Existing RDF serialization formats (Turtle, N-Triples, JSON-LD, RDF/XML) are text-based exchange formats. They can represent triples, but they cannot represent:

Pre-built indexes (SPO, POS, OSP) for fast query execution
HNSW vector indexes with their graph structure and parameters
IRI lookup table (maps full IRI strings like http://example.org/Tokyo to compact 8-byte IDs for fast indexing)
Vector predicate declarations and metadata

The .sdb format stores all of this in a single file. It's not a serialization format — it's a database file, like SQLite's .db or .sqlite.

Import/export still uses standard formats You import data into a .sdb file using standard formats (N-Triples, Turtle). You export using standard formats. The .sdb file is the live database, not an interchange format.

What's Inside a .sdb File

Component	Purpose	Structure
SPO Index	Subject → Predicate → Object lookups	Sorted B-tree of 24-byte keys (3 × u64, big-endian)
POS Index	Predicate → Object → Subject (type lookups, vector reverse traversal)	Same structure, different key order
OSP Index	Object → Subject → Predicate (reverse traversal)	Same structure, different key order
IRI Lookup Table	Maps full IRI strings to compact 8-byte IDs and back	Forward map (string → ID) + reverse map (ID → string)
HNSW Index(es)	One per declared vector predicate	Multi-layer proximity graph with node vectors, neighbor lists, and metadata
Metadata	Vector predicate declarations, index parameters (M, ef_construction, dimensions)	Key-value store

Comparison with Other Formats

vs. RDF Serialization Formats

Format	Type	Indexes	Vectors	Queryable	Use Case
.sdb	Database file	Yes (3)	Yes (HNSW)	Directly	Live database, serverless access
.ttl (Turtle)	Text exchange	No	No	Must parse first	Human-readable data exchange
.nt (N-Triples)	Text exchange	No	No	Must parse first	Streaming import, line-by-line
.jsonld (JSON-LD)	Text exchange	No	No	Must parse first	Web APIs, JavaScript
.rdf (RDF/XML)	Text exchange	No	No	Must parse first	Legacy XML systems

vs. Other Database Files

Format	Data Model	Vectors	Query Language
.sdb (SutraDB)	RDF-star triples	Native HNSW	SPARQL + vector extensions
.db / .sqlite (SQLite)	Relational tables	No	SQL
TDB2 (Jena)	RDF quads	No	SPARQL (requires Fuseki server)
.oxigraph (Oxigraph)	RDF 1.2	No	SPARQL

The .sdb Name

The extension .sdb stands for Sutra Database. We chose a distinct extension rather than reusing .db or .rdf because:

It contains more than standard RDF — vector indexes, HNSW graphs, and IRI lookup tables
It should not be confused with SQLite (.db) or RDF serialization (.rdf)
File managers and tools can associate .sdb specifically with SutraDB
It signals to users that this is a superset format — standard RDF data plus native vector infrastructure

Working with .sdb Files

# Create a new database
sutra import data.nt --data my_graph.sdb

# Query it directly (serverless)
sutra query "SELECT * WHERE { ?s ?p ?o } LIMIT 10" --data my_graph.sdb

# Start a server for HTTP access
sutra serve --port 3030 --data my_graph.sdb

# Copy/backup: it's just a file
cp my_graph.sdb backup_2024.sdb

# Export back to standard RDF
sutra export --format turtle --data my_graph.sdb > export.ttl