Benchmarks

Criterion benchmarks run daily and on each release. Charts show daily averages with release milestones marked. All numbers from GitHub Actions (ubuntu-latest).

Loading latest results...

sutra-core — Triple Store — Lower is better

The storage engine: how fast can we insert, look up, and scan triples? These benchmarks measure the SPO/POS/OSP index performance, IRI interning, pseudo-table discovery (auto-detected relational structure), and SIMD column scans. All values are nanoseconds per operation.

What each benchmark measures
triple_insert_single
Insert one triple into an empty store. Measures raw index write speed.
triple_bulk_insert/N
Insert N triples (100/1K/10K) into a fresh store. Measures throughput scaling.
triple_lookup_subject/graph_size/N
Find all triples for a specific subject in a graph of N nodes. Like SELECT * FROM table WHERE id = ? — this is the star-query pattern, the most common graph lookup.
triple_lookup_predicate_5k
Find all triples with a given predicate across 5K triples. Like scanning a column in SQL.
triple_contains_10k
Check if a specific triple exists in a 10K-triple store. Point lookup — should be O(1).
triple_remove_single
Remove one triple from a 1K-triple store. Measures deletion overhead.
adjacency_star_500
Get the adjacency list for a node connected to 500 neighbors. The core graph traversal primitive.
term_dictionary_intern_10k
Intern 10K unique IRIs into the dictionary. Measures the string→u64 mapping that makes everything else fast.
pseudotable_discover/N
Auto-discover relational (table-like) structure from N nodes that share the same predicate pattern. Like schema inference on a schemaless store.
pseudotable_scan_eq_5k
Scan a pseudo-table column for a specific value across 5K rows. Equivalent to WHERE category = ? in SQL. Uses SIMD when available.
reverse_lookup_10k
Find all subjects that reference a specific object across 10K triples. Like a reverse foreign key lookup in SQL — uses the OSP index.
Benchmarkns/iter±variance

sutra-hnsw — Vector Index — Lower is better

The HNSW approximate nearest neighbor index: how fast can we insert vectors, search for similar ones, and how does it handle deletions and high dimensions? These are the numbers that matter if you're storing embeddings.

What each benchmark measures
hnsw_insert/Dd/N
Insert N vectors of D dimensions. Measures index build time. 128d is typical for small models, 384d for sentence-transformers.
hnsw_search/nN_Dd_efE/k10
Search for 10 nearest neighbors in an index of N vectors at D dimensions, with search beam width E. Higher ef = more accurate but slower. This is the primary vector search operation.
hnsw_search_k/5k_128d/K
Vary k (1 to 100) on a fixed 5K-vector index. Shows how returning more results affects latency.
hnsw_delete_then_search/P%
Search after deleting P% of vectors (tombstoned, not removed). Measures search degradation under deletions — this is why the health report tracks tombstone ratio.
hnsw_bulk_insert/128d/N
Bulk insert N vectors at once (vs one-by-one). Measures batch optimization.
hnsw_metrics/Cosine|Euclidean|DotProduct
Search with each distance metric on the same data. Shows the cost difference between similarity measures.
hnsw_high_dim/search_500n_Dd
Search at 768d and 1536d — real embedding sizes from OpenAI, Cohere, etc. Measures how high dimensions affect search latency.
hnsw_high_dim/insert_100n_Dd
Insert at 768d and 1536d. High-dimensional insertion is significantly more expensive.
hnsw_recall_at_10/1k_128d
Quality benchmark, not speed: what fraction of the true top-10 nearest neighbors does HNSW actually find? Computed by brute-force comparison. Measures search accuracy.
Benchmarkns/iter±variance

Search Quality — Higher is better

Recall measures how many of the true nearest neighbors HNSW actually finds, compared to brute-force. 1.0 = perfect, meaning all 10 true nearest neighbors were returned.

BenchmarkRecall@10

sutra-sparql — Query Engine — Lower is better

End-to-end SPARQL query performance: parsing, planning, and execution. Includes pure graph traversal, vector similarity search, and the combined graph+vector queries that are SutraDB's core differentiator. All values are nanoseconds per query.

What each benchmark measures
sparql_parse/simple_select
Parse a basic SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10. Measures parser overhead.
sparql_parse/complex_with_filter
Parse a query with multiple patterns, FILTER, ORDER BY, and LIMIT. Worst-case parse time.
sparql_parse/vector_similar
Parse a VECTOR_SIMILAR query (SutraDB extension). Measures the cost of parsing vector literals.
sparql_chain_traversal/N_nodes/H_hops
Traverse a chain graph (node→node→node...) for H hops across N nodes. The fundamental graph walk. 4 hops on 200 nodes = following 4 edges.
sparql_star_join/N_leaves/C_cats
Star join: from a center node, fan out to N leaves, then join on type and category. Like a SQL SELECT ... JOIN ... WHERE type = 'Leaf'.
sparql_vector_search/docs/N
VECTOR_SIMILAR search across N documents with type filtering. The basic "find similar documents" query.
sparql_graph_vector/cite_chain/N
Combined graph+vector: find papers similar to a query vector that also cite other papers. This is the query pattern that justifies putting vectors and graphs in the same database.
sparql_optional_1k
OPTIONAL (left outer join) across 1K nodes where half have the optional property. Like LEFT JOIN in SQL.
sparql_filter_gt_2k
FILTER with integer comparison across 2K items. Like WHERE score > 80 in SQL.
Benchmarkns/iter±variance
View full benchmark history on GitHub
View latest raw results