Benchmarks

Criterion benchmarks run daily and on each release. Charts show daily averages with release milestones marked. All numbers from GitHub Actions (ubuntu-latest).

Loading latest results...

sutra-core — Triple Store — Lower is better

The storage engine: how fast can we insert, look up, and scan triples? These benchmarks measure the SPO/POS/OSP index performance, IRI interning, pseudo-table discovery (auto-detected relational structure), and SIMD column scans. All values are nanoseconds per operation.

What each benchmark measures

triple_insert_single: Insert one triple into an empty store. Measures raw index write speed.
triple_bulk_insert/N: Insert N triples (100/1K/10K) into a fresh store. Measures throughput scaling.
triple_lookup_subject/graph_size/N: Find all triples for a specific subject in a graph of N nodes. Like SELECT * FROM table WHERE id = ? — this is the star-query pattern, the most common graph lookup.
triple_lookup_predicate_5k: Find all triples with a given predicate across 5K triples. Like scanning a column in SQL.
triple_contains_10k: Check if a specific triple exists in a 10K-triple store. Point lookup — should be O(1).
triple_remove_single: Remove one triple from a 1K-triple store. Measures deletion overhead.
adjacency_star_500: Get the adjacency list for a node connected to 500 neighbors. The core graph traversal primitive.
term_dictionary_intern_10k: Intern 10K unique IRIs into the dictionary. Measures the string→u64 mapping that makes everything else fast.
pseudotable_discover/N: Auto-discover relational (table-like) structure from N nodes that share the same predicate pattern. Like schema inference on a schemaless store.
pseudotable_scan_eq_5k: Scan a pseudo-table column for a specific value across 5K rows. Equivalent to WHERE category = ? in SQL. Uses SIMD when available.
reverse_lookup_10k: Find all subjects that reference a specific object across 10K triples. Like a reverse foreign key lookup in SQL — uses the OSP index.

Benchmark	ns/iter	±variance

sutra-hnsw — Vector Index — Lower is better

The HNSW approximate nearest neighbor index: how fast can we insert vectors, search for similar ones, and how does it handle deletions and high dimensions? These are the numbers that matter if you're storing embeddings.

What each benchmark measures

hnsw_insert/Dd/N: Insert N vectors of D dimensions. Measures index build time. 128d is typical for small models, 384d for sentence-transformers.
hnsw_search/nN_Dd_efE/k10: Search for 10 nearest neighbors in an index of N vectors at D dimensions, with search beam width E. Higher ef = more accurate but slower. This is the primary vector search operation.
hnsw_search_k/5k_128d/K: Vary k (1 to 100) on a fixed 5K-vector index. Shows how returning more results affects latency.
hnsw_delete_then_search/P%: Search after deleting P% of vectors (tombstoned, not removed). Measures search degradation under deletions — this is why the health report tracks tombstone ratio.
hnsw_bulk_insert/128d/N: Bulk insert N vectors at once (vs one-by-one). Measures batch optimization.
hnsw_metrics/Cosine|Euclidean|DotProduct: Search with each distance metric on the same data. Shows the cost difference between similarity measures.
hnsw_high_dim/search_500n_Dd: Search at 768d and 1536d — real embedding sizes from OpenAI, Cohere, etc. Measures how high dimensions affect search latency.
hnsw_high_dim/insert_100n_Dd: Insert at 768d and 1536d. High-dimensional insertion is significantly more expensive.
hnsw_recall_at_10/1k_128d: Quality benchmark, not speed: what fraction of the true top-10 nearest neighbors does HNSW actually find? Computed by brute-force comparison. Measures search accuracy.

Benchmark	ns/iter	±variance

Search Quality — Higher is better

Recall measures how many of the true nearest neighbors HNSW actually finds, compared to brute-force. 1.0 = perfect, meaning all 10 true nearest neighbors were returned.

Benchmark	Recall@10

sutra-sparql — Query Engine — Lower is better

End-to-end SPARQL query performance: parsing, planning, and execution. Includes pure graph traversal, vector similarity search, and the combined graph+vector queries that are SutraDB's core differentiator. All values are nanoseconds per query.

What each benchmark measures

sparql_parse/simple_select: Parse a basic SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10. Measures parser overhead.
sparql_parse/complex_with_filter: Parse a query with multiple patterns, FILTER, ORDER BY, and LIMIT. Worst-case parse time.
sparql_parse/vector_similar: Parse a VECTOR_SIMILAR query (SutraDB extension). Measures the cost of parsing vector literals.
sparql_chain_traversal/N_nodes/H_hops: Traverse a chain graph (node→node→node...) for H hops across N nodes. The fundamental graph walk. 4 hops on 200 nodes = following 4 edges.
sparql_star_join/N_leaves/C_cats: Star join: from a center node, fan out to N leaves, then join on type and category. Like a SQL SELECT ... JOIN ... WHERE type = 'Leaf'.
sparql_vector_search/docs/N: VECTOR_SIMILAR search across N documents with type filtering. The basic "find similar documents" query.
sparql_graph_vector/cite_chain/N: Combined graph+vector: find papers similar to a query vector that also cite other papers. This is the query pattern that justifies putting vectors and graphs in the same database.
sparql_optional_1k: OPTIONAL (left outer join) across 1K nodes where half have the optional property. Like LEFT JOIN in SQL.
sparql_filter_gt_2k: FILTER with integer comparison across 2K items. Like WHERE score > 80 in SQL.

Benchmark	ns/iter	±variance

View full benchmark history on GitHub
View latest raw results