What they are, why they matter, and how SutraDB combines them into one system.
Most developers know SQL databases (PostgreSQL, MySQL, SQLite). Data lives in tables with rows and columns. This works great when your data fits neatly into a spreadsheet shape. But some data doesn't.
| person_id | name | knows_id |
|-----------|-------|----------|
| 1 | Alice | 2 |
| 2 | Bob | 3 |
| 3 | Carol | 1 |
Relationships require JOIN operations. Following a chain of connections (who does Alice know, and who do they know?) requires nested JOINs that get exponentially slower.
Alice knows Bob .
Bob knows Carol .
Carol knows Alice .
Alice type Person .
Bob type Person .
Relationships are the data. Following a chain of any length is a single query. No JOINs, no schema to define upfront. Add new relationship types anytime.
| Your data looks like... | Use |
|---|---|
| Spreadsheets, invoices, accounting | SQL (PostgreSQL, SQLite) |
| Documents, nested JSON, catalogs | Document DB (MongoDB) |
| Relationships, networks, knowledge, ontologies | Graph DB (SutraDB) |
| Embeddings, similarity search, recommendations | Vector DB (SutraDB) |
MongoDB stores documents — nested JSON objects. It's schema-flexible and fast for key-based lookups. But it goes in almost the opposite direction from a graph database:
{
"name": "Alice",
"knows": ["Bob", "Carol"],
"interests": ["ML", "RDF"]
}
Data is denormalized — everything about Alice is in one document. Traversing relationships means looking up other documents by reference. No native relationship traversal.
Alice knows Bob .
Alice knows Carol .
Alice interest ML .
Alice interest RDF .
Bob knows Carol .
Bob interest Databases .
Data is normalized into relationships. "Alice knows Bob and Bob knows Carol" is a two-hop traversal in one query. The shape of the data IS the query path.
MongoDB optimizes for "give me everything about this one document." Graph databases optimize for "follow connections between things." These are fundamentally different access patterns.
A vector database stores embeddings — arrays of numbers that represent the "meaning" of text, images, or other data in a high-dimensional space. Things with similar meanings have similar vectors.
-- These vectors are close together (similar meaning):
"king" → [0.82, 0.11, -0.03, 0.45, ...] (1024 numbers)
"monarch" → [0.79, 0.13, -0.01, 0.42, ...]
"queen" → [0.75, 0.15, 0.08, 0.41, ...]
-- This vector is far away (different meaning):
"bicycle" → [-0.12, 0.67, 0.33, -0.21, ...]
Vector databases let you ask: "Find me things similar to this." This is how semantic search, recommendation engines, and RAG (retrieval-augmented generation) work.
Standalone vector databases (Qdrant, Pinecone, Weaviate) store vectors and let you search them. But they don't understand relationships between the things those vectors represent.
Today, if you want both graph traversal and vector similarity, you need two separate databases:
Then your application has to:
In SutraDB, a vector embedding is just another triple — a relationship between an entity and its embedding:
-- Regular graph data:
:paper42 rdf:type :Paper .
:paper42 :title "Attention Is All You Need" .
:paper42 :author :Vaswani .
:paper42 :cites :paper17 .
-- The embedding is also a triple:
:paper42 :hasEmbedding "0.23 -0.11 0.87 ..."^^sutra:f32vec .
The vector is stored in the same database, indexed by the same system, and queryable in the same language. No sync, no translation, no second database.
The unique capability this unlocks is vector hopping — traversing between graph space and vector space within a single query. We call these "wormhole queries" because they let you jump through a shortcut in the data space.
Imagine you have a knowledge graph of academic papers with their relationships (citations, authors, topics) and their embeddings (semantic meaning of the paper's content).
SELECT ?paper ?title WHERE {
VECTOR_SIMILAR(?paper :hasEmbedding "..."^^sutra:f32vec, 0.8)
?paper :cites ?cited .
?cited :author :Vaswani .
?paper :title ?title .
} ORDER BY DESC(VECTOR_SCORE(?paper :hasEmbedding "..."^^sutra:f32vec))
LIMIT 10
One query. Graph traversal and vector search interleaved. The query planner decides the optimal execution order.
| Pattern | Description | Example |
|---|---|---|
| Vector → Graph | Start with similarity search, then traverse graph | "Find entities similar to X, then get their properties" |
| Graph → Vector | Traverse graph first, then filter by vector similarity | "Get all Papers by this author, rank by similarity to my query" |
| Graph → Vector → Graph | Traverse, hop through vector space, traverse again | "Find this author's papers, find similar papers, get their authors" |
| Vector → Graph → Vector | Search, traverse, search again | "Find similar entities, follow links, find similar neighbors" |
An important property: in SutraDB, multiple entities can share the same vector. This enables automatic entity resolution through vector search.
-- Two different entities, same word, same embedding:
:bank_financial :hasEmbedding "0.5 0.3 ..."^^sutra:f32vec .
:bank_river :hasEmbedding "0.5 0.3 ..."^^sutra:f32vec .
-- A vector search for "bank" returns BOTH entities.
-- Then graph patterns disambiguate:
SELECT ?entity ?type WHERE {
VECTOR_SIMILAR(?entity :hasEmbedding "..."^^sutra:f32vec, 0.9)
?entity rdf:type ?type .
}
-- Returns: bank_financial (type: FinancialInstitution)
-- bank_river (type: GeographicFeature)
The vector search finds candidates; the graph structure resolves ambiguity. This is how you combine the fuzzy power of embeddings with the precise structure of a knowledge graph.
SutraDB uses HNSW (Hierarchical Navigable Small World) for vector search. Here's the intuition:
This gives O(log n) search time instead of O(n) brute force. For 1 million vectors, that's ~20 comparisons instead of 1 million.
Key parameters you can tune per predicate:
| Parameter | Effect | Default |
|---|---|---|
M | Connections per node. Higher = better recall, more memory | 16 |
ef_construction | Build-time beam width. Higher = better index quality, slower builds | 200 |
ef_search | Query-time beam width. Tunable per query via ef:=N hint | 200 |