Graph + Vector Databases Explained

If You're Coming from SQL

Most developers know SQL databases (PostgreSQL, MySQL, SQLite). Data lives in tables with rows and columns. This works great when your data fits neatly into a spreadsheet shape. But some data doesn't.

SQL Database (Tables)

| person_id | name  | knows_id |
|-----------|-------|----------|
| 1         | Alice | 2        |
| 2         | Bob   | 3        |
| 3         | Carol | 1        |

Relationships require JOIN operations. Following a chain of connections (who does Alice know, and who do they know?) requires nested JOINs that get exponentially slower.

Graph Database (Triples)

Alice  knows  Bob .
Bob    knows  Carol .
Carol  knows  Alice .
Alice  type   Person .
Bob    type   Person .

Relationships are the data. Following a chain of any length is a single query. No JOINs, no schema to define upfront. Add new relationship types anytime.

When to use what

Your data looks like...	Use
Spreadsheets, invoices, accounting	SQL (PostgreSQL, SQLite)
Documents, nested JSON, catalogs	Document DB (MongoDB)
Relationships, networks, knowledge, ontologies	Graph DB (SutraDB)
Embeddings, similarity search, recommendations	Vector DB (SutraDB)

If You're Coming from MongoDB

MongoDB stores documents — nested JSON objects. It's schema-flexible and fast for key-based lookups. But it goes in almost the opposite direction from a graph database:

MongoDB (Documents)

{
  "name": "Alice",
  "knows": ["Bob", "Carol"],
  "interests": ["ML", "RDF"]
}

Data is denormalized — everything about Alice is in one document. Traversing relationships means looking up other documents by reference. No native relationship traversal.

SutraDB (Graph)

Alice knows Bob .
Alice knows Carol .
Alice interest ML .
Alice interest RDF .
Bob knows Carol .
Bob interest Databases .

Data is normalized into relationships. "Alice knows Bob and Bob knows Carol" is a two-hop traversal in one query. The shape of the data IS the query path.

MongoDB optimizes for "give me everything about this one document." Graph databases optimize for "follow connections between things." These are fundamentally different access patterns.

What Is a Vector Database?

A vector database stores embeddings — arrays of numbers that represent the "meaning" of text, images, or other data in a high-dimensional space. Things with similar meanings have similar vectors.

-- These vectors are close together (similar meaning):
"king"     → [0.82, 0.11, -0.03, 0.45, ...]  (1024 numbers)
"monarch"  → [0.79, 0.13, -0.01, 0.42, ...]
"queen"    → [0.75, 0.15,  0.08, 0.41, ...]

-- This vector is far away (different meaning):
"bicycle"  → [-0.12, 0.67, 0.33, -0.21, ...]

Vector databases let you ask: "Find me things similar to this." This is how semantic search, recommendation engines, and RAG (retrieval-augmented generation) work.

Standalone vector databases (Qdrant, Pinecone, Weaviate) store vectors and let you search them. But they don't understand relationships between the things those vectors represent.

The Problem: Two Databases

Today, if you want both graph traversal and vector similarity, you need two separate databases:

A triplestore (Apache Jena, Oxigraph, Neo4j) for your knowledge graph
A vector database (Qdrant, Pinecone, Weaviate) for your embeddings

Then your application has to:

Keep them in sync (insert into both, delete from both)
Translate IDs between systems
Execute multi-step queries manually: query one database, take the results, query the other
Handle consistency problems when one update succeeds but the other fails

This is the problem SutraDB solves Instead of bolting a vector database onto a graph database (or vice versa), SutraDB makes vectors a native part of the graph. One database, one query, one source of truth.

SutraDB: One Database for Both

In SutraDB, a vector embedding is just another triple — a relationship between an entity and its embedding:

-- Regular graph data:
:paper42  rdf:type     :Paper .
:paper42  :title       "Attention Is All You Need" .
:paper42  :author      :Vaswani .
:paper42  :cites       :paper17 .

-- The embedding is also a triple:
:paper42  :hasEmbedding  "0.23 -0.11 0.87 ..."^^sutra:f32vec .

The vector is stored in the same database, indexed by the same system, and queryable in the same language. No sync, no translation, no second database.

Vector Hopping: Wormhole Queries

The unique capability this unlocks is vector hopping — traversing between graph space and vector space within a single query. We call these "wormhole queries" because they let you jump through a shortcut in the data space.

How it works

Imagine you have a knowledge graph of academic papers with their relationships (citations, authors, topics) and their embeddings (semantic meaning of the paper's content).

Traditional approach (two databases):

Query the vector DB: "find papers similar to this embedding" → get IDs
Take those IDs to the graph DB: "which of these papers cite papers by Vaswani?" → filter
Take filtered IDs back to graph DB: "get titles and authors" → display

Three queries, two databases, application-layer glue code.

SutraDB (one query):

SELECT ?paper ?title WHERE {
  VECTOR_SIMILAR(?paper :hasEmbedding "..."^^sutra:f32vec, 0.8)
  ?paper :cites ?cited .
  ?cited :author :Vaswani .
  ?paper :title ?title .
} ORDER BY DESC(VECTOR_SCORE(?paper :hasEmbedding "..."^^sutra:f32vec))
LIMIT 10

One query. Graph traversal and vector search interleaved. The query planner decides the optimal execution order.

Types of wormhole traversals

Pattern	Description	Example
Vector → Graph	Start with similarity search, then traverse graph	"Find entities similar to X, then get their properties"
Graph → Vector	Traverse graph first, then filter by vector similarity	"Get all Papers by this author, rank by similarity to my query"
Graph → Vector → Graph	Traverse, hop through vector space, traverse again	"Find this author's papers, find similar papers, get their authors"
Vector → Graph → Vector	Search, traverse, search again	"Find similar entities, follow links, find similar neighbors"

Entity Resolution via Vectors

An important property: in SutraDB, multiple entities can share the same vector. This enables automatic entity resolution through vector search.

-- Two different entities, same word, same embedding:
:bank_financial  :hasEmbedding  "0.5 0.3 ..."^^sutra:f32vec .
:bank_river      :hasEmbedding  "0.5 0.3 ..."^^sutra:f32vec .

-- A vector search for "bank" returns BOTH entities.
-- Then graph patterns disambiguate:
SELECT ?entity ?type WHERE {
  VECTOR_SIMILAR(?entity :hasEmbedding "..."^^sutra:f32vec, 0.9)
  ?entity rdf:type ?type .
}
-- Returns: bank_financial (type: FinancialInstitution)
--          bank_river (type: GeographicFeature)

The vector search finds candidates; the graph structure resolves ambiguity. This is how you combine the fuzzy power of embeddings with the precise structure of a knowledge graph.

How HNSW Indexing Works

SutraDB uses HNSW (Hierarchical Navigable Small World) for vector search. Here's the intuition:

Imagine all your vectors as points in a high-dimensional space
HNSW builds a multi-layer graph connecting nearby points
The top layer has few points, widely spaced — good for big jumps
The bottom layer has all points, densely connected — good for precision
To search: start at the top, greedily follow connections downward, getting closer to the target at each layer

This gives O(log n) search time instead of O(n) brute force. For 1 million vectors, that's ~20 comparisons instead of 1 million.

Key parameters you can tune per predicate:

Parameter	Effect	Default
`M`	Connections per node. Higher = better recall, more memory	16
`ef_construction`	Build-time beam width. Higher = better index quality, slower builds	200
`ef_search`	Query-time beam width. Tunable per query via `ef:=N` hint	200