lightweight, embeddable hybrid graph/vector database
built on top of LMDB, queryable over capnproto RPC or an inbuilt HTTP server
(install script / docker image coming soon!!!)
- CMake >= 3.30
- a C++20 compiler
- Cap'n Proto
- LMDB
install on macos:
brew install cmake pkg-config capnp lmdbdocker compose up dev- to pass server args:
STARDUST_ARGS="-v -b unix:/tmp/stardust.sock" docker compose up dev.
args:
-v: verbose logging-b: bind address (default:unix:/tmp/stardust.sock)-d: data directory (default:data)-H, --http: optional HTTP bind, e.g.http://0.0.0.0:8080(disabled by default)
run tests:
docker compose run --rm testto query the database, you can either communicate over capnproto RPC or the HTTP server. this repo comes with
two example client libraries consuming both of these channels, the python sdk operating over capnproto and the
typescript sdk wrapping the http endpoints. for example apps using these two libraries you can check out the demo, mcp, or explorer directories
check schemas/graph.capnp for the RPC spec if you want to write your own client, and clients/typescript/src/client.ts for the HTTP spec (openapi schema coming soon !!!)
a basic mcp server implementation is in mcp, it can be configured to use either http or stdio transport. an example run command:
STARDUST_URL="tcp://127.0.0.1:4000" STARDUST_VECTOR_TAG="plot" uv run stardust-mcp --port 7777 assuming your stardust db is configured with -b 0.0.0.0:4000
to then use this mcp server in e.g. claude code you can point it to localhost:7777 and use the stardust:answer_with_stardust prompt to actually do graphrag
- actual knn
- where is vecTagMeta written
- partition vectors by tagId
- way more tests, make sure things are working properly
- implement interning of various strings properly (label names, property key names, types, etc.)
- intern text props
- expand out the API, more CRUD + query filters etc.
- add in http server alternative to capnproto
- initial python sdk wrapper around capnproto
- tidy up http server
- add the 4 new http queries to rpc server
- initial typescript sdk wrapper around http
- add colors to explorer
- add embeddings + actual lmdb data for the demo
- add a couple graph algos, betweenness centrality, pathfinding
- expose python sdk as (read only) mcp server
- make custom extension to cypher, make a grammar, register queries to precompile them (?)
- hnsw index for vector search
- configurable similarity metric
- actual error handling
longer term:
- own storage engine, optimise locality w/ graph location and vector latent space
- additional indexes on node/edge properties
- simd stuff
- think deeper about concurrency model, LMDB single-writer currently but could be multi-writer
- isolation guarantees, transactions, crash consistency, etc.
- auth
- cli installer, docker image, etc.
- inbuilt opinionated concept of versioning (?), layer on top of primitives