Skip to content

olifog/stardust

Repository files navigation

stardust

lightweight, embeddable hybrid graph/vector database

built on top of LMDB, queryable over capnproto RPC or an inbuilt HTTP server

build from source

(install script / docker image coming soon!!!)

requirements

  • CMake >= 3.30
  • a C++20 compiler
  • Cap'n Proto
  • LMDB

install on macos:

brew install cmake pkg-config capnp lmdb

dev with docker compose

docker compose up dev
  • to pass server args: STARDUST_ARGS="-v -b unix:/tmp/stardust.sock" docker compose up dev.

args:

  • -v: verbose logging
  • -b: bind address (default: unix:/tmp/stardust.sock)
  • -d: data directory (default: data)
  • -H, --http: optional HTTP bind, e.g. http://0.0.0.0:8080 (disabled by default)

run tests:

docker compose run --rm test

to query the database, you can either communicate over capnproto RPC or the HTTP server. this repo comes with two example client libraries consuming both of these channels, the python sdk operating over capnproto and the typescript sdk wrapping the http endpoints. for example apps using these two libraries you can check out the demo, mcp, or explorer directories

check schemas/graph.capnp for the RPC spec if you want to write your own client, and clients/typescript/src/client.ts for the HTTP spec (openapi schema coming soon !!!)

MCP for graphRAG

a basic mcp server implementation is in mcp, it can be configured to use either http or stdio transport. an example run command:

STARDUST_URL="tcp://127.0.0.1:4000" STARDUST_VECTOR_TAG="plot" uv run stardust-mcp --port 7777    

assuming your stardust db is configured with -b 0.0.0.0:4000

to then use this mcp server in e.g. claude code you can point it to localhost:7777 and use the stardust:answer_with_stardust prompt to actually do graphrag

TODO

  • actual knn
  • where is vecTagMeta written
  • partition vectors by tagId
  • way more tests, make sure things are working properly
  • implement interning of various strings properly (label names, property key names, types, etc.)
  • intern text props
  • expand out the API, more CRUD + query filters etc.
  • add in http server alternative to capnproto
  • initial python sdk wrapper around capnproto
  • tidy up http server
  • add the 4 new http queries to rpc server
  • initial typescript sdk wrapper around http
  • add colors to explorer
  • add embeddings + actual lmdb data for the demo
  • add a couple graph algos, betweenness centrality, pathfinding
  • expose python sdk as (read only) mcp server
  • make custom extension to cypher, make a grammar, register queries to precompile them (?)
  • hnsw index for vector search
  • configurable similarity metric
  • actual error handling

longer term:

  • own storage engine, optimise locality w/ graph location and vector latent space
  • additional indexes on node/edge properties
  • simd stuff
  • think deeper about concurrency model, LMDB single-writer currently but could be multi-writer
  • isolation guarantees, transactions, crash consistency, etc.
  • auth
  • cli installer, docker image, etc.
  • inbuilt opinionated concept of versioning (?), layer on top of primitives

About

embeddable hybrid graph/vector database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors