AI inference & memory systems on AMD ROCm | Custom Triton kernels | Hierarchical retrieval | MoE architectures
- Washington State
- https://linkedin.com/in/garyjduncan
Popular repositories Loading
-
pensive
pensive PublicHierarchical context retrieval for LLM applications. Multi-tier caching with FAISS + BM25 hybrid search, L1/L2/L3 tiered storage, sub-30ms latency at 10M+ tokens.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.