Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: GibRAM an in-memory ephemeral GraphRAG runtime for retrieval (github.com/gibram-io)
60 points by ktyptorio 1 day ago | hide | past | favorite | 10 comments
Hi HN,

I have been working with regulation-heavy documents lately, and one thing kept bothering me. Flat RAG pipelines often fail to retrieve related articles together, even when they are clearly connected through references, definitions, or clauses.

After trying several RAG setups, I subjectively felt that GraphRAG was a better mental model for this kind of data. The Microsoft GraphRAG paper and reference implementation were helpful starting points. However, in practice, I found one recurring friction point: graph storage and vector indexing are usually handled by separate systems, which felt unnecessarily heavy for short-lived analysis tasks.

To explore this tradeoff, I built GibRAM (Graph in-buffer Retrieval and Associative Memory). It is an experimental, in-memory GraphRAG runtime where entities, relationships, text units, and embeddings live side by side in a single process.

GibRAM is intentionally ephemeral. It is designed for exploratory tasks like summarization or conversational querying over a bounded document set. Data lives in memory, scoped by session, and is automatically cleaned up via TTL. There are no durability guarantees, and recomputation is considered cheaper than persistence for the intended use cases.

This is not a database and not a production-ready system. It is a casual project, largely vibe-coded, meant to explore what GraphRAG looks like when memory is the primary constraint instead of storage. Technical debt exists, and many tradeoffs are explicit.

The project is open source, and I would really appreciate feedback, especially from people working on RAG, search infrastructure, or graph-based retrieval.

GitHub: https://github.com/gibram-io/gibram

Happy to answer questions or hear why this approach might be flawed.





The separate graph and vector storage can indeed add overhead for short-lived tasks. I've found that using a dual-memory architecture, where episodic and semantic memories coexist, can streamline this process and reduce complexity. If you're interested in seeing how this could work, I put together some tutorials on similar setups: https://github.com/NirDiamant/agents-towards-production

Out of curiosity, did you settle on that name before or after the RAM availability/price issues?

Actually, the name definitely came after noticing RAM prices. Though the idea where the graph-in-memory only for ephemeral RAG sessions came first, we won't pretend the naming wasn't influenced by RAM being in the spotlight.

GrrHDD

Very cool, kudos

Where might one see more about what type of indexing you do to get the graph?



Exactly, thank you. Still in LLM-based extraction.

how do you search the graph network?

There are two steps:

Vector search (HNSW): Find top-k similar entities/text units from the query embedding

Graph traversal (BFS): From those seed entities, traverse relationships (up to 2 hops by default) to find connected entities

This catches both semantically similar entities AND structurally related ones that might not match the query text.

Implementation: https://github.com/gibram-io/gibram/blob/main/pkg/engine/eng...


This is how I did it a few years back while working for a set store company. It works well.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: