I don't think context size is really the limit for larger codebases - it's more ...

CuriouslyC · 2026-01-12T03:02:40 1768186960

The approaches used by Claude Code and Cursor are inefficient. It's possible to calculate a covering set for a piece of code and provide that to an agent directly via a tool, and it turns out that this can reduce context usage in SWE-bench style tasks by >90% over RAG and grep/read.

If you're interested in learning more, https://github.com/sibyllinesoft/scribe

trueno · 2026-01-12T05:50:01 1768197001

Like most LLM-made readme's and the six bajillion AI/agentic/llm tools now on Github I can barely get a grasp on what I'm looking at here, or how to use it practically.

> Smart code bundler that turns repositories into optimized code bundles meeting a token budget in milliseconds

Ok. So it's a tool, do I use it on my repo once? Then what? Do I use it as I go, does it sit somewhere accessible to something like Claude Code and the onus is on me to direct Claude to use this to search files instead of his out of box workflow ? I can see some CLI examples, what should I do with that where does that fit into what people are using with cursor / claude / gemini etc ?

This is the part I've been trying to hammer home about LLM created stuff. It leaves us with vague not well-understood outcomes that might do something. People are shipping/delivering things they don't even understand now and they often times can't speak to what their thing does with an acceptable level of authority. I'm not against creating tools with LLM's but I'm actually pretty against people creating the basic readme with LLM's. Wanna make a tool in an LLM? More power to you. But make sure you understand what was made, because we need humans in here telling other humans how to use it, because LLMs flat out lose the plot over the course of a large project and I think a big issue is LLM's can sometimes be more eloquent at writing than a lot of people can, so they opt for the LLM-generated readme.

But as someone who would maybe consider using something like this, I see that readme and it just looks like every claude code thing I've put together to date which is to say I've done some seemingly impossible things with Claude only to find that his ability to recap the entirety of it just ended up in a whole lot of seemingly meaningful words and phrases and sentences that actually paint a super disjointed picture of what exactly a repo is about.

HarHarVeryFunny · 2026-01-12T15:38:48 1768232328

This scribe tool seems to offer somewhat similar functionality to a Language Server and/or Cursor's chunked vector index.

The idea would seem to be to give instructions to your agent (Claude Code, etc) to use this tool to discover the chunks of code (not entire source files) it needs to look at to modify a particular function. You could put these instructions on how/when to use scribe someplace like .claude/rules/scribe.md

I assume this is meant to work as an override to Claude Code's normal operation where it reads entire source files into context (not sure on details as to how CC decides which files are relevant if developer hasn't explicitly told it), so if you asked CC to do something that matches the instructions you'd put in scribe.md it would run scribe, send the output (code chunks and file locations) to Claude AI, which would then base it's edit requests on that.

It's not obvious if this --covering-set command is the only one scribe currently supports, or if it has other ones to output code chunks relevant for other use cases.

CuriouslyC · 2026-01-12T16:03:19 1768233799

Scribe grew out of fixing all the problems with code bundlers like Repomix. The covering set feature is the thing that clearly sets it apart, the performance difference is extreme; up to 98% token use reduction on SWE-bench tasks. I lead with it because it's the place where I'm far ahead of other tools, people won't adopt something because it's slightly better, scribe is a step change.

HarHarVeryFunny · 2026-01-12T16:25:34 1768235134

It would be useful if you had some documentation (or maybe you do?) as to how you are integrating scribe with Claude Code etc (same for Gemini CLI, or different?), and what your work flow looks like if necessary. Do you have something like scribe.md so that Claude Code is automatically invoking scribe when appropriate, or are you invoking scribe manually?

Has anyone tried scribe for larger scale projects, and green field development?

CuriouslyC · 2026-01-12T12:15:58 1768220158

The main box on the readme should make it pretty clear. One tool call to get a covering set of a piece of code, versus wasteful grep/read/lsp/etc.

I'm not sure if you're being intentionally obtuse or you just don't have much of an attention span, but I'm not making any money off this so if you want to use 10x more tokens to get stuff done, by all means brother.