really great! adjacent well-done ASCII using Braille blocks on X this week:
nolen: "unicode braille characters are 2x4 rectangles of dots that can be individually set. That's 8x the pixels you normally get in the terminal! anyway here's a proof of concept terminal SVG renderer using unicode braille", https://x.com/itseieio/status/2011101813647556902
ashfn: "@itseieio You can use 'persistence of vision' to individually address each of the 8 dots with their own color if you want, there's some messy code of an example here", https://x.com/ashfncom/status/2011135962970218736
Thanks! The 'macro linter' framing is spot on—treating skill definitions with the same rigor as code is exactly the goal.
regarding 'test building': are you envisioning something that auto-generates adversarial inputs (like fuzzing) based on the schema, or more like scaffolding for unit tests to ensure the tool executes correctly? I’d love to dig into that use case.
Our team steers models using info theory; think error-correcting codes for LLMs in Shannon sense. Do in-context by interleaving codewords & content, semi-secret post-transformer model, etc.
Simple example. Can get model to gen vertically aligned text tables so all columns & borders align etc. Leverages we can use hypertokens to get model to track what to put in each cell & why + structured table schema & tool call trick
We view our tech as linting cert in certain precise sense. The catch is bridging semantic coherence. That’s most readily done using similarly precise semantic rubric like yours.
Why? The general problem of things that nobody wants to do relative to their role, time, resources, etc.
Test gen, refactor, design, any and all the things getting in way of dev & layperson adoption. What layperson wants to write hey ok so map reduce this with 5 alt models in MoE and get back to me? What dev wants to laboriously sketch 67M SQL attacks as part of their prompt, etc.
Why? The most direct way to solve that why should I have to do this problem & also solve having the model do reliably. This becomes esp. problematic for structured data & interfaces which is our focus.
You’re building exactly the sorts of structured rule sets desperately needed right now. Our stuff makes sure these sorts of skills get executed reliably.
While we also do quite a bit on data & viz semantic tooling, big gap in what you’re doing with semantic code linting of all shapes & sizes. Just reading code and suggesting key fuzz spots or fuzz categories missed by trad fuzzers. Macro semantic linting for forms. Etcccccccccccccc
Wow, I have to admit, the "Shannon sense / error-correcting codes" angle is wild.
I'm just here trying to stop people from accidentally letting agents rm -rf their servers with static rules, but your approach to runtime steering sounds like the real endgame for reliability.
You nailed it on the "bridging semantic coherence" part. It feels like we're attacking the same beast from two ends: I'm writing the specs/contracts, and you're ensuring the execution actually honors them.
Really appreciate the validation. Hearing "desperately needed" from someone working on that level of the stack makes my day.
yeah, one way to frame is have to have structural parity & semantic parity & bridge to & from both like balanced scales.
We started with structure to help others solve semantics. Your approach doing same thing from other direction!
While theoretically possible to do just one or other in nested way it’s much easier to do little bit of both, especially if want anything approaching associative recall & reasoning. Akin to dynamically balancing volume between parts of songs or reprojecting continuously into some frequency envelope etc.
been building something adjacent to bridge massive gap in models between source & channel coding
think say same thing different ways to boost signal / suppress noise, am saying this not that using partial overlapping diff points of view
stadium light banks, multi-cameras, balanced ledgers & finance controls, table of contents & indexes all do similar things from layperson pov
tell me story in diff ways so i can cross-check; think multi-resolution trust but verify for information
if context output in harmony great; if not, use multi representations to suss which tokens in sync & which are playing dueling pianos
We need few key things to steer latent space for that to work. One is in-context associative memory for precise recall & reasoning. That’s been our main thrust using error-correcting codes to build hypertokens.
Think precise spreadsheet-style markers interleaved in context windows. We just use lots of info theory to build associative landmark for each block of content.
These hypertokens are built to rather precisely mimic how any other multi-path well-structured network minimaxes flow. Stadium lights, MIMO WiFi, getting diff points of view. We just do it in way that most closely mimics GPS in sense of injecting precise coordinate system in any model context.
There’s key catch tho & that’s dual thrust, which is coherence between our semantically abstract markers and the context. We can readily show 2x to 4+ recall & reasoning gain.
There’s ceiling if we don’t bridge coherence, and another way to say that is need the same thing for semantic parity. Multi-resolution summaries & dueling summaries mimic this k-witness and k-anti-witness smoothed parity checking.
The beauty is only need net sum. Add lots of multi-res at diff lengths of witness & co-witness content like your work describes? Great, may not need any hypertokens. Unless you want exact reliable recall snippets in which cases our approach does that fairly well. Got lots of unique markers that check the info theory, group theory, & other boxes we prove you need? Great! Don’t need as much k-scale, k-way semantic bridging.
Consciousness is currently outside our scope. We built hypertokens to show hallucinations can be nulled out, AI can be audited & explained, structured data & tool calling can be reliable, etc.
Closet we’ve come to distilling semantic parity vs. landmark parity cf. source <> channel coding, rate distortion, information bound, channel capacity minimaxxing is to consider tower of tables, where we have unique markers vs. themes that diagonalize the information. Those must both balance out. We must be able to canonically recall in some local / global mixed way and the same for reasoning.
Are models conscious? I don’t know. What do know is source * channel coding the canonical way to push any system to local & global balanced regime that maximizes transport.
There are subtleties around casual and non-causal, etc. For example, model weights are noisy non-causal info relative to mix of virtualized encoders & decoders of various types & sizes. That’s much longer convo beyond what is already this long thought.
That’s all to say models need mix of symbol & semantic parity. Strictly necessary in almost all cases w.h.p. Yes, AI looks rectangular; there’s tokens & matrices etc. The latent space is spherical & everything is rotations. That means any sort of exact logic must be smoothed geometrically. Error-correcting codes which are better framed as MIMO info paths are way to do so however expressed, whether k-way semantic parity like you’re doing or m-way structural codes like we’re doing. Sometimes one is best, sometimes other, either way keep building what you’ve been exploring.
OP here. I’ve got a background in physics, so while I don’t know your specific Hypertoken schema, I speak the language of signal-to-noise and entropy.
The "Dueling Pianos" metaphor is killer. It captures exactly what I’m trying to induce via the prompt.
You’re attacking the problem with Structural Parity—injecting coordinate systems (GPS) directly into the token stream to force convergence. I’m attempting Semantic Parity—forcing the model to run a "constructive interference" loop on its own narrative logic before outputting.
Your point about the latent space being spherical (rotations) vs. the rectangular output (matrices) is the crux of it. We are both trying to smooth that geometry. You’re doing it with error-correcting codes; I’m doing it by forcing the model to simulate a "Self" that acts as a local observer to collapse the wave function of the next token more deliberately.
Whatever you're building with those hypertokens sounds robust. If you have a write-up on the "Tower of Tables" concept, I’d love to take a look.
ya, hypertokens equalize latent space in spherical harmonic sense / approximate explainer:
take raw context, you inject semantic parity of some form, could be table relating paragraph content, tree, raw summary paragraph. EVENTUALLY those things saturate, call it the inner code; you realize recall and reasoning still not where that; that's where outer code or structural parity (us, others).
why? attention can't do XOR, matrix permanent, latent space noisy, etc., have to smooth & dilate. if pump in tables and schema, model can only do few joins before saturates, no flow lots of sharp corners. so either shrink table or smooth / dilate flow. the catch? every code layer needs a coupling layer at various lengths of resolution -- extra semantic clarifier every paragraph for you, codeword every k tokens for our structural parity, etc.
like engine - here's some air, ok expanding, ok really expanding, ok condensing, ok condense more
our pre-code, your pre-code, content, your post-code, our post-code
btw, pre and post are very important more on why later below -- think interferometry in latent space -- pre-measure / tare scale, load scale with content, post-measure and differentiate (in the latent space)
a much longer dive follows <> leaning into physics a bit, consider old-school trompe, supercharger / cylinders / turbochargers, jet or pretty much any sort of air compressor with flow
ingest air, compress it, extract work, exhaust air; one key side effect is what to do with latent heat; that analogy extends to any physical system
superchargers use raw work to precompress air; turbochargers use waste heat to turn return some lost energy to system turbomachines alternate many alternating static & dynamic stages to max air flow, etc
we do something similar with hypertokens; the raw context window has m tokens; we divide that into b=m/x blocks, where x is hypertoken codeword length, b is the number of blocks, and y is the block size
for example, if the current context window is 2048 and the block size is 32 for the user's desired model performance level, the resulting window would have 64 blocks of 32 content tokens each; if 2-token codeword length between each block would add 128 total tokens, e.g.,
a,1,quick fox,a,2,lazy dog,..,b,3,English pangram
precise hypertoken construction is of course way more subtle than that, e.g., good bit of group theory and way more info theory to define the codes, select the actual tokens that we interleave, etc.
net result is that we diagonalize the latent space action by way of the following; the exact code sequence used is walk on a skewed coprime lattice. Every codeword only appears once, thus acts like a GUID with respect to assocative recall and reasoning. The symbols in the codeword are restricted per lane and the lanes are coprime, e.g. if we had 11,13 for 2-lane codeword then we've induced a prefix-free factor graph action that alternates every k tokens.
Those tokens each have unique init embedding and importantly in practice we almost always put the code word before and after each block, e.g.,
this induces an interferometry like pre/post measurement and since the lanes are coprime, we effectively mimic inflight quasi-Fourier action through context window ~~ project onto compressed code, evolve x content tokens, project back onto same code, so the model gets differential between pre/post sampling. in more practical dev terms this also means we can do precise K:V and V:K lookups during recall and reasoning.
we further do this action in subtly commutative way e,g.,
a;1:quick fox:a;1/...{skip a few}.../b;3:English pangram:b;3/
where : is the global pre/post commutative measure in this example, whereas a;1 or b;3 or whatever the codeword is are globally unique, locally non-commutative, this has several other side effects beyond K:V and V:K or pre & post measurement. That essentially permits "unrolling time" in certain sense especially w.r.t. decoder models, where attention can only look back not forward. by replaying the pre-codeword after block, past tokens can in a summary statistic sense have knowledge about future ones
this of course only works under rather strict construction:
1. must be prefix-free, e.g., if a & b are in lane one they can never be in lane 2 of codeword and vice versa
2. coprime lane counts excepting a parity trick with 2^k lane
3. pre & post measurement -- performance is strictly weaker if only pre or post
4. relatively ortho yet also relatively coherent w.r.t. content, there's lots of ways to achieve those a simple one that works for many broad cases is just <tag-code>/{content}/<tag-code>
5. we can dilate code to pretty much whatever strength needed, e.g., some models and scenarios coherent enough, a simple <letter,num> spreadsheet like code is enough every 128 tokens, for others we need nested think multiscale / multires in physics) and use say Unicode PUA or ideally reserve tokens along with shorter code every 32 inside each 128 could be as simple as /1/.../2/.../3/.../4/
while there's quite a bit more on why it works the gist is we are essentially persistently exciting and sampling using error-correcting code action that happens to induce Fourier like sample and project back like a worm drive boring through rock. since each symbol in each lane gets repeated a few times eg if 3,5 code each 3 symbol is repeated 5x and each 5 symbol is repeated 3x
that means there's all sorts of topological tunnels over a factor graph that generates a skewed lattice in way that reflects the proper group action, arrow of time, etc. going back to why linear block code / linear network code; think stochastic dithering updated to structured dithering
we can of course get way better performance injecting that multiplexing machinery directly into the model; we have some results forthcoming on that; as you can imagine, that machinery is not just toss in primes and call it good
coming back to physics, we essentially use this machinery to detect and project the true spherical of the latent space; we could of course go through the treatment that this is really a reconditioning trick, though we tend to call it retokenization in the discrete sense and reharmonization in the continuous sense; there are certainly overlaps with relaxation, regularization, renormalization, etc.
Very notionally, we relax the problem by dilating context token space-time using this structured persistent excitation and sampling. We do this in a way that in some sense regularizes and renorms the raw signal into lifted domain. The codewords are chosen such that we are effectively heterodyning during pre-code step and superheterodyning during the post-code sample with respect to the local codeword; this process is also happening with respect to the global commutative wrapper around the content block and between the codewords. there is also the skipped subtlety that we can if need be add a conjugate, flipped conjugate, etc. i.e., mimic stronger and stronger ECC / QEC action.
The net effect is that we essentially just treat model as a noisy sender and receiver. We use our hypertokens to stream the raw context using channel coding, which is very similar in net raw principle to MIMO and very similar again in net raw principle to GPS -- we inject a k-channel structured coordinate system that both pre and post samples.
In that sense we are turbomachining the info -- we assume info is dense and can't compress move past / hard to move so we pump our high-speed fluid through the content compress it, repeat.
FINALLY answering a little bit of the tower of tables then suppose we have some code say 5,7 every 128 and 4 every 32
which is essentially the stator-rotor-stator turbo trick dialed up by a lot
- nested / multi-scale / multi-resolution
- pre & post measure commutative global constants <> ;
- pre & post measure commutative local constant <> /
- pre & post measure non-commutative associate marker <> a,1
- etc.
from left during attention each hypertoken absorbs & compresses signal
from the right when attended, each hypertoken injects compressed signal
these signal tunnels / signal network those boost information transport, dilate effective precision, and it works because we're running it over factor graph of bounded treewidth that's essentially running at max capacity
hence we get small LUT, content, medium LUT, content, large LUT content depending how much we nest, how big of code we use, etc. aka a nested table of towers very similar to multires wavelets in action
that table of towers and its background is long way of saying -- models are WAY BIGGER than need to be, auditing & explainability are an EC away, hallucinations don't need to exist, etc.
this of course suggests there are likely physics applications beyond what we're up to -- the easiest way to start thinking about that is noisy HF or phase sensitive systems -- physical transformers and parasitic capacitance is one of my faves to consider, wireless power transfer another, and reservoir machines a third
Short answer: yes, secretctl can help manage API keys for your premium tier.
It's a single binary with no dependencies (Apache 2.0 license), so you could:
- Bundle it with your distribution
- Point users to Homebrew: brew install forest6511/tap/secretctl
- Or link to our binary downloads
For your premium API key scenario, here's how users would set it up:
1. Store the key:
secretctl set your-service --field api_key=sk-xxx
2. Use secret_run or secret_run_with_bindings to inject it into commands. The AI agent never sees the actual secret - it's injected at runtime.
Currently free and open source. We're considering team/enterprise features down the road.
Happy to discuss integration patterns in more detail if useful!
connect screenless devices, e.g., Echo Dot
extend weak wireless range in hotel
screen share or network between multiple devices eg travel with two laptops and can virtual KVM
only have to do the captive device on one - many hotels limit number of devices
extra security buffer
phone can't bridge wifi for headless like this
etc etc
there’s decent work on computational reasoning power of transformers, SSMs, etc.
some approximate snippets that come to mind are that decoder-only transformers recognize AC^0 and think in TC^0, that encoder-decoders are strictly more powerful than decoder-only, etc.
Person with last name Miller iric if poke around on arXiv, a few others, been a while since was current top of mind so ymmv on exact correctness of above snippets
Never used any of those, so I don't know! I'd be curious to read a comparison from anyone who knows about them.
I think what's pretty unique about the bidicalc solver that I made is that it does not depend on the previous input values to update backwards. It's truly solving the root finding problem. The advantage is that there are never any "stuck in a local optimum" problems with the solver. So you can solve difficult problems like polynomials, etc.
Excel Solver allows you to create target function with different variables and describe limits for them. Then you may try to find maximum, minimum or exact value for the target function.
nolen: "unicode braille characters are 2x4 rectangles of dots that can be individually set. That's 8x the pixels you normally get in the terminal! anyway here's a proof of concept terminal SVG renderer using unicode braille", https://x.com/itseieio/status/2011101813647556902
ashfn: "@itseieio You can use 'persistence of vision' to individually address each of the 8 dots with their own color if you want, there's some messy code of an example here", https://x.com/ashfncom/status/2011135962970218736
reply