More

velcrovan · 2026-02-19T16:07:20 1771517240

> It's difficult for us to maintain documentation of exactly the kind you'd want there

Suggestion: let an LLM maintain it for you.

Alternate suggestion for OP: let an LLM generate the explanations you want from the code (when available).

raggi · 2026-02-19T23:42:29 1771544549

This problem space is not small enough to stay within current LLM attention span. A sufficiently good agent setup might be able to help maintain docs somewhat through changes, but organizing them in an approachable way covering all the heuristics spread across so many places and external systems with a huge amount of time and versioning multivariate factors is hugely troublesome for current LLM capabilities. They're better at simpler problems, like typing the code.

_se · 2026-02-20T14:08:16 1771596496

LLM docs suck.

For technically complex things, they EXTRA suck.

This is a bad idea.

velcrovan · 2026-02-16T15:06:32 1771254392

Wow, you use Word, Google Docs and Scrivener to author the content on your website? Tell me more.

cxr · 2026-02-17T15:23:34 1771341814

The comment you're responding to is obnoxious, but authoring in Google Docs and exporting to HTML+CSS would be viable and is 10x more accessible than the simultaneously over- and underengineered toolchains and work practices that the professional web developer class has turned out, and doesn't produce output substantially worse than the now-widespread practice of sending mangled/minified payloads to UAs, which in the worst cases involves turning content into opaque blobs that you have to squirt through a JS runtime to get anything meaningful on screen.

The state of the art in Web publishing is such a mess that paid practitioners have, seemingly without realizing it, quietly eliminated the main reason why anyone should even have an expert handle the "lowering" from concept to HTML+CSS instead of using a quasi-WYSIWYG tool or some other crummy sitebuilder and working with whatever shoddy markup they give you.

berkshirehathaway.com (<https://berkshirehathaway.com/>) makes the rounds every now and then, and people ooh and aah over it in the comments, but you can tell they never really get it because then they just turn around and dump their next Vercel-hosted monstrosity on the world.

velcrovan · 2026-02-16T15:03:27 1771254207

https://scott.mn/2014/02/21/semantic_linewrapping/

velcrovan · 2026-02-16T15:01:48 1771254108

If I had to guess, I would say that this method might be applicable to other books besides the one featured in the post.

velcrovan · 2026-02-15T03:49:46 1771127386

I would like to know what models people are running locally that get the same results as a $20/month ChatGPT plan

ineedasername · 2026-02-15T05:24:48 1771133088

Same? Not quite as good as that. But google’s Gemma 3 27B is highly similar to their last Flash model. The latest Qwen3 variants are very good, to my need at least they are the best open coders, but really— here’s the thing:

There’s so many varieties, specialized to different tasks or simply different in performance.

Maybe we’ll get to a one-size fits all at some point, but for now trying out a few can pay off. It also starts to build a better sense of the ecosystem as a whole.

For running them: if you have an Nvidia GPU w/ 8GB of vram you’re probably able to run a bunch— quantized. It gets a bit esoteric when you start getting into quantization varieties but generally speaking you should find out the sort of integer & float math your gpu has optimized support for and then choose the largest quantized model that corresponds to support and still fits in vram. Most often that’s what will perform the best in both speed and quality, unless you need to run more than 1 model at a time.

To give you a reference point on model choice, performance, gpu, etc: one of my systems runs with an nvidia 4080 w/ 16GB VRAM. Using Qwen 3 Coder 30B, heavily quantized, I can get about 60 tokens per second.

Twirrim · 2026-02-15T06:42:15 1771137735

I get tolerable performance out of a quantized gpt-oss 20b on an old RTX3050 I have kicking around (I want to say 20-30 tokens/s, or faster when cache is effective). It's appreciably faster on the 4060. It's not quite ideal for more interactive agentic coding on the 3050, but approaching it, and fitting nicely as a "coding in the background while I fiddle on something else" territory.

Twirrim · 2026-02-15T16:17:48 1771172268

Just in case anyone hasn't seen this yet:

https://github.com/ggml-org/llama.cpp/discussions/15396 a guide for running gpt-oss on llama-server, with settings for various amounts of GPU memory, from 8GB on up

ineedasername · 2026-02-15T13:37:25 1771162645

Yeah, tokens per second can very much influence the work style and therefore mindset a person should bring to usage. You can also build on the results of a faster but less than SOTA class model in different ways. I can let a coding tuned 7-12b model “sketch” some things at higher speed, or even a variety of things, and I can review real time, and pass off to a slower more capable model to say “this is structural sound, or at least the right framing, tighten it all up in the following ways…” and run in the background.

saratogacx · 2026-02-15T05:12:28 1771132348

The run at home was in the context of $2k/mo. At that price you can get your money back on self-hosted hardware at a much more reasonable pace compared to 20/mo (or even 200).

giancarlostoro · 2026-02-15T16:26:21 1771172781

Well theres an open source GPT model you can run locally. I dont think running models locally is all that cheap considering top of the line GPUs used to be $300 now you are lucky if you get the best GPU for under $2000. The better models require a lot more VRAM. Macs can run them pretty decently but now you are spending $5000 plus you could have just bought a rig with a 5090 with mediocre desktop ram because Sam Altman has ruined the RAM pricing market.

Our_Benefactors · 2026-02-15T20:25:17 1771187117

Mac can run larger models due to the unified memory architecture. Try building a 512GB nvidia VRAM machine. You basically can’t.

giancarlostoro · 2026-02-15T23:12:04 1771197124

Fully aware, but who the heck wants to spend nearly 10 grand, and that's with just a 1TB hard drive (which needs to be able to fit your massive models mind you). Fair warning not ALL the RAM is fully unified. On my 24GB RAM Macbook Pro I can only use 16GB of VRAM, but its still better than me using my 3080 with only 10 GB of RAM, but I also didn't spend more than 2 grand on it.

everforward · 2026-02-15T16:38:40 1771173520

I got some decent mileage out of aider and Gemma 27B. The one shot output was a little less good, but I don’t have to worry about paying per token or hitting plan limits so I felt more free to let it devise a plan, run it in a loop, etc.

Not having to worry about token limits is surprisingly cognitively freeing. I don’t have to worry about having a perfect prompt.

joquarky · 2026-02-15T04:55:22 1771131322

And what hardware they needed to run the model, because that's the real pinch in local inference.

colonCapitalDee · 2026-02-15T05:01:58 1771131718

There are no models that you can run locally that'll match a frontier LLM

velcrovan · 2026-02-12T16:03:57 1770912237

https://joeldueck.com/manually-type-punctuation.html

https://joeldueck.com/ai-is-right-about-em-dashes.html

velcrovan · 2026-02-05T02:13:13 1770257593

who's "afraid" of green bubbles? it's like saying a toyota corolla driver is afraid of the ford pinto

antinomicus · 2026-02-05T02:42:24 1770259344

No it’s like someone owning a Ferrari and looking down on someone who drives a Corolla. Or that’s how they see it, anyway. Plus there’s the annoyance with interoperability: it’s not just about status, it’s about all your iMessage group chats that don’t play nice with android

elcritch · 2026-02-05T03:00:39 1770260439

Apple chose the colors well. For whatever reason the shade of green they chose just gives a bit of ick.

mlrtime · 2026-02-05T02:43:09 1770259389

It's a real thing, you're either too old and/or not dating young people. Some do care a lot.

velcrovan · 2026-02-06T14:30:31 1770388231

I'm confused, I thought we were talking about people who are installing and running openclaw. You're right, if this is now a thread about teenage dating habits, I'm out.

wolvoleo · 2026-02-05T05:17:55 1770268675

IMO it is pretty shallow to pick dating partners based on their mobile OS but yeah it does happen.

albedoa · 2026-02-05T20:34:15 1770323655

"Nissan" might have fit better than Ford Pinto here.

sneak · 2026-02-05T03:12:05 1770261125

iMessage lock in is a huge thing. When it was new and was still e2ee I ended up buying iPhones for everyone I regularly messaged.

These days it is insecure however because they backdoored the e2ee and kept it backdoored for the FBI, so now Signal is the only messenger I am reachable on.

Blue bubble snobbery is presently a mark of ignorance more than anything else.

xp84 · 2026-02-05T08:32:42 1770280362

I agree that it’s stupid to judge people for it, but you do have to admit that especially with not all people having RCS, the feature set of SMS and MMS that you have to deal with when not using iMessage is pretty barbaric. From the potato-quality videos (ironically, I recall QuickTime was heavily involved in that spec, lol) to the asinine way Apple lets you apply a reaction and then sends it as a verbose text… From an iPhone user’s point of view, a “green bubble” means “this conversation will work like it’s 2003.”

Yes, I know 99.999% of Android users are on WhatsApp (or WeChat, Line, or Telegram depending on cultural background) but at least half of iPhone users aren’t on those, so we still have to keep using Messages for a lot of people.

velcrovan · 2026-02-04T15:02:22 1770217342

Human beings are also liable for the results of their actions.

velcrovan · 2026-02-02T04:22:44 1770006164

Tell me about your auditing workflow and procedures.

velcrovan · 2026-01-30T15:20:46 1769786446

Different from other religions how? /s