More

endymion-light · 2026-02-06T10:43:26 1770374606

Found it fantastic - used up my daily usage in two queries though!

endymion-light · 2026-02-06T10:40:07 1770374407

I love doing a personal side project code review with claude code, because it doesn't beat around the bush for criticism.

I recently compared a class that I wrote for a side project that had quite horrible temporal coupling for a data processor class.

Gemini - ends up rating it a 7/10, some small bits of feedback etc

Claude - Brutal dismemberment of how awful the naming convention, structure, coupling etc, provides examples how this will mess me up in the future. Gives a few citations for python documentation I should re-read.

ChatGPT - you're a beautiful developer who can never do anything wrong, you're the best developer that's ever existed and this class is the most perfect class i've ever seen

majora2007 · 2026-02-06T15:07:56 1770390476

This is exactly what got me to actually pay. I had a side project with an architecture I thought was good. Fed it into Claude and ChatGPT. ChatGPT made small suggestions but overall thought it was good. Claude shit all over it and after validating it's suggestions, I realized Claude was what I needed.

I haven't looked back. I just use Claude at home and ChatGPT at work (no Claude). ChatGPT at work is much worse than Claude in my experience.

Willish42 · 2026-02-06T22:11:20 1770415880

I feel like this anecdote represents the differing incentives / philosophies of each group rather well.

I've noticed ChatGPT is rather high in its praise regardless of how valuable the input is, Gemini is less placating but still largely influenced by the perspective of the prompter, and Claude feels the most "honest" but humans are rather easy poor at judging this sort of thing.

Does anyone know if "sycophancy" has documented benchmarks the models are compared against? Maybe it's subjective and hard to measure, but given the issues with GPT 4o, this seems like a good thing to measure model to model to compare individual companies' changes as well as compare across companies.

endymion-light · 2026-02-09T11:09:49 1770635389

The issue i think is that to model sycophancy you'd need another model that can address signs of sycophancy - it's turtles all the way down

endymion-light · 2026-02-05T13:19:19 1770297559

I absolutley adore these. Are there any communities you're involved with sharing some of these?

The ability to tell a story within data is so critical

endymion-light · 2026-02-04T14:36:30 1770215790

This felt like a sane and useful case until you mentioned the access to bank account side.

I just don't see a reason to allow OpenClaw to make purchases for you, it doesn't feel like something that a LLM should have access to. What happens if you accidentally end up adding a new compromised skill?

Or it purchases you running shoes, but due to a prompt injection sends it through a fake website?

Everything else can be limited, but the buying process is currently quite streamlined, doesn't take me more than 2 minutes to go through a shopify checkout.

Are you really buying things so frequently that taking the risk to have a bot purchase things for you is worth it?

I think that's what turns this post from a sane bullish case to an incredibly risky sentiment.

I'd probably use openclaw in some of the ways you're doing, safe read-only message writing, compiling notes etc & looking at grocery shopping, but i'd personally add more strict limits if I were you.

mixologic · 2026-02-04T20:08:58 1770235738

What if... that whole post is written by AI, and the express intent of the post is to sand down our natural instincts for security, making it easier for malskill devs to take advantage?

endymion-light · 2026-02-05T08:56:01 1770281761

Then it's done a horrible job! As all I could think was surely you're not making the efficiency gain you state.

It's similiar to back when Notion second brain templates became popular, there was a level at which you went - surely it's just going to be a full time job to manage this single complicated template?

zozbot234 · 2026-02-04T14:39:59 1770215999

You could give it access to a limited budget and review its spending periodically. Then it can make annoying mistakes but it's not going to drain your bank account or anything.

chaostheory · 2026-02-04T15:21:35 1770218495

Giving it access to a separate bank account and separate credit card would have been more sane.

protocolture · 2026-02-04T23:48:22 1770248902

Yeah I was thinking a specific Wyse card with a 300 dollar limit, if I was going to do this, but it already seems stupidly expensive token wise.

endymion-light · 2026-02-05T10:47:41 1770288461

I almost feel like the amount of effort required to set up a seperate limited budget bank account and review spending periodically is equivalent to just going to a link to checkout

But I may be a lazy engineer, I definitely go by the if you do it once, don't automate, do it twice, automate approach

krackers · 2026-02-04T21:16:16 1770239776

>OpenClaw to make purchases for you

But don't you want the agents to book vacations and do the shopping for you!!?!

Though it would be nice if "deep research" could do the hard work of separating signal from the noise in terms of finding good quality products. But unfortunately that requires being extremely skeptical of everything written on the web and actively trying to suss out the ownership and supply chain involved, which isn't something agents can do unguided at the moment.

endymion-light · 2026-02-05T08:58:14 1770281894

Yeah - and honestly, i'll probably use a agent like openclaw to do stuff like find flights, villas, find a good timing between X and Y date, get it formatted as a table and validate the websites.

Then, I can commit to the checkout process, which isn't that much labour.

endymion-light · 2026-02-03T16:22:05 1770135725

Looks great - i'll try to check it out on my gaming PC.

On a misc note: What's being used to create the screen recordings? It looks so smooth!

kevinsync · 2026-02-04T01:29:40 1770168580

It might be Screen Studio [0] -- I was gonna write "99% sure" but now I'm not sure at all!!

[0] https://screen.studio

endymion-light · 2026-02-03T16:20:16 1770135616

Looks cool, I need an alternative to my supabase set-up for little web tools, so i'll check it out!

endymion-light · 2026-01-29T14:21:51 1769696511

Very cool - what would you recomend in terms of just getting started learning parametric design? It feels like one of my complete blank spots

hessammehr · 2026-01-29T23:00:37 1769727637

I would recommend Onshape to start. The user interface and documentation are stellar and you don't need to install anything to get started.

endymion-light · 2026-01-16T14:00:56 1768572056

For a text-based business simulator, i'd make the text far easier to read. I'm finding it to be a little to fast, with a lot of eye strain. There's a couple of techniques, including making sure that your text isn't completely black.

I'd look a little more into some of the design strategies, including smoother scrolling for text, better typography design, colors that are easier to read and more focus on the content itself.

Especially if you expect someone to read 20 minutes for an article. Just take a bit of a refresher on techniques for web readability!

endymion-light · 2026-01-15T15:39:12 1768491552

Thanks for this, FT has one of the worst paywalls i've seen

jader201 · 2026-01-15T15:46:20 1768491980

Not to mention a (potentially illegal?) 100% overlay for cookies that only has an “accept” button.

EDIT: there is at least a way to reject them by clicking the link to manage cookies. Still debatable whether this is legal, but at the very least, a dark pattern.

thunderbong · 2026-01-15T16:42:13 1768495333

I'm seeing this in a lot of places nowadays.

endymion-light · 2026-01-15T14:21:52 1768486912

Personal website - https://www.bemben.co.uk/

Would love bits of feedback for it - trying to toe the line between interesting css + actual usability