Hacker Newsnew | past | comments | ask | show | jobs | submit | vessenes's commentslogin

Cool!

From the prompt it looks like you don’t give the llms a harness to step through games or simulate - is that correct? If so I’d suggest it’s not a level playing field vs human written bots - if the humans are allowed to watch some games that is.


That’s true, I’m trying to figure out a better testing environment with a feedback loop.

I did try letting the models iterate on the bot code based on a summary of an end-of-game ‘report’, but that showed only marginal improvements vs. zero-shot


One really nice thing about nuclear is that the fuel is highly portable. Small reactors next to datacenters take away a lot of complexity; transport, grid connectivity, etc. Plus they're already being built in industrial-ish areas.

So, allow me to say a few things:

Nuclear is about replacing baseload - currently coal basically

Small nuclear agrees you with about "monsters"

Storage at this scale is also not easy

SNR definitely pencil out in today's energy regime.


Storage at that scale already exists in for example California.

EDF in France is now crying that renewables are cratering the earning potential of their nuclear fleet, and increasing maintenance costs due to having to adapt.

In e.g. Australia coal plants are forced to become peakers, or be decommissioned.

We need firming for when the 10 year winter hits. Not an inflexible "baseload" plant producing enormously subsidized electricity when renewables and storage already flood the grid. Which is far above 90% of the time.


This is cool. It makes me want an unsloth quant though! A 7b local model with tool calling would be genuinely useful, although I understand this is not that.

UPDATE: I'd skip this for now - it does not allow any kind of interactive conversation - as I learned after downloading 5G of models - it's a proof of concept that takes a wav file in.


I forked and added tool calling by running another llm in parallel to infer when to call tools it works well for me to toggle lights on and off.

Code updates here https://github.com/taf2/personaplex


Cool approach. So basically the part that needs to be realtime - the voice that speaks back to you - can be a bit dumb so long as the slower-moving genius behind the curtain is making the right things happen.

Yes exactly- one part I did not like is we have to also separately transcribe because it does not also provide what the person said only what the ai said

what do you mean "infer"? how does the LLM get anything it of this as input?

It provides a voice assistant demo in /Examples/PersonaPlexDemo, which allows you to try turn-based conversations. Real-time conversion is not implemented tho.

> I'd skip this for now - it does not allow any kind of interactive conversation - as I learned after downloading 5G of models - it's a proof of concept that takes a wav file in.

I haven't looked into it that much but to my understanding a) You just need an audio buffer and b) Thye seem to support streaming (or at least it's planed)

> Looking at the library’s trajectory — ASR, streaming TTS, multilingual synthesis, and now speech-to-speech — the clear direction was always streaming voice processing. With this release, PersonaPlex supports it.


> You just need an audio buffer

That alone to do right on macOS using Swift is an exercise in pain that even coding bots aren't able to solve first time right :)


I beg to differ. My agent just one-shotted a MicrophoneBufferManager in swift when asked.

Complete with AVFoundation and a tap for the audio buffer.

It really is trivial.


Any chance of pushing it to GitHub? My swift knowledge could be written out on an oversized beer coaster currently, so I'm still collecting useful snippets


I've also had great results with using LLMs to pry into Apple's private and undocumented APIs. I've been impressed with the lack of hallucinations for C/C++ and Obj-C functions.

I can attest that the quality in this domain has greatly improved over the years too. I am not always fan of the quality of the Swift code that my LLM produces, but I am impressed that what is often produced works in one shot, as well. The quality also is not that important to me because I can just refactor the logic myself, and often prefer to do it anyway. I cannot hold an LLM to any idiosyncrasies that I do not share with it.


Exactly. Even if it’s a skeleton, as long as it does “The Thing”, I’m happy. I can always refactor into something useful.

Bummer. Ideally you'd have a PWA on your phone that creates a WebRTC connection to your PC/Mac running this model. Who wants to vibe code it? With Livekit, you get most of the tricky parts served on a silver platter.

This is the way. This is something I’m working on but for other applications. WebRTC voice and data over LiveKit or Pion to have conversations.


Almost all reporting is terrible. But yes, this is terrible reporting.

Agreed. I will add NV has product dominance - they don’t need to buy strategic MFN supplier status - why not deploy capital elsewhere?

Not sure I understand your complaint - 8GB is a goodish amount of RAM for a Chromebook, the de facto lead for educational stuff. I would take this over any Chromebook, ever, in a heartbeat.

Well ChromeOS is basically a monolithic browser based OS. These will likely have apps deployed which contain one copy of Chrome each. By the time you get three vendors' worth of stuff on it then you're running three isolated browser stacks and eating up RAM. I'm sitting here on a Mac with Teams, Outlook and Slack open and there's 18 gig of RAM gone for example.

As for Chromebooks, they are fucking awful for education. The abject disaster that is Google Classroom needs to just go away. NOTHING works properly, has any inkling of any reasonable design or engineering or is intuitive. I've seen so many students struggling with them.


The RAM usage you are describing is likely not actual resident memory use. Check RPRVT via top on macOS for a more generally useful metric of actual impact per process.

I look at memory pressure. I am running close to the yellow line on a 24Gb machine. If I close the apps, it craters. If I put more workload on it (I have a couple of things that will eat 4-5gb of RAM) it'll start crawling.

They should all be native apps.


You cna use of all Teams, Outlook and Slack from an actual browser if you want to.

I thought I wanted this. And in fact, I turned on the 'window' option for my iPad Pro when it came out. I did not, in fact, want this. The iPad just has totally different ergonomics, even with keyboard / trackpad / etc. Anyway, for years I've been convinced that would be the perfect combo - but in my own tests, - meh - At the very least, new hardware needs to be built that gives laptop-grade typing and trackpad. With that, then it would be upside - but I'm not sure that it would be that much cheaper than just having two devices.

ARM64(!?!) I know you were joking, but still.

Agreed. AND some are universal -- right now, agentic workflows benefit from independent source-of-truth checkins A LOT.

A lot of Simon's tools are making harnesses for this so it can get integrated:

showboat - create a demo, validate the code generates the demo. This is making a documentation source of truth

rodney - validate visually and with navigation that things work like you expect

red-green tests are conceptually the same - once we have these tests then the agent can loop more successfully.

So, I think there are some "universals" or at least "universals for now" that do transcend team/deployment specificity


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: