Hacker Newsnew | past | comments | ask | show | jobs | submit | MITSardine's commentslogin

In an organization, the number of sequential steps doesn't really scale with number of participants, does it? Rather with dependent steps of the tackled process; say, devise building permit request, await approval, purchase materials, move materials to site, hire workforce, etc.

Theoretically, each of those steps is parallelizable to some extent. Amdahl's law equivalent here would be that some delays are outside the reach of an organization to improve. For instance, a building permit will take the time it takes to be examined based on an external public administration.


> You could in principle create a simulation with the same mathematical properties as the physical world but no one has ever done that. I'm not sure if we even know how.

What do you mean by that? Simulating physics is a rich field, which incidentally was one of the main drivers of parallel/super computing before AI came along.


The mapping of the physical world onto a computer representation introduces idiosyncratic measurement issues for every data point. The idiosyncratic bias, errors, and non-repeatability changes dynamically at every point in space and time, so it can be modeled neither globally nor statically. Some idiosyncratic bias exhibits coupling across space and time.

Reconstructing ground truth from these measurements, which is what you really want to train on, is a difficult open inference problem. The idiosyncratic effects induce large changes in the relationships learnable from the data model. Many measurements map to things that aren't real. How badly that non-reality can break your inference is context dependent. Because the samples are sparse and irregular, you have to constantly model the noise floor to make sure there is actually some signal in the synthesized "ground truth".

In simulated physics, there are no idiosyncratic measurement issues. Every data point is deterministic, repeatable, and well-behaved. There is also much less algorithmic information, so learning is simpler. It is a trivial problem by comparison. Using simulations to train physical world models is skipping over all the hard parts.

I've worked in HPC, including physics models. Taking a standard physics simulation and introducing representative idiosyncratic measurement seems difficult. I don't think we've ever built a physics simulation with remotely the quantity and complexity of fine structure this would require.


Is this like some scale-independent version of Heisenberg's Uncertainty Principle?

I'm probably missing most of your point, but wouldn't the fact that we have inverse problems being applied in real-world situations somewhat contradict your qualms? In those cases too, we have to deal with noisy real-world information.

I'll admit I'm not very familiar with that type of work - I'm in the forward solve business - but if assumptions are made on the sensor noise distribution, couldn't those be inferred by more generic models? I realize I'm talking about adding a loop on top of an inverse problem loop, which is two steps away (just stuffing a forward solve in a loop is already not very common due to cost and engineering difficulty).

Or better yet, one could probably "primal-adjoint" this and just solve at once for physical parameters and noise model, too. They're but two differentiable things in the way of a loss function.


As I recall, a lot goes into DF's world generation, including erosion simulation and the like to achieve a realistic result. That game is the embodiment of over-engineering, after all.

That's pretty impressive! However this workflow could have trouble dealing with the types of meshes coming out of those 3D generative algorithms. Geometries are arbitrary (not simple geometric shapes) so you'll have to fit with NURBS, and the meshes are noisy so that'll struggle/be somewhat arbitrary (what do you consider feature, what do you consider noise?).

However you highlight what I think is the way forward: using scriptable CAD that can leverage LLMs or, maybe in the future, specialized generative algorithms that output in a sane geometry specification.


Some of the defects are attributable to the critical:

> AI models generate meshes using "isosurface extraction" or similar volume-to-mesh techniques

This creates the "lumpiness", the inability to capture sharp or flat features, and the over-refinement. Noisy surface is also harder to clean up. How do you define what's a feature and what's noise when there's no ground truth beyond the mesh itself?

Implicit surface methods are expensive (versus if-everything-goes-right of the parametric alternative), but they have the advantage of being robust and simple to implement with much fewer moving parts. So it's a pragmatic choice, why not.

3D generative algorithms might become much better once they can rely on parametric surfaces. Then you can do things like symmetry, flatness, curvature that makes sense, much more naturally. And the mesh generation on top will produce very clean meshes, if it succeeds. That is a crucial missing piece: CAD to mesh is hardly robust with human-generated CAD, so I can't imagine what it'd be with AI-generated CAD. An interesting challenge to be sure.


Yes, what's preventing the LLM from running myCommand > /tmp/out_someHash.txt ; tail out_someHash.txt and then greping or tailing around /tmp/out_someHash.txt on failure?


There isn't really anything other than training, but they generally don't. You probably can get them to do that with some extra instructions, but part of the problem - at least with Claude - is that it's really trigger-happy about re-running the commands if it doesn't get the results it likes, assuming the results reflects stale results. Even with very expensive (in time) scripts I often see it start a run, pipe it to a file, put it in the background, then loop on sleep statements, occasionally get "frustrated" and check, only to throw the results away 30 seconds after they are done because it's made an unrelated change.

A lot of the time this behaviour is probably right. But it's annoyingly hard to steer it to handle this correctly. I've had it do this even with make targets where the makefile itself makes clear the dependencies means it could trust the cached (in a file) results if it just runs make <target>. Instead I regularly find it reading the Makefile and running the commands manually to work around the dependency management.


I'm only discovering Literate Programming today, but you seem very familiar so I might as well ask: what is the fundamental difference with abundant comments? Is it the linearity of it? I mean documentation type comments at the top of routines or at "checkpoints".

I'm particularly intrigued by your mention of keeping old code around. This is something I haven't found a solution for using git yet; I don't want to pollute the monorepo with "routine_old()"s but, at the same time, I'd like to keep track of why things changed (could be a benchmark).


An article and previous discussion; Literate programming is much more than just commenting code - https://news.ycombinator.com/item?id=30760835

Wikipedia has a very nice explanation - https://en.wikipedia.org/wiki/Literate_programming

A good way to think about it is {Program} = {set of functional graphs} X {set of dataflow pipelines}. Think cartesian product of DAG/Fan-In/Fan-Out/DFDs/etc. Usually we write the code and explain the local pieces using comments. The intention in the system-as-a-whole is lost. LP reverses that by saying don't think code; think essay explaining all the interactions in the system-as-a-whole with code embedded in as necessary to implement the intention. That is why it uses terms like "tangle", "weave" etc. to drive home the point that the program is a "meshed network".

To study actual examples of LP see the book C Interfaces and Implementations: Techniques for Creating Reusable Software by David Hanson - https://drh.github.io/cii/


With LLMs this fast, you could imagine using them as any old function in programs.


You could always have. Assuming you have an API or a local model.

Which was always the killer assumption, and this changes little.


The Portuguese sometimes describe themselves as the "povo de brandos costumes" (people of mild customs).


To be fair, a lot of science doesn't follow the scientific method. I've yet to see an applied mathematician (to speak only of what I know) come up with a hypothesis, it's usually rather: here's how people solve this problem currently, this has this and that drawback, and our paper introduces a new method that solves bigger problems faster/new classes of problems.

The same could be said of theoretical work: here, we tightened up an inequality.

This is also research, not all of it is experimental!


Yeah I get it when giving projects to kids it's easier to be like "Here are the 5 sections you have to do" and then grade them on how well they did the 5 sections... but that's really limiting the spirit of the thing if the idea was to let the kids off the leash and see where they can take their minds.


I concur - research can include both scientific and engineering research.

I note MIT (like many universities) has a department of Electrical Engineering and Computer "Science".


It's interesting seeing the EECS and CS+CompENG programs splitting into two CompE and AI programs currently. This is happening in my department where we are standing up an AI major and we're all asking "Is the CS department the AI department now or what? Where do all the systems people go?"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: