Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here is the reasoning: https://www.nobelprize.org/uploads/2024/09/advanced-physicsp...

I'm surprised Terry Sejnowski isn't included, considering it seems to be for Hopfield Nets and Boltzmann machines, where Terry played a large role in the latter.



I guess it makes sense to use that link above since it goes into much more detail. Changed from https://www.nobelprize.org/prizes/physics/2024/summary/. Thanks!


I know it's one day late but I personally don't agree with this change.

Using the main entry webpage is much easier for people to look for relevant info, including but not limited to: various articles about this specific award; other info about this year's Nobel, other info about Nobel in general.

This is exactly what the Internet (or WWW) is better than traditional printing media. Using a PDF as a link (which itself can be easily found from the entry point) defeats it.


I wish they were, but I don't think most readers are as diligent as you in looking for other sources.


He was probably considered since he is mentioned in the reasoning paper, still it could be one of those unfortunate omissions in the nobel history since those deciding the prize might have a hard time to measure impact.


Then they shouldn't be trusted to give awards in an area they are not experts in.


Trusted? The will of Alfred Nobel states that the Royal Swedish Academy of Sciences is the body that selects the winner, you can't change that.

Also, I think the process looks fairly decent:

https://www.nobelprize.org/nomination/physics/

Gather nominations, make a shortlist, research the shortlist with actual field experts, present candidates, discuss, and vote.

And in 50 years you'll be able to find out who the other candidates were!


Impact is hard to quantify. There have been several occasions where someone who very well deserved a Nobel prize didn't get one. There are all kinds of reasons. Given he is mentioned in the reasoning he was probably considered. We can't know the reason he did not get the prize.

I recently watched this quite video on the subject: https://youtu.be/zS7sJJB7BUI?feature=shared and found it quite enjoyable.


A funny one is that there was so much objection to General Relativity that they compromised by giving Einstein a Nobel for his work on quantum mechanics.

His acceptance speech was about general relativity.


Shouldn't be trusted? They are a random swedish foundation. How is one going to change that? Or disallow it or what you want.


That would probably leave the prizes awarded in a very narrow field, also the prize is supposed to be given to the thing that has "conferred the greatest benefit to humankind".

So in this case they picked something that might be viewed as only having a tangential connection to the field, but the impact has been so immense that they probably went outside their regular comfort zone (and how many prizes can we give for LHC work that really don't touch regular human lives in the foreseeable future anyhow?).


This is where passive voice highlights a weak position.

Trusted… by who?


Don't indulge in meritocracy. It's essentially a "prize".


Is this a widely accepted version of neural network history? I recognize Rosenblatt, Perceptron, etc., but I have never heard that Hopfield nets or Bolzmann machines were given any major weight in the history.

The descriptions I have read were all mathematical, focusing on the computational graph with the magical backpropagation (which frankly is just memoizing intermediate computations). The descriptions also went out of their way to discourage terms like "synapses" and rather use "units".


Restricted Boltzmann Machines were a huge revolution in the field, warranting a publication in Science in 2006. If you want to know what the field looks like back then, here it is: https://www.cs.toronto.edu/~hinton/absps/science.pdf

I remember in 2012 for my MS thesis on Deep Neural Networks spending several pages on Boltzmann Machines and the physics-inspired theories of Geoffrey Hinton.

My undergraduate degree was in physics.

So, yes, I think this is an absolutely stunning award. The connections between statistical entropy (inspired by thermodynamics) and also of course from biophysics of human neural networks should not be lost here.

Anyways, congratulations to Geoffrey Hinton. And also, since physics is the language of physical systems, why not expand the definition of the field to include the "physics of intelligence"?

From their official explanation page (https://www.nobelprize.org/uploads/2024/09/advanced-physicsp...): "With ANNs the boundaries of physics are extended to host phenomena of life as well as computation."


Yeah I agree with the 2006 Hinton paper. I read it and reread it and didn't get it. I didn't have the math background at the time and it inspired me to get it. And here I am almost 20 years later working on it.


  > I have never heard that Hopfield nets or Bolzmann machines were given any major weight in the history.
This is mostly because people don't realize what these are at more abstract levels (it's okay, ironically ML people frequently don't abstract). But Hopfield networks and Boltzmann machines have been pretty influential to the history of ML. I think you can draw a pretty good connection from Hopfield to LSTM to transformers. You can also think of a typical artificial neural network (easiest if you look at linear layers) as a special case of a Boltzmann machine (compare Linear Layers/Feed Forward Networks to Restricted Boltzmann Machines and I think it'll click).

Either way, these had a lot of influence on the early work, which does permeate into the modern stuff. There's this belief that all the old stuff is useless and I just think that's wrong. There's a lot of hand engineered stuff that we don't need anymore, but a lot of the theory and underlying principles are still important.


Bolzman machines were there in the very early days of deep learning. It was a clever hack to train deep nets layer wise and work with limited ressources.

Each layer was trained similar to the encoder part of an autoencoder. This way the layerwise transformations were not random, but roughly kept some of the original datas properties. Up to here training was done without the use of labelled data. After this training stage was done, you had a very nice initialization for your network and could train it end to end according to your task and target label.

If I recall correctly, the neural layers output was probabilistic. Because of that you couldn't simply use back propagation to learn the weights. Maybe this is the connection to John Hopkins work. But here my memory is a bit fuzzy.


Boltzmann machines were there in the 1980s, and they were created on the basis of Hopfield nets (augmenting with statistical physics techniques, among other reasons to better navigate the energy landscape without getting stuck in local optima so much).

From the people dissing the award here it seems like even a particularly benign internet community like HN has little notion of ML with ANN:s before Silicon Valley bought in for big money circa 2012. And media reporting from then on hasn't exactly helped.

ANN:s go back a good deal further still (as the updated post does point out) but the works cited for this award really are foundational for the modern form in a lot of ways.

As for DL and backpropagation: Maybe things could have been otherwise, but in the reality we actually got, optimizing deep networks with backpropagation alone never got off the ground on it's own. Around 2006 Hinton started getting it to work by building up layer-wise with optimizing Restricted Boltzmann Machines (the lateral connections within a layer are eliminated from the full Boltzmann Machine), resulting in what was termed a Deep Belief Net, which basically did it's job already but could then be fine-tuned with backprop for performance, once it had been initialized with the stack of RBM:s. An alternative approach with layer-wise autoencoders (also a technique essentially created by Hinton) soon followed.

Once these approaches had shown that deep ANN:s could work though, the analysis showed pretty soon that the random weight initializations used back then (especially when combined with the historically popular sigmoid activation function) resulted in very poor scaling of the gradients for deep nets which all but eliminated the flow of feedback. It might have generally optimized eventually, but after way longer wait than was feasible when run on the computers back then. Once the problem was understood, people made tweaks to the weight initialization, activation function and otherwise the optimization, and then in many cases it did work going directly to optimizing with supervised backprop. I'm sure those tweaks are usually taken for granted to the point of being forgotten today, when one's favourite highly-optimized dedicated Deep Learning library will silently apply the basic ones without so much as being requested to, but take away the normalizations and the Glorot or whatever initialization and it could easily mean a trip back to rough times getting your train-from-scratch deep ANN to start showing results.

I didn't expect this award, but I think it's great to see Hinton recognized again, and precisely because almost all modern coverage is to lazy to track down earlier history than the 2010s, not least Hopfield's foundational contribution, I think it is all the more important that the Nobel foundation did.

So going back to the original question above: there are so many bad, confused versions of neural network history going around that whether or not this one is widely accepted isn't a good measure of quality. For what it's worth, to me it seems a good deal more complete and veridical than most encountered today.


The deep learning variety of neural networks are heavily simplified, mostly linear versions of biological neurons. They don't resemble anything between your ears. Real life neurons are generally modeled by differential equations (in layman terms, have many levels of feedback loops tied to time), not the simplified ones used in dense layer activation functions.

Here are some examples

https://snntorch.readthedocs.io/en/latest/tutorials/tutorial...


Would that be equivalent to Weight matrice parameters?


Ish, take a look at the curves of the spiking neural network function, they are very different from the deep learning nets. When we "model" biological neural nets in code, we are essentially coming up with a mathematical transfer function that can replicate the chemical gradient changes and electrical impulses of a real neuron. Imagine playing a 3D computer game like Minecraft, the physics is not perfect but they are "close enough" to the real world.


Thanks this is very insightful.

Would the multi-head attention (Wv) not be equivalent to the chemical gradient changes?

(there are multiple matrices in multi-head attention, one for each attention head and what I imagine would be the equivalent of different gradients

This allows each attention head to learn different representations and focus on different aspects of the input sequence.)

And then the output produced after applying the concatenated (W0 or output projection), be equivalent to the different electrical outputs such as the spikes and passed to the next neuron equivalent or attention head?


¯\_(ツ)_/¯


Of the people who are still alive, Hopfield and Hinton make sense.

Hopfield networks led to Boltzmann machines. Deep learning started with showing that deep neural networks were viable in Hinton's 2006 Science paper, where he showed that by pre-training with a Restricted Boltzmann machine (essentially a stacked self-supervised auto-encoder) as a form of weight initialization, it was possible to effectively train neural networks with more than 2 layers. Prior to that finding, people found it was very hard to get backprop to work with more than 2 layers due to the activation functions people were using and problematic weight initialization procedures.

So long story short, while neither of them are in widespread use today, they led to demonstrating that neural networks were a viable technology and provided the FIRST strategy for successfully training deep neural networks. A few years later, people figured out ways to do this without the self-supervised pre-training phase by using activation functions with better gradient flow properties (ReLUs), better weight initialization procedures, and training on large datasets using GPUs. So without the proof of concept enabled by Restricted Boltzmann Machines, deep learning may not have become a thing, since prior to that almost all of the AI community (which was quite small) was opposed to neural networks except for a handful of evangelists (Geoff Hinton, Yoshua Bengio, Yann LeCun, Terry Sejnowski, Gary Cottrell, and a handful of other folks).


Second time he gets overlooked (after turing award)


Whether or not these fields are meaningfully distinct is a matter of taste, despite the fashion being to imagine a plurality of interconnected but autonomous domains.


Are you surprised every year that he's not included?


Yeah, Terry is a rockstar, and pumping out tons of papers. Maybe they google scholared, list by citations?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: