If your product has code on it that can only be understood and worked on by the person that wrote it, then your code is too complex and underdocumented and/or doesn't have enough test coverage.
Your time would be better spent, in a permanent code base, trying to get that LLM to understand something than it would be trying to understand the thing yourself. It might be the case that you need to understand the thing more thoroughly yourself so you can explain it to the LLM, and it might be the case that you need to write some code so that you can understand it and explain it, but eventually the LLM needs to get it based on the code comments and examples and tests.
For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex
This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.
If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."
Reference: Asma Issa, George Mohler, and John Johnson. Paraphrase identification using deep contextual-
ized representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 517–526, 2018.
Asma Issa and John Johnson don't appear to exist. George Mohler does, but it doesn't look like he works in this area (https://www.georgemohler.com/). No paper with that title exists. There are some with sort of similar titles (https://arxiv.org/html/2212.06933v2 for example), but none that really make sense as a citation in this context. EMNLP 2018 exists (https://aclanthology.org/D18-1.pdf), but that page range is not a single paper. There are papers in there that contain the phrases "paraphrase identification" and "deep contextualized representations", so you can see how an LLM might have come up with this title.
It's not the equivalent of a typo. A typo would be immediately apparent to the reader. This is a semantic error that is much less likely to be caught by the reader.
Going through a retraction and blacklisting process is also a lot of work -- collecting evidence, giving authors a chance to respond and mediate discussion, etc.
Labor is the bottleneck. There aren't enough academics who volunteer to help organize conferences.
(If a reader of this comment is qualified to review papers and wants to step up to the plate and help do some work in this area, please email the program chairs of your favorite conference and let them know. They'll eagerly put you to work.)
That's exactly why the inclusion of a hallucinated reference is actually a blessing. Instead going back and forth with the fraudster, just tell them to find the paper. If they can't, case closed. Massive amount of time and money saved.
Isn't telling them to find the paper just "going back and forth with a fraudster"?
One "simple" way of doing this would be to automate it. Have authors step through a lint step when their camera-ready paper is uploaded. Authors would be asked to confirm each reference and link it to a google scholar citation. Maybe the easy references could be auto-populated. Non-public references could be resolved by uploading a signed statement or something.
There's no current way of using this metadata, but it could be nice for future systems.
Even the Scholar team within Google is woefully understaffed.
My gut tells me that it's probably more efficient to just drag authors who do this into some public execution or twitter mob after-the-fact. CVPR does this every so often for authors who submit the same paper to multiple venues. You don't need a lot of samples for deterrence to take effect. That's kind of what this article is doing, in a sense.
I dunno about banning them, humans without LLMs make mistakes all the time, but I would definitely place them under much harder scrutiny in the future.
Hallucinations aren't mistakes, they're fabrications. The two are probably referred to by the same word in some languages.
Institutions can choose an arbitrary approach to mistakes; maybe they don't mind a lot of them because they want to take risks and be on the bleeding edge. But any flexible attitude towards fabrications is simply corruption. The connected in-crowd will get mercy and the outgroup will get the hammer. Anybody criticizing the differential treatment will be accused of supporting the outgroup fraudsters.
Fabrications carry intent to decieve. I don't think hallucinations necessarily do. If anything, they're a matter of negligence, not deception.
Think of it this way: if I wanted to commit pure academic fraud maliciously, I wouldn't make up a fake reference. Instead, I'd find an existing related paper and merely misrepresent it to support my own claims. That way, the deception is much harder to discover and I'd have plausible deniability -- "oh I just misunderstood what they were saying."
I think most academic fraud happens in the figures, not the citations. Researchers are more likely to to be successful at making up data points than making up references because it's impossible to know without the data files.
Generating a paper with an LLM is already academic fraud. You, the fraudster, are trying to optimize your fraud-to-effort ratio which is why you don't bother to look for existing papers to mis-cite.
> AI is definitely more knowledgeable about most things than most university teachers
I think this is under-appreciated so much. Yes, every university professor is going to know more about quite a lot of things than ChatGPT does, especially in their specialty, but there is no university professor on earth who knows as much about as many things as ChatGPT, nor do they have the patience or time to spend explaining what they know to people at scale, in an interactive way.
I was randomly watching a video about calculus on youtube this morning and didn't understand something (Feynman's integration trick) and then spent 90 minutes talking to ChatGPT getting some clarity on the topic, and finding related work and more reading to do about it, along with help working through more examples. I don't go to college. I don't have a college math teacher on call. Wikipedia is useless for learning anything in math that you don't already know. ChatGPT has endless patience to drill down on individual topics and explaining things at different levels of expertise.
This is just a capability for individual learning that _didn't exist before ai_ and we have barely begun to unlock it for people.
I think the real gap in computer languages wrt LLMs is a replacement for python as a "notebook" language that the LLM uses to solve ad hoc problems during a chat.
What you want is something that is safe, performant, uses minimal tokens and takes careful note of effects and capabilities. Tests aren't really even important for that use case.
> I think the real gap in computer languages wrt LLMs is a replacement for python as a "notebook" language that the LLM uses to solve ad hoc problems during a chat.
hey I found this project december 23 and you just commented on another thing I posted "amazing one shot that" I will give you an invite if you want (because it also does that) check bio will add contact dets now...
it was posted to this site earlier about 20 days ago and front paged and hilariously about half the comments were shooting it down the top voted comment was even "this is the worst website ever" lol xD and they since invite only to manage abuse (its a very capable service and currently free)
It's capable of what you just mentioned, and it made the other site that one-shot you said was amazing for the one shot (literally cut and paste the comment into the prompt, then 2nd was "Good, now do it better")
How could that possibly have not been the case. A tariff is no different from the cost of any input into the price of a finished good. There is some sense in which price increases are limited by supply and demand, but if the market won't pay for the production cost of the good, then the market will cease to provide that good. There are only two possible outcomes, long term -- either the price goes up, or the product becomes unavailable.
There's an argument that domestically produced goods would substitute for imported goods leaving the market, but markets are so global and intertwined now that even domestic goods have imported inputs that are also affected by tariffs, and there often are no domestic goods or not enough domestic goods produced to act as a price-competitive substitute, and companies are not going to invest a ton of money into expanding domestic capacity, when tariffs are imposed on the whim of a lunatic and will probably be eventually tossed out by the supreme court or congress.
reply