When learning is sufficiently atomized and recombined, creations cease to be "derived from" in a legal sense.
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
Token prediction is a form of "learning" that is reinforced by the goal of reproducing the correct next token of the work, rather that acquiring ideas and concepts. For instance, given the prefix "Four score and seven years", the weights are adjusted until "ago" is correctly predicted, which is a fancy way of saying that it was stored in the model in a lossy way. The model "learned" that "ago" follows "four score and seven years" exactly the way your hard drive "learns" the audio and video frames of a movie when you download a .mp4 file.
I dispute the idea that token sequences reproduced from the model are not derived works.
I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
The slop merchants are getting a free ride for the time being.
As you said, it's lossy. Try it with any other distinctive but non-famous passage, and you won't get a correct prediction for the immediately following clause, much less for multiple sentences or paragraphs.
That's the case even when an LLM correctly identifies which book the prompted text is from. It still won't accurately continue on from some arbitrary passage. By the time you ask it to reproduce hundreds of words, you're into brand new book territory. Even when it's slop content, it's distinct slop.
The exceptions are cases where a significant number of humans would also know a particular quote from memory. Then, chances are, a frontier LLM will too.
You know how else you can reproduce a quote? Search for it on google, and search the resulting top hits; if it's a significant quote, multiple people have probably quoted it -- legally. You can also search a pirate library for the actual book, and search the book for the quote; while illegal, it's very simple to do, so unless you propose to make the free and open internet illegal, I'd suggest that banning LLMs for being "derivative work" creation engines is not so different from destroying the internet.
> I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
If judges have any sense whatsoever, LLM generations (without specific prompt crafting to mimic existing works) will be judged to not be derived works and therefore not be violating copyright, in the same sense that you can live and breathe Taylor Swift's music, create new music in the same style, and still not be violating copyright.
The Stability AI case, and how Judge Orrick deals with it, will be interesting and uninteresting at the same time. It deals primarily with the fact that after specific prompting, an image-generation AI can generate something fairly close to existing copyrighted images. That doesn't say anything more about whether LLMs are inherently producers of [only or primarily] derivative works, just as the fact that a human can violate copyright doesn't say anything about whether humans primarily or exclusively output derivative works.
More likely, perhaps, is that everything will be so infused with LLM output that copyright ceases to be relevant, or forces copyright law to be rewritten from the ground up.
Copyright requirements, even prior to LLMs, weren't well-specified. There's no objective threshold for how close something has to be to a previous work before the new one violates copyright. It's whatever a judge thinks, refering to the 4-factor test but ultimately making subjective judgements about each of those prongs. It's all a house of cards, and LLMs may just be what topples it.
Machine translations do not contain a literal quote of the source material, yet are covered by copyright. For instance, binary executables do not quote the source code.
I predict that the LLM will be regarded as a binary-like machine translation of the source materials.
Lossiness is a red herring. You can't claim that a JPEG photograph doesn't violate copyright because JPEG is lossy.
Journals are not about providing access to science, much less public access.
Journals are an academic-career-advancement service. It therefore makes sense that they do not pay academics. You don't pay your customers.
That means they need to generate a secondary customer base elsewhere, who will pay. Those secondary customers happen to be the employers of the academics who are the primary customers. That socializes the cost of providing the service, since academics individually wouldn't be willing and able to pay.
Once journals have established a reputation, their policies and paywalls and fees are the result of trying to signal exclusivity and set an optimum market price.
Until the supply side of the research market largely agrees on a way to use open-access repositories like arXiv as a primary career-advancement signal, complaining about closed-access journals is tilting at windmills.
Changing the law to prevent journals from being able to copyright anything could potentially force the research industry to rapidly develop a new solution, but at the cost of short-term chaos and career instability for new academics.
How well would anything like that work in practice?
First of all, would we restrict all internet access, or just access to certain known sites and VPNs, letting everything else through because it's too insignificant even if it technically might merit being blocked for kids? I don't think a global internet block for minors is a good idea.
On wired internet, restricting access for devices that aren't clearly tied to individual users is problematic. Imposing age verification overhead on anyone who runs a network is unacceptable and unworkable. Locking non-mobile devices to individual users, in order to have mandatory software that blocks or sends age signals to the ISP, is also unacceptable and unworkable.
For mobile devices, maybe. There's a privacy problem if it's required for sim cards to be paid using credit cards, but if we do that, or if that's already effectively the case, I think it's fair that anyone who has an active credit card should be permitted on the "adult" internet. For multi-line accounts, we could make it a crime for the account holder to misrepresent age of the user of a line, i.e. to claim they're an adult when they're really a minor. Not very different from minors and cigarettes. It's not universally illegal for a parent to supply them, but it is in some places, and it should be.
I posted my plan forward but essentially kids get a whitelist at best. For example, a kid friendly access device allows a network connection to a vpn server certified safe for kids and then take it from there with whitelisted destinations. Blacklists are just whackamoles.
You can easily play or spectate a low-unit count game of BAR on any decent 2010+ quad core.
Such a computer won't allow you to play 8v8 that goes into the late-game stage. Sometimes not even 4v4 or 2v2 with players scaling to high unit counts. Some players try anyway. Ignoring player disconnections, half the drama of large-scale games is the one player who's lagging because they're on a potato computer. If the sim doesn't lag, the game will at least be down to single-digit fps.
That means you can't really play multiplayer comfortably, at least not beyond 2-4 players.
For that, you need a recent ryzen or intel. I'd estimate recent as post-covid.
I don't know what combination of things is important; there's larger cpu caches, faster sustained CPU frequencies (TDP and cooling matter there), hardware mitigations for speculative execution bugs, faster ram, resizeable BAR support... but in my experience going from a 6-core skylake-era cpu to a ryzen 9xxx, with the same gpu, made a massive difference. I saw no massive improvement going from a 4-core 2010-era cpu to a 6-core skylake-era cpu; I'd classify both as potatoes for BAR purposes.
Electric engineering talks about parallel and series. (including the old parallel and serial ports on computers, before almost everything became serial)
Programming talks about parallelism or concurrency or threading. (single-threading, multi-threading)
Or synchronous and asynchronous.
The legal system talks about concurrent and consecutive.
Process descriptions might use "sequential" rather than consecutive or series.
"Linear" is another possibility, but it's overloaded since it's often used in reference to mathematics.
Someone who works out every day will obviously have different metabolic and microRNA profiles; assuming that line of research holds up and those biomolecular profiles make it into the zygote, survive many replication cycles, and act as developmental signalling molecules affecting gene expression during embryonic and fetal development, there could be life-long effects.
What can't happen is inter-generational transmission of particular subjective experiences that aren't paired with specific, unique metabolic, hormonal, and gene-expression signatures. Only biomolecular-mediated phenotypes, the most general and obvious of which would be things like stress or exercise or diet, make sense to be transmitted that way.
For instance, someone who's chronically afraid might transmit some kind of stress/fear modulating signals to offspring. Someone who's afraid of a specific thing, however, cannot transmit fear of that specific thing unless there's some incredible and unexplored cognition-to-biomolecular signalling mechanism that's entirely unexplored and undescribed. Therefore, I don't know why the article uses the term "lived experience", which is too broad a term to describe what the research suggests might be occurring.
> Someone who's afraid of a specific thing, however, cannot transmit fear of that specific thing unless there's some incredible and unexplored cognition-to-biomolecular signalling mechanism that's entirely unexplored and undescribed.
While there is absolutely no conclusive evidence, there are a few studies that indicate this is a possibility.
But if, evolutionarily, there are only 20 common recurring threats that you need to fear (but each comes at some kind of cost, like you won't hunt in an area that would otherwise provide food), it would make sense to pass on those fears in a generational way. So the possible things come from a preset list that has evolved over millions of years, that recur over and over but only in specific times and places.
We know that severe stress (such as trauma) leaves chemical marks on the genes, potentially passed down to the offspring. For example, this paper writes about an “accumulating amount of evidence of an enduring effect of trauma exposure to be passed to offspring transgenerationally: https://pmc.ncbi.nlm.nih.gov/articles/PMC5977074/
Though “lived experience” can encompass a lot of things, it definitely encompasses severe stress.
For example, constantly worrying about money because you’re poor can definitely put you under severe stress. Also, growing up without secure attachment to your caretakers, being asked to do role reversal (having to take care of your parents as a child), things like that will generate complex PTSD.
The comment you’re replying to suggests “lived experience” is too broad, not too narrow. The issue isn’t that it fails to include your example. It fails to exclude other things. Part of my lived experience today was seeing a manatee. It is unlikely this will be passed on.
And the comment you’re replying to suggests that since many lived experiences are plausibly heritable, the term is appropriate. In any case, the context in which it is actually used in the article seems beyond all but the most pedantic reproach:
>The first is how a father’s body physically encodes lived experience, such as stress, diet, exercise or nicotine use
And that’s a single sentence partway through the article. From the beginning, the refrain is the list of the sorts of things that seem to have heritable effect, not the phrase “lived experiences”.
>Research into how a father’s choices — such as diet, exercise, stress, nicotine use — may transfer traits to his children
>Within a sperm’s minuscule head are stowaway molecules, which enter the egg and convey information about the father’s fitness, such as diet, exercise habits and stress levels, to his offspring
Etc.
The article is clearly not attempting to suggest that all experiences are heritable.
It feels so wonderfully weird reading about some else seeing a manatee today. I too saw a manatee while walking with my kids today. The interesting part was our navigational strategies complementing each other (me – misremembering the details of a road closure, and them - getting curious about what a bunch of people at a marina are looking at) to find a group of manatees in a place we didn’t know they can be found.
A lot of this is transmitted via the language. The stories we form as a result of events in our lives, have power to set our values in all areas. These myths of the self, have what is essentially a value manifest for someone. And these myths, can be so strongly held that it will influence the person and family’s moods, actions, habits.
What is important is to note that there are many formulas for consciousness. Some are truely bonkers, some are just fundamental truth. And some… have yet to be discovered.
Permutations and combinatorics create a hyperspace of all ridiculous things!
> The authors pointed out “there are significant drawbacks in the existing human literature” including “lack of longitudinal studies, methodological heterogeneity, selection of tissue type, and the influence of developmental stage and trauma type on methylation outcomes”
The literature in this area is a mess, has become highly politicized. I’d give it another 10 or so years before I made any strong statements about these effects in humans. Famously the study of Holocaust survivors’ descendants didn’t show transgenerational effects.
One might also argue that The Little Prince is "far more complex" and deeper than anything written at a typical adult reading level. That lower linguistic surface complexity allows more space for the reader to explore ideas and themes.
I'm skeptical. Is there no more value to series like Gormenghast, Book of the New Sun, and The Second Apocalypse, beyond mere literary masochism, compared to LotR? Like them or not, LotR, as elaborate as its mythology is (if you include Silmarillion and some or all of the History of Middle Earth), is not at the same level.
One would like to point out that the set {Gormenghast, Book of the New Sun, and The Second Apocalypse} is not a subset of {fantasy books I have come across}. I would not dare to claim that LoTR is the end all of all fantasy writing. Perhaps the word ”complex” was a bad choice here, since I’m sure there is books with more complex structure (which is not necessarily a good thing…)
I think what I tried to say is that the language Tolkien uses is as much or more part of the middle earth as are the characters, maps and whatnot. The obvious point is that he created whole new languages and writing systems for the book, basing the two Elvish languages on Finnish and Welsh etc. Other is that he changes his vocabulary depending on what he is describing. I am not a linguistic scholar, but I’m fairly vertain that at least in Two Towers the parts describing nature, forrests and whatnot use solely words that are celtic in origin, ie. no Latin influence and very old. There’s also structural techniques of interwoven plots that I can’t even start to unwind.
Point being, you can very much read the book on surface level as Frodo and the Ring and Swords and lah-di-daa and that is all fine. That’s how I read it when I was 12-13. But there is so much more, mastery of English language comparable only to Cormac McCarthy and Joyce… Here Tolkien is very much a singular writer, escaping the limits of genre he very much was essential in creating.
So no wonder it is perhaps the most influential book of the 20th century.
Any reverse linkages, by the Zodiac killer referencing the Black Dahlia killings, are potentially explainable, even the "deathbed" Elizabeth painting, as an interest in a historical murder[er].
What's interesting and not easily explainable if true, however, is the suspicion that the Black Dahlia murderer used a motel that at the time was called the Zodiac Motel. That forward-connection would've taken someone obsessed with solving the Black Dahlia murder, not just interested in the nature of the crime; assuming the theory about the Dahlia murder location is correct, the Zodiac killer would have had to solve the location of the Dahlia murder by himself, and then use it as an in-joke for a later series of unconnected murders.
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
reply