I suspect they train the network on a sample size of one and then ask it to generate something, or at least that's the idea. The project seems like a fairly obvious rag on Copilot.
I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?
I've seen tons of comments here and on Reddit that talk about multiple instances of entire functions being copied verbatim, but the only thing even remotely close to that I've seen is the fast inverse square root, so I must have missed a few tweets or something.
> Is it clear how many people have access? If access is fairly limited, I'm not surprised at the low number of examples.
I wouldn't be too surprised either, what surprised me was the large number of people who explicitly said words like "many examples" and then couldn't link more than one.
> Here's an example from their own docs:
Thank you! This is definitely a much stronger example than the Quake one. Looking at the git repository it seems like this is generic startup code for a university assignment, so it probably shows up a significant number of times in the training data.
While it definitely makes sense to interpret a piece of code showing up several hundred times in the exact same format as being okay to straight-up copy+paste (e.g., imports, some boilerplate in more verbose languages, keyboard input switch statements), this does seem to highlight that Copilot can't distinguish between code snippets that are always identical because that's just the correct way to do it, and code snippets that are always identical because only one person did that thing, and it just happens to be in hundreds of repositories for one reason or another.
I wonder how many of these verbatim examples are because the training data code itself occurs multiple times in GitHub because it in turn was copied from an upvoted answer in StackOverflow, LOL.
It only seems to if you give it no or very little "source" input, like an empty file with a comment that says "// X algorithm".
There's been a lot of bikeshedding on this, but GitHub decidedly hasn't given enough information on how it works and what the training dataset is, and the fair use question definitely needs to be answered, maybe even in court - it's just a matter of time.
What I think is interesting and not talked about in the copilot cases is there is actually two different copyright actions that come from using it.
The first is if GitHub had a license to distribute the code they have learned on. This is assuming that the code it produces is a derivative, which I don't think would be much of a stretch. For this I think most of the licenses that are used, GPL MIT etc, allow for someone to distribute in such a way.
The second is the user of copilot. They would be getting and distributing code where they wouldn't know the original license, but that wouldn't be a viable defense of infringement. To actually comply with most licenses they would have to follow the requirements.
In both cases I don't know if a fair use would really apply. Maybe GitHub could stretch the research aspect, but if you just use the code in your product there is no fair use.
> This is assuming that the code it produces is a derivative, which I don't think would be much of a stretch
I don't think that's easily settled. Is the code I write a derivative work of all prior code that I've learned from? The difference (IMO) is that Copilot has a much more precise memory than I do; I couldn't recall the exact text of a code example that I've encountered repeatedly even if I wanted to.
Not at all, you'd still be violating all the non-gpl licenses. Just because GPL is more restrictive doesn't mean it somehow supercedes all other licenses!
Not quite, there are far more aggressive licenses on repositories on GitHub, such as SSPL, which in theory by providing the SSPL licensed code, Co-Pilot should be open sourced under the same license to comply.
Is it though? I'd assume that too, but we don't really know, do we? I mean to ask, what have Microsoft stated that leads you to believe this? (Maybe there's a press release I missed?)
I have access and hints for entire functions have only appeared from code already in the same file. It uses the file you are in for context.
It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder.
I have had access for one day, spent my entire Saturday playing with while working on an addon for a game. I find it useful and most of the hints come from other code I have in the same files which saves me time or let’s me know when I’m too repetitive :)
"It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder."
So the article mentions an issue reported by dtjm annd the maintainer answers the issue explainig that the keys are fictional. I don't see any statement by the github CEO about this being an issue as the article says.
While that's obviously a UX flaw that definitely shouldn't have made it to release, I find it hard to envision it ever actually being a problem. If someone's accepting auto-generated copyright notices and licenses that don't actually apply, they can't really point the finger at Github - it's called "Copilot", not "Pilot".
The problem with full snippets of arbitrary obscure repositories being copy+pasted is that there's no realistic way for a user to know if that's happening without putting in more effort than just writing the code themselves, somewhat defeating the point. That's not really case when the first line of the file contains "(C) Someone Else 2003".
Thats much more than just a UX flaw, it's clearly showing that copilot just regurgitates code snippets from other peoples opensource. Which draws back to the issue of "did copilot AI actually learn from other peoples code or is it just copy+pasting other peoples code?". I don't think people are concerned about copilot user's liability/productivity.
That's irrelevant, if copilot is copy+pasting random licenses then it's definitely copy+pasting code as well, I don't see how this could be hard to believe.
You said "copilot just regurgitates code snippets", and the example given wasn't a code snippet. Of course it's relevant.
> if copilot is copy+pasting random licenses then it's definitely copy+pasting code as well
Not at all. Copy+pasting licenses is intended behaviour. That's the entire point of having standard licenses. No-one would want Copilot to try to make its own entirely new licenses, therefore it's very trivial to imagine a piece of software designed to allow for licenses to be copied verbatim, but not allow that for any significant chunk of code.
Copilot does not appear to be designed like that, but that doesn't make the logic correct. We only know that it isn't designed like that because we looked for an example of it actually copying arbitrary code.
> I don't see how this could be hard to believe
I don't care about believing it, I care about whether or not it's true. If someone claims to have seen Copilot copy+pasting code, and when asked for evidence their only example is it copy+pasting some license boilerplate, they're just straight-up lying - and worse, potentially drowning out any real examples that exist.
Other people here and elsewhere have demonstrated direct code copying. It's not hard to find.
The license/copyright notice (and note that the copyright notice with the names of individuals easily invalidates your assertion regarding boilerplate) issue is that it's clear that no due diligence was done on this product. No one (at least no one with the authority to stop it) spent any time reviewing what this product could produce, because _producing a copyright notice with a random developer's name on it_ is so beyond the pale that even the most cursory review by someone who knew literally anything about the law or copyright would have stopped it in its tracks.
> I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?
It's not important that the examples are really specific, the issue is that it does happens. This tool has the potential to infringe copyright (sure it does seems legal at some place, doesn't make it more right though).
Look at how EA reversed engineered the Genesis in the past [1]. They had 2 teams, one that did the reversing, and another that did the implementation. That made it safe to say that no infringing code was going through. Plenty of emulators developers try to avoid source code leaks for similar reason. Co-Pilot can't do that.
The fact that the copyright status of the code is hard to determine, doesn't means it's not copyrighted code nonetheless.
Personally I got nothing against this kind of technology, but I do have a pretty big issue with how it learn. Once people published their code on Github, they didn't know it could have been used to do machine learning and it's bad that it is. If Github asked for the copyright over the code to do Co-Pilot, I wouldn't mind.
Github did an analysis and found that it does do it, though very rarely, and usually when it has little context (e.g. at the start of a file). They're working on detecting those cases though so it doesn't happen accidentally, so it is unlikely to be a realistic problem.
I remember that the guys behind The Pirate Bay actually made a service like this back in the days. You would submit a song and get a mashup back of cuts from other songs where each cut would be short enough to fall under fair use. I can't find any references to it online anymore though. Maybe someone else remembers what the service was called.
I recommend anybody fascinated by this rabbit hole check out the album "The Grey Album," the documentary "Good Copy, Bad Copy," and the album "1987 (What the Fuck Is Going On?)". Or just the Wikipedia articles for them to start. Amazing world.
"each cut would be short enough to fall under fair yet" huh? That isn't good fair use works. There is no such thing as "short enough" fair use had to do with how you use it, not how much of it you use.
As weird as it may seem, you should not forget that free software licenses are built upon the fabric of copyright. Without copyright, free software could not exist in its current form.
For GPL-like "copyleft" licenses, there would be no way to enforce that binary distributions of derived works are accompanied by their source code. Similarly, in the context of permissive BSD/MIT-style licenses, there would be no way to enforce attribution.
So, given that FOSS---which a large portion of the HN crowd depends on---cannot work without copyright (at least not in its current form), the recent discussions may be less of a surprise.
Maybe... although I personally think that the GPL and other 'copy left' licenses aren't the reason open source has prospered, nor do I think enforcing attribution really helps the FOSS world that much.
People write and share code because it is useful to do that, not because licenses require them to.
I think FOSS would do fine with no copyright, and in fact more software might end up open source if we had ZERO copyright... why not make your code open source and get back contributions when your code would end up being shared anyway?
IMO Linux won because its license forced everyone who used and extended it to open their changes - this changes the calculus for firms building products on Linux to make it more worthwhile to upstream their changes to reduce their local maintenance burden.
Compare this to something like the playstation 4, which used freeBSD as the base of their OS, and contributed nothing back to the project at all.
Linux "won" because SGI very loudly shit the bed and betting on Itanium, taking IRIX with it, DEC was in the process of making the future of Tru64 very confusing with the Alpha, a computer that didn't support their flagship OS, and HP was EOLing HP/UX because they bet everything on IBM's OS/2 Warp.
The only big UNIX vendor left was Sun Microsystems, and Solaris indeed dominated the 90s dotcom era. Everybody was running SPARC and SunOS servers.
It wouldn't be until the mid 2000s when Linux started picking up the pieces left behind after Red Hat started their server product and certification program.
For a long time, Linux was strictly a hobbyist OS. It later dominated by simply being the last one standing after everyone else fell.
The true competitive threat to the Unix vendors wasn't each other, it was Microsoft.
And "for a long time" was actually a fairly short time. Linux began to approach feature-comparability fast, and ran on PCs, not $10,000 workstations (that were getting beat power-wise by PCs).
This seems a bit of a chicken and egg problem... why would the early adopters have chosen Linux, before others had been forced to contribute back their changes? The first company to adopt it wouldn't have received any benefits, only an obligation. Why pick it over BSD?
I think there were likely other factors that made it win out.
They would also get the "promise" that the system they were betting on would also get contributions from other companies, making a safer long term bet.
No. AT&T created Unix but was unable to market it due to a previous antitrust action. So they gave it away. (They required a license signature (I've signed that! :-)), but did not charge and were very lenient.)
The UCBerkeley Computer Science Research Group (if I've got my acronyms right) was one of the recipients and went on to add on many, many features and releasing the result under the BSD license.
Many people built companies like Sun around selling BSD Unix, including many alumni of UCB.
Then AT&T got out from under the consent agreement and began selling its own Unix, System V. By this time, Unix was a major player in the workstation market (a market that has largely disappeared as PCs got more powerful).
By, say, the mid 1980s, there were many, many companies selling many, many varieties of Unix, all descended from the original Unix via BSD or via BSD+System V. Most of them had some unique, valuable features (Irix's graphics, AIX's LVM and journaling file system, etc.) and all of them had modifications to lock customers into their version. This is where the POSIX standards come from (second only to ecmascript market manipulation goofiness), and why things like Autoconf/automake and the much-loved imake (not really) exist. There was much in-fighting; Sun vs everybody else, everybody else vs. IBM, etc.
Then two things happened: PCs got more powerful and began eating into the bottom of the workstation market (PCs ran DOS+Windows, which was and arguably still is, technically inferior to multi-user-by-design systems like Unix[1].) And PCs got more powerful and began to be able to run more advanced OSs (think "memory management").
At this point, the Unix world began to conflict with the Windows world. Unix was technically superior, Windows had more public and developer mind-share. But the Unix world was still more interested in fighting each other and stapled all of their arms and legs to that particular tree.
The end result was that Windows became and remains the most-used operating system[2]. All (almost) of the commercial Unixs died (almost; there's still some animated corpses around)[3]. The two counter-examples are MacOS, which is completely locked to Apple hardware, and Linux.
Linux is the interesting case. Windows and commercial Unix all had a 15- to 20-year head start. But Linux achieved (mostly) feature parity quickly and did not break down into multiple, competing streams. Both of those are due to the GPL; you can fork GPL software all you want, but you cannot add a feature to a fork and expect it not to be back-ported into the original if it's useful. You also have a very hard time locking users into your fork.
The bottom line is that Microsoft won the Unix wars, because the Unix licenses allowed companies to take Unix proprietary.
[1] Modern Windows is kinda-sorta based on VMS, another workstation OS, but not really and then they walked that back, and so on....
[2] I don't really consider Android or iOS to be general-purpose OSs. And they're both rather their own little islands, no matter how much the underlying tech shares with the rest of the universe.
[3] The Free/Net/Open/DragonFly BSDs are, I'm sorry to say, noise. And did you notice that I had to mention four of them?
There were other open source licenses at the party before the GPL dropped its controversial "viral" turd in the punchbowl - and many of them still exist nearly unmodified. (e.g. BSD with attribution removed, etc.)
That just isn't how things work. I believe you are making the wrong assumption that the same amount of open source work would exist. If that were the case, then yes, it wouldn't matter that much. But, a lot of contributions to open source wouldn't have existed if it wasn't for the licenses. That's the examples you've been given.
There might be some contributions that wouldn't have happened, but there also might have been others that did happen. My hypothetical was in a world with no copyright or license at all.... so all proprietary code would be copyable, too
Proprietary code would by definition not be copyable, because it would be... proprietary. It's the opposite to open source. The thing that incentivises open sourcing proprietary source code is exactly things like licenses... Your imagined scenario is just "no license for open source, and free reign for closed source". It makes no sense to me.
If no code was copyrightable nor licensable, people could reverse engineer any code they get access to. It would be hard to keep code completely proprietary. You would not be able to distribute your code at all.
I am not saying this is the way we want to go, I am just curious about the thought experiment.
Ok, if it's for a thought experiment then I'll play along. Reverse engineering source code is not as easy as you make it out to be. Not only that, but reverse engineering in the sense of black box is not a violation of most existing licenses. And, if we're talking about decompiling code into something useful, that too is a tall order.
> When did the HN crowd become so defensive of copyright?
Copyright is good in limited quantities. The current multi-decade time horizon is probably what a lot of people are against, and not the concept in general.
And limited time period seems to be consistent through history. From the paper "Copyrights and Creativity: Evidence from Italian Opera in the Napoleonic Age":
> Comparing changes in the creation of new operas across Italian states with and without copyrights, we show that the adoption of basic copyrights encouraged the creation of new work. Moreover, we find that copyrights changed the quality of creative output by encouraging composers to produce more popular and durable works. These results generalize to a broader set of musical compositions and to librettos, as the literary component to the score of operas. Based on these findings, we conclude that the adoption of basic levels of copyright protection – not exceeding the lifetime of the composer – can help to raise both the quantity and the quality of new creative works.
> Importantly, we find that extensions in the length of copyright beyond the composer’s life did not encourage creativity. Performance data reveal that few operas were played after the first 20 years, which suggests that only the most durable creative goods stand to gain from copyright extensions. […]
Because its my (and many of ours) code they have "learnt" from, stripped the license and are intending to sell on. When we listed code under MIT or GPL we meant those licenses, they weren't random and Microsoft just seems to be completely ignoring the reality of reproducing those works which are covered by those licenses, they are making code private and paid for that is open source. Not OK.
"The heathen are sunk down in the pit that they made: in the net which they hid is their own foot taken"
Copyright is a horrible system. Microsoft has been one of the biggest proponents of that system. But now they've clearly violated it. They should either join in abolishing it, or face its consequences.
Consider people's reaction to people selling boot leg DVDs vs torrenting a movie. Although people may consider both morally incorrect the corrupting profit motive results in the former being seen far more negatively. In the current situation there is also the matter that the Microsoft is still perceived rightly I think very negatively and open source authors very positively. Also in a David v Golliath situation nobody wants to be seen rooting for the giant.
Personally I would be concerned about insert corp here accidentally stealing code from an open source project then years later going after the open source project for copyright infringement regarding the code they in fact stole from the open source project.
Probably because when poor people give something for free to other people to lift them out of poverty it is called empowerment, but when billionaires take free work of poor people for their own personal selfish gain it is called - exploit.
Say your AGPL code is Copiloted into someone's new program and they decide to release that under a non-free license; that's the issue. We're defensive of copyleft.
It's hypocrisy. People will defend entire books and research papers being shared on libgen/scihub, which is unarguably actual copyright infringement on a massive scale, but training an AI on open source code is somehow the worst thing ever even if there's no case law to say that this constitutes infringement at all.
It's not really that hypocritical when you understand peoples perspective on it. In general people (here on HN/OS-community) care about sharing experiences and knowledge. Copyright on opensource content does not inhibit peoples ability to learn from it. Theres also the whole "big corp vs little guy" mentality at play here. If copilot was opensource then I don't think that anyone would have an issue with it, I actually think people would respond well to it if that were the case.
It's self-serving hypocrisy when programmers argue for a very strict reading of copyright for their own works (source code) while at the same time arguing for unbounded sharing of other people's works (books and papers).
If you agree with libgen and scihub then you should definitely not see anything wrong with copilot, because what copilot is doing is considerably milder, they are not reproducing anybody's works.
If you disagree with libgen and scihub because they are illegal only then can the argument that copilot is infringing be made.
> It's self-serving hypocrisy when programmers argue for a very strict reading of copyright for their own works (source code) while at the same time arguing for unbounded sharing of other people's works (books and papers).
Yes, if those were the only two things being discussed, it would be.
But almost all of these people think a certain amount of copyright for songs and movies is good too. And I bet a vast majority of them support copyright for books too, or at least most books.
The vast majority of research authors want their work to be freely available. The industry is just organized in a way that prevents that from happening without harming your research career.
I'm all for copilot if microsoft gets treated the same way libgen/scihub are for creating it. or if we abolish copyright, but the fact that they can just decide to do this and it's fine, but scihub gets DNS-blocked reveals the asymmetry at hand here.
Strangely, a lookalike of a music hit is nothing like the original, and it’s worth analyzing!
- Music is a vehicle for a common experience. Everyone knows the next notes of some Lady Gaga song. We feel like learning the lyrics will make us able to sing together if we were in a club, and share something with other clubbers. Any AI who would reproduce the voice and instruments would still not make you feel like you are sharing a common moment with the rest of the auditors,
- Hits are hits because we hear them a thousand times. It’s been proven that people don’t necessarily like it the first time. It’s the familiarity with the song which make us like it (or hate it when we’ve heard to too much).
- Even worse: We like some songs even more because we love the author. Be it because they are politically involved, have a cute face, has a nice life story, or seem to hide answers to life in the lyrics of their work - But an AI producing the same exact notes wouldn’t trigger similar affection from us. It’s like hearing our kid singing: Very cute, but we wouldn’t like the same song by another kid. Audiences have a genuine emotional attachment to the authors. It’s especially visible since the MCM revolution: Before MCM, music mattered; Now the image matters way more, bands have a face, a graphic style, a story to tell - and music could be as crap as possible, if we like the band it can still have success. MCM changed music forever, proving that AI can’t replace that feeling.
I hear a lot of stated assumptions on how certain things trigger emotional investment and other don’t.
If you knew how manufactured the music industry was, and how nothing of what you see of celebrities is true, it might as well be AI plucking our heart strings, because it isn’t “real” in the sense that I think you mean, authentic human connection over shared experience.
So when are neural nets trained on images or text going to be confronted with the same copyright concerns? At the point that GitHub has forced the issue into the spotlight with Copilot I feel that it's only a matter of time before this reaches the courts. Nobody seemed to care about copyright at the time people were having fun creating AI dream collages or nonexistent anime girls from a model trained on the Danbooru imageset. In the latter case it's not clear that 100% of the original Pixiv and Twitter creators gave their consent to have their work rehosted on a different site in the first place, much less be involved in ML experiments. That data was from 2018.
I'm almost tempted to believe that the people at GitHub knew this was going to blow up as much as it did as some kind of a challenge to the status quo of copyright and licensing, if only so that everyone would start talking about the issue. Why did the GitHub representative plainly state that Copilot was trained on all of GitHub's codebase without seeming to care about the pushback on Twitter and HN that was bound to happen as a result?
I remember Web Font Player before dynamic fonts became available. You could upload a copyrighted font. Say Microsoft or Apple's font. It will trace that font and generate your "Web Font" and then you could use it without any copyright issue as that's not the original font rather a machine learnt one.
In the US, fonts (a computer program) are copyrightable but typefaces (design of letters, numbers and other symbols) are not. So you can basically just redraw your own Helvetica with no copyright issues.
Theoretically, isn't a "functional" version of this program possible? Machine-learning aside, I have to wonder how precise platforms like ContentID are, and how easily they could be fooled by an algorithm designed to fool the system with near-imperceptible changes (eg. cutting out certain high-end freqnecies, dilating the speed of the track by very small amounts, etc.)
I don't understand why anyone has an issue with that. You know what else can copy and paste code all the time? Humans. But we have various ways of stopping employees who copy and paste code from stackoverflow and github without checking the license, so it's the same thing if you use one of these tools. There's nothing new I can see here to be upset about.
This would be a lot more interesting if it showed the various GPT-3 experiments at generating music and used that as a point of comparison.
> But we have various ways of stopping employees who copy and paste code from stackoverflow and github without checking the license
What would those be? I’ve worked at a number of organizations that were (rightfully) paranoid about accidentally incorporating GPL code, but even there I wasn’t aware of automated tooling to prevent it, it was only enforced through developer vigilance.
If you actually want a paid service, there are plagiarism detectors like Fossa and Codequiry. Although in my opinion, code review should be enough to catch any "accidental" incidents of plagiarism, the differences in writing style should make it very obvious when the employee has copied something. That of course probably won't apply if you suspect the employee is intentionally changing it around to obfuscate the origin of the code, but it seems that wouldn't be the case if they were just committing the output straight from a neural net. But automated scanners probably won't be able to catch those well either -- the way to catch that would be to make them do pair programming a lot.
>Although in my opinion, code review should be enough to catch any "accidental" incidents of plagiarism, the differences in writing style should make it very obvious when the employee has copied something.
You must do some CSI level code reviews. Best I'm able to do is figure out if code will work and if something can be done obviously better. Stylistic calls (beyond lint enforceable) are up to authors as far as I'm concerned.
And even then it's trivial to fix up naming schemes and such to march codebase - doubt that gets you out of copyright issues.
I mean in cases where someone just copy and pastes something without making any effort to match the style, or in cases where they can't explain what a piece of code does or how they came up with it. You should be able to spot those very easily in code review. If somebody is trying to fix up the naming schemes to avoid being detected and for whatever reason is able to explain the code perfectly, then I'd imagine that person would probably be doing the same bad things regardless of using copilot -- it's not like it's hard to search stackoverflow and github for code snippets.
This fast inverse square root function is very well known, with even a Wikipedia page, and it is more than 20 years old. My country doesn't have software patents but it seems that the standard duration of a software patent is 20 years, so even if this function was patented, the patent would have expired by now.
There is no real reason for copyright terms to exceed patent terms.
(And FWIW, patent terms should be inversely proportional to the number of patents issued in that category the previous year. This would automatically reduce terms in categories where innovation is rapid, promoting competition and drive to get to market, but preserve maximum protection for inventions in mature categories with a slower pace of innovation.
Copyright and patent are different. Also, while you can’t copyright an algorithm, your specific source code that implements it is copyrighted (assuming it’s sufficiently original).
In this case it’s not implementing the algorithm, it’s copying a particular famous implementation, down to the comments.
This truly is the engineer's disease. Hundreds of incredibly strong opinions about the legal system derived almost entirely from a few tweets and zero experience outside of software engineering.
Copilot is neat. If you are concerned about it, talk to a lawyer and get their opinion.
I sense a sleigh of bitterness in the programming community after Copilot ;) It’s the rumbling sound of the imagination of a thousand people throwing the towel saying “What now”.
Yeah, why would anyone be bitter about a giant corporation creating a commercial code laundering machine that digests a massive amount of copyrighted code and spits out "clean" code free of all the burdens of its inputs?
The same as million of people who unvoluntarily give their real-time GPS coordinate to Google so they can tell everyone where the jams are? The same as CC owners (everybody) whose data gets constantly harvested to produce models for Walmart? We benefit from wages for our programmer and AI skills in a way that is often used unethically by companies, it’s quite fair that we feel the heat.
I don’t know, GitHub Copilot was an expected result (why would anyone propose to host most of civilization’s open source projects for free? SourceForge did worse with shipping malware into people’s binaries.)
It is a bit disgusting, but we programmers have done it to the rest of the world.
Hey, speak for yourself. I've been speaking out against such practices for years in the same way that I'm speaking out against Copilot.
If you use "but there are other horrible things!" as a justification to trivialize one horrible thing, what do you think the outcome is when that strategy is applied to the world at large? Bingo, "nothing matters" and no consequences ever follow malicious behaviour, because there's always something else that's also bad.
I am pretty sure that neither my GPS coordinate information or credit card details are copyrightable works.
And I'm pretty sure I've done none of those things, so I don't really know where this "programmers are the scum of the earth" generalisation is coming from.
It's a joke ragging on Github Copilot, which suggests to its users code on github regardless of its copyright. The claim is that any code written with Copilot does not infringe since it's 'machine-generated code'. Github Copilot takes github code, learns it and then feeds it to users based on prompts but you can end up essentially copy and pasting an entire copyrighted snippet. This satire site takes your uploaded mp3, 'learns it' and hands you back the same mp3.
Nice try, but ... Co-pilot can be sued for copyright infringement in specific cases. Therefore it doesn't mean you can get away with copyright infringement if you copy Co-pilot's model.
I have some public code with non-free licenses on GitHub and I'm really hoping copilot can reproduce some of my functions without changes. If they do I'll see what they do about DMCA requests for those. But... still waiting for access.
> If they do I'll see what they do about DMCA requests for those.
DMCA takedown requests apply to hosts of user-generated content, which MS/Github doesn't pretend to be with Copilot (instead, Copilot is a first-party work that MS has stated a copyright theory regarding), so DMCA requests are meaningless. What you need, if you feel MS is violating your copyright via Github, is a copyright demand letter backed by a willingness to pursue a lawsuit against Microsoft, who have already stated that there position is that Copilot is completely within the Fair Use exception to copyright.
> Who owns the code GitHub Copilot helps me write?
> GitHub Copilot is a tool, like a compiler or a pen. The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it. We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.
> Do I need to credit GitHub Copilot for helping me write code?
> No, the code you create with GitHub Copilot’s help belongs to you. While every friendly robot likes the occasional word of thanks, you are in no way obligated to credit GitHub Copilot. Just like with a compiler, the output of your use of GitHub Copilot belongs to you.
While you may be right, that contradicts what Microsoft has to say.
So having tried it out the song sounds...exactly the same. So does this just make it that when it's played these detection systems can't pick it up since it's somewhat different? Or if I make a commercial product, include this version of the song, I can somehow afford lawyers to defend myself when the music industry sues me for using what sounds like the same song, just with the 1's and 0's ordered a little differently?
I think that's the joke. It literally takes the exact same song unaltered but it says it's "using machine learning", "fair use" etc. to give the pretense of it being legitimate.
This is most likely a commentary on GitHub co-pilot and how the authors of this joke think that GitHub co-pilot is violating copyright and does not fall under fair-use.
I just confirmed the "processed" file has the same SHA256 sum as the original.
EDIT: I incorrectly labelled at Google co-pilot instead of GitHub co-pilot. Fixed.
He should not have faked the machine learning, but I like the idea. (And I hope that copyright fail to exist, or at least gets adjusted to what it should be... aka, the effectiveness should be the same as patents (not forever and not easily extended))
How do you go to this much effort to make a point without even reading about how copyright and fair use works? There have been multiple comments on HN and Reddit explaining how it doesn't work like this.
Probably because it's not really all that much effort (nor money) to toss a simple satire page like this one up on the web. Easier now than it's ever been, honestly.
It's just a bunch of a sleep(random()) and visual changes on viewport and you download the exact file you uploaded