Tool to convert copyrighted music into fair use

steelbrain · on July 10, 2021

In case you don't get it, view the source :)

It's just a bunch of a sleep(random()) and visual changes on viewport and you download the exact file you uploaded

stavros · on July 10, 2021

I suspect they train the network on a sample size of one and then ask it to generate something, or at least that's the idea. The project seems like a fairly obvious rag on Copilot.

tyingq · on July 11, 2021

Click the "Careers" link in the top right.

stavros · on July 11, 2021

Ahh, yep!

cblconfederate · on July 10, 2021

The page code was written by co-pilot

londons_explore · on July 10, 2021

I tried to play the original and downloaded music a bunch of times to try to figure out any differences...

sellyme · on July 10, 2021

I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?

I've seen tons of comments here and on Reddit that talk about multiple instances of entire functions being copied verbatim, but the only thing even remotely close to that I've seen is the fast inverse square root, so I must have missed a few tweets or something.

tyingq · on July 10, 2021

Is it clear how many people have access? If access is fairly limited, I'm not surprised at the low number of examples.

Here's an example from their own docs:

Compare:

https://docs.github.com/assets/images/help/copilot/example_r...

To: https://github.com/nilavghosh/OMSCS/blob/master/AI4Robotics-...

sellyme · on July 10, 2021

> Is it clear how many people have access? If access is fairly limited, I'm not surprised at the low number of examples.

I wouldn't be too surprised either, what surprised me was the large number of people who explicitly said words like "many examples" and then couldn't link more than one.

> Here's an example from their own docs:

Thank you! This is definitely a much stronger example than the Quake one. Looking at the git repository it seems like this is generic startup code for a university assignment, so it probably shows up a significant number of times in the training data.

While it definitely makes sense to interpret a piece of code showing up several hundred times in the exact same format as being okay to straight-up copy+paste (e.g., imports, some boilerplate in more verbose languages, keyboard input switch statements), this does seem to highlight that Copilot can't distinguish between code snippets that are always identical because that's just the correct way to do it, and code snippets that are always identical because only one person did that thing, and it just happens to be in hundreds of repositories for one reason or another.

tyingq · on July 11, 2021

There's also the example of it spitting out valid sendgrid api keys, which wouldn't be a case of the code appearing in many places. https://news.ycombinator.com/item?id=27736460

geekraver · on July 11, 2021

I wonder how many of these verbatim examples are because the training data code itself occurs multiple times in GitHub because it in turn was copied from an upvoted answer in StackOverflow, LOL.

meibo · on July 10, 2021

It only seems to if you give it no or very little "source" input, like an empty file with a comment that says "// X algorithm".

There's been a lot of bikeshedding on this, but GitHub decidedly hasn't given enough information on how it works and what the training dataset is, and the fair use question definitely needs to be answered, maybe even in court - it's just a matter of time.

MadVikingGod · on July 10, 2021

What I think is interesting and not talked about in the copilot cases is there is actually two different copyright actions that come from using it.

The first is if GitHub had a license to distribute the code they have learned on. This is assuming that the code it produces is a derivative, which I don't think would be much of a stretch. For this I think most of the licenses that are used, GPL MIT etc, allow for someone to distribute in such a way.

The second is the user of copilot. They would be getting and distributing code where they wouldn't know the original license, but that wouldn't be a viable defense of infringement. To actually comply with most licenses they would have to follow the requirements.

In both cases I don't know if a fair use would really apply. Maybe GitHub could stretch the research aspect, but if you just use the code in your product there is no fair use.

wlesieutre · on July 10, 2021

> This is assuming that the code it produces is a derivative, which I don't think would be much of a stretch

I don't think that's easily settled. Is the code I write a derivative work of all prior code that I've learned from? The difference (IMO) is that Copilot has a much more precise memory than I do; I couldn't recall the exact text of a code example that I've encountered repeatedly even if I wanted to.

ric2b · on July 17, 2021

> I couldn't recall the exact text of a code example that I've encountered repeatedly even if I wanted to.

If you could it would be copyright infringement anyway.

orwin · on July 10, 2021

easy solution: license all the code you write with copilot under AGPL or GPLv3, and you're good.

martindevans · on July 11, 2021

Not at all, you'd still be violating all the non-gpl licenses. Just because GPL is more restrictive doesn't mean it somehow supercedes all other licenses!

cassonmars · on July 11, 2021

Not quite, there are far more aggressive licenses on repositories on GitHub, such as SSPL, which in theory by providing the SSPL licensed code, Co-Pilot should be open sourced under the same license to comply.

Hamuko · on July 11, 2021

Except you're not allowed to relicense proprietary code you found on GitHub under GPL.

Hamuko · on July 10, 2021

>what the training dataset is

All non-private repositories on GitHub.

brutal_chaos_ · on July 10, 2021

Is it though? I'd assume that too, but we don't really know, do we? I mean to ask, what have Microsoft stated that leads you to believe this? (Maybe there's a press release I missed?)

teraflop · on July 10, 2021

https://twitter.com/NoraDotCodes/status/1412741339771461635

ben0x539 · on July 10, 2021

(this doesn't say they didn't use other sources too, though)

tyingq · on July 11, 2021

"source code from publicly available sources, including code in public repositories on GitHub." (from the copilot home page)

Pretty much says they did also use other sources.

uberswe · on July 11, 2021

I have access and hints for entire functions have only appeared from code already in the same file. It uses the file you are in for context.

It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder.

I have had access for one day, spent my entire Saturday playing with while working on an addon for a game. I find it useful and most of the hints come from other code I have in the same files which saves me time or let’s me know when I’m too repetitive :)

tyingq · on July 11, 2021

"It also says on their website that the AI may generate api keys that look real but it’s actually just a “fake” key that the AI generated as placeholder."

Maybe this is what you meant, but that's already been shown to be untrue. https://fossbytes.com/github-copilot-generating-functional-a...

uberswe · on July 11, 2021

So the article mentions an issue reported by dtjm annd the maintainer answers the issue explainig that the keys are fictional. I don't see any statement by the github CEO about this being an issue as the article says.

If you have access, link to the maintainers answer: https://github.com/github/copilot-preview/discussions/45#dis...

The answer references the FAQ which is what I referred to: http://copilot.github.com/#faq-does-github-copilot-ever-outp...

dogecoinbase · on July 10, 2021

It's happily spitting out licenses and copyright notices with other people's names on them, it's pretty clearly half-baked.

sellyme · on July 10, 2021

While that's obviously a UX flaw that definitely shouldn't have made it to release, I find it hard to envision it ever actually being a problem. If someone's accepting auto-generated copyright notices and licenses that don't actually apply, they can't really point the finger at Github - it's called "Copilot", not "Pilot".

The problem with full snippets of arbitrary obscure repositories being copy+pasted is that there's no realistic way for a user to know if that's happening without putting in more effort than just writing the code themselves, somewhat defeating the point. That's not really case when the first line of the file contains "(C) Someone Else 2003".

abrokenpipe · on July 11, 2021

Thats much more than just a UX flaw, it's clearly showing that copilot just regurgitates code snippets from other peoples opensource. Which draws back to the issue of "did copilot AI actually learn from other peoples code or is it just copy+pasting other peoples code?". I don't think people are concerned about copilot user's liability/productivity.

sellyme · on July 11, 2021

> it's clearly showing that copilot just regurgitates code snippets from other peoples opensource

You consider a license file to be code?

abrokenpipe · on July 11, 2021

That's irrelevant, if copilot is copy+pasting random licenses then it's definitely copy+pasting code as well, I don't see how this could be hard to believe.

sellyme · on July 12, 2021

> That's irrelevant

You said "copilot just regurgitates code snippets", and the example given wasn't a code snippet. Of course it's relevant.

> if copilot is copy+pasting random licenses then it's definitely copy+pasting code as well

Not at all. Copy+pasting licenses is intended behaviour. That's the entire point of having standard licenses. No-one would want Copilot to try to make its own entirely new licenses, therefore it's very trivial to imagine a piece of software designed to allow for licenses to be copied verbatim, but not allow that for any significant chunk of code.

Copilot does not appear to be designed like that, but that doesn't make the logic correct. We only know that it isn't designed like that because we looked for an example of it actually copying arbitrary code.

> I don't see how this could be hard to believe

I don't care about believing it, I care about whether or not it's true. If someone claims to have seen Copilot copy+pasting code, and when asked for evidence their only example is it copy+pasting some license boilerplate, they're just straight-up lying - and worse, potentially drowning out any real examples that exist.

dogecoinbase · on July 12, 2021

Other people here and elsewhere have demonstrated direct code copying. It's not hard to find.

The license/copyright notice (and note that the copyright notice with the names of individuals easily invalidates your assertion regarding boilerplate) issue is that it's clear that no due diligence was done on this product. No one (at least no one with the authority to stop it) spent any time reviewing what this product could produce, because _producing a copyright notice with a random developer's name on it_ is so beyond the pale that even the most cursory review by someone who knew literally anything about the law or copyright would have stopped it in its tracks.

dwild · on July 11, 2021

> I've seen a lot of people ragging on Copilot for "copy+pasting" code - does anyone have links to cases where it has done this without the user intentionally trying to generate a specific (extremely famous) code snippet?

It's not important that the examples are really specific, the issue is that it does happens. This tool has the potential to infringe copyright (sure it does seems legal at some place, doesn't make it more right though).

Look at how EA reversed engineered the Genesis in the past [1]. They had 2 teams, one that did the reversing, and another that did the implementation. That made it safe to say that no infringing code was going through. Plenty of emulators developers try to avoid source code leaks for similar reason. Co-Pilot can't do that.

The fact that the copyright status of the code is hard to determine, doesn't means it's not copyrighted code nonetheless.

Personally I got nothing against this kind of technology, but I do have a pretty big issue with how it learn. Once people published their code on Github, they didn't know it could have been used to do machine learning and it's bad that it is. If Github asked for the copyright over the code to do Co-Pilot, I wouldn't mind.

[1] https://www.youtube.com/watch?v=x0qe1FNqtCo&t=280s

IshKebab · on July 10, 2021

Github did an analysis and found that it does do it, though very rarely, and usually when it has little context (e.g. at the start of a file). They're working on detecting those cases though so it doesn't happen accidentally, so it is unlikely to be a realistic problem.

lyxell · on July 10, 2021

I remember that the guys behind The Pirate Bay actually made a service like this back in the days. You would submit a song and get a mashup back of cuts from other songs where each cut would be short enough to fall under fair use. I can't find any references to it online anymore though. Maybe someone else remembers what the service was called.

rikkipitt · on July 10, 2021

I don't remember the site, but it reminds me of the Girl Talk album called "All Day" – https://en.wikipedia.org/wiki/All_Day_(Girl_Talk_album). It was originally released as a free digital download.

> Greg Gillis composed the album using overlapping samples of 372 songs by other artists.

This article goes into it a bit more: "Girl Talk, Fair Use, and Three Hundred Twenty-Two Reasons for Copyright Reform" – https://jipel.law.nyu.edu/ledger-vol-1-no-1-4-pearl/

gardnr · on July 10, 2021

Night Ripper is my favourite album by Girl Talk.

rkuykendall-com · on July 12, 2021

I recommend anybody fascinated by this rabbit hole check out the album "The Grey Album," the documentary "Good Copy, Bad Copy," and the album "1987 (What the Fuck Is Going On?)". Or just the Wikipedia articles for them to start. Amazing world.

stuntkite · on July 11, 2021

I remember reading an interview with GT where he said he was NOT A DJ because he was a guy with a laptop and a bunch of lawyers.

codetrotter · on July 10, 2021

Reminds me of this other thing from years ago called “sCrAmBlEd?HaCkZ!”

https://youtu.be/eRlhKaxcKpA

It splits music videos into small portions ahead of time and then later it reassembles them on the fly according to audio input.

brutal_chaos_ · on July 11, 2021

I remember when that came out, it blew my mind! Did anything public ever materialize from the project?

chrismcb · on July 11, 2021

"each cut would be short enough to fall under fair yet" huh? That isn't good fair use works. There is no such thing as "short enough" fair use had to do with how you use it, not how much of it you use.

voakbasda · on July 11, 2021

The whole experiment is a commentary on the need for copyright reform. That makes the usage a clear case of fair use.

lyxell · on July 11, 2021

This was supposed to be according to Swedish copyright law but you may very well be right.

jjcon · on July 10, 2021

When did the HN crowd become so defensive of copyright? I understand the concerns on copilot but it’s kinda weirding me out.

aurelian15 · on July 10, 2021

As weird as it may seem, you should not forget that free software licenses are built upon the fabric of copyright. Without copyright, free software could not exist in its current form. For GPL-like "copyleft" licenses, there would be no way to enforce that binary distributions of derived works are accompanied by their source code. Similarly, in the context of permissive BSD/MIT-style licenses, there would be no way to enforce attribution.

So, given that FOSS---which a large portion of the HN crowd depends on---cannot work without copyright (at least not in its current form), the recent discussions may be less of a surprise.

cortesoft · on July 10, 2021

Maybe... although I personally think that the GPL and other 'copy left' licenses aren't the reason open source has prospered, nor do I think enforcing attribution really helps the FOSS world that much.

People write and share code because it is useful to do that, not because licenses require them to.

I think FOSS would do fine with no copyright, and in fact more software might end up open source if we had ZERO copyright... why not make your code open source and get back contributions when your code would end up being shared anyway?

mcguire · on July 10, 2021

You may think that, but history argues otherwise. See the Unix Wars (https://en.wikipedia.org/wiki/Unix_wars).

cortesoft · on July 11, 2021

I am not sure I follow. Are you saying licenses are what stopped the UNIX wars?

ripdog · on July 11, 2021

IMO Linux won because its license forced everyone who used and extended it to open their changes - this changes the calculus for firms building products on Linux to make it more worthwhile to upstream their changes to reduce their local maintenance burden.

Compare this to something like the playstation 4, which used freeBSD as the base of their OS, and contributed nothing back to the project at all.

Jasper_ · on July 11, 2021

Linux "won" because SGI very loudly shit the bed and betting on Itanium, taking IRIX with it, DEC was in the process of making the future of Tru64 very confusing with the Alpha, a computer that didn't support their flagship OS, and HP was EOLing HP/UX because they bet everything on IBM's OS/2 Warp.

The only big UNIX vendor left was Sun Microsystems, and Solaris indeed dominated the 90s dotcom era. Everybody was running SPARC and SunOS servers.

It wouldn't be until the mid 2000s when Linux started picking up the pieces left behind after Red Hat started their server product and certification program.

For a long time, Linux was strictly a hobbyist OS. It later dominated by simply being the last one standing after everyone else fell.

mcguire · on July 11, 2021

The true competitive threat to the Unix vendors wasn't each other, it was Microsoft.

And "for a long time" was actually a fairly short time. Linux began to approach feature-comparability fast, and ran on PCs, not $10,000 workstations (that were getting beat power-wise by PCs).

cortesoft · on July 11, 2021

This seems a bit of a chicken and egg problem... why would the early adopters have chosen Linux, before others had been forced to contribute back their changes? The first company to adopt it wouldn't have received any benefits, only an obligation. Why pick it over BSD?

I think there were likely other factors that made it win out.

ripdog · on July 11, 2021

Oh, for sure. There are always other factors in the real world.

But I think that's the main one that allowed Linux to snowball ahead of the other Unix(-likes).

ric2b · on July 17, 2021

They would also get the "promise" that the system they were betting on would also get contributions from other companies, making a safer long term bet.

mcguire · on July 11, 2021

And because the GPL makes it difficult to fork and make the branch a proprietary product.

mcguire · on July 11, 2021

No. AT&T created Unix but was unable to market it due to a previous antitrust action. So they gave it away. (They required a license signature (I've signed that! :-)), but did not charge and were very lenient.)

The UCBerkeley Computer Science Research Group (if I've got my acronyms right) was one of the recipients and went on to add on many, many features and releasing the result under the BSD license.

Many people built companies like Sun around selling BSD Unix, including many alumni of UCB.

Then AT&T got out from under the consent agreement and began selling its own Unix, System V. By this time, Unix was a major player in the workstation market (a market that has largely disappeared as PCs got more powerful).

By, say, the mid 1980s, there were many, many companies selling many, many varieties of Unix, all descended from the original Unix via BSD or via BSD+System V. Most of them had some unique, valuable features (Irix's graphics, AIX's LVM and journaling file system, etc.) and all of them had modifications to lock customers into their version. This is where the POSIX standards come from (second only to ecmascript market manipulation goofiness), and why things like Autoconf/automake and the much-loved imake (not really) exist. There was much in-fighting; Sun vs everybody else, everybody else vs. IBM, etc.

Then two things happened: PCs got more powerful and began eating into the bottom of the workstation market (PCs ran DOS+Windows, which was and arguably still is, technically inferior to multi-user-by-design systems like Unix[1].) And PCs got more powerful and began to be able to run more advanced OSs (think "memory management").

At this point, the Unix world began to conflict with the Windows world. Unix was technically superior, Windows had more public and developer mind-share. But the Unix world was still more interested in fighting each other and stapled all of their arms and legs to that particular tree.

The end result was that Windows became and remains the most-used operating system[2]. All (almost) of the commercial Unixs died (almost; there's still some animated corpses around)[3]. The two counter-examples are MacOS, which is completely locked to Apple hardware, and Linux.

Linux is the interesting case. Windows and commercial Unix all had a 15- to 20-year head start. But Linux achieved (mostly) feature parity quickly and did not break down into multiple, competing streams. Both of those are due to the GPL; you can fork GPL software all you want, but you cannot add a feature to a fork and expect it not to be back-ported into the original if it's useful. You also have a very hard time locking users into your fork.

The bottom line is that Microsoft won the Unix wars, because the Unix licenses allowed companies to take Unix proprietary.

[1] Modern Windows is kinda-sorta based on VMS, another workstation OS, but not really and then they walked that back, and so on....

[2] I don't really consider Android or iOS to be general-purpose OSs. And they're both rather their own little islands, no matter how much the underlying tech shares with the rest of the universe.

[3] The Free/Net/Open/DragonFly BSDs are, I'm sorry to say, noise. And did you notice that I had to mention four of them?

em-bee · on July 11, 2021

people do, but companies don't.

projects like openwrt would not exist if it weren't for copyright enforcement.

dublin · on July 10, 2021

There were other open source licenses at the party before the GPL dropped its controversial "viral" turd in the punchbowl - and many of them still exist nearly unmodified. (e.g. BSD with attribution removed, etc.)

cortesoft · on July 10, 2021

I know, but I am just questioning whether any license is needed for open source to prosper.

I am positing that if licenses didn't exist, and anyone could do anything they want with any bit of code they see, open source would still prosper.

okamiueru · on July 11, 2021

That just isn't how things work. I believe you are making the wrong assumption that the same amount of open source work would exist. If that were the case, then yes, it wouldn't matter that much. But, a lot of contributions to open source wouldn't have existed if it wasn't for the licenses. That's the examples you've been given.

cortesoft · on July 11, 2021

There might be some contributions that wouldn't have happened, but there also might have been others that did happen. My hypothetical was in a world with no copyright or license at all.... so all proprietary code would be copyable, too

okamiueru · on July 11, 2021

Proprietary code would by definition not be copyable, because it would be... proprietary. It's the opposite to open source. The thing that incentivises open sourcing proprietary source code is exactly things like licenses... Your imagined scenario is just "no license for open source, and free reign for closed source". It makes no sense to me.

cortesoft · on July 11, 2021

If no code was copyrightable nor licensable, people could reverse engineer any code they get access to. It would be hard to keep code completely proprietary. You would not be able to distribute your code at all.

I am not saying this is the way we want to go, I am just curious about the thought experiment.

okamiueru · on July 11, 2021

Ok, if it's for a thought experiment then I'll play along. Reverse engineering source code is not as easy as you make it out to be. Not only that, but reverse engineering in the sense of black box is not a violation of most existing licenses. And, if we're talking about decompiling code into something useful, that too is a tall order.

throw0101a · on July 10, 2021

> When did the HN crowd become so defensive of copyright?

Copyright is good in limited quantities. The current multi-decade time horizon is probably what a lot of people are against, and not the concept in general.

And limited time period seems to be consistent through history. From the paper "Copyrights and Creativity: Evidence from Italian Opera in the Napoleonic Age":

> Comparing changes in the creation of new operas across Italian states with and without copyrights, we show that the adoption of basic copyrights encouraged the creation of new work. Moreover, we find that copyrights changed the quality of creative output by encouraging composers to produce more popular and durable works. These results generalize to a broader set of musical compositions and to librettos, as the literary component to the score of operas. Based on these findings, we conclude that the adoption of basic levels of copyright protection – not exceeding the lifetime of the composer – can help to raise both the quantity and the quality of new creative works.

> Importantly, we find that extensions in the length of copyright beyond the composer’s life did not encourage creativity. Performance data reveal that few operas were played after the first 20 years, which suggests that only the most durable creative goods stand to gain from copyright extensions. […]

* https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2505776

ReactiveJelly · on July 10, 2021

Both the permissive and copyleft licenses are only enforcable through copyright law.

I don't mind that copyright exists, I just wish it was better.

Also there's a power difference between individuals violating the rights of a big company, and a big company violating the rights of many individuals.

If Copilot isn't reined in, it feels like yet another case of "The laws only apply to poor people".

lupire · on July 11, 2021

What does it mean to "enforce" a "permissive" license?

jhugo · on July 11, 2021

Attribution, mostly.

PaulKeeble · on July 10, 2021

Because its my (and many of ours) code they have "learnt" from, stripped the license and are intending to sell on. When we listed code under MIT or GPL we meant those licenses, they weren't random and Microsoft just seems to be completely ignoring the reality of reproducing those works which are covered by those licenses, they are making code private and paid for that is open source. Not OK.

breck · on July 10, 2021

"The heathen are sunk down in the pit that they made: in the net which they hid is their own foot taken"

Copyright is a horrible system. Microsoft has been one of the biggest proponents of that system. But now they've clearly violated it. They should either join in abolishing it, or face its consequences.

michaelmrose · on July 10, 2021

Consider people's reaction to people selling boot leg DVDs vs torrenting a movie. Although people may consider both morally incorrect the corrupting profit motive results in the former being seen far more negatively. In the current situation there is also the matter that the Microsoft is still perceived rightly I think very negatively and open source authors very positively. Also in a David v Golliath situation nobody wants to be seen rooting for the giant.

Personally I would be concerned about insert corp here accidentally stealing code from an open source project then years later going after the open source project for copyright infringement regarding the code they in fact stole from the open source project.

carom · on July 10, 2021

I guarantee this is not Microsoft's announcement that they are forfeiting their copyrights. This is just them abusing the spirit of ours.

NiceWayToDoIT · on July 10, 2021

Probably because when poor people give something for free to other people to lift them out of poverty it is called empowerment, but when billionaires take free work of poor people for their own personal selfish gain it is called - exploit.

hjek · on July 10, 2021

Say your AGPL code is Copiloted into someone's new program and they decide to release that under a non-free license; that's the issue. We're defensive of copyleft.

clusterfish · on July 11, 2021

This submission aside, it seems that most people are just concerned about getting sued on copyright grounds for using copilot.

zarzavat · on July 11, 2021

It's hypocrisy. People will defend entire books and research papers being shared on libgen/scihub, which is unarguably actual copyright infringement on a massive scale, but training an AI on open source code is somehow the worst thing ever even if there's no case law to say that this constitutes infringement at all.

abrokenpipe · on July 11, 2021

It's not really that hypocritical when you understand peoples perspective on it. In general people (here on HN/OS-community) care about sharing experiences and knowledge. Copyright on opensource content does not inhibit peoples ability to learn from it. Theres also the whole "big corp vs little guy" mentality at play here. If copilot was opensource then I don't think that anyone would have an issue with it, I actually think people would respond well to it if that were the case.

Dylan16807 · on July 11, 2021

It's not hypocrisy to say that different types of work should have different copyright lengths, and that for some types the answer is zero.

Especially if you remember the phrase "to promote the progress of science and useful arts".

zarzavat · on July 11, 2021

It's self-serving hypocrisy when programmers argue for a very strict reading of copyright for their own works (source code) while at the same time arguing for unbounded sharing of other people's works (books and papers).

If you agree with libgen and scihub then you should definitely not see anything wrong with copilot, because what copilot is doing is considerably milder, they are not reproducing anybody's works.

If you disagree with libgen and scihub because they are illegal only then can the argument that copilot is infringing be made.

Dylan16807 · on July 11, 2021

> It's self-serving hypocrisy when programmers argue for a very strict reading of copyright for their own works (source code) while at the same time arguing for unbounded sharing of other people's works (books and papers).

Yes, if those were the only two things being discussed, it would be.

But almost all of these people think a certain amount of copyright for songs and movies is good too. And I bet a vast majority of them support copyright for books too, or at least most books.

ric2b · on July 17, 2021

The vast majority of research authors want their work to be freely available. The industry is just organized in a way that prevents that from happening without harming your research career.

asddubs · on July 11, 2021

I'm all for copilot if microsoft gets treated the same way libgen/scihub are for creating it. or if we abolish copyright, but the fact that they can just decide to do this and it's fine, but scihub gets DNS-blocked reveals the asymmetry at hand here.

wizzwizz4 · on July 10, 2021

Wow. And it's entirely client-side, too! Impressive.

er4hn · on July 10, 2021

Finally, Copilot for Music!

laurent92 · on July 10, 2021

Strangely, a lookalike of a music hit is nothing like the original, and it’s worth analyzing!

- Music is a vehicle for a common experience. Everyone knows the next notes of some Lady Gaga song. We feel like learning the lyrics will make us able to sing together if we were in a club, and share something with other clubbers. Any AI who would reproduce the voice and instruments would still not make you feel like you are sharing a common moment with the rest of the auditors,

- Hits are hits because we hear them a thousand times. It’s been proven that people don’t necessarily like it the first time. It’s the familiarity with the song which make us like it (or hate it when we’ve heard to too much).

- Even worse: We like some songs even more because we love the author. Be it because they are politically involved, have a cute face, has a nice life story, or seem to hide answers to life in the lyrics of their work - But an AI producing the same exact notes wouldn’t trigger similar affection from us. It’s like hearing our kid singing: Very cute, but we wouldn’t like the same song by another kid. Audiences have a genuine emotional attachment to the authors. It’s especially visible since the MCM revolution: Before MCM, music mattered; Now the image matters way more, bands have a face, a graphic style, a story to tell - and music could be as crap as possible, if we like the band it can still have success. MCM changed music forever, proving that AI can’t replace that feeling.

Can it?

imwillofficial · on July 10, 2021

I hear a lot of stated assumptions on how certain things trigger emotional investment and other don’t.

If you knew how manufactured the music industry was, and how nothing of what you see of celebrities is true, it might as well be AI plucking our heart strings, because it isn’t “real” in the sense that I think you mean, authentic human connection over shared experience.

nonbirithm · on July 10, 2021

So when are neural nets trained on images or text going to be confronted with the same copyright concerns? At the point that GitHub has forced the issue into the spotlight with Copilot I feel that it's only a matter of time before this reaches the courts. Nobody seemed to care about copyright at the time people were having fun creating AI dream collages or nonexistent anime girls from a model trained on the Danbooru imageset. In the latter case it's not clear that 100% of the original Pixiv and Twitter creators gave their consent to have their work rehosted on a different site in the first place, much less be involved in ML experiments. That data was from 2018.

I'm almost tempted to believe that the people at GitHub knew this was going to blow up as much as it did as some kind of a challenge to the status quo of copyright and licensing, if only so that everyone would start talking about the issue. Why did the GitHub representative plainly state that Copilot was trained on all of GitHub's codebase without seeming to care about the pushback on Twitter and HN that was bound to happen as a result?

jiminymcmoogley · on July 10, 2021

by the time the dinosaurs that dictate our laws begin to care about it, copyright will no longer exist

ChristianGeek · on July 10, 2021

Great way for the owner of the site to build up a library of free music!

habibur · on July 10, 2021

I remember Web Font Player before dynamic fonts became available. You could upload a copyrighted font. Say Microsoft or Apple's font. It will trace that font and generate your "Web Font" and then you could use it without any copyright issue as that's not the original font rather a machine learnt one.

Guess fonts are still like this.

Hamuko · on July 10, 2021

In the US, fonts (a computer program) are copyrightable but typefaces (design of letters, numbers and other symbols) are not. So you can basically just redraw your own Helvetica with no copyright issues.

not2b · on July 10, 2021

I was kinda hoping it would change the song enough to get it past Youtube's copyright filters, but apparently not.

IceHegel · on July 10, 2021

audiophiles gotta try this! it makes the music soo much better

smoldesu · on July 10, 2021

Theoretically, isn't a "functional" version of this program possible? Machine-learning aside, I have to wonder how precise platforms like ContentID are, and how easily they could be fooled by an algorithm designed to fool the system with near-imperceptible changes (eg. cutting out certain high-end freqnecies, dilating the speed of the track by very small amounts, etc.)

_dh54 · on July 11, 2021

This is such an innovative product! Game changer. The music industry will never be the same

terrycody · on July 11, 2021

I input a non-english song and it turns out nothing changed right?! Did I miss sth?

tgv · on July 10, 2021

Congratulations. That's got to be a 100% accurate algorithm.

speedgoose · on July 10, 2021

Have you tried Github co-pilot? It's not going to copy paste the Linux source code, like Dungeon AI is not going to copy paste a Tolkien book.

andersource · on July 10, 2021

Most of the time, but it can, and I think that's the issue a lot of people have with it.

https://twitter.com/mitsuhiko/status/1410886329924194309

https://news.ycombinator.com/item?id=27710287

_d7dt · on July 10, 2021

I don't understand why anyone has an issue with that. You know what else can copy and paste code all the time? Humans. But we have various ways of stopping employees who copy and paste code from stackoverflow and github without checking the license, so it's the same thing if you use one of these tools. There's nothing new I can see here to be upset about.

This would be a lot more interesting if it showed the various GPT-3 experiments at generating music and used that as a point of comparison.

paulgb · on July 10, 2021

> But we have various ways of stopping employees who copy and paste code from stackoverflow and github without checking the license

What would those be? I’ve worked at a number of organizations that were (rightfully) paranoid about accidentally incorporating GPL code, but even there I wasn’t aware of automated tooling to prevent it, it was only enforced through developer vigilance.

_d7dt · on July 10, 2021

If you actually want a paid service, there are plagiarism detectors like Fossa and Codequiry. Although in my opinion, code review should be enough to catch any "accidental" incidents of plagiarism, the differences in writing style should make it very obvious when the employee has copied something. That of course probably won't apply if you suspect the employee is intentionally changing it around to obfuscate the origin of the code, but it seems that wouldn't be the case if they were just committing the output straight from a neural net. But automated scanners probably won't be able to catch those well either -- the way to catch that would be to make them do pair programming a lot.

reader_mode · on July 10, 2021

>Although in my opinion, code review should be enough to catch any "accidental" incidents of plagiarism, the differences in writing style should make it very obvious when the employee has copied something.

You must do some CSI level code reviews. Best I'm able to do is figure out if code will work and if something can be done obviously better. Stylistic calls (beyond lint enforceable) are up to authors as far as I'm concerned.

And even then it's trivial to fix up naming schemes and such to march codebase - doubt that gets you out of copyright issues.

_d7dt · on July 11, 2021

I mean in cases where someone just copy and pastes something without making any effort to match the style, or in cases where they can't explain what a piece of code does or how they came up with it. You should be able to spot those very easily in code review. If somebody is trying to fix up the naming schemes to avoid being detected and for whatever reason is able to explain the code perfectly, then I'd imagine that person would probably be doing the same bad things regardless of using copilot -- it's not like it's hard to search stackoverflow and github for code snippets.

reader_mode · on July 11, 2021

But I think copilot matches code style, no ?

speedgoose · on July 10, 2021

This fast inverse square root function is very well known, with even a Wikipedia page, and it is more than 20 years old. My country doesn't have software patents but it seems that the standard duration of a software patent is 20 years, so even if this function was patented, the patent would have expired by now.

dublin · on July 10, 2021

There is no real reason for copyright terms to exceed patent terms.

(And FWIW, patent terms should be inversely proportional to the number of patents issued in that category the previous year. This would automatically reduce terms in categories where innovation is rapid, promoting competition and drive to get to market, but preserve maximum protection for inventions in mature categories with a slower pace of innovation.

lilyball · on July 10, 2021

Copyright and patent are different. Also, while you can’t copyright an algorithm, your specific source code that implements it is copyrighted (assuming it’s sufficiently original).

In this case it’s not implementing the algorithm, it’s copying a particular famous implementation, down to the comments.

dwild · on July 11, 2021

> This fast inverse square root function is very well known, with even a Wikipedia page, and it is more than 20 years old.

The Copilot algorithm didn't take this into account though... if it did, we probably couldn't complains about copyright infringement.

NautilusWave · on July 10, 2021

Copyright is different from patent. Copyright is (basically) forever.

speedgoose · on July 11, 2021

Can a few lines of technical code hold a copyright for that long?

UncleMeat · on July 10, 2021

This truly is the engineer's disease. Hundreds of incredibly strong opinions about the legal system derived almost entirely from a few tweets and zero experience outside of software engineering.

Copilot is neat. If you are concerned about it, talk to a lawyer and get their opinion.

hedora · on July 10, 2021

Ooh. They have a DARPA grant! Applying now.

cjohansson · on July 10, 2021

Hilarious stuff

konstruction · on July 10, 2021

Hilarious :-)

sycren · on July 10, 2021

By uploading the licensed music in the first place, are we not breaching copyright law?

pornel · on July 10, 2021

It's not uploading, it's making available for scraping.

Hamuko · on July 10, 2021

Obviously machine learning is fair use, so it supersedes copyright.

laurent92 · on July 10, 2021

I sense a sleigh of bitterness in the programming community after Copilot ;) It’s the rumbling sound of the imagination of a thousand people throwing the towel saying “What now”.

Hamuko · on July 10, 2021

Yeah, why would anyone be bitter about a giant corporation creating a commercial code laundering machine that digests a massive amount of copyrighted code and spits out "clean" code free of all the burdens of its inputs?

laurent92 · on July 11, 2021

The same as million of people who unvoluntarily give their real-time GPS coordinate to Google so they can tell everyone where the jams are? The same as CC owners (everybody) whose data gets constantly harvested to produce models for Walmart? We benefit from wages for our programmer and AI skills in a way that is often used unethically by companies, it’s quite fair that we feel the heat.

I don’t know, GitHub Copilot was an expected result (why would anyone propose to host most of civilization’s open source projects for free? SourceForge did worse with shipping malware into people’s binaries.)

It is a bit disgusting, but we programmers have done it to the rest of the world.

joepie91_ · on July 11, 2021

Hey, speak for yourself. I've been speaking out against such practices for years in the same way that I'm speaking out against Copilot.

If you use "but there are other horrible things!" as a justification to trivialize one horrible thing, what do you think the outcome is when that strategy is applied to the world at large? Bingo, "nothing matters" and no consequences ever follow malicious behaviour, because there's always something else that's also bad.

Don't do this.

Hamuko · on July 11, 2021

I am pretty sure that neither my GPS coordinate information or credit card details are copyrightable works.

And I'm pretty sure I've done none of those things, so I don't really know where this "programmers are the scum of the earth" generalisation is coming from.

anderber · on July 10, 2021

What are the criteria that this tool uses to determine something to be fair use?

sumnole · on July 10, 2021

It's a joke ragging on Github Copilot, which suggests to its users code on github regardless of its copyright. The claim is that any code written with Copilot does not infringe since it's 'machine-generated code'. Github Copilot takes github code, learns it and then feeds it to users based on prompts but you can end up essentially copy and pasting an entire copyrighted snippet. This satire site takes your uploaded mp3, 'learns it' and hands you back the same mp3.

anderber · on July 10, 2021

Ah, thank you for the explanation!

justshowpost · on July 11, 2021

It presents genuinely black screen after «downloading» gauge, so I can confirm what its fair use indeed.

IndySun · on July 11, 2021

Contemptible humour, they know they could deliver ML soundalikes if they tried! Harumph!

amelius · on July 10, 2021

Nice try, but ... Co-pilot can be sued for copyright infringement in specific cases. Therefore it doesn't mean you can get away with copyright infringement if you copy Co-pilot's model.

dragonwriter · on July 11, 2021

> Co-pilot can be sued for copyright infringement in specific cases

No, it can't; Copilot isn't a legal entity that can be sued.

Also, “can be sued” proves very little. Any actual or potential legal person can be sued for anything.

“Is actually legally liable” is what you need here, and it is at best speculative that Copilot ever creates actual legal liability.

viraptor · on July 11, 2021

I have some public code with non-free licenses on GitHub and I'm really hoping copilot can reproduce some of my functions without changes. If they do I'll see what they do about DMCA requests for those. But... still waiting for access.

dragonwriter · on July 12, 2021

> If they do I'll see what they do about DMCA requests for those.

DMCA takedown requests apply to hosts of user-generated content, which MS/Github doesn't pretend to be with Copilot (instead, Copilot is a first-party work that MS has stated a copyright theory regarding), so DMCA requests are meaningless. What you need, if you feel MS is violating your copyright via Github, is a copyright demand letter backed by a willingness to pursue a lawsuit against Microsoft, who have already stated that there position is that Copilot is completely within the Fair Use exception to copyright.

Zambyte · on July 11, 2021

From the Copilot FAQ:

> Who owns the code GitHub Copilot helps me write?

> GitHub Copilot is a tool, like a compiler or a pen. The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it. We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.

> Do I need to credit GitHub Copilot for helping me write code?

> No, the code you create with GitHub Copilot’s help belongs to you. While every friendly robot likes the occasional word of thanks, you are in no way obligated to credit GitHub Copilot. Just like with a compiler, the output of your use of GitHub Copilot belongs to you.

While you may be right, that contradicts what Microsoft has to say.

mrlonglong · on July 11, 2021

Gnarly website

gavinray · on July 10, 2021

Damn, I read this as "Tool, the band, is converting all of their copyrighted music into fair-use music." and got excited.

But this is funny too I guess

MeinBlutIstBlau · on July 10, 2021

So having tried it out the song sounds...exactly the same. So does this just make it that when it's played these detection systems can't pick it up since it's somewhat different? Or if I make a commercial product, include this version of the song, I can somehow afford lawyers to defend myself when the music industry sues me for using what sounds like the same song, just with the 1's and 0's ordered a little differently?

Edit: was out of the loop on the joke...

abetusk · on July 10, 2021

I think that's the joke. It literally takes the exact same song unaltered but it says it's "using machine learning", "fair use" etc. to give the pretense of it being legitimate.

This is most likely a commentary on GitHub co-pilot and how the authors of this joke think that GitHub co-pilot is violating copyright and does not fall under fair-use.

I just confirmed the "processed" file has the same SHA256 sum as the original.

EDIT: I incorrectly labelled at Google co-pilot instead of GitHub co-pilot. Fixed.

barbecue_sauce · on July 10, 2021

Github Copilot.

detaro · on July 10, 2021

It's a joke about GitHub Copilot.

laurent92 · on July 10, 2021

The code source is setTimeout(…, random()). I’d say, even if it takes long to build the neural network, it is very CPU efficient.

Black101 · on July 10, 2021

He should not have faked the machine learning, but I like the idea. (And I hope that copyright fail to exist, or at least gets adjusted to what it should be... aka, the effectiveness should be the same as patents (not forever and not easily extended))

bobthebuilders · on July 10, 2021

Using ddos-guard, does this sell my info to Russia?

imwillofficial · on July 10, 2021

Isn’t that service run out of a bunker in Norway or something? I remember they were in the news for something recently.

bni · on July 11, 2021

It's Machine Learning, so anything is OK really.

ev1 · on July 11, 2021

this is on a vm in canada, what?

IshKebab · on July 10, 2021

How do you go to this much effort to make a point without even reading about how copyright and fair use works? There have been multiple comments on HN and Reddit explaining how it doesn't work like this.

blooalien · on July 11, 2021

Probably because it's not really all that much effort (nor money) to toss a simple satire page like this one up on the web. Easier now than it's ever been, honestly.