Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bike example is a very poor analogy - the data isn't removed from LinkedIn, it's merely copied. It doesn't matter how many times the site is scraped, the data is unchanged and still available. A better analogy would be me taking a photo of the bike while walking past. It shouldn't matter how many times you tell me to stop, if it's on public property you can't really stop me from taking the photo.

If you have an issue with that, you should be moving your bike somewhere people can't take a photo from the public street. Not have someone creatively interpret a law that says where I am is suddenly not public property, because you asked me to stop using my camera.



Not sure why you're being down-voted, the Web were designed to be public. If you want to prevent me from taking pictures of the exterior to your Cafeteria on the public street you'd have to build a wall / put it behind a login. But then don't complain that you are loosing customers because they can't see it or can not find information to your site through search engines.


Ok, if I left the prototype for a new bike on the pavement, and you came along and 3D printed an exact replica. Sure, the original still exists, but you just violated trademark/registered design laws because it was there. It's not the same as a photo because a photo of a bike doesn't give you the same value as the actual bike, whereas scraping the content of a webpage does give you the same value.

Perhaps my analogy wasn't great, but the grey area is around going to LinkedIn's server (whether or not this is "public" or their property they allow you access to is another philosophical question, though in the eyes of the law it appears it's the latter), deliberately extracting value from it, and then getting annoyed when you're asked not to.

Inherently it seems as though it's the old question of whether a server is public, or private but accessible (like those POPs [0] there was a thread on recently).

[0] https://en.wikipedia.org/wiki/Privately_owned_public_space


> you just violated trademark/registered design laws

If it has those. The data on LI (other people's employment histories) is not its own IP.


What if I paint the bike artfully, put a copyright notice on it, and you sell the photos you took?


Standard IANAL.

That would be copyright infringement because photographing a copyrighted work is considered reproducing the work under the law.


> A better analogy would be me taking a photo of the bike while walking past.

Taking a photo of the bike is also a poor analogy. Scrapers don't take one photo, they take photos of all the bikes. And scrapers don't keep the photos for themselves, they sell them for a profit. Also, the original bike isn't parked, it's placed in a gallery (probably with an admission fee? I don't know the business model of linkedin).


The number of photos is irrelevant to the analogy, though, as is what people do with the photos afterwards. If the bikes are visible from the public street, people can take as many pictures of every bike they want, and then make money from them if they want. It doesn't affect the owners' usage of the bike (unlike the original analogy, where the owner loses access, which was what I was trying to correct)

Physical analogies for this kind of thing are always flawed, it's just dishonest/misleading to pretend that copying data is ever analogous to taking a physical object (the owner of the original is never deprived of the original when data is copied).

"I don't know the business model of linkedin"

Most of it is selling premium features to recruiters and other businesses. I'm not sure if Hi-Q's service interferes with that or not, but LinkedIn should not be trying to have their cake and eat it by leaving things in public then complaining when the public accesses it in a way they don't like.


> The number of photos is irrelevant to the analogy

Actually, the size of the data and the number of requests is very relevant. More data means more information, means more money. It also means more bandwidth and processing power required to process requests. You're not taking a photo of the bike, you're asking the bike to give you a photo of it.

> it's just dishonest/misleading to pretend that copying data is ever analogous to taking a physical object (the owner of the original is never deprived of the original when data is copied).

Leaving LinkedIn aside, possession of the original data is never the issue with digital piracy. It's a straw man. The hurt occurs when people benefit from the work the original author put into creating that data without proper compensation. Just because you can clone my gizmo (which I spent years working on) without taking the original one doesn't mean you're not hurting me. That gizmo could give me an advantage you wouldn't otherwise have. I place hours of working into something that doesn't put food on the table because you can clone my work, but I can't clone my food.

There's a reason an empty CD costs 50c but a music album costs $10. You're not paying for the physical medium. You're paying for the IP. And yes, digital distributions are cheaper because of this, but that doesn't make them free.

> Most of it is selling premium features to recruiters and other businesses.

I'd say it's pretty obviously interfering with their business model.

> LinkedIn should not be trying to have their cake and eat it by leaving things in public then complaining when the public accesses it in a way they don't like.

LinkedIn could ban IPs that make unreasonable number of requests in a short amount of time.


If LinkedIn are being that negatively affected by a single scraper, they should deal with it - block it, only allow a specific number of requests from an IP per day, anything that doesn't involve lawsuits. The problem is them trying to pretend that publicly visible content is really private if they say so, without them trying to protect it in any real way.

"The hurt occurs when people benefit from the work the original author put into creating that data without proper compensation"

Not necessarily. If I'm paying for print of some imaginative artwork that was created using the picture of the bike, that doesn't mean the bike owner lost anything, even if he spent time building the bike with his own hands. Similarly, if the only reason why people paid Hi-Q was for the extra work that they put in, LinkedIn didn't lose money because people would not have bought their product without that extra work.

There is certainly an argument that Hi-Q should have licenced the content first, but it's public data. If they want to make licence deals, don't put it in the view of the public street then whine when people are documenting what's in public.

"It's a straw man."

No, the straw man is pretending that a copy is the same as theft. Theft is theft because someone is depriving you of the original, not because you imagine you might have had more sales if the copy didn't exist. There's a reason why there are different words for different things, and pretending that a copy is the same as taking a physical object it a lie. Period.

"I place hours of working into something that doesn't put food on the table because you can clone my work, but I can't clone my food."

But, you put the price up too high, so I opted not to buy it. Maybe borrow the CD from a friend, or listen to something else. Or, you decided I couldn't buy it in the format or region I wanted. There are real issues, but pretending that a copy = a lost sale is utter bull that's been debunked time and time again, yet is regularly repeated by people trying to inject emotional arguments instead of facts.

"I'd say it's pretty obviously interfering with their business model"

Then perhaps they should address the business model or not put their content out there in public unprotected if it's that valuable to their income.

"LinkedIn could ban IPs that make unreasonable number of requests in a short amount of time."

Yes they could. Which would not have to involve the courts in any way. Or, they could protect the content in some other way that (for example) requires a log in and adherence to T&Cs, with which they could easily kick violators off their site for non-compliance.

The issue is that LinkedIn are trying to have it both ways - gathering the benefits of public content while blocking others who use the now-public content in ways that are usually acceptable for public content to be used. Sorry, not acceptable, you pick one - take the content away from the public street or accept that some people will use what has been shown to the public.


> If LinkedIn are being that negatively affected by a single scraper, they should deal with it - block it, only allow a specific number of requests from an IP per day, anything that doesn't involve lawsuits. The problem is them trying to pretend that publicly visible content is really private if they say so, without them trying to protect it in any real way.

With this, I agree 100%.

> No, the straw man is pretending that a copy is the same as theft. Theft is theft because someone is depriving you of the original, not because you imagine you might have had more sales if the copy didn't exist. There's a reason why there are different words for different things, and pretending that a copy is the same as taking a physical object it a lie. Period.

That's just pedantry. The debate isn't between "copy" and "theft", it's between "theft" and "copyright infringement".

> But, you put the price up too high, so I opted not to buy it. Maybe borrow the CD from a friend, or listen to something else. Or, you decided I couldn't buy it in the format or region I wanted. There are real issues, but pretending that a copy = a lost sale is utter bull that's been debunked time and time again, yet is regularly repeated by people trying to inject emotional arguments instead of facts.

This is wrong on so many levels, I'm not sure there's any point in continuing this debate. Are you accusing me of using emotional blackmail instead of facts because I point out that "you can clone my work, but I can't clone my food"?

I'm not using myself as an example because I want pity. I'm doing it because it's easier in writing, and because I'm a software developer.

My work takes hours of hours of time and effort (not accounting the hours I spent in school). If it' ok for everyone to clone my work, I won't make any money from it. We still live in a society where goods and services are exchanged with money. I exchanged my hours of work for no money, but I can't exchange no money for basic living necessities such as food. There's no feelings involved here. In the current economy, work going in, and no food coming out is not a viable business model. And if nobody payed for digital content, there would be a lot less digital content.

> pretending that a copy = a lost sale is utter bull that's been debunked time and time again

This is another straw man. Whether or not an illegal copy is or isn't a lost sale is irrelevant. You don't have the right to make that copy in the first place. If everyone made illegal copies, there would be no sales. So then why should only some be entitled to illegal copies? There isn't a distinction between people who can make copies and people who must pay for copies, so either everyone must pay for copies or no one must pay for copies. That's how law and economy work. You can't make exceptions by yourself. Either everyone is allowed, or no one is allowed. And for digital content that is for sale, no one is allowed illegal copies. If laws are made that allow poor people to receive goods for free, these laws must address both digital and physical goods.


So? How does this impact the bike owner in anyway?


The owner makes money from the admission fee clients pay to see the bike. Taking photos and selling albums defeats that purpose.


But the owner is offering the images up for free to the public. hiQ is taking those publicly-available photos and annotating them with, "red bike", "pink bike", "broken bike", "professional bike", etc.

It's clearly a value-add and not theft.


It depends.

You have to keep in mind that an entire generation was brainwashed that personal data isn't that "personal", so Google, FB and the rest can have amazing profits.

Most of these discussions are stained by general unawareness of the privacy and copyright law.

Ofc because the value of the data supplier (usually a single person, etc) these never reaches the courts, which just reinforces the ongoing misconceptions.

If you really want to test this, try copying the content from Google, Facebook hiQ or whoever that's big enough to go after you.

But people somehow believe that it's okay for businesses to do what regular persons aren't allowed to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: