True, but in making the request, you will provide information on who is making t...

__jal · on Aug 1, 2017

That argument works, insofar as it does, only for more recognizable bots and browsers. If I write a client of some sort that identifies itself as:

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Snackmaster Pro/666.0.666

What do you do?

I also tell my browser to lie about what it is sometimes, due to sites that are malfunctioning, but whose owners choose to document the errors instead of fixing them with "Use Chrome" (or IE, or whatever) checks.

Is that 'kinda' illegal or morally wrong (two very different things)?

If so, that seems like a belief that all sorts of browser defaults are 'kinda' wrong and/or illegal to change. Javascript? Lying about installed fonts/screen dimensions/whatever? Refusing to keep nonsession cookies between sessions? That slope would seem to get pretty slippery...

jboy55 · on Aug 1, 2017

In your first case, if you are running on Windows NT 6.1 using WebKit on a new browser for humans called 'Snakemaster Pro', then you aren't doing anything wrong.

If by client you mean a robot, then you are pretending to be a browser and you are accessing the service without permission.

Let me ask you a question, say your client was hitting my service with that user agent, 100 times a second, crawling through urls sequentionaly. Lets say I added it to my robots.txt deny list and starting blocking that user agent. Would you change the user agent and continue?

If someone creates a site that says, 'Access to this site is for 640x480 browsers only, any other use is forbidden'. Then I think its pretty clear that its a stupid site but also that faking your screen resolution is accessing a site without consent. There is no slope, someone (Linkedin) putting explicit terms on their website is pretty clear.

vageli · on Aug 1, 2017

Have you ever heard of "headless browsers" (like [chrome](https://github.com/dhamaniasad/HeadlessBrowsers/issues/37)? What are some defining characteristics of browsers that are absent in scraping clients? If I open a browser window while doing the scraping is that acceptable?

__jal · on Aug 1, 2017

I very rarely use robots, and think I've only been "abusive" (not really abusive, in my book) once.

What if I send a null UA? Or use it as an opportunity to share my favorite quote?

What if the behavior of my software doesn't attack like a robot, does keep the request volume reasonable (use whatever you think is reasonable here) but also doesn't do what you might expect a human clicking around to do?

havetocharge · on July 31, 2017

There isn't a universal 'I'm a bot' setting. There are user agent conventions, but they are hardly standard. Your point works in theory, but it's not something one can just implement and be reasonably confident that they won't be scraped.

jboy55 · on Aug 1, 2017

No, it won't prevent being scraped. That's not the point I was making.

The point is, the scraper would have to hide their intentions and identity, which removes any claim they are being 'honest' in their intentions and not trying to circumvent the provider of the services efforts to prevent scraping.

arcbyte · on July 31, 2017

The user agent header is not an authorization header. There is explicitly an authorization header.