Off topic but for years I've been using a one-off proxy to strip javascript and ...

1ark · on May 25, 2023

They are probably just checking headers such as user agent and cookies. Would copy whatever your normal browser sends and put it in the urllib.request. If that doesn’t work, then it is likely more sophisticated.

throwaway81523 · on May 25, 2023

I will try that, but a quick look at the error page makes me think it tries to run a javascript blob.

ksala_ · on May 25, 2023

They're just checking the user agent

    $ curl -s -I 'https://www.sfgate.com/' -H 'User-Agent: curl/7.54.1' | head -1
    HTTP/2 403
    
    $curl -s -I 'https://www.sfgate.com/' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0' | head -1   
    HTTP/2 200

One "trick" is that Firefox (and I assume Chrome?) allow you to copy a request as curl - then you can just see if that works in the terminal, and if it does you can binary search for the required headers.

chrisco255 · on May 25, 2023

It probably does. But there are better modern tools like headless Chrome / Puppeteer that can fully render a page with scripts.

withinboredom · on May 25, 2023

Sounds like an ADA lawsuit waiting to happen. I'd send the editor an email explaining how they've reduced usability of the site; especially if you're a paying customer.