Page MenuHomeSoftware Heritage

Web API: flaky 403 responses between python requests and curl
Closed, MigratedEdits Locked

Description

We have some weird caching effects in the Web API deployment that make python requests library fail with 403 the first time an object is requested. On the other hand curl on the same URL is happy and once it passes it also makes subsequent GETs made with python requests happy too (for a while).

Example:

$ python3 -c "import requests ; print(requests.get('https://archive.softwareheritage.org/api/1/directory/977fc4b98c0e85816348cebd3b12026407c368b6/').status_code)"
403
$ curl -i https://archive.softwareheritage.org/api/1/directory/977fc4b98c0e85816348cebd3b12026407c368b6/ | head -n 1
HTTP/1.1 200 OK
$ python3 -c "import requests ; print(requests.get('https://archive.softwareheritage.org/api/1/directory/977fc4b98c0e85816348cebd3b12026407c368b6/').status_code)"
200

I can reliably reproduce this with a number of different API endpoints and from at least 2 different networks.

Event Timeline

zack triaged this task as Normal priority.Dec 17 2019, 2:39 PM
zack created this task.
zack claimed this task.

Turns out this was due to a left-over username/password in my ~/.netrc, which apparently requests uses by default and curl doesn't.
(But why the heck the caching effect? I've no idea...)