Page MenuHomeSoftware Heritage

bitbucket lister: Deal with connection errors
Started, Work in Progress, NormalPublic

Description

Bitbucket lister did more work than before, currently listed 510k repositories (againt 212 IMSMW).
It seems to be stuck there though.

It's either a rate limit, a blacklist or a bug happening somewhere.

From kibana's dashboard, we can see the following [3] (the date is the index).

[2019-06-28 08:46:30,942: WARNING/ForkPoolWorker-5] giving up on 2015-05-02 12:37:07.608575+00:00: max retries exceeded
[2019-06-28 08:46:20,931: WARNING/ForkPoolWorker-5] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
[2019-06-28 08:46:09,366: WARNING/ForkPoolWorker-5] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds

...
Jul 04 07:57:20 worker09 python3[15600]: [2019-07-04 07:57:20,742: INFO/MainProcess] Received task: swh.lister.bitbucket.tasks.IncrementalBitBucketLister[5f4b1d51-c3e0-4615-9a55-59676ef13969]
Jul 04 07:57:22 worker09 python3[15609]: [2019-07-04 07:57:22,565: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:33 worker09 python3[15609]: [2019-07-04 07:57:33,918: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:45 worker09 python3[15609]: [2019-07-04 07:57:45,114: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:56 worker09 python3[15609]: [2019-07-04 07:57:56,625: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:08 worker09 python3[15609]: [2019-07-04 07:58:08,369: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:19 worker09 python3[15609]: [2019-07-04 07:58:19,629: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:30 worker09 python3[15609]: [2019-07-04 07:58:30,819: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:40 worker09 python3[15609]: [2019-07-04 07:58:40,830: WARNING/ForkPoolWorker-2] giving up on 2015-05-02 12:37:07.608575+00:00: max retries exceeded
Jul 04 07:58:40 worker09 python3[15609]: [2019-07-04 07:58:40,896: INFO/ForkPoolWorker-2] Task swh.lister.bitbucket.tasks.IncrementalBitBucketLister[5f4b1d51-c3e0-4615-9a55-59676ef13969] succeeded in 80.05816953699104s: None

As there are rate limit policies in bitbucket [2], that could be the reason.

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/ce225b20-998c-11e9-ad62-c3b20d6a84bb

[2] https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html?_ga=2.125982285.1517600277.1561721528-61276570.1561721528

[3] exception's log message needs to be improved btw

Event Timeline

ardumont created this task.Jun 28 2019, 1:41 PM
ardumont triaged this task as Normal priority.
ardumont updated the task description. (Show Details)Jun 28 2019, 1:44 PM
ardumont renamed this task from bitbucket lister: Deal with rate limit to bitbucket lister: Deal with connection errors.Jun 28 2019, 1:50 PM
ardumont updated the task description. (Show Details)

Trying to determine if it's a blacklist per ip, i rescheduled this with another worker (worker02).
It got stuck with the same error.

It still could be a range ips blacklist though.

ardumont updated the task description. (Show Details)Jul 4 2019, 10:01 AM
ardumont updated the task description. (Show Details)
ardumont added a comment.EditedJul 4 2019, 10:49 AM

So possibly a blacklist per range of ips.

Maybe not.

Adding some log to the current situation, we got this:

Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,974: DEBUG/ForkPoolWorker-5] method: get
Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,975: DEBUG/ForkPoolWorker-5] path: https://api.bitbucket.org/2.0/repositories?after=2015-05-02T12%3A37%3A07.608575%2B00%3A00&pagelen=100
Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,976: DEBUG/ForkPoolWorker-5] params: {'headers': {'User-Agent': 'Software Heritage lister (0.0.32)'}}
Jul 04 08:42:56 worker09 python3[20042]: [2019-07-04 08:42:56,630: DEBUG/ForkPoolWorker-5] response: <Response [500]>
Jul 04 08:42:56 worker09 python3[20042]: [2019-07-04 08:42:56,631: DEBUG/ForkPoolWorker-5] response headers: {'Content-Length': '105', 'X-Version': 'b0b59be904d8', 'X-Served-By': 'app-162', 'Strict-Transport-Security': 'max-age=31536000; $ncludeSubDomains; preload', 'Date': 'Thu, 04 Jul 2019 08:42:56 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'X-Render-Time': '1.16090798378', 'Vary': 'Authorization', 'Connection': 'close', 'Cache-Control': 'max-age=900', 'Content-Type': 'appli$ation/json; charset=utf-8', 'X-Request-Count': '583', 'X-Cache-Info': 'not cacheable; response code not cacheable', 'ETag': '"3dace379989468726567f75283443f34"', 'X-Static-Version': 'b0b59be904d8', 'Server': 'nginx', 'X-Accepted-OAuth-Sco$es': 'repository', 'X-Sentry-ID': '479821ff1daa4c8cb6fb295d7e59cdc6'}

Trying to get something from the browser on the same api endpoint:

{"type": "error", "error": {"message": "Something went wrong", "id": "de9a14cdc5ec4b6685964676c4ea87f1"}}

So definitely something is wrong on the other side.

ardumont added a comment.EditedJul 4 2019, 11:30 AM

I have reported the issue to the atlassian team [1] with mostly the following analysis:

issue

Bisecting calls from repository listing api, it starts to fail from the following api call:
https://api.bitbucket.org/2.0/repositories?pagelen=1&after=2015-05-02T15%3A37%3A26.103001%2B00%3A00

The response has status 500 with the following content:

{"type": "error", "error": {"message": "Something went wrong", "id": "d8bf59f4321e46d2999e5a635f95f02c"}}

Adding 1ms in the future, the listing starts to be ok again:
https://api.bitbucket.org/2.0/repositories?pagelen=1&after=2015-05-02T15%3A37%3A26.103002%2B00%3A00

when

Error triggered from following the next link of the response of:
https://api.bitbucket.org/2.0/repositories?after=2015-05-02T12%3A37%3A07.608575%2B00%3A00&pagelen=50


Note:
When creating the atlassian ticket, I have cc-ed swh-devel@inria.fr, no idea if that will work or not though

[1] https://getsupport.atlassian.com/servicedesk/customer/portal/11/BBS-110670

ardumont changed the task status from Open to Work in Progress.Jul 4 2019, 11:30 AM

I have cc-ed swh-devel@inria.fr, no idea if that will work or not though

It does not (they expected a bitbucket team or some such).
In any case, i've referenced the content of the ticket so, it should be ok.

ardumont updated the task description. (Show Details)Jul 4 2019, 4:36 PM
ardumont updated the task description. (Show Details)Mon, Aug 26, 10:31 AM

Ticket [1] is closed because i did not reply fast enough.

They asked for authentication information we do not use (there is no api authentication required for the listing api).

I replied as much (they apparently allows the user to re-open but when actually trying to, it's not working...).

(they apparently allows the user to re-open but when actually trying to, it's not working...).

In the end, the reply i made automatically reopened the ticket.


There is currently a discussion ongoing.
So far they actually involved the development team (of the api i guess).
So it's in progress.

My latest understanding of the issue is that the users/repositories listed at this particular erratic page would not exist, thus raising an unexpected error on the listing api...
But it's not confirmed.

They are still working on it...
(last update from 04/09/2019 mentioning it's still wip on their dev team side).