Page MenuHomeSoftware Heritage

bitbucket lister: Deal with connection errors
Started, Work in Progress, NormalPublic

Description

bitbucket lister did more work than before, currently listed 510k repositories.

But it seems to be stuck there.

It's either a rate limit, a blacklist or a bug happening somewhere.

From kibana's dashboard, we can see the following (log message needs to be improved), the date is the index.

[2019-06-28 08:46:30,942: WARNING/ForkPoolWorker-5] giving up on 2015-05-02 12:37:07.608575+00:00: max retries exceeded
[2019-06-28 08:46:20,931: WARNING/ForkPoolWorker-5] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
[2019-06-28 08:46:09,366: WARNING/ForkPoolWorker-5] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds

...
Jul 04 07:57:20 worker09 python3[15600]: [2019-07-04 07:57:20,742: INFO/MainProcess] Received task: swh.lister.bitbucket.tasks.IncrementalBitBucketLister[5f4b1d51-c3e0-4615-9a55-59676ef13969]
Jul 04 07:57:22 worker09 python3[15609]: [2019-07-04 07:57:22,565: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:33 worker09 python3[15609]: [2019-07-04 07:57:33,918: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:45 worker09 python3[15609]: [2019-07-04 07:57:45,114: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:57:56 worker09 python3[15609]: [2019-07-04 07:57:56,625: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:08 worker09 python3[15609]: [2019-07-04 07:58:08,369: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:19 worker09 python3[15609]: [2019-07-04 07:58:19,629: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:30 worker09 python3[15609]: [2019-07-04 07:58:30,819: WARNING/ForkPoolWorker-2] connection error on 2015-05-02 12:37:07.608575+00:00: sleep for 10 seconds
Jul 04 07:58:40 worker09 python3[15609]: [2019-07-04 07:58:40,830: WARNING/ForkPoolWorker-2] giving up on 2015-05-02 12:37:07.608575+00:00: max retries exceeded
Jul 04 07:58:40 worker09 python3[15609]: [2019-07-04 07:58:40,896: INFO/ForkPoolWorker-2] Task swh.lister.bitbucket.tasks.IncrementalBitBucketLister[5f4b1d51-c3e0-4615-9a55-59676ef13969] succeeded in 80.05816953699104s: None

There exists rate limit policies in bitbucket [2], so that might be the reason.

[1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/ce225b20-998c-11e9-ad62-c3b20d6a84bb

[2] https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html?_ga=2.125982285.1517600277.1561721528-61276570.1561721528

Event Timeline

ardumont created this task.Fri, Jun 28, 1:41 PM
ardumont triaged this task as Normal priority.
ardumont updated the task description. (Show Details)Fri, Jun 28, 1:44 PM
ardumont renamed this task from bitbucket lister: Deal with rate limit to bitbucket lister: Deal with connection errors.Fri, Jun 28, 1:50 PM
ardumont updated the task description. (Show Details)

Trying to determine if it's a blacklist per ip, i rescheduled this with another worker (worker02).
It got stuck with the same error.

So possibly a blacklist per range of ips.

ardumont updated the task description. (Show Details)Thu, Jul 4, 10:01 AM
ardumont updated the task description. (Show Details)
ardumont added a comment.EditedThu, Jul 4, 10:49 AM

So possibly a blacklist per range of ips.

Maybe not.

Adding some log to the current situation, we got this:

Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,974: DEBUG/ForkPoolWorker-5] method: get
Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,975: DEBUG/ForkPoolWorker-5] path: https://api.bitbucket.org/2.0/repositories?after=2015-05-02T12%3A37%3A07.608575%2B00%3A00&pagelen=100
Jul 04 08:42:54 worker09 python3[20042]: [2019-07-04 08:42:54,976: DEBUG/ForkPoolWorker-5] params: {'headers': {'User-Agent': 'Software Heritage lister (0.0.32)'}}
Jul 04 08:42:56 worker09 python3[20042]: [2019-07-04 08:42:56,630: DEBUG/ForkPoolWorker-5] response: <Response [500]>
Jul 04 08:42:56 worker09 python3[20042]: [2019-07-04 08:42:56,631: DEBUG/ForkPoolWorker-5] response headers: {'Content-Length': '105', 'X-Version': 'b0b59be904d8', 'X-Served-By': 'app-162', 'Strict-Transport-Security': 'max-age=31536000; $ncludeSubDomains; preload', 'Date': 'Thu, 04 Jul 2019 08:42:56 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'X-Render-Time': '1.16090798378', 'Vary': 'Authorization', 'Connection': 'close', 'Cache-Control': 'max-age=900', 'Content-Type': 'appli$ation/json; charset=utf-8', 'X-Request-Count': '583', 'X-Cache-Info': 'not cacheable; response code not cacheable', 'ETag': '"3dace379989468726567f75283443f34"', 'X-Static-Version': 'b0b59be904d8', 'Server': 'nginx', 'X-Accepted-OAuth-Sco$es': 'repository', 'X-Sentry-ID': '479821ff1daa4c8cb6fb295d7e59cdc6'}

Trying to get something from the browser on the same api endpoint:

{"type": "error", "error": {"message": "Something went wrong", "id": "de9a14cdc5ec4b6685964676c4ea87f1"}}

So definitely something is wrong on the other side.

ardumont added a comment.EditedThu, Jul 4, 11:30 AM

I have reported the issue to the atlassian team [1] with mostly the following analysis:

issue

Bisecting calls from repository listing api, it starts to fail from the following api call:
https://api.bitbucket.org/2.0/repositories?pagelen=1&after=2015-05-02T15%3A37%3A26.103001%2B00%3A00

The response has status 500 with the following content:

{"type": "error", "error": {"message": "Something went wrong", "id": "d8bf59f4321e46d2999e5a635f95f02c"}}

Adding 1ms in the future, the listing starts to be ok again:
https://api.bitbucket.org/2.0/repositories?pagelen=1&after=2015-05-02T15%3A37%3A26.103002%2B00%3A00

when

Error triggered from following the next link of the response of:
https://api.bitbucket.org/2.0/repositories?after=2015-05-02T12%3A37%3A07.608575%2B00%3A00&pagelen=50


Note:
When creating the atlassian ticket, I have cc-ed swh-devel@inria.fr, no idea if that will work or not though

[1] https://getsupport.atlassian.com/servicedesk/customer/portal/11/BBS-110670

ardumont changed the task status from Open to Work in Progress.Thu, Jul 4, 11:30 AM

I have cc-ed swh-devel@inria.fr, no idea if that will work or not though

It does not (they expected a bitbucket team or some such).
In any case, i've referenced the content of the ticket so, it should be ok.

ardumont updated the task description. (Show Details)Thu, Jul 4, 4:36 PM