Revisions and Commits
rDLS Listers | |||
D5785 | rDLS9ca5295a408b sourceforge: retry for all retryable exceptions | ||
D5715 | rDLS2ff549e12593 sourceforge/tasks: Allow incremental listing | ||
D5714 | rDLS7282647bb2d3 sourceforge/lister: Add credentials parameter | ||
D5712 | rDLS3167a6dcb736 sourceforge/tests: Ensure correct sleep function gets mocked | ||
D5711 | rDLS1284eb158703 sourceforge/tests: Fix failing test with tenacity < 5.1 | ||
rSPSITE puppet-swh-site | |||
D5713 | rSPSITE06bd2cdec0d5 Deploy new sourceforge lister task |
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T3315 archive SourceForge | ||
Migrated | gitlab-migration | T735 SourceForge lister | ||
Migrated | gitlab-migration | T3310 Deploy sourceforge lister on staging |
Event Timeline
Note that it helped yet but i reproduced the issue in jenkins *locally*.
Prior to that, other issues with our moving cogs (swh.core, etc...) prevented it
(other unrelated failures arose).
Installing the latest package on a worker:
swh lister run --help | grep sourceforge -l, --lister [bitbucket|cgit|cran|debian|gitea|github|gitlab|gnu|launchpad|npm|packagist|phabricator|pypi|sourceforge]
As expected, It's here ;)
Update the scheduler backend with the new task type:
# apt update; apt install -y python3-swh.lister ... $ swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task-type register INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.archive INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.cran INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.debian INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.deposit INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.nixguix INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.npm INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.pypi INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.bitbucket INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.cgit INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.cran INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.debian INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitea INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.github INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gitlab INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.gnu INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.launchpad INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.npm INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.packagist INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.phabricator INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.pypi INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.sourceforge INFO:swh.scheduler.cli.task_type:Create task type list-sourceforge-full in scheduler
(i did saatchi/prod as well)
Check everything is fine (it is):
psql service=admin-staging-swh-scheduler psql (12.6) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. swh-scheduler=> \conninfo You are connected to database "swh-scheduler" as user "swh-scheduler" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432". SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) swh-scheduler=> \x Expanded display is on. swh-scheduler=> select * from task_type where type like 'list-source%'; -[ RECORD 1 ]----+--------------------------------------------------- type | list-sourceforge-full description | Full update of a SourceForge instance backend_name | swh.lister.sourceforge.tasks.FullSourceForgeLister default_interval | 90 days min_interval | 90 days max_interval | 90 days backoff_factor | 1 max_queue_length | num_retries | retry_delay | -- make some more gentle default swh-scheduler=> update task_type set max_queue_length=10, min_interval='30 days', max_interval='30 days', num_retries=3 where type='list-sourceforge-full'; UPDATE 1 swh-scheduler=> select * from task_type where type like 'list-source%'; -[ RECORD 1 ]----+--------------------------------------------------- type | list-sourceforge-full description | Full update of a SourceForge instance backend_name | swh.lister.sourceforge.tasks.FullSourceForgeLister default_interval | 90 days min_interval | 30 days max_interval | 30 days backoff_factor | 1 max_queue_length | 10 num_retries | 3 retry_delay |
(note: we may want to adapt those in the lister repository in the register function).
Schedule the new listing task:
probably want
swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task add list-sourceforge-full Created 1 tasks Task 22782916 Next run: today (2021-05-07T14:01:29.354886+00:00) Interval: 90 days, 0:00:00 Type: list-sourceforge-full Policy: recurring Args: Keyword args:
Scheduler runner picked it up:
May 07 14:01:31 scheduler0 swh[824184]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks list-sourceforge-full
That got picked and failed:
May 07 14:01:32 worker2 python3[218671]: [2021-05-07 14:01:32,495: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.FullSourceForgeLister[1eb27c36-2f58-4a33-8c9d-10b15b98a294] May 07 14:01:32 worker2 python3[218680]: [2021-05-07 14:01:32,541: ERROR/ForkPoolWorker-4] Task swh.lister.sourceforge.tasks.FullSourceForgeLister[1eb27c36-2f58-4a33-8c9d-10b15b98a294] raised unexpected: TypeError("__init__() got an unexpected keyword argument 'credentials'") Traceback (most recent call last): File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 385, in trace_task R = retval = fun(*args, **kwargs) File "/usr/lib/python3/dist-packages/swh/scheduler/task.py", line 55, in __call__ result = super().__call__(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 650, in __protected_call__ return self.run(*args, **kwargs) File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 161, in _inner reraise(*exc_info) File "/usr/lib/python3/dist-packages/sentry_sdk/_compat.py", line 57, in reraise raise value File "/usr/lib/python3/dist-packages/sentry_sdk/integrations/celery.py", line 156, in _inner return f(*args, **kwargs) File "/usr/lib/python3/dist-packages/swh/lister/sourceforge/tasks.py", line 15, in list_sourceforge_full return SourceForgeLister.from_configfile().run().dict() File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 268, in from_configfile return cls.from_config(**config) File "/usr/lib/python3/dist-packages/swh/lister/pattern.py", line 255, in from_config return cls(scheduler=scheduler_instance, **config) TypeError: __init__() got an unexpected keyword argument 'credentials'
I'll adapt (but afk).
There is something else i need to update there anyway, the incremental task.
I'll adapt (but afk).
fixed.
There is something else i need to update there anyway, the incremental task.
done as well.
Deployment in progress.
Deployment in progress.
Package built, deployment done.
Added the incremental sourceforge task as well (staging).
INFO:swh.scheduler.cli.task_type:Create task type list-sourceforge-incremental in scheduler
Scheduled back the full listing task which got scheduled:
May 07 15:35:02 scheduler0 swh[824184]: INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks list-sourceforge-full
It's now running:
May 07 15:31:58 worker0 python3[230921]: [2021-05-07 15:31:58,779: INFO/MainProcess] lister@worker0.internal.staging.swh.network ready. May 07 15:35:02 worker0 python3[230921]: [2021-05-07 15:35:02,091: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.FullSourceForgeLister[ec00c8cd-ff5b-47df-adfd-a8c1884b9831] May 07 15:35:06 worker0 python3[230930]: [2021-05-07 15:35:06,698: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/adobe/wiki' does not have any tools May 07 15:35:07 worker0 python3[230930]: [2021-05-07 15:35:07,402: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/adobe/blog' does not have any tools
And we can see the new lister appear in the scheduler backend:
swh-scheduler=> \conninfo You are connected to database "swh-scheduler" as user "swh-scheduler" on host "db1.internal.staging.swh.network" (address "192.168.130.11") at port "5432". SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) swh-scheduler=> select * from listers where name ='sourceforge'; id | name | instance_name | created | current_state | updated --------------------------------------+-------------+---------------+-------------------------------+---------------+------------------------------- 4b19e941-5e25-4cb0-b55d-ae421d983e2f | sourceforge | main | 2021-05-07 15:35:02.157958+00 | {} | 2021-05-07 15:35:02.157958+00 (1 row)
It broke with the following, sentry should have more detail [1]
May 07 15:57:03 worker0 python3[230930]: [2021-05-07 15:57:03,547: ERROR/ForkPoolWorker-4] Task swh.lister.sourceforge.tasks.FullSourceForgeLister[ec00c8cd-ff5b-47df-adfd-a8c1884b9831] raised unexpected: HTTPError('404 Client Error: Not Found for url: https://sourceforge.net/rest/p/fci-cu-library2/b396')
[1] https://sentry.softwareheritage.org/share/issue/06c779e53f7a47c582d8e551662fb65f/
1.3.1 [1] packaged and deployed on staging worker
Scheduled back there:
May 19 09:41:15 worker2 python3[1285423]: [2021-05-19 09:41:15,280: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.FullSourceForgeLister[0d30a736-4f1d-491b-b703-13118a33b7fb]
[1] With the fixes from Alphare
It no longer stops on unexpected 404 ;)
May 19 09:58:54 worker2 python3[1285433]: [2021-05-19 09:58:54,339: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/fci-cu-library2/b396 May 19 09:59:52 worker2 python3[1285433]: [2021-05-19 09:59:52,095: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/manijshrestha/salesathi May 19 09:59:58 worker2 python3[1285433]: [2021-05-19 09:59:58,657: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/mp-wrapper May 19 10:00:11 worker2 python3[1285433]: [2021-05-19 10:00:11,044: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/pasoiu May 19 10:00:15 worker2 python3[1285433]: [2021-05-19 10:00:15,673: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/sga-ds
Still running:
May 19 09:41:15 worker2 python3[1285423]: [2021-05-19 09:41:15,280: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.FullSourceForgeLister[0d30a736-4f1d-491b-b703-13118a33b7fb] May 19 09:41:20 worker2 python3[1285433]: [2021-05-19 09:41:20,376: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/adobe/wiki' does not have any tools May 19 09:41:20 worker2 python3[1285433]: [2021-05-19 09:41:20,921: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/adobe/blog' does not have any tools May 19 09:58:54 worker2 python3[1285433]: [2021-05-19 09:58:54,339: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/fci-cu-library2/b396 May 19 09:59:52 worker2 python3[1285433]: [2021-05-19 09:59:52,095: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/manijshrestha/salesathi May 19 09:59:58 worker2 python3[1285433]: [2021-05-19 09:59:58,657: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/mp-wrapper May 19 10:00:11 worker2 python3[1285433]: [2021-05-19 10:00:11,044: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/pasoiu May 19 10:00:15 worker2 python3[1285433]: [2021-05-19 10:00:15,673: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/sga-ds May 19 10:09:20 worker2 python3[1285433]: [2021-05-19 10:09:20,460: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/chaoticmoon/home' does not have any tools May 19 10:14:29 worker2 python3[1285433]: [2021-05-19 10:14:29,447: WARNING/ForkPoolWorker-4] Project URL 'https://sourceforge.net/motorola/' does not match expected pattern May 19 10:14:29 worker2 python3[1285433]: [2021-05-19 10:14:29,649: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/motorola/wiki' does not have any tools May 19 10:14:30 worker2 python3[1285433]: [2021-05-19 10:14:30,136: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/motorola/discussion' does not have any tools May 19 10:14:30 worker2 python3[1285433]: [2021-05-19 10:14:30,542: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/motorola/news' does not have any tools May 19 10:50:00 worker2 python3[1285423]: [2021-05-19 10:50:00,600: INFO/MainProcess] Received task: swh.lister.gitlab.tasks.IncrementalGitLabLister[0225ac44-3b3b-4653-af3f-110f547123d6] May 19 10:50:00 worker2 python3[1285430]: [2021-05-19 10:50:00,767: INFO/ForkPoolWorker-1] Task swh.lister.gitlab.tasks.IncrementalGitLabLister[0225ac44-3b3b-4653-af3f-110f547123d6] succeeded in 0.14286251738667488s: {'pages': 1, 'origins': 0} May 19 11:32:49 worker2 python3[1285433]: [2021-05-19 11:32:49,647: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 404 for URL https://sourceforge.net/rest/p/intel-sas May 19 13:10:23 worker2 python3[1285433]: [2021-05-19 13:10:23,221: WARNING/ForkPoolWorker-4] Project URL 'https://sourceforge.net/mirror/' does not match expected pattern May 19 15:27:45 worker2 python3[1285423]: [2021-05-19 15:27:45,729: INFO/MainProcess] Received task: swh.lister.debian.tasks.DebianListerTask[edd4782b-22e0-4db7-9fd1-c3c1b65c8930] May 19 15:28:35 worker2 python3[1285430]: [2021-05-19 15:28:35,076: INFO/ForkPoolWorker-1] Task swh.lister.debian.tasks.DebianListerTask[edd4782b-22e0-4db7-9fd1-c3c1b65c8930] succeeded in 49.340988324955106s: {'pages': 9, 'origins': 11} May 19 17:54:29 worker2 python3[1285423]: [2021-05-19 17:54:29,454: INFO/MainProcess] Received task: swh.lister.pypi.tasks.PyPIListerTask[beb8cebc-955c-47d6-9c6a-daa34cebe759] May 19 18:04:04 worker2 python3[1285430]: [2021-05-19 18:04:04,660: INFO/ForkPoolWorker-1] Task swh.lister.pypi.tasks.PyPIListerTask[beb8cebc-955c-47d6-9c6a-daa34cebe759] succeeded in 575.1989205237478s: {'pages': 1, 'origins': 305318} May 19 22:32:26 worker2 python3[1285433]: [2021-05-19 22:32:26,578: WARNING/ForkPoolWorker-4] Project URL 'https://sourceforge.net/arris/' does not match expected pattern May 19 22:32:26 worker2 python3[1285433]: [2021-05-19 22:32:26,769: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/arris/wiki' does not have any tools May 19 22:32:27 worker2 python3[1285433]: [2021-05-19 22:32:27,044: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/arris/discussion' does not have any tools May 19 22:32:27 worker2 python3[1285433]: [2021-05-19 22:32:27,896: WARNING/ForkPoolWorker-4] Project 'https://sourceforge.net/rest/arris/news' does not have any tools May 20 01:45:47 worker2 python3[1285433]: [2021-05-20 01:45:47,985: WARNING/ForkPoolWorker-4] Unexpected HTTP status code 500 for URL https://sourceforge.net/rest/p/sightexaminer
Note: Other listing is happening alongside thus the "noise" in the output
It finally stopped, albeit poorly, the remote closed the connection.
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
[1] https://sentry.softwareheritage.org/share/issue/881876c208e24e89a0eb753410c41f72/
Sorry for the delayed response. I'm assuming we'd like it better if the lister continued anyway in case of a "fatal" connection error, with maybe some sort of retry?
Sorry for the delayed response.
Don't worry about it, it's fine.
I'm assuming we'd like it better if the lister continued anyway in case of a "fatal"
connection error, with maybe some sort of retry?
Yes, it'd be neat. Also, if we had such implementation as a decorator like the other
retry we got [1], we could share it on other listers as well (I don't recall we have
this already but I guess that could happen with other listers as well).
[1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/utils.py
New lister deployed:
ii python3-swh.lister 1.3.2-1~swh1~bpo10+1 all Software Heritage Listers (bitbucket, git(lab|hub), pypi, etc...)
Task scheduled, let's see:
May 26 10:56:42 worker2 python3[1850675]: [2021-05-26 10:56:42,737: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.FullSourceForgeLister[38d47353-1721-4fd6-adac-b96d494efca0]
It went through \o/:
INFO/ForkPoolWorker-4] Task swh.lister.sourceforge.tasks.FullSourceForgeLister[38d47353-1721-4fd6-adac-b96d494efca0] succeeded in 89751.71981433034s: {'pages': 258764, 'origins': 338175}
Installed and triggered a run for the incremental task on staging:
May 27 15:10:48 worker0 python3[2047265]: [2021-05-27 15:10:48,563: INFO/MainProcess] Received task: swh.lister.sourceforge.tasks.IncrementalSourceForgeLister[15a585c3-73ab-4522-b799-f7f768c430a6]
Everything went fine as well:
May 27 19:25:29 worker0 python3[2047275]: [2021-05-27 19:25:29,379: INFO/ForkPoolWorker-4] Task swh.lister.sourceforge.tasks.IncrementalSourceForgeLister[15a585c3-73ab-4522-b799-f7f768c430a6] succeeded in 15280.69617000036s: {'pages': 818, 'origins': 1408}
roh, you know what i meant forge, not claim the task, resolve it... (anyway, closing)