Related to T3945
Details
- Reviewers
anlambert Alphare - Group Reviewers
Reviewers - Maniphest Tasks
- T3945: staging: List and ingest bzr origins, analyse and fix paper cuts
- Commits
- rDLS262f9369c837: launchpad: Allow bzr origins listing
tox
and docker is happy too:
Lister log (with a pprint on the collection it lists)
swh-lister_1 | [2022-02-16 16:33:50,268: INFO/MainProcess] Task swh.lister.launchpad.tasks.FullLaunchpadLister[f3e3f3aa-8f4a-4e2c-8821-facd4952e53e] received
swh-lister_1 | ('git', <lazr.restfulclient.resource.Collection object at 0x7ff52ebef290>)
swh-lister_1 | ('bzr', <lazr.restfulclient.resource.Collection object at 0x7ff52d1a9450>)
...[1] scheduler db in docker, it's listing new bzr origins (no bzr prior to the run):
17:58:57 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
| now | count |
+-------------------------------+-------+
| 2022-02-16 16:59:07.267496+00 | 21000 |
+-------------------------------+-------+
(1 row)
Time: 5.236 ms
17:59:07 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
| now | count |
+-------------------------------+-------+
| 2022-02-16 16:59:46.291344+00 | 22000 |
+-------------------------------+-------+
(1 row)
Time: 5.584 ms
18:00:23 swh-scheduler@localhost:5433=# select now(), * from listed_origins where visit_type='bzr' order by last_update desc limit 1;
+-[ RECORD 1 ]-----------+-------------------------------------------------------------------------------+
| now | 2022-02-16 17:00:34.779024+00 |
| lister_id | 9290a3f8-6896-47ea-81b3-e3adc9df21be |
| url | https://code.launchpad.net/~ubuntu-branches/ubuntu/karmic/libvncserver/karmic |
| visit_type | bzr |
| extra_loader_arguments | {} |
| enabled | t |
| first_seen | 2022-02-16 16:59:53.309055+00 |
| last_seen | 2022-02-16 16:59:53.309055+00 |
| last_update | 2009-06-27 00:56:06.928908+00 |
+------------------------+-------------------------------------------------------------------------------+
Time: 11.362 msAfter an incremental run:
19:59:26 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+--------+
| now | count |
+-------------------------------+--------+
| 2022-02-17 08:18:45.201575+00 | 168000 |
+-------------------------------+--------+
(1 row)
Time: 20.536 ms
09:18:45 swh-scheduler@localhost:5433=# select * from listers where name='launchpad';
+-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------+
| id | 9290a3f8-6896-47ea-81b3-e3adc9df21be |
| name | launchpad |
| instance_name | launchpad |
| created | 2022-02-16 16:24:45.466527+00 |
| current_state | {"bzr_date_last_modified": "2009-09-10T10:21:25+00:00", "git_date_last_modified": "2022-02-16T19:07:16.970183+00:00"} |
| updated | 2022-02-16 21:25:33.628123+00 |
+---------------+-----------------------------------------------------------------------------------------------------------------------+
Time: 0.414 msDiff Detail
- Repository
- rDLS Listers
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D7193 (id=26070)
Rebasing onto 31b4429ced...
Current branch diff-target is up to date.
Changes applied before test
commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date: Wed Feb 16 17:56:13 2022 +0100
launchpad: Allow bzr origins listing
Related to T3945See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/454/ for more details.
| swh/lister/launchpad/lister.py | ||
|---|---|---|
| 30 | That means either i'll reset the state in the scheduling db or i'll alter the data when deploying this. | |
Looks good to me, I added some nitpick comments.
| swh/lister/launchpad/lister.py | ||
|---|---|---|
| 30 | I think altering the JSON data in the scheduler db should be a good move as we already listed plenty of git repos. 21:54 $ psql service=swh-scheduler
psql (12.10 (Debian 12.10-1.pgdg110+1), server 12.9 (Debian 12.9-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
softwareheritage-scheduler=> select current_state from listers where name = 'launchpad';
current_state
------------------------------------------------------------
{"date_last_modified": "2022-02-16T19:32:09.400561+00:00"}
(1 row) | |
| 73–78 | Use a tuple instead of a list. | |
| 82 | same here | |
| 121–123 | We can now remove the previous origin check as @vsellier fixes the duplicated origin insertion in the scheduler db in rDSCH0a6aac583adff2c55069c9da676ad95670e35708. | |
Thanks for the reviews
Looks good to me, I added some nitpick comments.
Good points, i'll adapt in another commit to avoid rebasing gazillion of diffs.
| swh/lister/launchpad/lister.py | ||
|---|---|---|
| 30 | yes, i think so as well. | |