Related to T3945
Details
- Reviewers
anlambert Alphare - Group Reviewers
Reviewers - Maniphest Tasks
- T3945: staging: List and ingest bzr origins, analyse and fix paper cuts
- Commits
- rDLS262f9369c837: launchpad: Allow bzr origins listing
tox
and docker is happy too:
Lister log (with a pprint on the collection it lists)
swh-lister_1 | [2022-02-16 16:33:50,268: INFO/MainProcess] Task swh.lister.launchpad.tasks.FullLaunchpadLister[f3e3f3aa-8f4a-4e2c-8821-facd4952e53e] received swh-lister_1 | ('git', <lazr.restfulclient.resource.Collection object at 0x7ff52ebef290>) swh-lister_1 | ('bzr', <lazr.restfulclient.resource.Collection object at 0x7ff52d1a9450>) ...
[1] scheduler db in docker, it's listing new bzr origins (no bzr prior to the run):
17:58:57 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr'; +-------------------------------+-------+ | now | count | +-------------------------------+-------+ | 2022-02-16 16:59:07.267496+00 | 21000 | +-------------------------------+-------+ (1 row) Time: 5.236 ms 17:59:07 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr'; +-------------------------------+-------+ | now | count | +-------------------------------+-------+ | 2022-02-16 16:59:46.291344+00 | 22000 | +-------------------------------+-------+ (1 row) Time: 5.584 ms 18:00:23 swh-scheduler@localhost:5433=# select now(), * from listed_origins where visit_type='bzr' order by last_update desc limit 1; +-[ RECORD 1 ]-----------+-------------------------------------------------------------------------------+ | now | 2022-02-16 17:00:34.779024+00 | | lister_id | 9290a3f8-6896-47ea-81b3-e3adc9df21be | | url | https://code.launchpad.net/~ubuntu-branches/ubuntu/karmic/libvncserver/karmic | | visit_type | bzr | | extra_loader_arguments | {} | | enabled | t | | first_seen | 2022-02-16 16:59:53.309055+00 | | last_seen | 2022-02-16 16:59:53.309055+00 | | last_update | 2009-06-27 00:56:06.928908+00 | +------------------------+-------------------------------------------------------------------------------+ Time: 11.362 ms
After an incremental run:
19:59:26 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr'; +-------------------------------+--------+ | now | count | +-------------------------------+--------+ | 2022-02-17 08:18:45.201575+00 | 168000 | +-------------------------------+--------+ (1 row) Time: 20.536 ms 09:18:45 swh-scheduler@localhost:5433=# select * from listers where name='launchpad'; +-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------+ | id | 9290a3f8-6896-47ea-81b3-e3adc9df21be | | name | launchpad | | instance_name | launchpad | | created | 2022-02-16 16:24:45.466527+00 | | current_state | {"bzr_date_last_modified": "2009-09-10T10:21:25+00:00", "git_date_last_modified": "2022-02-16T19:07:16.970183+00:00"} | | updated | 2022-02-16 21:25:33.628123+00 | +---------------+-----------------------------------------------------------------------------------------------------------------------+ Time: 0.414 ms
Diff Detail
- Repository
- rDLS Listers
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build is green
Patch application report for D7193 (id=26070)
Rebasing onto 31b4429ced...
Current branch diff-target is up to date.
Changes applied before test
commit 262f9369c837e293f8389dd9f7a6a965c09f621e Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Date: Wed Feb 16 17:56:13 2022 +0100 launchpad: Allow bzr origins listing Related to T3945
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/454/ for more details.
swh/lister/launchpad/lister.py | ||
---|---|---|
30 | That means either i'll reset the state in the scheduling db or i'll alter the data when deploying this. |
Looks good to me, I added some nitpick comments.
swh/lister/launchpad/lister.py | ||
---|---|---|
30 | I think altering the JSON data in the scheduler db should be a good move as we already listed plenty of git repos. 21:54 $ psql service=swh-scheduler psql (12.10 (Debian 12.10-1.pgdg110+1), server 12.9 (Debian 12.9-1.pgdg110+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. softwareheritage-scheduler=> select current_state from listers where name = 'launchpad'; current_state ------------------------------------------------------------ {"date_last_modified": "2022-02-16T19:32:09.400561+00:00"} (1 row) | |
73–78 | Use a tuple instead of a list. | |
82 | same here | |
121–123 | We can now remove the previous origin check as @vsellier fixes the duplicated origin insertion in the scheduler db in rDSCH0a6aac583adff2c55069c9da676ad95670e35708. |
Thanks for the reviews
Looks good to me, I added some nitpick comments.
Good points, i'll adapt in another commit to avoid rebasing gazillion of diffs.
swh/lister/launchpad/lister.py | ||
---|---|---|
30 | yes, i think so as well. |