Page MenuHomeSoftware Heritage

launchpad: Allow bzr origins listing
ClosedPublic

Authored by ardumont on Feb 16 2022, 5:57 PM.

Details

Summary

Related to T3945

Test Plan

tox

and docker is happy too:

Lister log (with a pprint on the collection it lists)

swh-lister_1                        | [2022-02-16 16:33:50,268: INFO/MainProcess] Task swh.lister.launchpad.tasks.FullLaunchpadLister[f3e3f3aa-8f4a-4e2c-8821-facd4952e53e] received
swh-lister_1                        | ('git', <lazr.restfulclient.resource.Collection object at 0x7ff52ebef290>)
swh-lister_1                        | ('bzr', <lazr.restfulclient.resource.Collection object at 0x7ff52d1a9450>)
...

[1] scheduler db in docker, it's listing new bzr origins (no bzr prior to the run):

17:58:57 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
|              now              | count |
+-------------------------------+-------+
| 2022-02-16 16:59:07.267496+00 | 21000 |
+-------------------------------+-------+
(1 row)

Time: 5.236 ms
17:59:07 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+-------+
|              now              | count |
+-------------------------------+-------+
| 2022-02-16 16:59:46.291344+00 | 22000 |
+-------------------------------+-------+
(1 row)

Time: 5.584 ms
18:00:23 swh-scheduler@localhost:5433=# select now(), * from listed_origins where visit_type='bzr' order by last_update desc limit 1;
+-[ RECORD 1 ]-----------+-------------------------------------------------------------------------------+
| now                    | 2022-02-16 17:00:34.779024+00                                                 |
| lister_id              | 9290a3f8-6896-47ea-81b3-e3adc9df21be                                          |
| url                    | https://code.launchpad.net/~ubuntu-branches/ubuntu/karmic/libvncserver/karmic |
| visit_type             | bzr                                                                           |
| extra_loader_arguments | {}                                                                            |
| enabled                | t                                                                             |
| first_seen             | 2022-02-16 16:59:53.309055+00                                                 |
| last_seen              | 2022-02-16 16:59:53.309055+00                                                 |
| last_update            | 2009-06-27 00:56:06.928908+00                                                 |
+------------------------+-------------------------------------------------------------------------------+

Time: 11.362 ms

After an incremental run:

19:59:26 swh-scheduler@localhost:5433=# select now(), count(*) from listed_origins where visit_type='bzr';
+-------------------------------+--------+
|              now              | count  |
+-------------------------------+--------+
| 2022-02-17 08:18:45.201575+00 | 168000 |
+-------------------------------+--------+
(1 row)

Time: 20.536 ms
09:18:45 swh-scheduler@localhost:5433=# select * from listers where name='launchpad';
+-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------+
| id            | 9290a3f8-6896-47ea-81b3-e3adc9df21be                                                                                  |
| name          | launchpad                                                                                                             |
| instance_name | launchpad                                                                                                             |
| created       | 2022-02-16 16:24:45.466527+00                                                                                         |
| current_state | {"bzr_date_last_modified": "2009-09-10T10:21:25+00:00", "git_date_last_modified": "2022-02-16T19:07:16.970183+00:00"} |
| updated       | 2022-02-16 21:25:33.628123+00                                                                                         |
+---------------+-----------------------------------------------------------------------------------------------------------------------+

Time: 0.414 ms

Diff Detail

Repository
rDLS Listers
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 26939
Build 42124: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 42123: arc lint + arc unit

Event Timeline

ardumont edited the test plan for this revision. (Show Details)

Build is green

Patch application report for D7193 (id=26070)

Rebasing onto 31b4429ced...

Current branch diff-target is up to date.
Changes applied before test
commit 262f9369c837e293f8389dd9f7a6a965c09f621e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Feb 16 17:56:13 2022 +0100

    launchpad: Allow bzr origins listing
    
    Related to T3945

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/454/ for more details.

swh/lister/launchpad/lister.py
31

That means either i'll reset the state in the scheduling db or i'll alter the data when deploying this.

This looks good, considering the... interesting launchpad API.

This revision is now accepted and ready to land.Feb 17 2022, 10:48 AM
anlambert added a subscriber: vsellier.

Looks good to me, I added some nitpick comments.

swh/lister/launchpad/lister.py
31

I think altering the JSON data in the scheduler db should be a good move as we already listed plenty of git repos.

21:54 $ psql service=swh-scheduler
psql (12.10 (Debian 12.10-1.pgdg110+1), server 12.9 (Debian 12.9-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

softwareheritage-scheduler=> select current_state from listers where name = 'launchpad';
                       current_state                        
------------------------------------------------------------
 {"date_last_modified": "2022-02-16T19:32:09.400561+00:00"}
(1 row)
73

Use a tuple instead of a list.

83

same here

121

We can now remove the previous origin check as @vsellier fixes the duplicated origin insertion in the scheduler db in rDSCH0a6aac583adff2c55069c9da676ad95670e35708.

Thanks for the reviews

Looks good to me, I added some nitpick comments.

Good points, i'll adapt in another commit to avoid rebasing gazillion of diffs.

swh/lister/launchpad/lister.py
31

yes, i think so as well.

ardumont added inline comments.
swh/lister/launchpad/lister.py
73
ardumont added inline comments.
swh/lister/launchpad/lister.py
121

I've amended D7196 with another commit which drops this as well.

This revision was automatically updated to reflect the committed changes.