Page MenuHomeSoftware Heritage

Add a non-incremental sourceforge lister
ClosedPublic

Authored by Alphare on Mar 19 2021, 6:08 PM.

Details

Summary

Following @zack's work on T735, this change introduces an actual SWH lister for
SourceForge.

SourceForge provides a main sitemap that lists sharded sitemaps, which
themselves list pages. Each page belongs to a project (or sub-project,
though those are rare), information about which can be found by querying
a REST API, which gives us the list of any and all VCS used for said
project. Both sitemaps and pages have a "last modified" timestamp that
will be used in a future patch to implement incremental listing.

More precise information can be found as inline comments or docstrings.

Diff Detail

Repository
rDLS Listers
Branch
sourceforge-lister
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20025
Build 31086: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 31085: arc lint + arc unit

Event Timeline

Build has FAILED

Patch application report for D5293 (id=18945)

Rebasing onto df73073a67...

First, rewinding head to replay your work on top of it...
Applying: Add a non-incremental sourceforge lister
Changes applied before test
commit 5c77c477538af6e1a7b67c620a578618d08a5774
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/258/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/258/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 19 2021, 6:10 PM
Harbormaster failed remote builds in B20025: Diff 18945!
zack added a subscriber: zack.

Build is green

Patch application report for D5293 (id=18953)

Rebasing onto df73073a67...

First, rewinding head to replay your work on top of it...
Applying: Add a non-incremental sourceforge lister
Changes applied before test
commit a085f108debd45ac7c7dc8b2777869bfe3aca822
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/259/ for more details.

Very nice! My only suggestion would be to add some tests for error conditions (eg. to make sure it retries).

swh/lister/sourceforge/lister.py
149

-> None

Update typing and add retry tests + check other fatal errors

Build is green

Patch application report for D5293 (id=18994)

Rebasing onto 879170a57d...

First, rewinding head to replay your work on top of it...
Applying: Add a non-incremental sourceforge lister
Changes applied before test
commit b6bc9bd0d0b00ec731773617871ff6d3c1a53a1d
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/260/ for more details.

Alphare marked an inline comment as not done.

Record lastmod for origins

Build is green

Patch application report for D5293 (id=19015)

Rebasing onto 879170a57d...

First, rewinding head to replay your work on top of it...
Applying: Add a non-incremental sourceforge lister
Changes applied before test
commit 247a8a25fd7e44b37766e2d14094227ecf56d32b
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/261/ for more details.

Simplify date tests (sorry for the update spam)

Build is green

Patch application report for D5293 (id=19016)

Rebasing onto 879170a57d...

First, rewinding head to replay your work on top of it...
Applying: Add a non-incremental sourceforge lister
Changes applied before test
commit 5903a523aaa8108a007261d6923004cb6d400934
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/262/ for more details.

This revision is now accepted and ready to land.Mar 23 2021, 12:28 PM
This revision was landed with ongoing or failed builds.Mar 23 2021, 6:41 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D5293 (id=19067)

Rebasing onto 879170a57d...

Current branch diff-target is up to date.
Changes applied before test
commit f7b27c6930220b4ed3e1141167d61e67416852c9
Author: Raphaël Gomès <rgomes@octobus.net>
Date:   Wed Mar 17 17:39:41 2021 +0100

    Add a non-incremental sourceforge lister
    
    Following zack's work on T735, this change introduces an actual SWH lister for
    SourceForge.
    
    SourceForge provides a main sitemap that lists sharded sitemaps, which
    themselves list pages. Each page belongs to a project (or sub-project,
    though those are rare), information about which can be found by querying
    a REST API, which gives us the list of any and all VCS used for said
    project. Both sitemaps and pages have a "last modified" timestamp that
    will be used in a future patch to implement incremental listing.
    
    More precise information can be found as inline comments or docstrings.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/263/ for more details.