Page MenuHomeSoftware Heritage

sourceforge: Fix listing of bzr projects
ClosedPublic

Authored by anlambert on Apr 21 2022, 3:13 PM.

Details

Summary

Fix sourceforge origin URL for bzr projects,
http://project.bzr.sourceforge.net/bzrroot/project
redirects to http://project.bzr.sourceforge.net/bzr/project.

Handle bzr projects with multiple branches, one listed origin
must be created per branch.

Discard bzr projects that no longer exist from listing.

See http://t12eksandbox.bzr.sourceforge.net/bzr/t12eksandbox and
http://ocaml-lpd.bzr.sourceforge.net/bzr/ocaml-lpd/ as examples.

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Ensure link exists before trying to extract its text.

Build is green

Patch application report for D7623 (id=27608)

Rebasing onto 20c1351aa0...

Current branch diff-target is up to date.
Changes applied before test
commit f661ead2c5f2b3a39eca0d0bf9e100ad1a65a3c1
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 21 15:08:33 2022 +0200

    sourceforge: Fix listing of bzr projects
    
    Fix sourceforge origin URL for bzr projects,
    http://project.bzr.sourceforge.net/bzrroot/project
    redirects to http://project.bzr.sourceforge.net/bzr/project.
    
    Handle bzr projects with multiple branches, one listed origin
    must be created per branch.
    
    Discard bzr projects that no longer exist from listing.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/488/ for more details.

Build is green

Patch application report for D7623 (id=27609)

Rebasing onto 20c1351aa0...

Current branch diff-target is up to date.
Changes applied before test
commit 7d4ab6199e37d91a9b9e41ea118051d0b8f2a549
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 21 15:08:33 2022 +0200

    sourceforge: Fix listing of bzr projects
    
    Fix sourceforge origin URL for bzr projects,
    http://project.bzr.sourceforge.net/bzrroot/project
    redirects to http://project.bzr.sourceforge.net/bzr/project.
    
    Handle bzr projects with multiple branches, one listed origin
    must be created per branch.
    
    Discard bzr projects that no longer exist from listing.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/489/ for more details.

vlorentz added inline comments.
swh/lister/sourceforge/lister.py
421–439

Can be "simplified" with XPath magic.

(we already transitively depend on lxml via bs4)

swh/lister/sourceforge/lister.py
421–439

Awesome, thanks ! I was so disappointed by bs4 not supporting CSS selectors syntax.

Rebase and improve code to extract bzr branche names (thanks to @vlorentz)

This revision is now accepted and ready to land.Apr 21 2022, 6:14 PM

Build has FAILED

Patch application report for D7623 (id=27621)

Rebasing onto 63a744559f...

Current branch diff-target is up to date.
Changes applied before test
commit aafab054f9aff4b73f4d761b5437aae5ef9054f5
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 21 15:08:33 2022 +0200

    sourceforge: Fix listing of bzr projects
    
    Fix sourceforge origin URL for bzr projects,
    http://project.bzr.sourceforge.net/bzrroot/project
    redirects to http://project.bzr.sourceforge.net/bzr/project.
    
    Handle bzr projects with multiple branches, one listed origin
    must be created per branch.
    
    Discard bzr projects that no longer exist from listing.

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/491/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/491/console

Add missing lxml dependency in requirements.txt

Build is green

Patch application report for D7623 (id=27622)

Rebasing onto 63a744559f...

Current branch diff-target is up to date.
Changes applied before test
commit 2fa9f0abd2b09b900ad2216ba6091f134c2b280b
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Apr 21 15:08:33 2022 +0200

    sourceforge: Fix listing of bzr projects
    
    Fix sourceforge origin URL for bzr projects,
    http://project.bzr.sourceforge.net/bzrroot/project
    redirects to http://project.bzr.sourceforge.net/bzr/project.
    
    Handle bzr projects with multiple branches, one listed origin
    must be created per branch.
    
    Discard bzr projects that no longer exist from listing.

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/492/ for more details.

This revision was automatically updated to reflect the committed changes.