Page MenuHomeSoftware Heritage

feat: Make the Gogs lister incremental
ClosedPublic

Authored by KShivendu on Aug 9 2022, 7:45 AM.

Details

Summary

Make the Gogs lister incremental

Related to T1721

Diff Detail

Repository
rDLS Listers
Branch
feat/gogs-incremental-lister
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 30701
Build 48003: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 48002: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D8218 (id=29633)

Rebasing onto d51bce0a1c...

Current branch diff-target is up to date.
Changes applied before test
commit bb14166804858e0a0f1afcaa25d33c66403f8815
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/576/ for more details.

  • test: Improve variable names and comments in gogs lister tests

Build is green

Patch application report for D8218 (id=29634)

Rebasing onto d51bce0a1c...

Current branch diff-target is up to date.
Changes applied before test
commit 642286955d9183eae1858ee4f4d9ad206d633918
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:36:53 2022 +0530

    test: Improve variable names and comments in gogs lister tests

commit bb14166804858e0a0f1afcaa25d33c66403f8815
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/577/ for more details.

vlorentz added inline comments.
swh/lister/gogs/lister.py
141

not covered?

test: Add test for incremental gogs lister and improve existing tests

  • Add test for incremental listing in gogs
  • Use 3 mocked pages instead of 2 in existing tests
  • In test_gogs_list_http_error, successfully parse 1 page before throwing HTTPError
  • Minor improvement in variables names (P1, P2, P3)

Build is green

Patch application report for D8218 (id=29703)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: feat: Make the Gogs lister incremental
Applying: test: Improve variable names and comments in gogs lister tests
Applying: test: Add test for incremental gogs lister and improve existing tests
Changes applied before test
commit b2557d51dce0981cdbc405d466324e99076cfd73
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Aug 11 09:51:24 2022 +0530

    test: Add test for incremental gogs lister and improve existing tests

commit 1e4556b621f0b35b692f168995c38558430a31b7
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:36:53 2022 +0530

    test: Improve variable names and comments in gogs lister tests

commit 3a7288a149195d3be35f42dc2163cbf4b75c1680
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/580/ for more details.

fix: Improvements for edge cases and coverage

  • state.last_seen_next_link should be None if last page is encountered (no next link)
  • assert page.repos is not None in get_origins_from_page
  • Improve comments and formatting

Build has FAILED

Patch application report for D8218 (id=29704)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: feat: Make the Gogs lister incremental
Applying: test: Improve variable names and comments in gogs lister tests
Applying: test: Add test for incremental gogs lister and improve existing tests
Applying: fix: Improvements for edge cases and coverage
Changes applied before test
commit 549190d9035b556954478f145849d05f31924cb5
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Aug 11 10:17:13 2022 +0530

    fix: Improvements for edge cases and coverage
    
    - state.last_seen_next_link should be None if last page is encountered
      (no next link)
    - `assert page.repos is not None` in get_origins_from_page
    - Improve comments and formatting

commit 590dd1a36165b8eed65d035897a88f3496c783a9
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Aug 11 09:51:24 2022 +0530

    test: Add test for incremental gogs lister and improve existing tests

commit 5d83c3f66f8fc03edb30b99db86682e87e561787
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:36:53 2022 +0530

    test: Improve variable names and comments in gogs lister tests

commit e43e17f9712a943a9d7457f00fdef90172ada4ea
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/581/
See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/581/console

test: fix test failures

  • make last_seen_next_link = None if the last page is encountered
  • in test_gogs_incremental_lister, throw 400 in the 1st listing attempt and succeed in the 2nd one.

Build is green

Patch application report for D8218 (id=29705)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: feat: Make the Gogs lister incremental
Changes applied before test
commit 6a49b9dfddc78e13bc3e3629005db93dd6dd4a40
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/582/ for more details.

swh/lister/gogs/tests/test_lister.py
193–196

Does it really mock the first listing attempt? I would expect the second request (with P2) not to have a Link header, so the lister wouldn't send the third request (with P3).

swh/lister/gogs/tests/test_lister.py
193–196

I would expect the second request (with P2) not to have a Link header

If P2 doesn't provide a "next" link header then the lister will throw an error because of assert len(response.links) > 0, "API changed: no Link header found" (same happens for Gitea lister)

What is the difference between a normal lister and an incremental one?
It is just that it restarts itself from the last state stored in the scheduler on running lister.run()

swh/lister/gogs/tests/test_lister.py
193–196

*If P2 doesn't provide a "next" link header

But, if it does provide a link header without a "next" link, then the first attempt will stop at page 2 and last_seen_next_link will be None. Now when lister.run() is used for the second time, it will recrawl both the pages again.

In the current test, the first lister fails and last_seen_next_link points to page 3. Hence in the second attempt, the lister only crawls that. Isn't this the expected behavior? Am I missing something?

swh/lister/gogs/tests/test_lister.py
193–196

It doesn't make sense for the last page to link to another page that doesn't exist.

I didn't try with Gogs as their API requires authentication, but with Gitea:

$ curl -i https://try.gitea.io/api/v1/repos/search/\?limit\=3 | grep "^link"
link: <https://try.gitea.io/api/v1/repos/search/?limit=3&page=2>; rel="next",<https://try.gitea.io/api/v1/repos/search/?limit=3&page=1753>; rel="last"

$ curl -i https://try.gitea.io/api/v1/repos/search/\?limit\=3\&page\=1753 | grep "^link"
link: <https://try.gitea.io/api/v1/repos/search/?limit=3&page=1>; rel="first",<https://try.gitea.io/api/v1/repos/search/?limit=3&page=1752>; rel="prev"

So there is indeed a link header, but no next link.

KShivendu added inline comments.
swh/lister/gogs/tests/test_lister.py
193–196

It doesn't make sense for the last page to link to another page that doesn't exist.

Actually, there are instances like T4423 where certain pages cannot be crawled because of some fatal repos. I've created an issue in the upstream but we haven't received any update from the Gogs maintainers.

I proposed T1721#88903 to fix this. You can read @ardumont's opinion on this in T1721#88915

swh/lister/gogs/tests/test_lister.py
193–196

but that's not what this test does. This test is written as a "normal" listing run and ends with a 400. T4423 is a bug, which is not normal, and ends with a 500.

  • Introduce last_seen_repo_id in GogsListerState to properly set the value of self.updated in`finalize()`
  • Update incremental listing test to test happy flow (instead of the buggy T4423)

Build is green

Patch application report for D8218 (id=29729)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: feat: Make the Gogs lister incremental
Applying: feat: Introduce last_seen_repo_id in GogsListerState and update incremental listing test
Changes applied before test
commit 577d6104d6d67eec60d9661ca33dcc63609a9b9a
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Aug 15 16:24:22 2022 +0530

    feat: Introduce last_seen_repo_id in GogsListerState and update incremental listing test

commit 5464574bba36bfa0e5f317b2bbd5bb3f75c9bc4f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/585/ for more details.

Please add a test for when a new page (P4) is added after P3

  • Introduce 4th page in incremental listing test
  • Improve comments

Build is green

Patch application report for D8218 (id=29772)

Rebasing onto cee6bcb514...

First, rewinding head to replay your work on top of it...
Applying: feat: Make the Gogs lister incremental
Applying: feat: Introduce last_seen_repo_id in GogsListerState and update incremental listing test
Applying: test: Introduce 4th page and update the incremental listing test accordingly
Changes applied before test
commit a4dc4f9ec4348d379efca1e65bd8a522d44e1b21
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Aug 17 13:10:19 2022 +0530

    test: Introduce 4th page and update the incremental listing test accordingly

commit 64fb84ed6716325c55c1fcb1bae57ce9a0b24c9f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Aug 15 16:24:22 2022 +0530

    feat: Introduce last_seen_repo_id in GogsListerState and update incremental listing test

commit a93300c88d7b0fa7b5a26f0df05d7c09c884a421
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/592/ for more details.

This revision is now accepted and ready to land.Aug 17 2022, 11:30 AM

Build is green

Patch application report for D8218 (id=29778)

Rebasing onto cee6bcb514...

Current branch diff-target is up to date.
Changes applied before test
commit 6a53a6ad06d5677d246732a8525fc931f2a2c7b1
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Tue Aug 9 11:13:32 2022 +0530

    feat: Make the Gogs lister incremental

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/593/ for more details.

This revision was automatically updated to reflect the committed changes.