Page MenuHomeSoftware Heritage

gitlab: Implement keyset-based pagination listing
ClosedPublic

Authored by ardumont on Jan 26 2021, 3:43 PM.

Details

Summary

The previous pagination implementation has a hard-coded limit server side [1]

[1]

{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}

Related to T2994

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D4948 (id=17627)

Rebasing onto 22eeb0956e...

First, rewinding head to replay your work on top of it...
Applying: gitlab: Implement keyset-based pagination listing
Changes applied before test
commit 4419742639c73f1c0feb16f5a0388c6cab3f2170
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 26 15:39:46 2021 +0100

    gitlab: Implement keyset-based pagination listing
    
    The previous pagination implementation has a hard-coded limit server side [1]
    
    [1]
{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}
```

Related to T2994
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/181/ for more details.
anlambert added a subscriber: anlambert.

Looks good to me ! I added a comment regarding further improvements that could be added to the lister but not related to that diff.

swh/lister/gitlab/lister.py
156

Not really related to that diff but I think you could add the simple=true query parameter to get shorter responses when authenticated and change the per_page value to the maximum one (100 according to the doc) to reduce the number of sent requests.

Also, you could make the code easier to read by using urlllib.parse.urlencode.

def page_url(self, id_after: Optional[int] = None) -> str:
    parameters = {
        "pagination": "keyset",
        "order_by": "id",
        "sort": "asc",
        "simple": "true",
        "per_page": 100,
    }
    if id_after:
        parameters["id_after"] = id_after
    return f"{self.url}projects?{urlencode(parameters)}"
This revision is now accepted and ready to land.Jan 26 2021, 4:19 PM
swh/lister/gitlab/lister.py
156

oh, yes, thanks!

I like the proposal and since i'm modifying it anyways...
Tell you what, i'll do the related diff adaptations here.

and the simplification (simple true, increase page size to 100) in another commit.

Adapt according to review to improve readability

Build is green

Patch application report for D4948 (id=17629)

Rebasing onto 22eeb0956e...

First, rewinding head to replay your work on top of it...
Applying: gitlab: Implement keyset-based pagination listing
Changes applied before test
commit 71d00d867d783a47d8b74daa92e57aeb7e57599e
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 26 15:39:46 2021 +0100

    gitlab: Implement keyset-based pagination listing
    
    The previous pagination implementation has a hard-coded limit server side [1]
    
    [1]
{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}
```

Related to T2994
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/182/ for more details.

Build is green

Patch application report for D4948 (id=17632)

Rebasing onto 22eeb0956e...

Current branch diff-target is up to date.
Changes applied before test
commit 97254a19f22654661383240330294b3908f332b2
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 26 15:39:46 2021 +0100

    gitlab: Implement keyset-based pagination listing
    
    The previous pagination implementation has a hard-coded limit server side [1]
    
    [1]
{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}
```

Related to T2994
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/184/ for more details.