Page MenuHomeSoftware Heritage

Make origin_visit_stats_get return results from all pages
ClosedPublic

Authored by olasd on Feb 11 2021, 7:40 PM.

Details

Summary

psycopg2.extras.execute_values executes queries in batches of 100 by
default. At the end of execute_values, only the last batch of results is
available in the cursor; To fetch all results, one needs to set
fetch=True instead of using the cursor.

Related to T3000 (and probably resolves it altogether).

Test Plan

added a unit test for proper behavior.

Diff Detail

Repository
rDSCH Scheduling utilities
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D5068 (id=18081)

Could not rebase; Attempt merge onto cf32e37649...

Updating cf32e37..ecab745
Fast-forward
 swh/scheduler/backend.py                   |  6 ++++--
 swh/scheduler/journal_client.py            | 20 ++++++++----------
 swh/scheduler/tests/test_journal_client.py | 34 ++++++++++++++++++++++++++----
 swh/scheduler/tests/test_scheduler.py      | 26 +++++++++++++++++++++++
 4 files changed, 69 insertions(+), 17 deletions(-)
Changes applied before test
commit ecab745a5f2089c7da1546885bc52557fdc904d7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Thu Feb 11 19:16:19 2021 +0100

    Make origin_visit_stats_get return results from all pages
    
    psycopg2.extras.execute_values executes queries in batches of 100 by
    default. At the end of execute_values, only the last batch of results is
    available in the cursor; To fetch all results, one needs to set
    fetch=True instead of using the cursor.

commit 86ada443d2900e820356727149a1f7b0fa6e6bff
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Feb 9 10:36:23 2021 +0100

    journal client: Filter out status messages without type
    
    This allows us to support reading the journal from the beginning,
    ignoring messages with the old schema.

commit cdb1775f247401ec24f8bb5dac7f9f2b40005bca
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Feb 9 10:33:19 2021 +0100

    Simplify max_date()
    
    The built-in `max` function can take an iterable directly, no need to
    reimplement it.

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/303/ for more details.

olasd requested review of this revision.Feb 11 2021, 7:44 PM
This revision is now accepted and ready to land.Feb 12 2021, 11:07 AM