Page MenuHomeSoftware Heritage

Refactor ProvenanceWithPathDB.insert_location() and add a test for revision_add()
ClosedPublic

Authored by douardda on Mar 12 2021, 12:23 PM.

Details

Summary

Also:

  • enforce tz-aware datetime value for RevisionEntry.date
  • fix build_isochrone_graph() to use a tz-aware datetime as min value,
  • ensure CSVRevisionIterator produces tz-aware datetime values.
  • simplify the code,
  • extract the insertion in the 'location' table in a dedicated query to prevent from updating existing rows unnecessarily.

Event Timeline

Build was aborted

Patch application report for D5234 (id=18768)

Rebasing onto f653eb8609...

Current branch diff-target is up to date.
Changes applied before test
commit d9d68dd1ff2f090019eb75cd96251042b36c6842
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit 98634b770aa86f6a4313079837ba40c5e69d27af
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit e962039315c206c5c25d9c613a1aab9b4a293de4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/2/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/2/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 12 2021, 12:43 PM
Harbormaster failed remote builds in B19866: Diff 18768!

Fix the hanging test (unclosed db transaction)

also improve a bit directory_ls_internal() to use a with statement for the db cursor.

Build is green

Patch application report for D5234 (id=18807)

Rebasing onto f653eb8609...

Current branch diff-target is up to date.
Changes applied before test
commit 057e6f4d3c5b649dfa5ab935fe27c5d133b4c075
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit f6a8fcde6975e03f1ef493fb36619308e4d986da
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 98634b770aa86f6a4313079837ba40c5e69d27af
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit e962039315c206c5c25d9c613a1aab9b4a293de4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/3/ for more details.

aeviso added a subscriber: aeviso.
aeviso added inline comments.
swh/provenance/provenance.py
291

This was fixed in a different way already. Not sure if one is better than the other anyway.

This revision is now accepted and ready to land.Mar 18 2021, 3:00 PM
swh/provenance/provenance.py
291

where is the fix?

swh/provenance/provenance.py
291

oh you just pushed...
So we basically do the same but:

  1. there is no need for the dependency on pytz for this,
  2. it's best to compute this tz-aware min date only once.

So I'll keep "my" version, if you don't mind

rebase and remove the dependency on pytz

Build has FAILED

Patch application report for D5234 (id=18922)

Rebasing onto f8854b8a66...

Current branch diff-target is up to date.
Changes applied before test
commit 8ff0505454ff8b643227791ea12e6a30c816e159
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit a09ec94641b870ecc9d5863ab1893ca8ae7e612f
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 516709d5e30e79d58ee4f76d2313aec151c2015b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit 80b47dc94b8453cf98e48ff1a9164d7609cbc01b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/4/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/4/console

Rebase and use test dataset from the CMDBTS git repo

Build is green

Patch application report for D5234 (id=19037)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit 51edbb3494270e072ddb0a12495c02166330b67c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit 61b0add9b4a42002efb1694d538c656a452f2aba
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 92ac977b449a61090b2e0aaf36d5cc4c471480b8
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 090ad1350f65bf36c6ca656ae243174943812fc5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/5/ for more details.

aeviso added inline comments.
swh/provenance/provenance.py
330

I think 'black' is missing up with this. On my side it complains about this formatting and changes for the other one.

swh/provenance/revision.py
86

Same as the other comment. Does this depend on the version of 'black'?

swh/provenance/tests/conftest.py
90

This first commit shouldn't be considered for the test. I believe it is actually in a different branch in the repository.

182

Silly comment but I would have expected the values to be returned in the opposite order (just as the function name and parameters)

swh/provenance/tests/test_provenance_db.py
54

I guess this test is incomplete, right?

swh/provenance/tests/conftest.py
90

not it's not, the first 2 revisions are part of the branch (even they should not be there). It does not hurt for this test, but yes, it's a bit sad to have it (because later tests will have to add extra logic to ignore them).

182

not silly at all, it's obviously inconsistent as is and there is no need for this inconsistency! I'll fix it.

swh/provenance/tests/test_provenance_db.py
54

the test is fine, the comment on the other side has been left in an unfinished state...

ardumont added inline comments.
swh/provenance/provenance.py
330

maybe check that you are using the right black version.

We pin such things in the /.pre-commit-config.yaml which is called when you commit (/ the root of your git repository).

If you are using another black version, say for example, the one in your venv, it might do things slightly differently.

swh/provenance/provenance.py
330
swh/provenance/revision.py
86

it's likely possible yes, see my previous detailed comment above ;)

apply aeviso comments and reorder (and fix) revisions

so new tests are added before the small refactoring of insert_location() (and do pass after AND before the refactoring).

Build has FAILED

Patch application report for D5234 (id=19075)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit b502ab1fbe26f316518a9abbabb10ef2f54b7248
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit f360ac61e8936ea688dd751605db985b6f48ce89
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 24 09:59:01 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 2ec4a0ab9da6cc81c864e2ec3645fb15b155461c
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit e66d9492545928bcf81f23dadec749e2b6380f20
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/6/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/6/console

Add forgotten test data file CMDBTS.msgpack

Build is green

Patch application report for D5234 (id=19076)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit eb524713c3d6b553a52cf411e425b55e2d6d10d5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit 15e390fb66c990aba8b9454f675f716f1e1e579b
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 24 09:59:01 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 2ec4a0ab9da6cc81c864e2ec3645fb15b155461c
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit e66d9492545928bcf81f23dadec749e2b6380f20
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/7/ for more details.