Page MenuHomeSoftware Heritage

Refactor ProvenanceWithPathDB.insert_location() and add a test for revision_add()
ClosedPublic

Authored by douardda on Mar 12 2021, 12:23 PM.

Details

Summary

Also:

  • enforce tz-aware datetime value for RevisionEntry.date
  • fix build_isochrone_graph() to use a tz-aware datetime as min value,
  • ensure CSVRevisionIterator produces tz-aware datetime values.
  • simplify the code,
  • extract the insertion in the 'location' table in a dedicated query to prevent from updating existing rows unnecessarily.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build was aborted

Patch application report for D5234 (id=18768)

Rebasing onto f653eb8609...

Current branch diff-target is up to date.
Changes applied before test
commit d9d68dd1ff2f090019eb75cd96251042b36c6842
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit 98634b770aa86f6a4313079837ba40c5e69d27af
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit e962039315c206c5c25d9c613a1aab9b4a293de4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/2/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/2/console

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 12 2021, 12:43 PM
Harbormaster failed remote builds in B19866: Diff 18768!

Fix the hanging test (unclosed db transaction)

also improve a bit directory_ls_internal() to use a with statement for the db cursor.

Build is green

Patch application report for D5234 (id=18807)

Rebasing onto f653eb8609...

Current branch diff-target is up to date.
Changes applied before test
commit 057e6f4d3c5b649dfa5ab935fe27c5d133b4c075
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit f6a8fcde6975e03f1ef493fb36619308e4d986da
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 98634b770aa86f6a4313079837ba40c5e69d27af
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit e962039315c206c5c25d9c613a1aab9b4a293de4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/3/ for more details.

aeviso added a subscriber: aeviso.
aeviso added inline comments.
swh/provenance/provenance.py
291

This was fixed in a different way already. Not sure if one is better than the other anyway.

This revision is now accepted and ready to land.Mar 18 2021, 3:00 PM
swh/provenance/provenance.py
291

where is the fix?

swh/provenance/provenance.py
291

oh you just pushed...
So we basically do the same but:

  1. there is no need for the dependency on pytz for this,
  2. it's best to compute this tz-aware min date only once.

So I'll keep "my" version, if you don't mind

rebase and remove the dependency on pytz

Build has FAILED

Patch application report for D5234 (id=18922)

Rebasing onto f8854b8a66...

Current branch diff-target is up to date.
Changes applied before test
commit 8ff0505454ff8b643227791ea12e6a30c816e159
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add a simple test for swh.provenance.provenance.revision_add()

commit a09ec94641b870ecc9d5863ab1893ca8ae7e612f
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 516709d5e30e79d58ee4f76d2313aec151c2015b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit 80b47dc94b8453cf98e48ff1a9164d7609cbc01b
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/4/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/4/console

Rebase and use test dataset from the CMDBTS git repo

Build is green

Patch application report for D5234 (id=19037)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit 51edbb3494270e072ddb0a12495c02166330b67c
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit 61b0add9b4a42002efb1694d538c656a452f2aba
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:18:55 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 92ac977b449a61090b2e0aaf36d5cc4c471480b8
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit 090ad1350f65bf36c6ca656ae243174943812fc5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/5/ for more details.

aeviso added inline comments.
swh/provenance/provenance.py
330

I think 'black' is missing up with this. On my side it complains about this formatting and changes for the other one.

swh/provenance/revision.py
86

Same as the other comment. Does this depend on the version of 'black'?

swh/provenance/tests/conftest.py
89 ↗(On Diff #19037)

This first commit shouldn't be considered for the test. I believe it is actually in a different branch in the repository.

181 ↗(On Diff #19037)

Silly comment but I would have expected the values to be returned in the opposite order (just as the function name and parameters)

swh/provenance/tests/test_provenance_db.py
54 ↗(On Diff #19037)

I guess this test is incomplete, right?

swh/provenance/tests/conftest.py
89 ↗(On Diff #19037)

not it's not, the first 2 revisions are part of the branch (even they should not be there). It does not hurt for this test, but yes, it's a bit sad to have it (because later tests will have to add extra logic to ignore them).

181 ↗(On Diff #19037)

not silly at all, it's obviously inconsistent as is and there is no need for this inconsistency! I'll fix it.

swh/provenance/tests/test_provenance_db.py
54 ↗(On Diff #19037)

the test is fine, the comment on the other side has been left in an unfinished state...

ardumont added inline comments.
swh/provenance/provenance.py
330

maybe check that you are using the right black version.

We pin such things in the /.pre-commit-config.yaml which is called when you commit (/ the root of your git repository).

If you are using another black version, say for example, the one in your venv, it might do things slightly differently.

swh/provenance/provenance.py
330
swh/provenance/revision.py
86

it's likely possible yes, see my previous detailed comment above ;)

apply aeviso comments and reorder (and fix) revisions

so new tests are added before the small refactoring of insert_location() (and do pass after AND before the refactoring).

Build has FAILED

Patch application report for D5234 (id=19075)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit b502ab1fbe26f316518a9abbabb10ef2f54b7248
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit f360ac61e8936ea688dd751605db985b6f48ce89
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 24 09:59:01 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 2ec4a0ab9da6cc81c864e2ec3645fb15b155461c
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit e66d9492545928bcf81f23dadec749e2b6380f20
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

Link to build: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/6/
See console output for more information: https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/6/console

Add forgotten test data file CMDBTS.msgpack

Build is green

Patch application report for D5234 (id=19076)

Rebasing onto 7ff72e3925...

Current branch diff-target is up to date.
Changes applied before test
commit eb524713c3d6b553a52cf411e425b55e2d6d10d5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 12:14:04 2021 +0100

    Refactor ProvenanceWithPathDB.insert_location()
    
    - simplify the code,
    - extract the insertion in the 'location' table in a dedicated query to
      prevent from updating existing rows unnecessarily.

commit 15e390fb66c990aba8b9454f675f716f1e1e579b
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 24 09:59:01 2021 +0100

    Add tests for revision_add() and content_find_first()
    
    These tests uses the git repository
    
      https://github.com/grouss/CMDBTS
    
    as source to build the dataset (in tests/data/CMDBTS.msgpack).
    
    See tests/generate_CMDBTS_dataset.py to generate the dataset file.

commit 2ec4a0ab9da6cc81c864e2ec3645fb15b155461c
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Mar 15 12:16:11 2021 +0100

    Make ArchivePostgreSQL.directory_ls_internal close the db cursor
    
    using a with statement.

commit e66d9492545928bcf81f23dadec749e2b6380f20
Author: David Douard <david.douard@sdfa3.org>
Date:   Tue Mar 23 14:24:26 2021 +0100

    Fix the example config in the README file

commit c5985c3085c09442b109dc478f2719eaaf433558
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 12 11:57:21 2021 +0100

    Enforce tz-aware datetime value for RevisionEntry.date
    
    Also:
    - fix build_isochrone_graph() to use a tz-aware datetime as min value,
    - ensure CSVRevisionIterator produces tz-aware datetime values.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/7/ for more details.