Page MenuHomeSoftware Heritage

Remove the CMDBTS dataset
ClosedPublic

Authored by douardda on Jun 7 2021, 10:13 AM.

Details

Summary

for which:

  • Rewrite test_provenance_content_find_first() using the new datasets instead of depending on CMDBTS.
  • Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets.
  • Remove the (now unused) CMDBTS dataset.

Diff Detail

Repository
rDPROV Provenance database
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Add 2 more revisions in this diff aiming at removing CMDBTS entirely

  • Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets
  • Remove the (now unused) CMDBTS dataset

Build is green

Patch application report for D5823 (id=20832)

Rebasing onto 100384fb8f...

Current branch diff-target is up to date.
Changes applied before test
commit 473cf60ec8f121ffca095c7122ed4020501a74e5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 12:15:53 2021 +0200

    Rewrite test_provenance_content_find_first() using the new datasets
    
    instead of depending on CMDBTS.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/90/ for more details.

douardda retitled this revision from Rewrite test_provenance_content_find_first() using the new datasets to Remove the CMDBTS dataset.Jun 7 2021, 10:17 AM
douardda edited the summary of this revision. (Show Details)

Build is green

Patch application report for D5823 (id=20833)

Rebasing onto 100384fb8f...

Current branch diff-target is up to date.
Changes applied before test
commit f7e64682a0fe3296988484e0e80c9b8585111c73
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:20:22 2021 +0200

    Remove the (now unused) CMDBTS dataset

commit c69856cc99f495db1cebb3274daf8b053fb897eb
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:18:15 2021 +0200

    Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets
    
    instead of the deprecated CMDBTS one.

commit 473cf60ec8f121ffca095c7122ed4020501a74e5
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 12:15:53 2021 +0200

    Rewrite test_provenance_content_find_first() using the new datasets
    
    instead of depending on CMDBTS.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/91/ for more details.

douardda edited the summary of this revision. (Show Details)

rebas

Build is green

Patch application report for D5823 (id=20835)

Rebasing onto 826b3b1041...

Current branch diff-target is up to date.
Changes applied before test
commit 35534075fb343928f92fcad89d33bba4dcd9f00f
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:20:22 2021 +0200

    Remove the (now unused) CMDBTS dataset

commit dbfe08ec7a81917cc75146ea65742ec8465545b2
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:18:15 2021 +0200

    Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets
    
    instead of the deprecated CMDBTS one.

commit cdb50aefeabc11ca317acc37be77b91653d55eab
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 12:15:53 2021 +0200

    Rewrite test_provenance_content_find_first() using the new datasets
    
    instead of depending on CMDBTS.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/93/ for more details.

Build is green

Patch application report for D5823 (id=20841)

Rebasing onto 4cd50e66bb...

Current branch diff-target is up to date.
Changes applied before test
commit 5d8979b0aa26e89757eaa051dffe913b418dd902
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:20:22 2021 +0200

    Remove the (now unused) CMDBTS dataset

commit a000c927c5f35dda197438409cb758920a2defb9
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:18:15 2021 +0200

    Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets
    
    instead of the deprecated CMDBTS one.

commit a6a7e816c4606f9314215c7cb64c608c7620488f
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 12:15:53 2021 +0200

    Rewrite test_provenance_content_find_first() using the new datasets
    
    instead of depending on CMDBTS.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/95/ for more details.

Build is green

Patch application report for D5823 (id=20865)

Rebasing onto 6cdd424eba...

Current branch diff-target is up to date.
Changes applied before test
commit 075b0d6cd6b97ee7fa86d40e34a34b11bf2784c8
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:20:22 2021 +0200

    Remove the (now unused) CMDBTS dataset

commit ea73aca90613f32e144d90cfe2b837df5d4e3514
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 13:18:15 2021 +0200

    Rewrite test_revision_iterator using the cmdbts2 and out-of-order datasets
    
    instead of the deprecated CMDBTS one.

commit be417e53f0d52c579a6889d12e2647f0f97fb8b2
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Jun 4 12:15:53 2021 +0200

    Rewrite test_provenance_content_find_first() using the new datasets
    
    instead of depending on CMDBTS.

See https://jenkins.softwareheritage.org/job/DPROV/job/tests-on-diff/99/ for more details.

aeviso added inline comments.
swh/provenance/tests/test_provenance_heuristics.py
291

I've noticed that you use this hex method instead of the hash_to_hex function from the swh.model.hashutil module. I guess they are equivalent (at least in this scope) and you do so for simplicity, but should we prioritize one over the other? I just ask to be consistent when I do this kind of conversions too.

This revision is now accepted and ready to land.Jun 10 2021, 2:24 PM
swh/provenance/tests/test_provenance_heuristics.py
291

that's a good question. I don't see the reason for using hash_to_hex, actually. I don't like it's (currently) not properly type annotated, and I don't like it "versatility" (accepts both a str, in which case it's a noop, or a bytes, in which case it hexlify it).

So yes, I tend not to use it any more. Not sure if there is a real downside of using bytes.hex() directly.