Page MenuHomeSoftware Heritage

Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`
Changes PlannedPublicDraft

Authored by Ericson2314 on May 5 2022, 6:29 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Each commit is self-contained and has its own description. But the basic idea is to try to:

  • Work with structured information not strings as long as possible
  • Avoid mixing hashes of different types of objects together

Diff Detail

Repository
rDWAPPS Web applications
Branch
lookup_missing_hashes-bytes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29160
Build 45592: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45591: arc lint + arc unit

Unit TestsFailed

TimeTest
319 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.test_apiresponse::Tests / Python tests / test_api_endpoints_have_cors_headers
client = <django.test.client.Client object at 0x7f869114ef28> content = {'blake2s256': '574bff42d27168b2ae7d6d62ce7b36a79b2c8514ca4d82271eb470c7b962ebe1', 'data': '/*\n** SGI FREE SOFTWARE L...Elements( TESStesselator *tess )\n{\n\treturn tess->elements;\n}\n', 'encoding': 'us-ascii', 'hljs_language': 'c', ...} directory = '476daad0ea1406747bce46ad1649f6a7438724a5'
185 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_all_present
api_client = <rest_framework.test.APIClient object at 0x7f8690b630f0> content = {'blake2s256': 'c6518d8e2ceb0ec1c45e4ebfc626a1ff8216e6e1a9d0b2fdbaeea8fa209c275f', 'data': '/*\n** SGI FREE SOFTWARE L...lse\nvoid tessMeshCheckMesh( TESSmesh *mesh );\n#endif\n\n#endif\n', 'encoding': 'us-ascii', 'hljs_language': 'c', ...} directory = 'e34b4b770ac7e0976a6d4a37c62e2ecce5029ec2'
180 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_same_hash
api_client = <rest_framework.test.APIClient object at 0x7f86911e4fd0> content = {'blake2s256': '20eb4518a27c398e33454ecb12587fc011ece0e56cd7073f82ef110e32cf80c7', 'data': '/*\r\n** SGI FREE SOFTWARE...r\n\t\tv->t = Interpolate( z1, o2->t, z2, d2->t );\r\n\t}\r\n}\r\n', 'encoding': 'us-ascii', 'hljs_language': 'c', ...}
312 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_some_present
api_client = <rest_framework.test.APIClient object at 0x7f8690c00b38> content = {'blake2s256': 'c6518d8e2ceb0ec1c45e4ebfc626a1ff8216e6e1a9d0b2fdbaeea8fa209c275f', 'data': '/*\n** SGI FREE SOFTWARE L...lse\nvoid tessMeshCheckMesh( TESSmesh *mesh );\n#endif\n\n#endif\n', 'encoding': 'us-ascii', 'hljs_language': 'c', ...} directory = 'e34b4b770ac7e0976a6d4a37c62e2ecce5029ec2'
181 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.common.test_archive::Tests / Python tests / test_lookup_missing_hashes_some_present
content = {'blake2s256': '390cd78a6461faf060de75f92c60f2f061f5bde15fce5c3b5c66d9f5e0fab203', 'data': '.hljs{display:block;overfl...avascript,.xml .vbscript,.xml .css,.xml .hljs-cdata{opacity:0.5}', 'encoding': 'us-ascii', 'hljs_language': 'css', ...} directory = '0c538a4e9309aedaa83c6f4e3a19017edb758e73'
View Full Test Results (5 Failed · 970 Passed · 5 Skipped)

Event Timeline

This conflicts with D7748, but only superficially. I am hedging my bet on which will get past CI first by basing them both on master :). Whatever lands first, I will then rebase the other on top.

Build has FAILED

Patch application report for D7749 (id=28019)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit a8cee44ed55ac7f4627a58c6a11391d4590f6795
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 6:50 PM
Harbormaster failed remote builds in B29095: Diff 28019!

directory -> directory.id in test, hopefully fixing

Build has FAILED

Patch application report for D7749 (id=28020)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit f3b80574ad3dbf8452c5e799052c9b2a80a6d1f7
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 7:40 PM
Harbormaster failed remote builds in B29096: Diff 28020!

Build has FAILED

Patch application report for D7749 (id=28077)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 68f1a4376d482f7efd432b6dc87030755c33acfc
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:31 PM
Harbormaster failed remote builds in B29149: Diff 28077!
Ericson2314 retitled this revision from Make archive.lookup_missing_hashes output bytes to Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`.May 6 2022, 4:32 PM
Ericson2314 edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D7749 (id=28078)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit fec74e69e6fd54e03d5e66ed4056b599824fe5da
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:39 PM
Harbormaster failed remote builds in B29150: Diff 28078!

See if factoring out function makes mypy happy

Build has FAILED

Patch application report for D7749 (id=28080)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 8f0b3d92456954ce01d6e39178f90d0b3200a4b6
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:01 PM
Harbormaster failed remote builds in B29152: Diff 28080!
  • Fix error (had set comprehension not map!)
  • Add TODO about leveraging D7751 once it lands

Build has FAILED

Patch application report for D7749 (id=28084)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 96e128796b3c5f93e8d75878b8391136bdf2f00a
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:36 PM
Harbormaster failed remote builds in B29156: Diff 28084!

Remember to collect iterable into set

Build has FAILED

Patch application report for D7749 (id=28087)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 7:21 PM
Harbormaster failed remote builds in B29159: Diff 28087!

Build has FAILED

Patch application report for D7749 (id=28088)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 7 2022, 7:43 PM
Harbormaster failed remote builds in B29160: Diff 28088!