Page MenuHomeSoftware Heritage

Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`
Changes PlannedPublicDraft

Authored by Ericson2314 on May 5 2022, 6:29 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Each commit is self-contained and has its own description. But the basic idea is to try to:

  • Work with structured information not strings as long as possible
  • Avoid mixing hashes of different types of objects together

Diff Detail

Repository
rDWAPPS Web applications
Branch
lookup_missing_hashes-bytes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29156
Build 45586: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45585: arc lint + arc unit

Unit TestsFailed

TimeTest
319 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.test_apiresponse::Tests / Python tests / test_api_endpoints_have_cors_headers
client = <django.test.client.Client object at 0x7fe21bcd9748> content = {'blake2s256': '833ec71111b41a49f0b96ec129b41dfe7ee3c38a50d54c20f5969d89128c885c', 'data': "// jshint multistr:true\n\...gs[n] : m;\n });\n }\n\n}(window, document));\n", 'encoding': 'us-ascii', 'hljs_language': 'javascript', ...} directory = '139f4a175f1af016a151dc4f1e4fd0886a512f44'
187 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_all_present
api_client = <rest_framework.test.APIClient object at 0x7fe21a145518> content = {'blake2s256': '819b04890457029a003db7c4a66c0184ddf0719dfa1ebb6e8bc89761b5bc4a29', 'data': '// !$*UTF8*$!\n{\n\tarchiv...ject = 29B97313FDCFA39411CA2CEA /* Project object */;\n}\n', 'encoding': 'us-ascii', 'hljs_language': 'plaintext', ...} directory = '5b61d50ef35ca9a4618a3572bde947b8cccf71ad'
338 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_same_hash
api_client = <rest_framework.test.APIClient object at 0x7fe21ab830f0> content = {'blake2s256': '819b04890457029a003db7c4a66c0184ddf0719dfa1ebb6e8bc89761b5bc4a29', 'data': '// !$*UTF8*$!\n{\n\tarchiv...ject = 29B97313FDCFA39411CA2CEA /* Project object */;\n}\n', 'encoding': 'us-ascii', 'hljs_language': 'plaintext', ...}
185 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.api.views.test_identifiers::Tests / Python tests / test_api_known_swhid_some_present
api_client = <rest_framework.test.APIClient object at 0x7fe219fff048> content = {'blake2s256': '819b04890457029a003db7c4a66c0184ddf0719dfa1ebb6e8bc89761b5bc4a29', 'data': '// !$*UTF8*$!\n{\n\tarchiv...ject = 29B97313FDCFA39411CA2CEA /* Project object */;\n}\n', 'encoding': 'us-ascii', 'hljs_language': 'plaintext', ...} directory = '5b61d50ef35ca9a4618a3572bde947b8cccf71ad'
186 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.web.tests.common.test_archive::Tests / Python tests / test_lookup_missing_hashes_non_present
def test_lookup_missing_hashes_non_present(): missing_cnt = random_sha1_bytes() missing_dir = random_sha1_bytes()
View Full Test Results (6 Failed · 969 Passed · 5 Skipped)

Event Timeline

This conflicts with D7748, but only superficially. I am hedging my bet on which will get past CI first by basing them both on master :). Whatever lands first, I will then rebase the other on top.

Build has FAILED

Patch application report for D7749 (id=28019)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit a8cee44ed55ac7f4627a58c6a11391d4590f6795
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 6:50 PM
Harbormaster failed remote builds in B29095: Diff 28019!

directory -> directory.id in test, hopefully fixing

Build has FAILED

Patch application report for D7749 (id=28020)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit f3b80574ad3dbf8452c5e799052c9b2a80a6d1f7
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 7:40 PM
Harbormaster failed remote builds in B29096: Diff 28020!

Build has FAILED

Patch application report for D7749 (id=28077)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 68f1a4376d482f7efd432b6dc87030755c33acfc
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:31 PM
Harbormaster failed remote builds in B29149: Diff 28077!
Ericson2314 retitled this revision from Make archive.lookup_missing_hashes output bytes to Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`.May 6 2022, 4:32 PM
Ericson2314 edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D7749 (id=28078)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit fec74e69e6fd54e03d5e66ed4056b599824fe5da
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:39 PM
Harbormaster failed remote builds in B29150: Diff 28078!

See if factoring out function makes mypy happy

Build has FAILED

Patch application report for D7749 (id=28080)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 8f0b3d92456954ce01d6e39178f90d0b3200a4b6
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:01 PM
Harbormaster failed remote builds in B29152: Diff 28080!
  • Fix error (had set comprehension not map!)
  • Add TODO about leveraging D7751 once it lands

Build has FAILED

Patch application report for D7749 (id=28084)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 96e128796b3c5f93e8d75878b8391136bdf2f00a
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:36 PM
Harbormaster failed remote builds in B29156: Diff 28084!

Remember to collect iterable into set

Build has FAILED

Patch application report for D7749 (id=28087)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 7:21 PM
Harbormaster failed remote builds in B29159: Diff 28087!

Build has FAILED

Patch application report for D7749 (id=28088)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 7 2022, 7:43 PM
Harbormaster failed remote builds in B29160: Diff 28088!