Page MenuHomeSoftware Heritage

Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`
Changes PlannedPublicDraft

Authored by Ericson2314 on May 5 2022, 6:29 PM.

Details

Reviewers
vlorentz
Group Reviewers
Reviewers
Summary

Each commit is self-contained and has its own description. But the basic idea is to try to:

  • Work with structured information not strings as long as possible
  • Avoid mixing hashes of different types of objects together

Diff Detail

Repository
rDWAPPS Web applications
Branch
lookup_missing_hashes-bytes
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 29150
Build 45574: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45573: arc lint + arc unit

Event Timeline

This conflicts with D7748, but only superficially. I am hedging my bet on which will get past CI first by basing them both on master :). Whatever lands first, I will then rebase the other on top.

Build has FAILED

Patch application report for D7749 (id=28019)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit a8cee44ed55ac7f4627a58c6a11391d4590f6795
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1783/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 6:50 PM
Harbormaster failed remote builds in B29095: Diff 28019!

directory -> directory.id in test, hopefully fixing

Build has FAILED

Patch application report for D7749 (id=28020)

Rebasing onto 468dda170e...

Current branch diff-target is up to date.
Changes applied before test
commit f3b80574ad3dbf8452c5e799052c9b2a80a6d1f7
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1784/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 5 2022, 7:40 PM
Harbormaster failed remote builds in B29096: Diff 28020!

Build has FAILED

Patch application report for D7749 (id=28077)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 68f1a4376d482f7efd432b6dc87030755c33acfc
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1796/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:31 PM
Harbormaster failed remote builds in B29149: Diff 28077!
Ericson2314 retitled this revision from Make archive.lookup_missing_hashes output bytes to Cleanup `archive.lookup_missing_hashes` and `api_swhid_known`.May 6 2022, 4:32 PM
Ericson2314 edited the summary of this revision. (Show Details)

Build has FAILED

Patch application report for D7749 (id=28078)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit fec74e69e6fd54e03d5e66ed4056b599824fe5da
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1797/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 4:39 PM
Harbormaster failed remote builds in B29150: Diff 28078!

See if factoring out function makes mypy happy

Build has FAILED

Patch application report for D7749 (id=28080)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 8f0b3d92456954ce01d6e39178f90d0b3200a4b6
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1799/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:01 PM
Harbormaster failed remote builds in B29152: Diff 28080!
  • Fix error (had set comprehension not map!)
  • Add TODO about leveraging D7751 once it lands

Build has FAILED

Patch application report for D7749 (id=28084)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit 96e128796b3c5f93e8d75878b8391136bdf2f00a
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1800/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 5:36 PM
Harbormaster failed remote builds in B29156: Diff 28084!

Remember to collect iterable into set

Build has FAILED

Patch application report for D7749 (id=28087)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1801/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 6 2022, 7:21 PM
Harbormaster failed remote builds in B29159: Diff 28087!

Build has FAILED

Patch application report for D7749 (id=28088)

Rebasing onto e6a8303eef...

Current branch diff-target is up to date.
Changes applied before test
commit cf84ac6e03d3785264243e8b166a96541d36647c
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:48:07 2022 -0400

    Overhaul `lookup_missing_hashes`
    
    Keep hashes separated by type to make bugs less likely.

commit 9825ad425f2960e8449bdd85378125ddb585eb3e
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 14:59:12 2022 -0400

    Tweak `api_swhid_known` for perf and avoiding strings
    
    By shuffling around the algorithm, we avoid a `hash_to_bytes` and work
    more with the structured data.

commit b3c82465a688e3f6bdea7a1568e3993344c9a229
Author: John Ericson <John.Ericson@Obsidian.Systems>
Date:   Thu May 5 12:20:51 2022 -0400

    Make archive.lookup_missing_hashes output bytes
    
    All things equal, I think the bytes representation is better, and in
    this case it works well for existing callers too.

Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1802/console

Harbormaster returned this revision to the author for changes because remote builds failed.May 7 2022, 7:43 PM
Harbormaster failed remote builds in B29160: Diff 28088!