Page MenuHomeSoftware Heritage

Add a wrapper to manage a sorted list.
ClosedPublic

Authored by vlorentz on Apr 9 2020, 12:36 PM.

Details

Summary

For now this is only used by sorted_sha1, but we'll need it for
origin_metadata soon.

Diff Detail

Repository
rDSTO Storage manager
Branch
bisect-wrapper
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 11745
Build 17804: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 17803: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D2987 (id=10625)

Rebasing onto b0b0313c96...

Current branch diff-target is up to date.
Changes applied before test
commit 5da27ab73280c29ab6355e04cd4d67fc44ff365c
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 24 17:22:09 2020 +0100

    Add a wrapper to manage a sorted list.
    
    For now this is only used by sorted_sha1, but we'll need it for
    origin_metadata soon.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/78/ for more details.

I am not sure if explicitly setting the key type when instantiating a SortedList instance is a real necessity.
How about simplifying a little bit in order to simply write sha1s = SortedList[bytes] ?

SortedListItem = TypeVar("SortedListItem")
SortedListKey = Any


class SortedList(collections.UserList, Generic[SortedListItem]):
    data: List[Tuple[SortedListKey, SortedListItem]]

    # https://github.com/python/mypy/issues/708
    # key: Callable[[SortedListItem], SortedListKey]

    def __init__(
        self,
        data: List[SortedListItem] = None,
        key: Optional[Callable[[SortedListItem], SortedListKey]] = None,
    ):
        super().__init__(sorted(data or [], key=key))
        self.key: Optional[Callable[[SortedListItem], SortedListKey]] = key

    def add(self, item: SortedListItem):
        k = self.key(item) if self.key is not None else item
        bisect.insort(self.data, (k, item))

    def __iter__(self) -> Iterator[SortedListItem]:
        for (k, item) in self.data:
            yield item

    def iter_from(self, start_key: SortedListKey) -> Iterator[SortedListItem]:
        from_index = bisect.bisect_left(self.data, (start_key,))
        for (k, item) in itertools.islice(self.data, from_index, None):
            yield item

Also some simple tests are missing.

swh/storage/in_memory.py
123

You can also write:

self._shorted_sha1s = SortedList[bytes, bytes]()
swh/storage/in_memory.py
123

TIL, thanks

  • rebase
  • add tests
  • shorter initialization of _sorted_sha1s

I am not sure if explicitly setting the key type when instantiating a SortedList instance is a real necessity.

Of course it's not necessary, but this way mypy can check the type of key and argument of iter_from match.

Build is green

Patch application report for D2987 (id=10772)

Rebasing onto 32b3e93695...

Current branch diff-target is up to date.
Changes applied before test
commit 4526c528ed9d5d5788fac4c4f25ab3e468521a49
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 24 17:22:09 2020 +0100

    Add a wrapper to manage a sorted list.
    
    For now this is only used by sorted_sha1, but we'll need it for
    origin_metadata soon.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/100/ for more details.

ardumont added inline comments.
swh/storage/in_memory.py
64

why not Tuple[SortedListKey, ...]?

swh/storage/in_memory.py
64

i'll be explicit to avoid confusion, i meant why not List[Tuple[SortedListKey, SortedListItem]]?

swh/storage/tests/test_in_memory.py
62 ↗(On Diff #10772)

Arf...
It took me a while to understand that , split is the string printed in case the assertion fails...

Could you please make that more apparent?
(same for the next test)

For example:

assert list(list_.iter_from(split)) == expected, f"split: {split}"

or something better if you have ;)

This revision now requires changes to proceed.Apr 21 2020, 2:55 PM

Thanks.

I'm fine with the implementation, just a nitpick on the assertions which i misunderstood initially.

swh/storage/in_memory.py
64

oh yes, indeed

This revision is now accepted and ready to land.Apr 23 2020, 11:05 AM

Build is green

Patch application report for D2987 (id=10834)

Rebasing onto bca643acab...

First, rewinding head to replay your work on top of it...
Applying: Add a wrapper to manage a sorted list.
Changes applied before test
commit 0a31410ed5e22c77113c35c4f2fd24b30d9c2d87
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Tue Mar 24 17:22:09 2020 +0100

    Add a wrapper to manage a sorted list.
    
    For now this is only used by sorted_sha1, but we'll need it for
    origin_metadata soon.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/112/ for more details.

This revision was landed with ongoing or failed builds.Apr 23 2020, 1:03 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D2987 (id=10844)

Rebasing onto fe56005555...

First, rewinding head to replay your work on top of it...
Fast-forwarded diff-target to base-revision-114-D2987.
Changes applied before test

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/114/ for more details.