Page MenuHomeSoftware Heritage

Initial version of the swh.objstorage.replayer package
ClosedPublic

Authored by douardda on Apr 27 2020, 5:22 PM.

Details

Summary

this consists in the copy and adaptation of content replayer related files
from swh.journal v0.0.31.

Notable modifications:

  • test_write_replay.py has been merged into test_replay.py,
  • any reference to swh.journal.replay has been replaced,
  • adapted cli module to make added replayer commands subcommands of the objstorage's main cli group,
  • drop usage of vcversionner in favor of setuptools_scm.

Diff Detail

Repository
rDOBJSRPL Content replayer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

anlambert requested changes to this revision.EditedApr 27 2020, 6:57 PM
anlambert added a subscriber: anlambert.

I added several comments to address regarding a couple of issues I found. I am not really familiar with replayers so another review is needed as I may have missed something.

Also there is still references to swh.foo in tox.ini and you should add the CLI documentation as in rDGRPH7a5920220ee273003b54e6f0c6a44aa2bda310b3.

When I try to execute the tests using pytest in my venv, they are hanging. They ran fine using tox though.
swh-journal tests have the same issue and make test simply runs tox in that case, we should do the same here I think.

requirements-swh.txt
2–4

This should be swh-core[http] here.

tox fails to run tests otherwise:

File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/_pytest/assertion/rewrite.py", line 152, in exec_module
    exec(co, module.__dict__)
  File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/swh/journal/pytest_plugin.py", line 16, in <module>
    from swh.journal.serializers import object_key, kafka_to_key, kafka_to_value
  File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/_pytest/assertion/rewrite.py", line 152, in exec_module
    exec(co, module.__dict__)
  File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/swh/journal/serializers.py", line 10, in <module>
    from swh.core.api.serializers import msgpack_dumps, msgpack_loads
  File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/_pytest/assertion/rewrite.py", line 152, in exec_module
    exec(co, module.__dict__)
  File "/home/antoine/swh/swh-environment/swh-objstorage-replayer/.tox/py3/lib/python3.7/site-packages/swh/core/api/__init__.py", line 11, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
swh/objstorage/replayer/cli.py
32

black processing side effect here

43

same here

60

you should use verbatim text using double backquotes to display command options in the docstring

67

same here

swh/objstorage/replayer/replay.py
72–87

Why not using the bisect module here ?

swh/objstorage/replayer/tests/test_cli.py
40–45

That fixture is not used in the tests, you can remove it

129

You can remove storage fixture as it is not used, same thing in other tests

swh/objstorage/replayer/tests/test_replay.py
20

There is already sha1 and sha1_git strategies defines in swh.model.hypothesis_strategies

This revision now requires changes to proceed.Apr 27 2020, 6:57 PM
olasd added inline comments.
swh/objstorage/replayer/replay.py
72–87

I don't think the bisect module can be used to search a substring in an mmapped byte array, can it?

swh/objstorage/replayer/replay.py
72–87

FTR, I've "moved" the code from swh.journal (but your remark remains very valid)

swh/objstorage/replayer/tests/test_cli.py
40–45

good catch

swh/objstorage/replayer/tests/test_replay.py
20

same as above: copy code from journal. For these remarks, I'll provide fixes in a following commit.

fixes according to anlambert's comments

and remaining references to the storage fixture in test_cli...

swh/objstorage/replayer/replay.py
72–87

I don't think the bisect module can be used to search a substring in an mmapped byte array, can it?

Oh right, I got mistaken by the example in the docstring.
Nevertheless, you should be able to wrap your mmapped byte array in a sequence like this:

import collections

class HashArray(collections.Sequence):

    def __init__(self, byte_array, hash_size):
        self.byte_array = byte_array
        self.hash_size = hash_size

    def __len__(self):
        return self.byte_array.size() / self.hash_size

    def __getitem__(self, i):
        return self.byte_array[i * self.hash_size : (i+1) * self.hash_size]

And then use bisect functions on it.

vlorentz added inline comments.
swh/objstorage/replayer/replay.py
72–87

That's out of scope for this diff, that only moves code between packages.

Looks good, there is still the doc index.rst to update with component name though.

This revision is now accepted and ready to land.Apr 28 2020, 12:28 PM

Build is green

Patch application report for D3071 (id=10954)

Rebasing onto 88da644b49...

Current branch diff-target is up to date.
Changes applied before test
commit e9c46618fd2522df46c43df407b4514de0700af6
Author: David Douard <david.douard@sdfa3.org>
Date:   Mon Apr 27 16:51:41 2020 +0200

    Initial version of the swh.objstorage.replayer package
    
    this consists in the copy and adaptation of content replayer related files
    from swh.journal v0.0.31.
    
    Notable modifications:
    - test_write_replay.py has been merged into test_replay.py,
    - any reference to swh.journal.replay has been replaced,
    - adapted cli module to make added replayer commands subcommands of the
      objstorage's main cli group,
    - drop usage of vcversionner in favor of setuptools_scm.

See https://jenkins.softwareheritage.org/job/DOBJSRPL/job/tests-on-diff/1/ for more details.