Page MenuHomeSoftware Heritage

scheduler: Make origin_visit_stats_get read multiple entries
ClosedPublic

Authored by ardumont on Jan 19 2021, 6:32 PM.

Diff Detail

Event Timeline

swh/scheduler/backend.py
883

overall, improvment suggestions welcome ;)

Build is green

Patch application report for D4888 (id=17361)

Rebasing onto 0a32a31195...

First, rewinding head to replay your work on top of it...
Applying: scheduler: Make origin_visit_stats_get read multiple entries
Changes applied before test
commit 08e5af9335bff3a77d4df0c49ee11f04c7ff7686
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/170/ for more details.

You should consider using psycopg2.extras.execute_values instead of rolling your own implementation (see the implementation of record_listed_origins).

Please document that the return value can have fewer entries than the input. Maybe it'd make sense to return a dict (url, type) -> OriginVisitStats instead?

olasd requested changes to this revision.Jan 20 2021, 11:31 AM
This revision now requires changes to proceed.Jan 20 2021, 11:31 AM
vlorentz added inline comments.
swh/scheduler/backend.py
881

btw

swh/scheduler/backend.py
881

"of course", yesterday evening, that would not come to me ¯\_(ツ)_/¯

Thanks ;)

You should consider using psycopg2.extras.execute_values instead of rolling your own implementation (see the implementation of record_listed_origins).

Agreed.

Please document that the return value can have fewer entries than the input.

Right.

Maybe it'd make sense to return a dict (url, type) -> OriginVisitStats instead?

Thanks, that sounds better for the client of this api.

Maybe it'd make sense to return a dict (url, type) -> OriginVisitStats instead?

Thanks, that sounds better for the client of this api.

Except now, we are hitting serialization problem.
test_remote_api client (which passes through serialization) are not happy with tuple as keys...

.tox/py3/lib/python3.7/site-packages/swh/scheduler/tests/test_scheduler.py:954:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.tox/py3/lib/python3.7/site-packages/swh/core/api/__init__.py:181: in meth_
    return self.post(meth._endpoint_path, post_data)
.tox/py3/lib/python3.7/site-packages/swh/core/api/__init__.py:278: in post
    return self._decode_response(response)
.tox/py3/lib/python3.7/site-packages/swh/core/api/__init__.py:355: in _decode_response
    return decode_response(response, extra_decoders=self.extra_type_decoders)
.tox/py3/lib/python3.7/site-packages/swh/core/api/serializers.py:120: in decode_response
    r = msgpack_loads(response.content, extra_decoders=extra_decoders)
.tox/py3/lib/python3.7/site-packages/swh/core/api/serializers.py:296: in msgpack_loads
    data, raw=False, object_hook=decode_types, ext_hook=ext_hook
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: list is not allowed for map key

(food)

Adapt according to review (most are ok except for the returned type dict change)

Remains a problem with using the Dict[Tuple[str, str], OriginVisitStats] as returned
type.

Because, iirc, tuples are converted into list by our serialization layer... and list are
not hashable so it's not usable as dict...

(updated the diff so the error is demonstrated here)

Build has FAILED

Patch application report for D4888 (id=17387)

Rebasing onto b03d978241...

First, rewinding head to replay your work on top of it...
Applying: scheduler: Make origin_visit_stats_get read multiple entries
Using index info to reconstruct a base tree...
M	swh/scheduler/backend.py
M	swh/scheduler/interface.py
M	swh/scheduler/tests/test_cli_journal.py
M	swh/scheduler/tests/test_scheduler.py
Falling back to patching base and 3-way merge...
Auto-merging swh/scheduler/tests/test_scheduler.py
Auto-merging swh/scheduler/tests/test_cli_journal.py
Auto-merging swh/scheduler/interface.py
CONFLICT (content): Merge conflict in swh/scheduler/interface.py
Auto-merging swh/scheduler/backend.py
CONFLICT (content): Merge conflict in swh/scheduler/backend.py
Patch failed at 0001 scheduler: Make origin_visit_stats_get read multiple entries

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto b03d978241...

Already up to date.
Changes applied before test
commit 23a688a2d4b0cabbaf4bd24c3958dd4ccf975037
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    This changes the signature of the api to
    (ids: Iterable[Tuple[str, str]]) -> Dict[Tuple[str, str], OriginVisitStats]
    
    Related to T2967

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/185/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/185/console

Adapt back to the first implementation, this returns a list of OriginVisitStats

Update interface docstring to mention the length of the output list could be less than
the one from the input.

Build is green

Patch application report for D4888 (id=17390)

Rebasing onto b03d978241...

First, rewinding head to replay your work on top of it...
Applying: scheduler: Make origin_visit_stats_get read multiple entries
Using index info to reconstruct a base tree...
M	swh/scheduler/backend.py
M	swh/scheduler/interface.py
M	swh/scheduler/tests/test_cli_journal.py
M	swh/scheduler/tests/test_scheduler.py
Falling back to patching base and 3-way merge...
Auto-merging swh/scheduler/tests/test_scheduler.py
CONFLICT (content): Merge conflict in swh/scheduler/tests/test_scheduler.py
Auto-merging swh/scheduler/tests/test_cli_journal.py
Auto-merging swh/scheduler/interface.py
CONFLICT (content): Merge conflict in swh/scheduler/interface.py
Auto-merging swh/scheduler/backend.py
CONFLICT (content): Merge conflict in swh/scheduler/backend.py
Patch failed at 0001 scheduler: Make origin_visit_stats_get read multiple entries

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto b03d978241...

Already up to date.
Changes applied before test
commit a8a6a9dcbcbde5e0f2510746e488a32975cdf9d0
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/186/ for more details.

Build is green

Patch application report for D4888 (id=17391)

Rebasing onto b03d978241...

First, rewinding head to replay your work on top of it...
Applying: scheduler: Make origin_visit_stats_get read multiple entries
Using index info to reconstruct a base tree...
M	swh/scheduler/backend.py
M	swh/scheduler/interface.py
M	swh/scheduler/tests/test_cli_journal.py
M	swh/scheduler/tests/test_scheduler.py
Falling back to patching base and 3-way merge...
Auto-merging swh/scheduler/tests/test_scheduler.py
CONFLICT (content): Merge conflict in swh/scheduler/tests/test_scheduler.py
Auto-merging swh/scheduler/tests/test_cli_journal.py
Auto-merging swh/scheduler/interface.py
CONFLICT (content): Merge conflict in swh/scheduler/interface.py
Auto-merging swh/scheduler/backend.py
CONFLICT (content): Merge conflict in swh/scheduler/backend.py
Patch failed at 0001 scheduler: Make origin_visit_stats_get read multiple entries

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto b03d978241...

Already up to date.
Changes applied before test
commit 3d9c3ab655067d28ba2fc18aeb482b384f41cf31
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/187/ for more details.

Build is green

Patch application report for D4888 (id=17394)

Rebasing onto b03d978241...

Current branch diff-target is up to date.
Changes applied before test
commit ffa2b43153137b4a0b51690f4179309688c867b7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/191/ for more details.

Build has FAILED

Patch application report for D4888 (id=17435)

Rebasing onto 898820fac5...

Current branch diff-target is up to date.
Changes applied before test
commit 1fd92d0b378da82ec605de36f21c428df6236515
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

Link to build: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/199/
See console output for more information: https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/199/console

This needs a rebase to fix the build (i'm on it)

Rebase with fix so tests are happy (i missed the modification last time i rebased and
updated this ¯\_(ツ)_/¯)

Build is green

Patch application report for D4888 (id=17440)

Rebasing onto c7b740cafa...

First, rewinding head to replay your work on top of it...
Applying: scheduler: Make origin_visit_stats_get read multiple entries
Changes applied before test
commit e339cef62ce8997abff8e3368387331338e7ac76
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/203/ for more details.

Build is green

Patch application report for D4888 (id=17443)

Rebasing onto c7b740cafa...

Current branch diff-target is up to date.
Changes applied before test
commit 1cbc3f6f39c6dd12d162f8be7bdbc4689f86188d
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/206/ for more details.

Build is green

Patch application report for D4888 (id=17449)

Rebasing onto ffe2aed2fa...

Current branch diff-target is up to date.
Changes applied before test
commit d464b4cc1f9ae6a5c5c94a534826eff5cc27f12f
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Tue Jan 19 18:12:33 2021 +0100

    scheduler: Make origin_visit_stats_get read multiple entries
    
    Related to T2967

See https://jenkins.softwareheritage.org/job/DSCH/job/tests-on-diff/207/ for more details.

This revision is now accepted and ready to land.Jan 21 2021, 11:01 AM