Page MenuHomeSoftware Heritage

Add a --margin option to the `swh dataset graph export` command
ClosedPublic

Authored by douardda on Apr 15 2022, 12:29 PM.

Details

Summary

this option allows to restart consuming kafka a bit earlier than last
committed offsets.
This can be useful to test and debug.

Depends on D7588.

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7589 (id=27476)

Could not rebase; Attempt merge onto 9f342d9994...

Updating 9f342d9..6d07ab9
Fast-forward
 swh/dataset/cli.py              | 27 +++++++++++++++++++++++++--
 swh/dataset/journalprocessor.py | 21 ++++++++++++++++++---
 2 files changed, 43 insertions(+), 5 deletions(-)
Changes applied before test
commit 6d07ab9fc745bd8375f4ece73459fdadc87f7f28
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:25:47 2022 +0200

    Add a --margin option to the `swh dataset graph export` command
    
    this option allows to restart consuming kafka a bit earlier than last
    committed offsets.
    This can be useful to test and debug.

commit 6f6b1bd42760203427d2b8f72a6fd22d4686b106
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:19:17 2022 +0200

    Add a --type option to the `swh dataset graph export` command
    
    to easily set the list of exported object types. If not set, export all
    supported object types.
    
    Note that ``--exclude`` is also applied.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/131/ for more details.

Improve and simplify a bit the code

so that progress bars are not messed up.

Build is green

Patch application report for D7589 (id=27483)

Could not rebase; Attempt merge onto 9f342d9994...

Updating 9f342d9..08eac9d
Fast-forward
 swh/dataset/cli.py              | 26 ++++++++++++++++++++++++--
 swh/dataset/journalprocessor.py | 27 ++++++++++++++++++++++++---
 2 files changed, 48 insertions(+), 5 deletions(-)
Changes applied before test
commit 08eac9da7cabae2c018edb00ac44b10351e12ef2
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:25:47 2022 +0200

    Add a --margin option to the `swh dataset graph export` command
    
    this option allows to restart consuming kafka a bit earlier than last
    committed offsets.
    This can be useful to test and debug.

commit 6f6b1bd42760203427d2b8f72a6fd22d4686b106
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:19:17 2022 +0200

    Add a --type option to the `swh dataset graph export` command
    
    to easily set the list of exported object types. If not set, export all
    supported object types.
    
    Note that ``--exclude`` is also applied.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/132/ for more details.

Sounds like you're having fun

swh/dataset/cli.py
87
swh/dataset/journalprocessor.py
257

I don't think lo would ever be large enough to overflow a double mantissa, but better safe than sorry.

This revision is now accepted and ready to land.Apr 20 2022, 11:29 AM
douardda added inline comments.
swh/dataset/cli.py
87

whuu nice!

douardda marked an inline comment as done.

thx vlorentz

Build is green

Patch application report for D7589 (id=27563)

Could not rebase; Attempt merge onto 9f342d9994...

Updating 9f342d9..628e561
Fast-forward
 swh/dataset/cli.py              | 26 ++++++++++++++++++++++++--
 swh/dataset/journalprocessor.py | 29 ++++++++++++++++++++++++++---
 2 files changed, 50 insertions(+), 5 deletions(-)
Changes applied before test
commit 628e561e778f9b328b5b1afc5c070f25f55e6d98
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:25:47 2022 +0200

    Add a --margin option to the `swh dataset graph export` command
    
    this option allows to restart consuming kafka a bit earlier than last
    committed offsets.
    This can be useful to test and debug.

commit 6f6b1bd42760203427d2b8f72a6fd22d4686b106
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:19:17 2022 +0200

    Add a --type option to the `swh dataset graph export` command
    
    to easily set the list of exported object types. If not set, export all
    supported object types.
    
    Note that ``--exclude`` is also applied.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/133/ for more details.

Build is green

Patch application report for D7589 (id=27569)

Could not rebase; Attempt merge onto 9f342d9994...

Updating 9f342d9..07bcf16
Fast-forward
 swh/dataset/cli.py              | 32 ++++++++++++++++++++++++++++++--
 swh/dataset/journalprocessor.py | 29 ++++++++++++++++++++++++++---
 2 files changed, 56 insertions(+), 5 deletions(-)
Changes applied before test
commit 07bcf1674e0a7873f44c6b71a0f7047c58f9d402
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:25:47 2022 +0200

    Add a --margin option to the `swh dataset graph export` command
    
    this option allows to restart consuming kafka a bit earlier than last
    committed offsets.
    This can be useful to test and debug.

commit 4154d43a4c55c20df7f37466f84adba3c1505c64
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Apr 15 12:19:17 2022 +0200

    Add a --types option to the `swh dataset graph export` command
    
    to easily set the list of exported object types. If not set, export all
    supported object types.
    
    Note that ``--exclude`` is also applied.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/135/ for more details.