Page MenuHomeSoftware Heritage

Make sure the progress bar for the export reaches 100%
ClosedPublic

Authored by douardda on Sep 9 2021, 5:54 PM.

Details

Summary
  • ensure the last offset is sent to the queue,
  • fix the computation of the progress value (off-by-one).

Depends on D6232

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D6233 (id=22552)

Could not rebase; Attempt merge onto 002ee70b99...

Updating 002ee70..48d246f
Fast-forward
 swh/dataset/journalprocessor.py | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)
Changes applied before test
commit 48d246f178851dfe06e47b6c12555fcd095f5641
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:54:15 2021 +0200

    Make sure the progress bar for the export reaches 100%
    
    - ensure the last offset is sent to the queue,
    - fix the computation of the progress value (off-by-one).

commit 3a2f5076dcbf791d1ef43982b70551f048ee7c3e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:47:57 2021 +0200

    Explicitly close the temporary kafka consumer in `get_offsets`
    
    used to retrieve partitions and lo/hi offets.
    
    It could cause some dead-lock/long timeout kind of situation sometime
    (especially in the developper docker environment).

commit 45126fd621e8b75c592d7c6cd3d8d1337f95c97e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:39:44 2021 +0200

    Simplify the lo/high partition offset computation
    
    The computation of lo and high offsets used to be done in 2 steps:
    - first get the watermak offsets (thus the absolute min and max offsets
      of the whole partition)
    - then, as a "hook" in `process()`, retrieve the last committed offset
      for the partition and "push" these current offsets in the progress
      queue.
    
    Instead, this simplifies a bit this process by quering the committed
    offsets while computing the hi/low offsets.

commit e47a3db1287b3f6ada32c3afb3270ef0947a7659
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:22:37 2021 +0200

    Use proper signature for JournalClientOffsetRanges.process()

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/4/ for more details.

add forgotten revision: Reduce the size of the progress bar

Build is green

Patch application report for D6233 (id=22555)

Could not rebase; Attempt merge onto 002ee70b99...

Updating 002ee70..3f331e1
Fast-forward
 swh/dataset/journalprocessor.py | 46 +++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 20 deletions(-)
Changes applied before test
commit 3f331e1823e3329085f01f073fe8a6bd6f43473a
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:30:25 2021 +0200

    Reduce the size of the progress bar
    
    so we get a chance to actually have a visible progress bar:
    
    - reduce the label size (shorter desc),
    - use a single 'workers' postfix (like "workers=n/m").

commit 48d246f178851dfe06e47b6c12555fcd095f5641
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:54:15 2021 +0200

    Make sure the progress bar for the export reaches 100%
    
    - ensure the last offset is sent to the queue,
    - fix the computation of the progress value (off-by-one).

commit 3a2f5076dcbf791d1ef43982b70551f048ee7c3e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:47:57 2021 +0200

    Explicitly close the temporary kafka consumer in `get_offsets`
    
    used to retrieve partitions and lo/hi offets.
    
    It could cause some dead-lock/long timeout kind of situation sometime
    (especially in the developper docker environment).

commit 45126fd621e8b75c592d7c6cd3d8d1337f95c97e
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:39:44 2021 +0200

    Simplify the lo/high partition offset computation
    
    The computation of lo and high offsets used to be done in 2 steps:
    - first get the watermak offsets (thus the absolute min and max offsets
      of the whole partition)
    - then, as a "hook" in `process()`, retrieve the last committed offset
      for the partition and "push" these current offsets in the progress
      queue.
    
    Instead, this simplifies a bit this process by quering the committed
    offsets while computing the hi/low offsets.

commit e47a3db1287b3f6ada32c3afb3270ef0947a7659
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:22:37 2021 +0200

    Use proper signature for JournalClientOffsetRanges.process()

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/5/ for more details.

This revision is now accepted and ready to land.Sep 10 2021, 1:13 PM

Build is green

Patch application report for D6233 (id=22581)

Could not rebase; Attempt merge onto 002ee70b99...

Updating 002ee70..5881ae0
Fast-forward
 swh/dataset/journalprocessor.py | 54 +++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 21 deletions(-)
Changes applied before test
commit 5881ae06f636a74e7fb0addca04127bfe18b687d
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:30:25 2021 +0200

    Reduce the size of the progress bar
    
    so we get a chance to actually have a visible progress bar:
    
    - reduce the label size (shorter desc),
    - use a single 'workers' postfix (like "workers=n/m").

commit 47713ee38c9498a0548535e5b8361d8158ee3e09
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:54:15 2021 +0200

    Make sure the progress bar for the export reaches 100%
    
    - ensure the last offset is sent to the queue,
    - fix the computation of the progress value (off-by-one).

commit d07b2a632256da4e7778bf7b1f4a02acd03f9ca0
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:47:57 2021 +0200

    Explicitly close the temporary kafka consumer in `get_offsets`
    
    used to retrieve partitions and lo/hi offets.
    
    It could cause some dead-lock/long timeout kind of situation sometime
    (especially in the developper docker environment).

commit 2760e322af7c5862e0329198671b49d2755491ef
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:39:44 2021 +0200

    Simplify the lo/high partition offset computation
    
    The computation of lo and high offsets used to be done in 2 steps:
    - first get the watermak offsets (thus the absolute min and max offsets
      of the whole partition)
    - then, as a "hook" in `process()`, retrieve the last committed offset
      for the partition and "push" these current offsets in the progress
      queue.
    
    Instead, this simplifies a bit this process by quering the committed
    offsets while computing the hi/low offsets.

commit e47a3db1287b3f6ada32c3afb3270ef0947a7659
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:22:37 2021 +0200

    Use proper signature for JournalClientOffsetRanges.process()

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/9/ for more details.

attempt to trick phab/arcanist

Build is green

Patch application report for D6233 (id=22610)

Could not rebase; Attempt merge onto 002ee70b99...

Updating 002ee70..358d849
Fast-forward
 swh/dataset/journalprocessor.py | 54 +++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 21 deletions(-)
Changes applied before test
commit 358d84938d01ee25706619e533213c6e62f4c828
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:30:25 2021 +0200

    Reduce the size of the progress bar
    
    so we get a chance to actually have a visible progress bar:
    
    - reduce the label size (shorter desc),
    - use a single 'workers' postfix (like "workers=n/m").

commit 47713ee38c9498a0548535e5b8361d8158ee3e09
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:54:15 2021 +0200

    Make sure the progress bar for the export reaches 100%
    
    - ensure the last offset is sent to the queue,
    - fix the computation of the progress value (off-by-one).

commit d07b2a632256da4e7778bf7b1f4a02acd03f9ca0
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:47:57 2021 +0200

    Explicitly close the temporary kafka consumer in `get_offsets`
    
    used to retrieve partitions and lo/hi offets.
    
    It could cause some dead-lock/long timeout kind of situation sometime
    (especially in the developper docker environment).

commit 2760e322af7c5862e0329198671b49d2755491ef
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 11:39:44 2021 +0200

    Simplify the lo/high partition offset computation
    
    The computation of lo and high offsets used to be done in 2 steps:
    - first get the watermak offsets (thus the absolute min and max offsets
      of the whole partition)
    - then, as a "hook" in `process()`, retrieve the last committed offset
      for the partition and "push" these current offsets in the progress
      queue.
    
    Instead, this simplifies a bit this process by quering the committed
    offsets while computing the hi/low offsets.

commit e47a3db1287b3f6ada32c3afb3270ef0947a7659
Author: David Douard <david.douard@sdfa3.org>
Date:   Thu Sep 9 14:22:37 2021 +0200

    Use proper signature for JournalClientOffsetRanges.process()

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/14/ for more details.