Page MenuHomeSoftware Heritage

Use a named logger for journalprocessor.py
ClosedPublic

Authored by douardda on Mar 18 2022, 2:10 PM.

Details

Summary

and add a few more debug logging statements.

Depends on D7380.

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7381 (id=26683)

Could not rebase; Attempt merge onto 68f9bd2028...

Updating 68f9bd2..048f273
Fast-forward
 requirements-swh.txt            |  6 +++---
 swh/dataset/exporters/orc.py    | 31 +++++++++++--------------------
 swh/dataset/journalprocessor.py | 14 +++++++++++---
 swh/dataset/relational.py       | 15 +++++++++------
 swh/dataset/test/test_orc.py    | 14 ++++++--------
 5 files changed, 40 insertions(+), 40 deletions(-)
Changes applied before test
commit 048f273838cdba3c8dfbebfb9768fc675f205d74
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit 8cae6adb5c63af22bc798ebe7072a33978f3037e
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

commit 8c2b5e951c1a1195c9ec3e700cb9da60711a96ab
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 11:46:37 2022 +0100

    Encode TimestampWithTimezone as (sec, usec, offset) in ORC file
    
    instead of using the ORC Timestamp format, since we cannot always encode
    them in this format.
    
    The offset is encoded as binary (byte string), following recent evolutions
    of swh-model.
    
    This makes swh-dataset compatible with swh-model 5.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/31/ for more details.

ardumont added inline comments.
swh/dataset/journalprocessor.py
202

let the logger do the formatting, same for other new logging instruction below.

swh/dataset/journalprocessor.py
202

let the logger do the formatting, same for other new logging instruction below.

ah yes sure, I always forget about this

rebase and fixes suggested by ardumont

This revision is now accepted and ready to land.Mar 18 2022, 3:50 PM

Build is green

Patch application report for D7381 (id=26704)

Could not rebase; Attempt merge onto 68f9bd2028...

Updating 68f9bd2..4f14a95
Fast-forward
 requirements-swh.txt            |  6 +++---
 swh/dataset/exporters/orc.py    | 36 ++++++++++++++++-------------------
 swh/dataset/journalprocessor.py | 16 +++++++++++++---
 swh/dataset/relational.py       | 15 +++++++++------
 swh/dataset/test/test_orc.py    | 42 +++++++++++++++++++++++++++++++++--------
 5 files changed, 75 insertions(+), 40 deletions(-)
Changes applied before test
commit 4f14a95aaadabc1d2036a9c31c18e6a78befb44d
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit 316d51b6da36719bca767c78ad04402c609d5abe
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

commit ae440431049470ecac6aca0e8cbed4a51cde0c09
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 11:46:37 2022 +0100

    Encode TimestampWithTimezone as (sec, usec, offset) in ORC file
    
    instead of using the ORC Timestamp format, since we cannot always encode
    them in this format.
    
    The offset is encoded as binary (byte string), following recent evolutions
    of swh-model.
    
    This makes swh-dataset compatible with swh-model 5.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/43/ for more details.

Build has FAILED

Patch application report for D7381 (id=26774)

Could not rebase; Attempt merge onto 68f9bd2028...

Updating 68f9bd2..c508c67
Fast-forward
 requirements-swh.txt            |  6 ++--
 swh/dataset/exporters/orc.py    | 79 +++++++++++++++++++++++++++++++----------
 swh/dataset/journalprocessor.py | 16 +++++++--
 swh/dataset/relational.py       | 15 ++++----
 swh/dataset/test/test_orc.py    | 57 ++++++++++++++++++++++-------
 5 files changed, 131 insertions(+), 42 deletions(-)
Changes applied before test
commit c508c673043458a419f2eeb0d5a2fb60b12aa007
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit d49db10f0bf7174ea4f2742d5d5ac8c8e25b707a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

commit 69e806698bbb6df42bfa3520681e0203f91d8a65
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 11:46:37 2022 +0100

    Encode TimestampWithTimezone as (timestamp, offset, raw_offset_bytes) in ORC file
    
    ie. use the standard ORC Timestamp format (aka a couple
    (seconds, nanoseconds)) with 2 extra fields for the offset.
    
    The offset is stored as an integer (in minutes), but the raw offset
    value is also present as a binary string representation, following
    recent evolutions of swh-model.
    
    This makes swh-dataset compatible with swh-model 5.

Link to build: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/54/
See console output for more information: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/54/console

Build is green

Patch application report for D7381 (id=26796)

Could not rebase; Attempt merge onto 68f9bd2028...

Updating 68f9bd2..55cf5ac
Fast-forward
 requirements-swh.txt            |  6 +--
 swh/dataset/exporters/orc.py    | 81 ++++++++++++++++++++++++++++++++---------
 swh/dataset/journalprocessor.py | 16 ++++++--
 swh/dataset/relational.py       | 15 +++++---
 swh/dataset/test/test_orc.py    | 59 ++++++++++++++++++++++++------
 5 files changed, 136 insertions(+), 41 deletions(-)
Changes applied before test
commit 55cf5ac5cc0cb818ae23b5df4af416e57794469c
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit 70d9d3182de1420ba545f2f507ade8f59b2c2f33
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

commit 09d2840dbd4db6e1a3dd976c44b3c628b9174741
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 11:46:37 2022 +0100

    Encode TimestampWithTimezone as (timestamp, offset, raw_offset_bytes) in ORC file
    
    ie. use the standard ORC Timestamp format (aka a couple
    (seconds, nanoseconds)) with 2 extra fields for the offset.
    
    The offset is stored as an integer (in minutes), but the raw offset
    value is also present as a binary string representation, following
    recent evolutions of swh-model.
    
    This makes swh-dataset compatible with swh-model 5.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/66/ for more details.

Build is green

Patch application report for D7381 (id=26992)

Could not rebase; Attempt merge onto 68f9bd2028...

Updating 68f9bd2..de114c2
Fast-forward
 requirements-swh.txt            |  6 +--
 swh/dataset/exporters/orc.py    | 81 ++++++++++++++++++++++++++++++++---------
 swh/dataset/journalprocessor.py | 16 ++++++--
 swh/dataset/relational.py       |  3 ++
 swh/dataset/test/test_orc.py    | 59 ++++++++++++++++++++++++------
 5 files changed, 130 insertions(+), 35 deletions(-)
Changes applied before test
commit de114c20f105c0b888eb92625f4e073f97a94ae8
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit a8442bcf7c4311a28bea0898a01dc9475889efc7
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

commit f588e20a41af4b1b8042f9b5f0e88a1f1dc91e59
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 11:46:37 2022 +0100

    Encode TimestampWithTimezone as (timestamp, offset, raw_offset_bytes) in ORC file
    
    ie. use the standard ORC Timestamp format (aka a couple
    (seconds, nanoseconds)) with 2 extra fields for the offset.
    
    The offset is stored as an integer (in minutes), but the raw offset
    value is also present as a binary string representation, following
    recent evolutions of swh-model.
    
    This makes swh-dataset compatible with swh-model 5.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/78/ for more details.

Build is green

Patch application report for D7381 (id=27029)

Could not rebase; Attempt merge onto 31081e4121...

Updating 31081e4..769b6a7
Fast-forward
 requirements-swh.txt            |  2 +-
 swh/dataset/journalprocessor.py | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)
Changes applied before test
commit 769b6a77d250123ee25d8576bc1fe4a9340616f4
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:41:28 2021 +0100

    Use a named logger for journalprocessor.py
    
    and add a few more debug logging statements.

commit d7c332e4e7e1d5ee531a914b302f98c11503663e
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Dec 15 16:44:49 2021 +0100

    Update JournalClientOffsetRanges for swh.journal 0.9
    
    deserialize_message() now takes an optional 'object_type' argument.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/89/ for more details.