Page MenuHomeSoftware Heritage

Move exporter config entries in dedicated sections
ClosedPublic

Authored by douardda on Mar 29 2022, 5:44 PM.

Details

Summary

eg. orc exporter specific exporter config entries are now under the
'orc' section, like:

journal:
  brokers: [...]

orc:
  remove_pull_requests: true
  max_rows:
    revision: 100000
    directory: 10000

Depends on D7461.

Diff Detail

Repository
rDDATASET Datasets
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D7462 (id=27047)

Could not rebase; Attempt merge onto 5a8a8a7847...

Updating 5a8a8a7..ebb5a89
Fast-forward
 swh/dataset/exporters/orc.py |  98 +++++++++++++++++++++++++++++++++++-------
 swh/dataset/relational.py    |  10 +++++
 swh/dataset/test/test_orc.py | 100 +++++++++++++++++++++++++++++++++++++++----
 3 files changed, 185 insertions(+), 23 deletions(-)
Changes applied before test
commit ebb5a89f95d73f52e87c456b872ec6c529d80fe3
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 25 15:43:18 2022 +0100

    Move exporter config entries in dedicated sections
    
    eg. orc exporter specific exporter config entries are now under the
    'orc' section, like:
    
      journal:
        brokers: [...]
    
      orc:
        remove_pull_requests: true
        max_rows:
          revision: 100000
          directory: 10000

commit e8ccb166a6aaa82f5917388f9b995c830499170a
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 23 16:35:52 2022 +0100

    Add support for limited row numbers in ORC files
    
    Make it possible to specify a maximum number of rows a table can store
    in a single ORC file. The limit can only be set on main tables for now
    (i.e. cannot be specified for tables like revision_history or
    directory_entry).
    
    This can be set by configuration only (no extra cli options).

commit fd3f9aa61de374655fd4bc4920d5047eb7d0c4ca
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 12:24:31 2022 +0100

    Add the raw_manifest column for revision, release and directory ORC files

commit 5c652bb058e2c1b59bafefd6817f392fdc171a20
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 12:22:47 2022 +0100

    Export revision extra headers in a dedicated ORC file

commit 45c8124b7a310963a868eb6602ea24e240d761e4
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 18 12:20:20 2022 +0100

    Add the type fields for revision and origin_visit_status ORC table

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/99/ for more details.

This revision is now accepted and ready to land.Mar 29 2022, 6:00 PM

Build has FAILED

Patch application report for D7462 (id=27066)

Could not rebase; Attempt merge onto fd3f9aa61d...

Updating fd3f9aa..c728c05
Fast-forward
 swh/dataset/exporters/orc.py | 99 +++++++++++++++++++++++++++++++++++---------
 swh/dataset/relational.py    | 58 +++++++++++++++-----------
 swh/dataset/test/test_orc.py | 96 ++++++++++++++++++++++++++++++++++++++----
 3 files changed, 202 insertions(+), 51 deletions(-)
Changes applied before test
commit c728c05e630cf02f35081e810279a5e5b24ebf98
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 25 15:43:18 2022 +0100

    Move exporter config entries in dedicated sections
    
    eg. orc exporter specific exporter config entries are now under the
    'orc' section, like:
    
      journal:
        brokers: [...]
    
      orc:
        remove_pull_requests: true
        max_rows:
          revision: 100000
          directory: 10000

commit 850ee3be47cf3b1e0ab53f8820cc5e4c86b94f38
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 23 16:35:52 2022 +0100

    Add support for limited row numbers in ORC files
    
    Make it possible to specify a maximum number of rows a table can store
    in a single ORC file. The limit can only be set on main tables for now
    (i.e. cannot be specified for tables like revision_history or
    directory_entry).
    
    This can be set by configuration only (no extra cli options).

Link to build: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/105/
See console output for more information: https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/105/console

Build is green

Patch application report for D7462 (id=27075)

Could not rebase; Attempt merge onto fd3f9aa61d...

Updating fd3f9aa..e01daba
Fast-forward
 swh/dataset/exporters/orc.py | 99 +++++++++++++++++++++++++++++++++++---------
 swh/dataset/relational.py    | 58 +++++++++++++++-----------
 swh/dataset/test/test_orc.py | 96 ++++++++++++++++++++++++++++++++++++++----
 3 files changed, 202 insertions(+), 51 deletions(-)
Changes applied before test
commit e01daba4d601733a86ce7401fe54247908d03e5c
Author: David Douard <david.douard@sdfa3.org>
Date:   Fri Mar 25 15:43:18 2022 +0100

    Move exporter config entries in dedicated sections
    
    eg. orc exporter specific exporter config entries are now under the
    'orc' section, like:
    
      journal:
        brokers: [...]
    
      orc:
        remove_pull_requests: true
        max_rows:
          revision: 100000
          directory: 10000

commit 3df08fd71759487e963e6569c8dfd0c502b060de
Author: David Douard <david.douard@sdfa3.org>
Date:   Wed Mar 23 16:35:52 2022 +0100

    Add support for limited row numbers in ORC files
    
    Make it possible to specify a maximum number of rows a table can store
    in a single ORC file. The limit can only be set on main tables for now
    (i.e. cannot be specified for tables like revision_history or
    directory_entry).
    
    This can be set by configuration only (no extra cli options).

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/110/ for more details.