Page MenuHomeSoftware Heritage

scanner: json output format
ClosedPublic

Authored by DanSeraf on Apr 27 2020, 11:25 AM.

Details

Summary

The json output now is flat and i've added the ndjson output format as @olasd suggested (Related T2363)

Diff Detail

Repository
rDTSCN Code scanner
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D3069 (id=10908)

Rebasing onto 3bca054d47...

Current branch diff-target is up to date.
Changes applied before test
commit e5f7cb3e9833e82725a0370481669adb8bd83f9b
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 11:13:36 2020 +0200

    new output format: ndjson

commit 015e87f66c25d19b83ea802488e838aac103fbcc
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 11:09:12 2020 +0200

    model: known attribute in Tree structure
    
    Now the scanner will save the "known" status while scanning
    a source code project. The swhid is saved also for unknown files.
    
    The view of the json format now is flat

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/10/ for more details.

Could you split this into two diffs?

Could you split this into two diffs?

I've separated the diffs so i'll wait the other one (D3070) before updating this one.

Build is green

Patch application report for D3069 (id=10925)

Could not rebase; Attempt merge onto 3bca054d47...

Updating 3bca054..4f7fd3e
Fast-forward
 swh/scanner/model.py                               | 60 ++++++++++++++++++----
 swh/scanner/scanner.py                             | 10 ++--
 swh/scanner/tests/conftest.py                      |  6 +--
 swh/scanner/tests/data.py                          |  5 +-
 .../data/sample-folder-result-no-toexclude.json    | 12 -----
 swh/scanner/tests/data/sample-folder-result.json   | 13 -----
 swh/scanner/tests/flask_api.py                     | 14 ++---
 swh/scanner/tests/test_model.py                    | 40 ++++++++-------
 swh/scanner/tests/test_scanner.py                  | 30 +++++------
 9 files changed, 100 insertions(+), 90 deletions(-)
 delete mode 100644 swh/scanner/tests/data/sample-folder-result-no-toexclude.json
 delete mode 100644 swh/scanner/tests/data/sample-folder-result.json
Changes applied before test
commit 4f7fd3e0599c60e8d0adeb9e8f66e3c3f2bb60d7
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 16:00:10 2020 +0200

    scanner tests adapted to the new attribute on the model

commit 66d6394ba20e452771c47b99deea1b8f6f3e7352
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 15:55:01 2020 +0200

    model: known attribute in Tree structure
    
    The scanner now has the "known" attribute and the
    Software Heritage persistent identifier is saved
    also for unknown files.
    
    model tests adapted to the new attribute

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/17/ for more details.

you pushed the wrong commit to this diff

Build is green

Patch application report for D3069 (id=10926)

Could not rebase; Attempt merge onto 3bca054d47...

Updating 3bca054..ca45546
Fast-forward
 swh/scanner/model.py                               | 67 ++++++++++++++--------
 swh/scanner/scanner.py                             | 10 ++--
 swh/scanner/tests/conftest.py                      |  6 +-
 swh/scanner/tests/data.py                          |  5 +-
 .../data/sample-folder-result-no-toexclude.json    | 12 ----
 swh/scanner/tests/data/sample-folder-result.json   | 13 -----
 swh/scanner/tests/flask_api.py                     | 14 ++---
 swh/scanner/tests/test_model.py                    | 40 +++++++------
 swh/scanner/tests/test_scanner.py                  | 30 ++++------
 9 files changed, 93 insertions(+), 104 deletions(-)
 delete mode 100644 swh/scanner/tests/data/sample-folder-result-no-toexclude.json
 delete mode 100644 swh/scanner/tests/data/sample-folder-result.json
Changes applied before test
commit ca455463e8bc81705637d1690b1efa5a460efae6
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 17:01:40 2020 +0200

    flat json output with known and swhid values

commit 80812af59952830df0462de14a0e90d86bde66d7
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Mon Apr 27 15:55:01 2020 +0200

    model: known attribute in Tree structure
    
    The scanner now has the "known" attribute and the
    Software Heritage persistent identifier is saved
    also for unknown files.
    
    tests adapted to the new attribute on the model

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/18/ for more details.

This revision is now accepted and ready to land.Apr 28 2020, 4:55 PM
This revision was landed with ongoing or failed builds.Apr 29 2020, 12:57 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D3069 (id=10974)

Rebasing onto bbf296f752...

Current branch diff-target is up to date.
Changes applied before test
commit 3f00bb004b4db54659a8b60c7a14851fae2e1a18
Author: Daniele Serafini <me@danieleserafini.eu>
Date:   Wed Apr 29 12:53:26 2020 +0200

    flat json output with known and swhid values

See https://jenkins.softwareheritage.org/job/DTSCN/job/tests-on-diff/20/ for more details.