Page MenuHomeSoftware Heritage

swh scanner reports double results in ndjson format
Closed, MigratedEdits Locked

Description

Given a directory with a subdirectory (and a couple of files in each level):

$ find mydir -printf '%y %p\n'
d mydir
f mydir/file1
f mydir/file2
d mydir/subdir
f mydir/subdir/file3
f mydir/subdir/file4

Running a swh scanner scan -f ndjson on it, results in

$ swh scanner scan -f ndjson  mydir
{"mydir/subdir": {"swhid": "swh:1:dir:c8813fd1fd68cc3e5a49cc2f81c786116a7b5314", "known": false}}
{"mydir/subdir/file3": {"swhid": "swh:1:cnt:f250008831e45d6c73f01b9d4f5085c2d1abb400", "known": true}}
{"mydir/subdir/file4": {"swhid": "swh:1:cnt:2c09459acd4d0946c02b5d61bf2d279af395a15b", "known": true}}
{"mydir/subdir/file3": {"swhid": "swh:1:cnt:f250008831e45d6c73f01b9d4f5085c2d1abb400", "known": true}}
{"mydir/subdir/file4": {"swhid": "swh:1:cnt:2c09459acd4d0946c02b5d61bf2d279af395a15b", "known": true}}
{"mydir/file1": {"swhid": "swh:1:cnt:f3d94808194a3f1c2c2e6ab4009c7cf2471a095b", "known": false}}
{"mydir/file2": {"swhid": "swh:1:cnt:2834e5822e25b78ca17b20524a64febb26237c7a", "known": true}}

(note the duplicate lines for file3 and file4)

Reporting in colored output ("txt") or multi-line JSON (-f json) does not show this behavior.