The json format output of the scanner returns something like this:
$ swh scanner scan -f json . { ".HEADER": "swh:1:cnt:fd8430bc864cfcd5f10e5590f8a447e01b942bfe", ".editorconfig": "swh:1:cnt:34c5e9234ec18c69a16828dbc9633a95f0253fe9", ".gitattributes": "swh:1:cnt:176a458f94e0ea5272ce67c36bf30b6be9caf623", ".github": "swh:1:dir:e8bfe5af39579a7e4898bb23f3a76a72c368cee6", ".gitignore": "swh:1:cnt:dec3dca06c8fdc1dd7d426bb148b7f99355eaaed", ... "src": "swh:1:dir:f3c5e67df5a3b3e812e6331008b7e179865a30fc", "tests": "swh:1:dir:506e33bae73858bdf4b90a8f89dee8a32dae9c93" }
It looks like the semantics is to return the list of known files/dirs and not returning unknown ones.
That is not very easily exploitable programmatically, as based on the json output alone one doesn't know what is missing out.
The output format should be changed to always output all encountered files/dirs, with an associated known: boolean flag.
Also remember that in the future other fields will need to be associated to each encoutered file/dir, so we need to have room (e.g., other keys at the same level of known) to attach other information in the future.