HomeSoftware Heritage

git2graph: add back node output support, with simpler/saner semantics

Description

git2graph: add back node output support, with simpler/saner semantics

Rationale: generating the nodes file from the edges file is not reasonable in
terms of processing time. For linux.git alone, tr + sort -u can take up to 1
hour, depending on the sort setup. On the other hand outputing (unsorted, but
unique) nodes via git2graph adds near-zero overhead w.r.t. outputing edges.

The sane semantics for nodes/edges selection is to completely separate
filtering. The user is expected to filter nodes *and* edges on the command
line (if desired), and neither trickles to the other. So it is possible to,
say, emit "rev:rev" edges and "dir,cnt"; it is up to the user to select a
reasonable semantics.

It is also now possible to filter *out* all nodes/edges, passing empty strings
as filters. That might be needed when one really wants all and only nodes
corresponding to selected edges; in that case nodes output should be
suppressed, and tr+sort used separately. Note that doing so is not always
desirable, as it excludes singleton nodes, not connected to anything at
all (which do exist!).

This commit partially reverts d2ff3227a240d0a4de043661874c1959cc0b462c

Details