Page MenuHomeSoftware Heritage

Provide a collaboration graph / dataset
Closed, MigratedEdits Locked

Description

Rough idea: build the bipartite graph of who contributed to which project

Naive implementation: go from each origin (or last visit), and collect every author it references. But there is a lot of duplicate work.

Slightly less naive implementation: reverse-traversal in swh-graph, by tagging every revision/release/snapshot with the set of authors who contributed to their parent, starting from oldest revisions.

This might overflow memory though? If not, could we get away with a database?

Event Timeline

vlorentz triaged this task as Normal priority.Nov 21 2022, 12:13 PM
vlorentz created this task.
zack changed the task status from Open to Work in Progress.Dec 15 2022, 10:27 AM

TODO: deanonymized dataset should be just a <contributor_id,contributor_base64,contributor_escaped> table, rather than repeating the origin<->contributor mapping