Page MenuHomeSoftware Heritage

Add node id -> node types bitmap
ClosedPublic

Authored by haltode on Jul 11 2019, 4:31 PM.

Details

Summary

Add node id -> node type bitmap to effectively run edge restriction
during graph traversals.

This is still WIP because I am waiting on full scale test results. Also, this
diff is only about *loading/storing* the bitmap, not using it, because it needs
to be merged with recent D1700 work (in a separate diff).

Closes T1902.

Diff Detail

Repository
rDGRPH Compressed graph representation
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

zack requested changes to this revision.Jul 11 2019, 5:09 PM
zack added inline comments.
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

we also need ORI here now

9–13

this deserves a comment pointing to some external reference about SWH PIDs, to explain where these weird magic labels come from

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
15–17

DRY violation: ".nodesTypesMap" is used twice, hence it could use a constant somewhere :-)

while we are at it, minor naming bikeshed, how about .node2type instead?

api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
75

Can we make this the result of a log2 calculation on the size of the Node.Type enum? (unless it's too annoying to code, that is)

It would be nicer, and less prone to future bugs if we ever add/remove node types in the future.

94

this is another place where the constant proposed above should be used

This revision now requires changes to proceed.Jul 11 2019, 5:09 PM
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

I do prefer to add ORI support in another diff as well, there are multiple places to update (here of course, but also the example test graph, the unit tests themselves, and so on).

9–13

Dev documentation is almost entirely absent for the moment in the code, I was thinking about adding all references/external links in a separate diff (this doc task is already tracked in T1904).

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
15–17

To be more consistent with other files, should we go with .node2swh and .swh2node as well?

api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
75

Sure, it's a bit sad Java only has log, log10, log1p but no log2 :(

api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

ok, fair enough. Please file a dedicated task, for better tracking of this need

9–13

*nod*

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
15–17

yeah, good idea. So, let's streamline this completely and use:

  • .node2type.map
  • .node2pid.{csv,map}
  • .pid2node.{csv,map}

note:

  • the trailing .map for files that are in custom binary formats
  • the use of "pid" instead of "swh", because swh is contextual-implicit and PIDs are what those things really are
  • I'm guessing you'll want to keep CSV format for now (hence .csv), but eventually they'll probably all disappear in favor of more efficient binary formats; we'll switch to .map then
api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
75

something like Math.ceil(Math.log(x) / Math.log(2)) then?

  • Rename graph mapping files
  • Use log2 instead of hard-coded result
haltode added inline comments.
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

See T1915.

This revision is now accepted and ready to land.Jul 15 2019, 10:01 AM
This revision was automatically updated to reflect the committed changes.
haltode marked an inline comment as done.