Page MenuHomeSoftware Heritage

Add node id -> node types bitmap
ClosedPublic

Authored by haltode on Jul 11 2019, 4:31 PM.

Details

Summary

Add node id -> node type bitmap to effectively run edge restriction
during graph traversals.

This is still WIP because I am waiting on full scale test results. Also, this
diff is only about *loading/storing* the bitmap, not using it, because it needs
to be merged with recent D1700 work (in a separate diff).

Closes T1902.

Diff Detail

Repository
rDGRPH Graph service
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

haltode created this revision.Jul 11 2019, 4:31 PM
zack requested changes to this revision.Jul 11 2019, 5:09 PM
zack added inline comments.
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

we also need ORI here now

9–13

this deserves a comment pointing to some external reference about SWH PIDs, to explain where these weird magic labels come from

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
16–18

DRY violation: ".nodesTypesMap" is used twice, hence it could use a constant somewhere :-)

while we are at it, minor naming bikeshed, how about .node2type instead?

api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
76

Can we make this the result of a log2 calculation on the size of the Node.Type enum? (unless it's too annoying to code, that is)

It would be nicer, and less prone to future bugs if we ever add/remove node types in the future.

95

this is another place where the constant proposed above should be used

This revision now requires changes to proceed.Jul 11 2019, 5:09 PM
haltode added inline comments.Jul 11 2019, 5:43 PM
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

I do prefer to add ORI support in another diff as well, there are multiple places to update (here of course, but also the example test graph, the unit tests themselves, and so on).

9–13

Dev documentation is almost entirely absent for the moment in the code, I was thinking about adding all references/external links in a separate diff (this doc task is already tracked in T1904).

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
16–18

To be more consistent with other files, should we go with .node2swh and .swh2node as well?

api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
76

Sure, it's a bit sad Java only has log, log10, log1p but no log2 :(

zack added inline comments.Jul 12 2019, 9:43 AM
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

ok, fair enough. Please file a dedicated task, for better tracking of this need

9–13

*nod*

api/server/src/main/java/org/softwareheritage/graph/backend/NodeTypesMap.java
16–18

yeah, good idea. So, let's streamline this completely and use:

  • .node2type.map
  • .node2pid.{csv,map}
  • .pid2node.{csv,map}

note:

  • the trailing .map for files that are in custom binary formats
  • the use of "pid" instead of "swh", because swh is contextual-implicit and PIDs are what those things really are
  • I'm guessing you'll want to keep CSV format for now (hence .csv), but eventually they'll probably all disappear in favor of more efficient binary formats; we'll switch to .map then
api/server/src/main/java/org/softwareheritage/graph/backend/Setup.java
76

something like Math.ceil(Math.log(x) / Math.log(2)) then?

haltode updated this revision to Diff 5821.EditedJul 14 2019, 8:21 PM
  • Rename graph mapping files
  • Use log2 instead of hard-coded result
haltode marked 9 inline comments as done.Jul 14 2019, 8:21 PM
haltode added inline comments.
api/server/src/main/java/org/softwareheritage/graph/Node.java
8–31

See T1915.

zack accepted this revision.Jul 15 2019, 10:01 AM
This revision is now accepted and ready to land.Jul 15 2019, 10:01 AM
This revision was automatically updated to reflect the committed changes.
haltode marked an inline comment as done.