Python part only for now, the generation on the Java side will follow
Details
- Reviewers
seirl - Group Reviewers
Reviewers - Commits
- rDGRPH1cda22bd9422: binary (de)serialiazer for more compact PID<->int maps
rDGRPH7c16703ac537: requirements.txt: add new dep on swh.model
rDGRPH596dc78573f2: fix typo in function name cross-ref
rDGRPH51acefe11543: binary (de)serialiazer for more compact PID<->int maps
pytest
Diff Detail
- Repository
- rDGRPH Compressed graph representation
- Branch
- feature/binary-maps
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 7627 Build 10932: tox-on-jenkins Jenkins Build 10931: arc lint + arc unit
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/26/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/26/console
swh/graph/pid.py | ||
---|---|---|
54 | str_to_bytes probably |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/27/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/27/console
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/28/ for more details.
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/29/ for more details.
swh/graph/pid.py | ||
---|---|---|
23–28 | Anticipating the Java part, it would be best to follow the ordering already defined in https://forge.softwareheritage.org/source/swh-graph/browse/master/java/server/src/main/java/org/softwareheritage/graph/Node.java (or at least make sure in the future Java part diff that the order is the same in both Java/Python code). |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/30/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tox/30/console
swh/graph/pid.py | ||
---|---|---|
23–28 | Ah, good point. Not much the Java compatibility (we're gonna change the serialization on the Java side anyway), but the fact the previous one I used was not alphabetic on textual PIDs. Having that is a good property, because we already have alpha-sorted PID lists, e.g., in the nodes file. Being able to easily translate those files to binary serializations is a feature that we want to keep. Diff updated to fix this. |
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/31/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/32/ for more details.
Please see how other SWH tools implement their CLI. There should be a file named swh/graph/cli.py that extends the cli from swh.core, and an entrypoint declared in setup.py
swh/graph/dump_map.py | ||
---|---|---|
25 | these should be two different commands instead. | |
swh/graph/pid.py | ||
42 | Is it useful to include namespace version? (ie. will we ever mix two different namespace versions in the same file?) | |
54 | There's a function in swh.model.identifiers to parse pids. | |
72–73 | There's also a function for this. | |
91–92 | (self.length, remainder) = divmod(self.size, record_size) | |
289 | The error should probably show the original pos, not len(self) + <original pos>. It might give surprising errors otherwise. |
- update to fix (most of) @vlorentz review remarks
swh/graph/dump_map.py | ||
---|---|---|
25 | I'm not sure I agree. We're going to have a bunch of maps (we already have a 3rd one, although it's not supported here yet). One command per map type would not be convenient. Also, while currently the type has to be specified, it's possible in the future it will be auto-detected, which would make a single dump command more convenient. | |
swh/graph/pid.py | ||
42 | Unlikely in these maps, but the serialization is meant to be general purpose. As such, it should contain the same information that exist in a PID, which includes the version number. I can imagine in the future using this serialization format elsewhere, where you cannot necessarily expect uniformity. |
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/34/ for more details.
swh/graph/pid.py | ||
---|---|---|
42 | If it's general purpose, shouldn't it be in swh.model.identifiers? |
swh/graph/pid.py | ||
---|---|---|
42 | I've raised this on #swh-devel the other day (although only @olasd participated). The consensus was that it could go there eventually, but until it's needed there is no point in doing so. I'm personally fine with waiting for a second user to promote this as something general purposes. But if there is the desire from other to move it there right now, I wouldn't mind doing so either. |
- integrate cli with swh.core.cli
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
- change PID order to be alphabetic and match current Java implementation
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
- add restore script from textual to binary maps
- update to fix (most of) @vlorentz review remarks
- serialization: integrate map dump/restore commands into CLI
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/37/ for more details.
Done in this version.
The new commands are: swh graph map {dump,restore}, replacing the previous standalone top-level executables.
The former cli.py main is now available as swh graph api-client (cc: @haltode).
- binary (de)serialiazer for more compact PID<->int maps
- fix typo in function name cross-ref
- requirements.txt: add new dep on swh.model
- change PID order to be alphabetic and match current Java implementation
- fix binary serialization tests after PID ordering change
- pid2int: add type checking on the key
- add restore script from textual to binary maps
- update to fix (most of) @vlorentz review remarks
- serialization: integrate map dump/restore commands into CLI
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/38/ for more details.
Build is green
See https://jenkins.softwareheritage.org/job/DGRPH/job/tox/39/ for more details.