Diff Detail
- Repository
- rDDATASET Datasets
- Branch
- wip_graph_export
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 12255 Build 18587: arc lint + arc unit
Event Timeline
Could you add some docstrings and tests?
swh/dataset/exporter.py | ||
---|---|---|
43 | when is it raised? | |
122 | remove this print? | |
swh/dataset/graph.py | ||
97–121 | isn't it needlessly expansive to take the list of edges as input and deduplicate it? why not output nodes directly from the journal client? | |
99–104 | why not just env={}? |
swh/dataset/graph.py | ||
---|---|---|
97–121 | I will add a docstring to explain this step, but the idea is that we also need nodes that are referenced as edge targets. For instance a rev_entry in a directory that points to a revision that we don't have should be a node in the graph. | |
99–104 | The manual of sort specifically suggests to set this locale to C to do a byte-wise sorting. Also we probably still need PATH etc. |
it's not a blocker, and can be added in a subsequent commit, but can you please also add documentation of the CLI, in the style of https://docs.softwareheritage.org/devel/swh-model/cli.html ?
swh/dataset/graph.py | ||
---|---|---|
99–104 | It says "The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values." if you clear the environment, then it's not needed. |
looks fine to me, but the few minor comments. I'll let @vlorentz gives his green stamp on this.
swh/dataset/exporter.py | ||
---|---|---|
64 | remove this pass I guess | |
145–146 | is this black-legit? | |
157 | the get_offset() usage here looks a bit weird. IMHO it would be clearer to use the returned value from get_offset() each time, or make it a property. | |
swh/dataset/graph.py | ||
38–39 | weird, a few chunks above you are using (lo, hi) = xxx but no parenthesis here :-) |
swh/dataset/exporter.py | ||
---|---|---|
145–146 | No, this branch is pre-black. | |
157 | The problem is that get_offsets is really something I just want to do once and keep consistent afterwards (because it takes time to fetch and can change over time). I'm changing this so it looks less weird in this instance, but the underlying logic will stay the same. | |
swh/dataset/graph.py | ||
99–104 | Right, but I still need PATH, so setting LOCALE=C is the easiest way anyway. |
swh/dataset/graph.py | ||
---|---|---|
38–39 | Just a matter of taste, I consider (lo, hi) to be a single "watermark offset" object, but src/dst are just different stuff crammed into one argument. I can change one or the other if it bugs you :-P |