Details
Details
Diff Detail
Diff Detail
- Repository
- rDDATASET Datasets
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Comment Actions
Some things you could try to improve perfs after you land this diff:
- WITHOUT ROWID https://sqlite.org/withoutrowid.html
- using a cursor, adding IF NOT EXISTS ... to the query and checking cursor.total_changes
- alternatively, just use IF NOT EXISTS ... without checking the changes, remove the creation of nodes.csv from this process, and create it from an other process from the sqlite DB
swh/dataset/utils.py | ||
---|---|---|
46 | a short docstring plz | |
58 | here too, for the return type |
swh/dataset/graph.py | ||
---|---|---|
49–53 | I think you need origin and the visit id here, or you'll only get one visit per origin |
swh/dataset/graph.py | ||
---|---|---|
49–53 | And you probably need to filter visits out to only keep the ones whose states are "final" |
swh/dataset/graph.py | ||
---|---|---|
49–53 | Good catch for the visit ID, thanks! |