- visit/edges: fix incorrect handling of diamond pattern
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Jul 23 2020
Jul 23 2020
So it turns out that using multiprocessing to spawn a web app that we want to test isn't exactly great coverage... I'll check what I can do about that.
seirl committed rDGRPH39430074227c: pid: use PROT_READ/MAP_SHARED for readonly maps (authored by seirl).
pid: use PROT_READ/MAP_SHARED for readonly maps
Jul 6 2020
Jul 6 2020
seirl committed rDGRPH481fcc9fc026: generate_graph.sh: fix mix up between nodes and edges (authored by seirl).
generate_graph.sh: fix mix up between nodes and edges
Jun 9 2020
Jun 9 2020
Jun 8 2020
Jun 8 2020
seirl committed rDDATASET8d79b8475d50: graph export: compute node/edge type stats (authored by seirl).
graph export: compute node/edge type stats
Jun 3 2020
Jun 3 2020
We no longer export edges from Spark
seirl closed T1741: graph dataset: update to use persistent identifiers everywhere, a subtask of T1848: refresh graph dataset export, as Resolved.
We no longer export edges per file type.
May 29 2020
May 29 2020
seirl committed rDGRPHc5b0a152b78a: java: GenDistribution: shutdown service pool when all the threads are done (authored by seirl).
java: GenDistribution: shutdown service pool when all the threads are done
May 28 2020
May 28 2020
May 15 2020
May 15 2020
seirl committed rDDATASETd26a4247094d: exporter: increase message.max.bytes value to allow large messages (authored by seirl).
exporter: increase message.max.bytes value to allow large messages
May 14 2020
May 14 2020
seirl committed rDDATASET312474e905b6: graph exporter: fix process arguments, run process on content (authored by seirl).
graph exporter: fix process arguments, run process on content
seirl committed rDDATASET518973396296: exporter: send EOF token to progress queue when process() finishes normally (authored by seirl).
exporter: send EOF token to progress queue when process() finishes normally
seirl committed rDDATASETb608f41411ee: graph exporter: more exhaustive heuristic to remove pull requests (authored by seirl).
graph exporter: more exhaustive heuristic to remove pull requests
May 5 2020
May 5 2020
seirl committed rDDATASET56fee87710ce: graph: do not deduplicate different visits from the same origin (authored by seirl).
graph: do not deduplicate different visits from the same origin
seirl committed rDDATASET7a34fb115d38: graph: use an sqlite3 on-disk set to avoid processing nodes twice (authored by seirl).
graph: use an sqlite3 on-disk set to avoid processing nodes twice
seirl updated the diff for D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
- rebase
- graph: do not deduplicate different visits from the same origin
seirl added inline comments to D3121: graph: use an sqlite3 on-disk set to avoid processing nodes twice.
Don't know why this hasn't been autoclosed, but it's merged in master.
May 4 2020
May 4 2020
Add Black pre-commit
Rebase and update against swh-storage/master
- tests/graph: check the presence of duplicate nodes
- exporter: minor style changes
Apr 21 2020
Apr 21 2020
- Rework graph export pipeline
- Graph export: add unit tests
seirl committed rDGRPH387ce5e2ff0c: cachemount: only cache .graph, not .obl/.offset files (authored by seirl).
cachemount: only cache .graph, not .obl/.offset files
Apr 17 2020
Apr 17 2020
Apr 16 2020
Apr 16 2020
Apr 15 2020
Apr 15 2020
seirl added a comment to T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli'.
Temporary fix here until the branch that implements this entrypoint is merged: https://forge.softwareheritage.org/rDDATASETbe9e71ba1f858bbb8f44649306b919a1fa965ea2
seirl committed rDDATASETbe9e71ba1f85: setup.py: temporarily remove entrypoint until diff is merged (authored by seirl).
setup.py: temporarily remove entrypoint until diff is merged
Apr 14 2020
Apr 14 2020
seirl committed rDJNLb53d7d7c37ce: client: move subscription to a separate function (authored by seirl).
client: move subscription to a separate function
rebase
cli: add a memcache command
Apr 10 2020
Apr 10 2020
Rename command to cachemount, mention default cache path
- Add docstrings and types
Add default cache path
java: Graph: load the graph with mmap()
Apr 5 2020
Apr 5 2020
Apr 4 2020
Apr 4 2020
Mar 23 2020
Mar 23 2020
identifiers: encode origin URLs in utf-8
replayer: factor out legacy objects fixers
use fallback for revision
Fix tests
Mar 17 2020
Mar 17 2020
Mar 10 2020
Mar 10 2020
Feb 26 2020
Feb 26 2020
seirl committed rDJNL6ca43d5c59e6: JournalClient: add a stop_at_eof boolean to read the log only once (authored by seirl).
JournalClient: add a stop_at_eof boolean to read the log only once
seirl updated the diff for D2718: JournalClient: add a stop_at_eof boolean to read the log only once.
fix reviews
seirl added inline comments to D2718: JournalClient: add a stop_at_eof boolean to read the log only once.
seirl committed rDJNLeea69820792f: JournalClient: split main loop in three functions (authored by seirl).
JournalClient: split main loop in three functions
Feb 25 2020
Feb 25 2020
Fix reviews
Feb 24 2020
Feb 24 2020
gsoc-epita: add 2020 slides
Feb 14 2020
Feb 14 2020
Feb 11 2020
Feb 11 2020
Feb 10 2020
Feb 10 2020
Feb 5 2020
Feb 5 2020
Remove useless while true
Remove commented out part
Bootstrap Python project files