We need to have an in production swh-graph service, including fully automated periodic exports from storage.
This is a meta-task to track all the related activities to achieve this goal.
Description
Description
Revisions and Commits
Revisions and Commits
rDGRPH Compressed graph representation | |||
D8919 | rDGRPHb76801259953 Add CLI script to generate Luigi config and call it | ||
D8891 | rDGRPH2b634bf4192d luigi: Simplify compression task | ||
D8891 | rDGRPH0a4b706400d4 luigi: Remove option to configure graph name | ||
D8891 | rDGRPH60e4707feca6 Add luigi task to compress the graph | ||
D8881 | rDGRPHdc2cb79bc670 Document the need for higher vm.max_map_count | ||
D8880 | rDGRPH486312e00eaa Fix crash on null Release message |
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T2201 Indexing / mining | ||
Migrated | gitlab-migration | T2204 Full-text search on source code (prototype) | ||
Migrated | gitlab-migration | T2217 Plumbings | ||
Migrated | gitlab-migration | T3096 Efficient and reliable download via the Vault | ||
Migrated | gitlab-migration | T3550 Compute and show ETA for vault tasks | ||
Migrated | gitlab-migration | T887 Vault: "snapshot" cooker | ||
Migrated | gitlab-migration | T2220 swh-graph in production | ||
Migrated | gitlab-migration | T3161 graph service: add anti-DoS limit on the number of edges traversed | ||
Migrated | gitlab-migration | T4676 Add Luigi workflow in swh-dataset |
Event Timeline
Comment Actions
Graph status meeting
- Interrogation du graph
- Rocquencourt : Granet 700GO de ram (max atteint)
- Compression du graph : 1.7TO minimum
- Telecom : machine 4TO
Compression
- Dataset: https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.html?highlight=dataset
- pipeline de compression : python
- implementation des étapes : org.softwareheritage.grap.compress
TODO
- Finish GRPC migration (seirl)
- Forge issues to cleanup once GRPC is merged (seirl)
- Automate deployment (sysadm) >> prepare the command
- Native hadoop libraries (?)
- T4250
- Benchmark perfs with and without the hadoop librairies
- Luigi ETL[1] pipeline for compression / deployment
- Integration of the generated javadoc in swh docs (vlorentz)
- Integration of Java code coverage in the forge
- Unit test the compression pipeline