After discussing with the upstream of scanoss tool, Roberto compulsed a list of (github)
repositories (large [1] and normal [2]) we are currently missing. Let's try and ingest
those using what we did for the chromium repository [3].
Plan:
- [ ] Clean up large worker17 and 18 setup and keep them out of the standard consumption
loop [4]
- [ ] Schedule large repositories on dedicated queue
oneshot:swh.loader.git.tasks.UpdateGitRepository
- [ ] Schedule normal repositories on dedicated queue
oneshot2:swh.loader.git.tasks.UpdateGitRepository
- [ ] Configure parallelism to not be too much as well
- [ ] Babysit processes
[1] big: {F5800895}
[2] normal: {F5800897}
[3] T4283
[4] Recent tryouts on chromium and liferay-portal repositories currently failed possibly
due to the standard consumption happening in parallel. If large repositories is consumed
at the same time, the machine might become unable to finish both repositories...