As per parent task's description (T847), there exists homonyms dumps coming from different sources.
The original urls was not amongst the data transmitted from googlecode when we retrieved information.
Thus we had to build them back. The original origin_url computation was too naive.
This results in clash in origin urls fields.
For the same dumps, we have the same origin (googlecode hosted googlecode, eclispselabs, apache-extras dumps in different arborescence trees).
This needs to be fixed.
That means:
- [ ] identifies such dumps
- [ ] reschedule the eclipselabs, and apache-extras as their own distinct origins
- [ ] reschedule the original googlecode origins that clashed (to make sure the last occurrence targets the right one).
As a gotcha for the last part, the current loader-svn starts by checking previous visits to be incremental in its load.
We need to make sure that for the last bullet point scheduling, it starts back from scratch.
Otherwise, it could start from a wrong revision (one from another clashed origin), thus effectively not working...
Note:
This would also explain an error so far unexplained 'No revision found' on the latest svn loader rescheduling.