As per parent task's description (T847), there exists homonyms dumps coming from different sources.
The original urls was not amongst the data transmitted from googlecode when we retrieved information.
Thus we had to build them back. The original origin_url computation was too naive.
This results in clash in origin urls fields.
For the same dumps, we have the same origin (googlecode hosted googlecode, eclispselabs, apache-extras dumps in different arborescence trees).
This needs to be fixed.
- reschedule the eclipselabs, and apache-extras as their own distinct origins
- Make loader-svn able to start a loading from scratch
- identifies such clashed dumps for googlecode only (first bullet points took care of the other ones) -> 1865 origins
- reschedule the original googlecode origins that clashed for svn loading 'from scratch' (to make sure the last occurrence targets the right one) -> 257 origins
- Make sure nothing is amiss
As a gotcha for the bullet point 4, current loader-svn starts by checking previous visit's revision to be incremental and start from that one.
We need for those origins to load from scratch, so we must improve the current loader-svn to accept an option for that behavior.
Otherwise, it could start from a wrong or an unknown revision (one from another clashed origin), thus effectively not working...
This would also explain an error so far unexplained 'No revision found' on the latest svn loader rescheduling.