loader-svn: Fix origin clashes for homonym but distinct svn dumps
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	ardumont
	Nov 29 2017, 11:22 AM

Description

As per parent task's description (T847), there exists homonyms dumps coming from different sources.

The original urls was not amongst the data transmitted from googlecode when we retrieved information.
Thus we had to build them back. The original origin_url computation was too naive.
This results in clash in origin urls fields.
For the same dumps, we have the same origin (googlecode hosted googlecode, eclispselabs, apache-extras dumps in different arborescence trees).

This needs to be fixed.

That means:

reschedule the eclipselabs, and apache-extras as their own distinct origins
Make loader-svn able to start a loading from scratch
identifies such clashed dumps for googlecode only (first bullet points took care of the other ones) -> 1865 origins
reschedule the original googlecode origins that clashed for svn loading 'from scratch' (to make sure the last occurrence targets the right one) -> 257 origins
Make sure nothing is amiss

As a gotcha for the bullet point 4, current loader-svn starts by checking previous visit's revision to be incremental and start from that one.
We need for those origins to load from scratch, so we must improve the current loader-svn to accept an option for that behavior.
Otherwise, it could start from a wrong or an unknown revision (one from another clashed origin), thus effectively not working...

Note:
This would also explain an error so far unexplained 'No revision found' on the latest svn loader rescheduling.

Related Objects
Search...

Status	Assigned	Task
		Unknown Object (Maniphest Task)
Migrated	gitlab-migration	T367 ingest Google Code repositories
Migrated	gitlab-migration	T617 ingest Google Code Subversion repositories
Migrated	gitlab-migration	T847 loader-svn: Some SVN origins have occurrences that point to non-existent objects
Migrated	gitlab-migration	T863 loader-svn: Fix origin clashes for homonym but distinct svn dumps

Event Timeline

ardumont renamed this task from loader-svn: Investigate potential origin clash for homonyms but distinct svn dump loading to loader-svn: Investigate potential origin clash for homonym but distinct svn dumps.Nov 29 2017, 11:22 AM

ardumont created this task.

ardumont updated the task description. (Show Details)

ardumont updated the task description. (Show Details)Nov 29 2017, 11:29 AM

ardumont updated the task description. (Show Details)Dec 1 2017, 11:45 AM

ardumont added a project: Origin-GoogleCode.

ardumont mentioned this in T847: loader-svn: Some SVN origins have occurrences that point to non-existent objects.Dec 8 2017, 1:39 PM

ardumont changed the task status from Open to Work in Progress.Dec 8 2017, 1:42 PM

ardumont claimed this task.

ardumont renamed this task from loader-svn: Investigate potential origin clash for homonym but distinct svn dumps to loader-svn: Investigate origin clash for homonym but distinct svn dumps.Dec 8 2017, 1:59 PM

ardumont renamed this task from loader-svn: Investigate origin clash for homonym but distinct svn dumps to loader-svn: Fix origin clashes for homonym but distinct svn dumps.

ardumont updated the task description. (Show Details)

ardumont updated the task description. (Show Details)Dec 8 2017, 3:38 PM

ardumont updated the task description. (Show Details)Dec 8 2017, 4:11 PM

ardumont updated the task description. (Show Details)Dec 8 2017, 4:18 PM

ardumont updated the task description. (Show Details)Dec 8 2017, 5:12 PM

ardumont mentioned this in rDSNIP557abd06bf06: list-svndumps-urls.py: Improve origin computations.Dec 8 2017, 5:25 PM

ardumont mentioned this in rDSNIPaf7016cf43ff: check_for_dump_clash: Add check for dump name clashes.

ardumont mentioned this in rDLDSVNdad5dbc330d3: swh.loader.svn: Add option to load a repository from scratch.

ardumont updated the task description. (Show Details)Dec 8 2017, 5:26 PM

ardumont mentioned this in rDLDSVN099c14d41b29: svn.producer: Produce svn load repo with option start-from-scratch.Dec 8 2017, 5:38 PM

ardumont updated the task description. (Show Details)Dec 8 2017, 6:16 PM

ardumont mentioned this in T570: svn loader: CRLF/LF mess in svn history results in hash computations divergence.Dec 9 2017, 11:08 AM

ardumont updated the task description. (Show Details)Dec 9 2017, 11:14 AM

ardumont updated the task description. (Show Details)Dec 9 2017, 11:16 AM

ardumont updated the task description. (Show Details)Dec 9 2017, 11:22 AM

ardumont updated the task description. (Show Details)Dec 9 2017, 11:25 AM

Make sure nothing is amiss

Well, there are errors.

"ConnectionResetError(104, 'Connection reset by pee": 13,  # ok, will reschedule
"Eventful partial visit. Detail: Property 'svn:exte": 1
"TypeError(\"run_task() got an unexpected keyword ar": 257

The type errors is an error because the loader-svn did not recognize the new parameter start_from_scratch because it was updated to the new version but not restarted...

Well, there are errors.

Those origins were rescheduled and done.

ardumont closed this task as Resolved.Dec 11 2017, 10:20 AM

ardumont updated the task description. (Show Details)

This task has been migrated to GitLab.

loader-svn: Fix origin clashes for homonym but distinct svn dumpsClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

loader-svn: Fix origin clashes for homonym but distinct svn dumps
Closed, MigratedEdits Locked
Actions

Related Objects
Search...