Reschedule googlecode svn origins from scratch
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	ardumont
	Dec 11 2017, 10:59 AM

Description

As T847/T876 revealed, the bug fixed in the loader-core about the misbehavior flushing step could result in missing data.
Those tasks only revealed the occurrences's target missing though.

It's unfortunately possible that other objects might be missing (contents, directories, etc...).

As we fixed quite some bugs in the loader-svn anyway and even though they should have been rescheduled, it seems more reasonable to reschedule all origins to make sure.

At worst, it won't do anything.
At best, it will:

fill in the missing data.
gives a proper listing of origins with external-id
permit to ascertain bugged origins are no longer (again)

Note:
Only the loader-svn should be impacted by those missing data since it's only recent that all loaders now derives from it.
Loader-svn being historically the first one and using the flushing mechanism.

Related Objects
Search...

Status	Assigned	Task
		Unknown Object (Maniphest Task)
Migrated	gitlab-migration	T367 ingest Google Code repositories
Migrated	gitlab-migration	T617 ingest Google Code Subversion repositories
Migrated	gitlab-migration	T879 Reschedule googlecode svn origins from scratch
Migrated	gitlab-migration	T896 Clean up wrong origins
Migrated	gitlab-migration	T946 loader-svn: googlecode import: UnicodeDecodeError in user svn properties fails the loading
Migrated	gitlab-migration	T947 googlecode import: Some dumps are just empty repository
Migrated	gitlab-migration	T948 googlecode import: Loading failure on symbolic link edge cases

Event Timeline

ardumont created this task.Dec 11 2017, 10:59 AM

ardumont changed the task status from Open to Work in Progress.Dec 11 2017, 11:01 AM

ardumont updated the task description. (Show Details)

ardumont mentioned this in T676: Google Code SVN import: Examine ingestion logs for errors and list them if any.

ardumont raised the priority of this task from Normal to High.Dec 11 2017, 11:03 AM

ardumont mentioned this in rDSNIPc452f7ff0b40: scheduling: Add direct task to celery using queue length check.Dec 11 2017, 5:00 PM

Scheduled back from saatchi (as i needed the producer credentials to access the queue properties):

$ cat /srv/storage/space/mirrors/code.google.com/sources/INDEX-svn-dumps.reverse-sorted-by-size.txt | tail -n +2 | ./schedule_with_queue_length_check.py --queue-name svndump --threshold 1000 --waiting-time 60 | tee scheduling-svn

So this will schedule up to 1000 tasks in the loader-svn queue every 60 seconds.
The state of what has been scheduled is in the scheduling-svn file.

ardumont created subtask T896: Clean up wrong origins.Dec 13 2017, 11:40 AM

After discussion with the team, it has been decided to remove from the re-scheduling the svn dumps whose compressed size exceeds 2Gib.
This reflects the same decision took for git repositories.

That list of compressed dumps whose size exceeds the 2Gib threshold is stored in uffizi:
/srv/storage/space/mirrors/code.google.com/sources/INDEX-svn-dumps-with-size-superior-to-2gib.txt

Recreated the scheduling input lists (which filters out those huge dumps) and rescheduled with that list using:

tail -n +187239 /srv/storage/space/mirrors/code.google.com/sources/INDEX-svn-dumps-with-size-inferior-to-2gib.txt | awk '{print $3" "$2}' | ./schedule_with_queue_length_check.py --queue-name svndump --threshold 1000 --waiting-time 120 | gzip -c - >> scheduling-svn.txt.gz

Note:
Both the old and the new input files are sorted in the same ascending order on the first column, the compressed dump's size:

old: /srv/storage/space/mirrors/code.google.com/sources/INDEX-svn-dumps.reverse-sorted-by-size.txt
new: /srv/storage/space/mirrors/code.google.com/sources/INDEX-svn-dumps-with-size-inferior-to-2gib.txt

ardumont changed the status of subtask T896: Clean up wrong origins from Open to Work in Progress.Dec 14 2017, 12:06 PM

ardumont closed subtask T896: Clean up wrong origins as Resolved.Dec 14 2017, 3:03 PM

This is in stand-by during the snapshot migration.

ardumont created subtask T946: loader-svn: googlecode import: UnicodeDecodeError in user svn properties fails the loading.Feb 2 2018, 1:52 PM

ardumont created subtask T947: googlecode import: Some dumps are just empty repository.Feb 5 2018, 11:43 AM

ardumont changed the status of subtask T947: googlecode import: Some dumps are just empty repository from Open to Work in Progress.Feb 5 2018, 1:45 PM

ardumont created subtask T948: googlecode import: Loading failure on symbolic link edge cases.Feb 5 2018, 3:46 PM

ardumont closed subtask T947: googlecode import: Some dumps are just empty repository as Resolved.Feb 6 2018, 3:35 PM

ardumont closed subtask T948: googlecode import: Loading failure on symbolic link edge cases as Resolved.

That's been done for a while now.

ardumont closed subtask T946: loader-svn: googlecode import: UnicodeDecodeError in user svn properties fails the loading as Resolved.Sep 28 2018, 3:04 PM

gitlab-migration changed the status of subtask T946: loader-svn: googlecode import: UnicodeDecodeError in user svn properties fails the loading from Resolved to Migrated.Jan 8 2023, 4:23 PM

This task has been migrated to GitLab.

gitlab-migration changed the status of subtask T896: Clean up wrong origins from Resolved to Migrated.Jan 8 2023, 9:57 PM

gitlab-migration changed the status of subtask T947: googlecode import: Some dumps are just empty repository from Resolved to Migrated.

gitlab-migration changed the status of subtask T948: googlecode import: Loading failure on symbolic link edge cases from Resolved to Migrated.

Reschedule googlecode svn origins from scratchClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Reschedule googlecode svn origins from scratch
Closed, MigratedEdits Locked
Actions

Related Objects
Search...