Update existing contents with new hash blake2s256
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	ardumont
	Apr 26 2017, 11:42 AM

Description

Leveraging azure infrastructure, trigger the blake2s256 update on the existing contents.

This means:

Provisioning azure vms (sizing -> DS2_V2: 7GB ram, 14GB ssd disk, 2 cores; 85.33E/month) -> for now 2 vms
code: configuration composability on storage read/write and objstorage readings adaptation
puppet: swh_indexer_rehash puppetization
Deploying the swh.indexer.rehash module (+ fix bits and pieces along the way)
Compute list of sha1s to rehash from swh.content table (IN PROGRESS in uffizi:/srv/storage/space/lists/contents-sha1-to-rehash.txt.gz).
Send all contents to the swh_indexer_rehash queue

Note:
In regards to the storage stack to use, we can:

either use the azure's objstorage (copy is 'complete' as in the snapshot copy). This will be the starting point.
or use uffizi's objstorage (or banco) as the azure's in-transit's cost is null if the cost projection is too high.
or use a multiplexer objstorage using azure as initial objstorage, falling back to banco if object not found, falling back to uffizi if object not found (solution used)

Related Objects
Search...

		Status	Assigned	Task
		Migrated	gitlab-migration	T692 worker to efficiently (re)compute content blob checksums
		Migrated	gitlab-migration	T712 Update existing contents with new hash blake2s256

Event Timeline

ardumont created this task.Apr 26 2017, 11:42 AM

ardumont added a parent task: T692: worker to efficiently (re)compute content blob checksums.

ardumont mentioned this in T713: Index existing contents (mimetype, language, license).Apr 26 2017, 11:50 AM

ardumont mentioned this in rDCIDX27ed8a54801b: swh.indexer.rehash: Untangle objstorage reading from storage.Apr 26 2017, 3:04 PM

ardumont mentioned this in rSPPROF56c64ba99ec4: deploy/indexer: Add swh_indexer_rehash manifest.Apr 26 2017, 3:18 PM

ardumont mentioned this in rSPSITE7b3ccb0d7b52: data/defaults: Add swh_indexer_rehash default configuration.

ardumont updated the task description. (Show Details)Apr 26 2017, 3:20 PM

ardumont mentioned this in rDSNIPf9fccf0b1b90: azure/create-vm: Be slightly more parametric.Apr 27 2017, 11:36 AM

ardumont mentioned this in rDSNIPc1c580ce4745: azure/provision-vm: Adapt finalization step to latest puppet.

ardumont mentioned this in rSPSITEc81da8a80d2d: data/worker01-2: Deploy swh_indexer_rehash.Apr 27 2017, 12:50 PM

ardumont mentioned this in rSPSITEa693b455e6fc: data/worker02: Fix by removing wrong empty array.Apr 27 2017, 1:45 PM

ardumont mentioned this in rSPSITE05994a16be94: data/defaults: inline the azure objstorage value.Apr 27 2017, 2:50 PM

ardumont mentioned this in rSPPROF83bf8db35e22: deploy/indexer_rehash: Work around the objstorage config limitation.Apr 27 2017, 2:59 PM

ardumont mentioned this in rSPSITEd8fda04e17b1: data/defaults: Work around objstorage config limitation.

ardumont mentioned this in rSPSITE6706d9d043de: data/defaults: alias does not work as expected (empty when deployed).Apr 27 2017, 3:04 PM

ardumont mentioned this in rSPPROF151b26a6b1c2: deploy/indexer_rehash: Set the default objstorage to be azure.Apr 27 2017, 3:18 PM

ardumont mentioned this in rSPSITE118384334941: data/default: indexer_rehash: Remove objstorage config.

ardumont mentioned this in rDCIDXb208956f2b54: swh.indexer.rehash: Align configuration file to be under indexer.Apr 27 2017, 3:57 PM

ardumont updated the task description. (Show Details)Apr 27 2017, 4:01 PM

ardumont updated the task description. (Show Details)Apr 27 2017, 4:07 PM

ardumont updated the task description. (Show Details)

ardumont mentioned this in rSPSITE7c11c345bc86: data/site: Fix wrong pattern for node internal worker case.Apr 27 2017, 4:34 PM

ardumont mentioned this in rSPSITE305e1c14e560: data/worker09-10: Provide network information.Apr 27 2017, 4:43 PM

ardumont mentioned this in rSPSITEc71b80cc3b9a: data/worker09-10/network: Use unused ips.Apr 27 2017, 4:45 PM

ardumont updated the task description. (Show Details)Apr 27 2017, 6:25 PM

3262961641 contents sent by batch of 1000 so ~3.26 billion messages in swh_indexer_content_rehash queue.

2 workers: worker0[1-2].euwest.azure.internal.softwareheritage.org

gzip -dc /srv/storage/space/lists/contents-sha1-to-rehash.txt.gz | SWH_WORKER_INSTANCE=swh_indexer_rehash python3 -m swh.indexer.producer --batch 1000 --task-name rehash --dict-with-key sha1

Starting date: Thu Apr 27 18:26:55 CEST 2017

ardumont mentioned this in rSPSITEab49e62bdd8f: data/worker03-4.azure: Deploy swh_indexer_rehash.Apr 27 2017, 6:41 PM

ardumont mentioned this in rDCIDXaca22b916a2b: swh.indexer.rehash: Read objstorage's raw content only when needed.Apr 28 2017, 9:54 AM

ardumont mentioned this in rDCIDX8b88f95e3465: swh.indexer.rehash: Fix empty content file case.

ardumont mentioned this in rDCIDX788aae6881d0: swh.indexer.rehash: Add logging on edge case.

ardumont mentioned this in rDCIDX18c6a70f4baa: swh.indexer.rehash: Actually make the worker log.May 2 2017, 2:40 PM

ardumont mentioned this in rSPSITEe1cba809b3f0: data/defaults: Increase swh.indexer.rehash concurrency.May 2 2017, 3:05 PM

So, it turns out that sending all contents to rehash in one shot was dumb...
It cluttered the rabbitmq machine's disk (saatchi).

So after cleaning everything up and some more thought process, we:

Reworked the swh.indexer.rehash to read only the objstorage's raw content if we need to (it can be pricy depending on the configuration). That is, either the option flag explicitely imposes us to (not right now), either the needed fields to compute are not filled. This permits the job to be idempotent which is good since some have already been done (prior the incident) and we do not want to read them again.
Send only batch of hashes based on a first hash character basis. Currently, only the 0-prefixed hashes are sent.

Command used:

gzip -dc /srv/storage/space/lists/azure-rehash/0.gz | SWH_WORKER_INSTANCE=swh_indexer_rehash python3 -m swh.indexer.producer --batch 100 --task-name rehash --dict-with-key sha1

This sent ~2M job distributed amongst 4 azure machines working 8 tasks in parallel.

Note:
All listing files are stored in uffizi:/srv/storage/space/lists/azure-rehash/.

ardumont mentioned this in rDCIDX71dac8f58a0e: swh.indexer.rehash: Reschedule when error in processing contents.May 4 2017, 2:32 PM

ardumont mentioned this in rSPSITE2d6b178b09d5: data/azure-workers: Deploy indexers and use default concurrency.May 5 2017, 11:52 AM

ardumont changed the task status from Open to Work in Progress.May 5 2017, 2:37 PM

ardumont updated the task description. (Show Details)May 5 2017, 2:42 PM

ardumont mentioned this in rSPSITEde4727078156: data/azure-workers: Deploy new workers for indexers.May 5 2017, 3:48 PM

ardumont mentioned this in rSPSITEa6d32da90c7b: data/defaults: Add uffizi as fallback objstorage.May 9 2017, 3:45 PM

ardumont mentioned this in rSPSITE0d04ee43b8a2: data/defaults: Add uffizi as fallback objstorage in readonly mode.May 9 2017, 3:49 PM

zack assigned this task to ardumont.Sep 15 2017, 9:50 AM

current status: overall, the contents (3.6B) are mostly rehashed.
But, some known issue (T760) incurred some missed contents (around 5M).

I rescheduled yesterday (14/09/2017) around 5M contents (4867588 to be precise). Which is now done as well.

Supposedly some holes remains (I saw the same error occuring during those rehash computation).
This is currently being solved (as in listing + scheduling those missed).

This is currently being solved (as in listing + scheduling those missed).

There you go: 1920000 contents not rehashed.
It's currently being dealt with.

ardumont updated the task description. (Show Details)Sep 15 2017, 10:45 AM

We have reached a point where the remaining contents to rehash are only stored in uffizi (not on the other mirrors; azure, banco ; according to logs).

This creates a high pressure on uffizi's objstorage to the point where the objstorage stops responding.
Uffizi itself starts hanging.

For now, I have put on hold the remaining 500k (the time to find the correct solution to this problem).

The possible solutions i foresee without any development are:

As the main purpose of the main storage/objstorage is to support writing through loaders, the workers setup for the objstorage are less than the storage, 16 against 96.

One possiblity would be to slightly increase the number of workers for the objstorage and decrease of the storage.

Another solution would be to pause the rehash computations to let the archiver fill the gap (the archiver is running). Maybe cranking up the archiver to make it go faster would be possible as well.

Other solution with development would be:

Schedule the rehash once the archiver did its copy to banco/azure. IIRC, the orchestration is already possible through the director's setup.

But, it's possible some little code is needed since i believe there is a slight discrepancy between the data out from the director and the data in for the rehash job.

Good call in pausing this to avoid uffizi hangs (assuming this was the cause).
We want the different object storage copies to converge, so I think waiting for the archiver to close the gap (possibly increasing resources to it if that helps) before restarting this is the right solution here.

I'm not clear on whether, at steady state, the archiver is currently capable of keeping the various copies aligned. But even if it is not, this specific issue needs "only" that the gap gets closed on the 500k contents that are still waiting for blake2 hashing.

So, no matter what, it looks like there is a way forward here.

Schedule the rehash once the archiver did its copy to banco/azure. IIRC, the orchestration is already possible through the director's setup.
But, it's possible some little code is needed since i believe there is a slight discrepancy between the data out from the director and the data in for the rehash job.

Just to be correct about my last assertion.
I checked and such orchestration is possible.
It's relative to worker (swh.archiver.worker.ArchiverToBackendWorker) and not director like i previously hinted at.
Also, indeed some little work would need to be done in regards to the flowing data structure (worker data out is id, rehash computation data in is dict)

... (assuming this was the cause).

Well, starting/stopping rehash computations lead to uffizi starting hanging (ram full, swapping, objstorage icinga check down, etc...) / being well again (after some time passes).
So i think it's the cause.

We want the different object storage copies to converge, so I think waiting for the archiver to close the gap (possibly increasing resources to it if that helps) before restarting this is the right solution here.

agreed

So...

I've done several things today to try to wrap this up, and we're ever so close (3000 or so objects left).

I've manually scheduled the archiver to run on the blake2-missing contents
I've started looking at deploying nginx in front of the backend API servers
I've queued rehashing for the objects whose archival succeeded

The (manual) nginx deployment really helps with pipelining-related errors (T760), so I think it's worthwhile, but doesn't help with flaws intrinsic to the archiver: the archiver is really bad at archiving big objects.

For the last stragglers, I'm using a workaround : use a local objstorage instead of the api server, and push the objects for archival one by one..... Yes, it's as horrible as it sounds.

.....

softwareheritage=> select count(*) from content where blake2s256 is null;
 count 
-------
  2800
(1 row)

After some more manual poking, we're now in the following status:

softwareheritage=> select count(*) from content where blake2s256 is null;
 count 
-------
   873
(1 row)

softwareheritage=> select min(length) from content where blake2s256 is null;                                                                                                                                                                  
    min    
-----------
 350253114
(1 row)

At this point, the rehash workers on azure also fail to handle the size of the object and get nuked by the OOM killer...

I'll process the final few entries with an ad-hoc script running on uffizi so we can finally close this issue and update the blake2s256 column to not null.

After "manual" computation of the remaining hashes:

softwareheritage=> select count(*) from content where blake2s256 is null;                                                                                                                                                                     
 count 
-------
     0
(1 row)

ardumont mentioned this in T864: Indexers - Find and implement a proper scheduling content messages indexing method.Dec 1 2017, 2:06 PM

ardumont mentioned this in rSPSITE56c64ba99ec4: deploy/indexer: Add swh_indexer_rehash manifest.Jun 15 2018, 2:29 PM

ardumont mentioned this in rSPSITE83bf8db35e22: deploy/indexer_rehash: Work around the objstorage config limitation.

ardumont mentioned this in rSPSITE151b26a6b1c2: deploy/indexer_rehash: Set the default objstorage to be azure.

This task has been migrated to GitLab.

Update existing contents with new hash blake2s256Closed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Update existing contents with new hash blake2s256
Closed, MigratedEdits Locked
Actions

Related Objects
Search...