diff --git a/docs/archiver-blueprint.md b/docs/archiver-blueprint.md --- a/docs/archiver-blueprint.md +++ b/docs/archiver-blueprint.md @@ -12,24 +12,26 @@ Requirements ------------ -* **Master/slave architecture** +* **Peer to Peer architecture** - There is 1 master copy and 1 or more slave copies of each object. A retention - policy specifies the minimum number of copies that are required to be "safe". + Every copy involved in the archival process can be used as a source or a + destination for the archival, depending on the blobs it contains. A + retention policy specifies the minimum number of copies that are required + to be "safe". * **Append-only archival** - The archiver treats master as read-only storage and slaves as append-only - storages. The archiver never deletes any object. If removals are needed, in - either master or slaves, they will be dealt with by other means. + The archiver treats involved storages as append-only storages. The archiver + never deletes any object. If removals are needed, they will be dealt with + by other means. * **Asynchronous archival.** Periodically (e.g., via cron), the archiver runs, produces a list of objects - that need to be copied from master to slaves, and starts copying them over. - Very likely, during any given archival run other new objects will be added to - master; it will be the responsibility of *future* archiver runs, and not the - current one, to copy new objects over. + that need to have more copies, and starts copying them over. Very likely, + during any given archival run other new objects will be added to storages; + it will be the responsibility of *future* archiver runs, and not the current + one, to copy new objects over if needed. * **Integrity at archival time.** @@ -40,7 +42,7 @@ reporting about the corruption will be emitted. Note that archival-time integrity checks are *not meant to replace periodic - integrity checks* on both master and slave copies. + integrity checks*. * **Parallel archival** @@ -86,21 +88,17 @@ At each execution the director: 1. for each object: retrieve its archival status -2. for each object that is in the master storage but has fewer copies than - those requested by the retention policy: - 1. if status=ongoing and mtime is not older than archival max age - then continue to next object - 2. check object integrity (e.g., with swh.storage.ObjStorage.check(obj_id)) - 3. mark object as needing archival +2. for each object that has fewer copies than those requested by the + retention policy: + 1. mark object as needing archival 3. group objects in need of archival in batches of archival batch size 4. for each batch: - 1. set status=ongoing and mtime=now() for each object in the batch - 2. spawn an archive worker on the whole batch (e.g., submitting the relevant + 1. spawn an archive worker on the whole batch (e.g., submitting the relevant celery task) -Note that if an archiver worker task takes a long time (t > archival max age) -to complete, it is possible for another task to be scheduled on the same batch, -or an overlapping one. +Note that if an archiver worker task takes a long time to complete, it is +possible for another task to be scheduled on the same batch, or an +overlapping one. ### Archiver worker @@ -111,47 +109,49 @@ Runtime parameters: * objects to archive +* archival policies (retention & archival max age) At each execution a worker: -1. create empty map { destinations -> objects that need to be copied there } +1. Check that the given objects still need to be archived: + 1. If an object has status=ongoing but elapsed time, it will be rescheduled 2. for each object to archive: 1. retrieve current archive status for all destinations - 2. update the map noting where the object needs to be copied -3. for each destination: - 1. look up in the map objects that need to be copied there - 2. copy all objects to destination using the copier - 3. set status=present and mtime=now() for each copied object + 2. create a map noting where the object is present and where it can be copied + 3. Randomly choose couples (source, destination), where destinations are all + differents, to make enough copies +3. for each (content, destination, source): + 1. Join the contents by key (destination, source) to have a map + {(destination, source) -> [contents]} + 1. for each transfert couple, use a copier to make the copy. + 2. Set status=present and mtime=now for each copied object. Note that: * In case multiple jobs where tasked to archive the same of overlapping - objects, step (2.2) might decide that some/all objects of this batch no - longer need to be archived to some/all destinations. + objects, step (1) might decide that some/all objects of this batch no + longer need to be archived. -* Due to parallelism, it is also possible that the same objects will be copied - over at the same time by multiple workers. +* Due to parallelism, it is possible that the same objects will be copied + over at the same time by multiple workers. Also, the same object could end + having more copies than the required number. ### Archiver copier The copier is run on demand by archiver workers, to transfer file batches from -master to a given destination. +a given source to a given destination. -The copier transfers all files together with a single network connection. The -copying process is atomic at the file granularity (i.e., individual files might -be visible on the destination before *all* files have been transferred) and -ensures that *concurrent transfer of the same files by multiple copier -instances do not result in corrupted files*. Note that, due to this and the -fact that timestamps are updated by the director, all files copied in the same -batch will have the same mtime even though the actual file creation times on a -given destination might differ. +The copier transfers files one by one. The copying process is atomic at the file +granularity (i.e., individual files might be visible on the destination before +*all* files have been transferred) and ensures that *concurrent transfer of the +same files by multiple copier instances do not result in corrupted files*. Note +that, due to this and the fact that timestamps are updated by the worker, all +files copied in the same batch will have the same mtime even though the actual +file creation times on a given destination might differ. -As a first approximation, the copier can be implemented using rsync, but a -dedicated protocol can be devised later. In the case of rsync, one should use ---files-from to list the file to be copied. Rsync atomically renames files -one-by-one during transfer; so as long as --inplace is *not* used, concurrent -rsync of the same files should not be a problem. +The copier is implemented using the ObjStorage api for the sources and +destinations. DB structure @@ -159,9 +159,12 @@ Postgres SQL definitions for the archival status: - CREATE DOMAIN archive_id AS TEXT; + CREATE TYPE archive_id AS ENUM ( + 'uffizi', + 'banco' + ); - CREATE TABLE archives ( + CREATE TABLE archive ( id archive_id PRIMARY KEY, url TEXT ); @@ -173,9 +176,6 @@ ); CREATE TABLE content_archive ( - content_id sha1 REFERENCES content(sha1), - archive_id archive_id REFERENCES archives(id), - status archive_status, - mtime timestamptz, - PRIMARY KEY (content_id, archive_id) + content_id sha1 unique, + copies jsonb );