Page MenuHomeSoftware Heritage

D88.id289.diff
No OneTemporary

D88.id289.diff

diff --git a/docs/archiver-blueprint.md b/docs/archiver-blueprint.md
--- a/docs/archiver-blueprint.md
+++ b/docs/archiver-blueprint.md
@@ -12,24 +12,32 @@
Requirements
------------
-* **Master/slave architecture**
+* **Peer-to-peer topology**
- There is 1 master copy and 1 or more slave copies of each object. A retention
- policy specifies the minimum number of copies that are required to be "safe".
+ Every storage involved in the archival process can be used as a source or a
+ destination for the archival, depending on the blobs it contains. A
+ retention policy specifies the minimum number of copies that are required
+ to be "safe".
+
+ Althoug the servers are totally equals the coordination of which content
+ should be copied and from where to where is centralized.
* **Append-only archival**
- The archiver treats master as read-only storage and slaves as append-only
- storages. The archiver never deletes any object. If removals are needed, in
- either master or slaves, they will be dealt with by other means.
+ The archiver treats involved storages as append-only storages. The archiver
+ never deletes any object. If removals are needed, they will be dealt with
+ by other means.
* **Asynchronous archival.**
Periodically (e.g., via cron), the archiver runs, produces a list of objects
- that need to be copied from master to slaves, and starts copying them over.
- Very likely, during any given archival run other new objects will be added to
- master; it will be the responsibility of *future* archiver runs, and not the
- current one, to copy new objects over.
+ that need to have more copies, and starts copying them over. The decision of
+ which storages are choosen to be sources and destinations are not up to the
+ storages themselves.
+
+ Very likely, during any given archival run, other new objects will be added
+ to storages; it will be the responsibility of *future* archiver runs, and
+ not the current one, to copy new objects over if needed.
* **Integrity at archival time.**
@@ -40,7 +48,7 @@
reporting about the corruption will be emitted.
Note that archival-time integrity checks are *not meant to replace periodic
- integrity checks* on both master and slave copies.
+ integrity checks*.
* **Parallel archival**
@@ -56,7 +64,8 @@
information:
* **status**: 3-state: *missing* (copy not present at destination), *ongoing*
- (copy to destination ongoing), *present* (copy present at destination)
+ (copy to destination ongoing), *present* (copy present at destination),
+ *corrupted* (content detected as corrupted during an archival).
* **mtime**: timestamp of last status change. This is either the destination
archival time (when status=present), or the timestamp of the last archival
request (status=ongoing); the timestamp is undefined when status=missing.
@@ -86,22 +95,14 @@
At each execution the director:
1. for each object: retrieve its archival status
-2. for each object that is in the master storage but has fewer copies than
- those requested by the retention policy:
- 1. if status=ongoing and mtime is not older than archival max age
- then continue to next object
- 2. check object integrity (e.g., with swh.storage.ObjStorage.check(obj_id))
- 3. mark object as needing archival
-3. group objects in need of archival in batches of archival batch size
+2. for each object that has fewer copies than those requested by the
+ retention policy:
+ 1. mark object as needing archival
+3. group objects in need of archival in batches of `archival batch size`
4. for each batch:
- 1. set status=ongoing and mtime=now() for each object in the batch
- 2. spawn an archive worker on the whole batch (e.g., submitting the relevant
+ 1. spawn an archive worker on the whole batch (e.g., submitting the relevant
celery task)
-Note that if an archiver worker task takes a long time (t > archival max age)
-to complete, it is possible for another task to be scheduled on the same batch,
-or an overlapping one.
-
### Archiver worker
@@ -111,47 +112,66 @@
Runtime parameters:
* objects to archive
+* archival policies (retention & archival max age)
At each execution a worker:
-1. create empty map { destinations -> objects that need to be copied there }
+1. for each object in the batch
+ 1. check that the object still need to be archived
+ (#present copies < retention policy)
+ 2. if an object has status=ongoing but the elapsed time from task submission
+ is less than the *archival max age*, it count as present, as we assume
+ that it will be copied in the futur. If the delay is elapsed, otherwise,
+ is count as a missing copy.
2. for each object to archive:
1. retrieve current archive status for all destinations
- 2. update the map noting where the object needs to be copied
-3. for each destination:
- 1. look up in the map objects that need to be copied there
- 2. copy all objects to destination using the copier
- 3. set status=present and mtime=now() for each copied object
+ 2. create a map noting where the object is present and where it can be copied
+ 3. Randomly choose couples (source, destination), where destinations are all
+ differents, to make enough copies
+3. for each (content, source, destination):
+ 1. Join the contents by key (source, destination) to have a map
+ {(source, destination) -> [contents]}
+ 2. for each transfer couple, use a copier to make the copy.
Note that:
* In case multiple jobs where tasked to archive the same of overlapping
- objects, step (2.2) might decide that some/all objects of this batch no
- longer need to be archived to some/all destinations.
+ objects, step (1) might decide that some/all objects of this batch no
+ longer need to be archived.
-* Due to parallelism, it is also possible that the same objects will be copied
- over at the same time by multiple workers.
+* Due to parallelism, it is possible that the same objects will be copied
+ over at the same time by multiple workers. Also, the same object could end
+ having more copies than the minimal number required.
### Archiver copier
The copier is run on demand by archiver workers, to transfer file batches from
-master to a given destination.
+a given source to a given destination.
+
+The copier transfers files one by one. The copying process is atomic at the file
+granularity (i.e., individual files might be visible on the destination before
+*all* files have been transferred) and ensures that *concurrent transfer of the
+same files by multiple copier instances do not result in corrupted files*. Note
+that, due to this and the fact that timestamps are updated by the worker, all
+files copied in the same batch will have the same mtime even though the actual
+file creation times on a given destination might differ.
-The copier transfers all files together with a single network connection. The
-copying process is atomic at the file granularity (i.e., individual files might
-be visible on the destination before *all* files have been transferred) and
-ensures that *concurrent transfer of the same files by multiple copier
-instances do not result in corrupted files*. Note that, due to this and the
-fact that timestamps are updated by the director, all files copied in the same
-batch will have the same mtime even though the actual file creation times on a
-given destination might differ.
+The copier is implemented using the ObjStorage API for the sources and
+destinations.
-As a first approximation, the copier can be implemented using rsync, but a
-dedicated protocol can be devised later. In the case of rsync, one should use
---files-from to list the file to be copied. Rsync atomically renames files
-one-by-one during transfer; so as long as --inplace is *not* used, concurrent
-rsync of the same files should not be a problem.
+At each execution of the copier:
+
+1. for each object in the batch:
+ 1. check the object on the source storages
+ * if the object if corrupted or missing
+ * update its status in the database
+ * remove it from the current batch
+ 2. start the copy of the content from the source storage from the
+ destination storage
+ * if an error occured on one of the content that should have been valid,
+ consider the whole batch as a failure.
+ 3. Set status=present and mtime=now for each successfully copied object
DB structure
@@ -159,9 +179,13 @@
Postgres SQL definitions for the archival status:
- CREATE DOMAIN archive_id AS TEXT;
+ -- Those names are sample of archives server names
+ CREATE TYPE archive_id AS ENUM (
+ 'uffizi',
+ 'banco'
+ );
- CREATE TABLE archives (
+ CREATE TABLE archive (
id archive_id PRIMARY KEY,
url TEXT
);
@@ -169,13 +193,39 @@
CREATE TYPE archive_status AS ENUM (
'missing',
'ongoing',
- 'present'
+ 'present',
+ 'corrupted'
);
CREATE TABLE content_archive (
- content_id sha1 REFERENCES content(sha1),
- archive_id archive_id REFERENCES archives(id),
- status archive_status,
- mtime timestamptz,
- PRIMARY KEY (content_id, archive_id)
+ content_id sha1 unique,
+ copies jsonb
);
+
+
+Where the content_archive.copies field is of type jsonb and contains datas
+about the storages that contains (or not) the content represented by the sha1
+
+ {
+ "$schema": "http://json-schema.org/schema#",
+ "title": "Copies data"
+ "description": "Data about the presence of a content into the storages"
+ "type": "object",
+ "Patternproperties": {
+ "^[a-zA-Z1-9]+$": {
+ "description": "archival status for the server named by key",
+ "type": "object",
+ "properties": {
+ "status": {
+ "description": "Archival status on this copy matching one of the archive_status enum"
+ "type": string
+ },
+ "mtime": {
+ "description": "Last time of the status update"
+ "type": float
+ }
+ }
+ }
+ },
+ "additionalProperties": False
+ }

File Metadata

Mime Type
text/plain
Expires
Sun, Aug 24, 4:56 PM (1 w, 8 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3226056

Event Timeline