D88.id289.diff
No OneTemporary
Actions

Size

10 KB

Subscribers

None

D88.id289.diff
View Options

	diff --git a/docs/archiver-blueprint.md b/docs/archiver-blueprint.md
	--- a/docs/archiver-blueprint.md
	+++ b/docs/archiver-blueprint.md
	@@ -12,24 +12,32 @@
	Requirements
	------------

	-* Master/slave architecture
	+* Peer-to-peer topology

	- There is 1 master copy and 1 or more slave copies of each object. A retention
	- policy specifies the minimum number of copies that are required to be "safe".
	+ Every storage involved in the archival process can be used as a source or a
	+ destination for the archival, depending on the blobs it contains. A
	+ retention policy specifies the minimum number of copies that are required
	+ to be "safe".
	+
	+ Althoug the servers are totally equals the coordination of which content
	+ should be copied and from where to where is centralized.

	* Append-only archival

	- The archiver treats master as read-only storage and slaves as append-only
	- storages. The archiver never deletes any object. If removals are needed, in
	- either master or slaves, they will be dealt with by other means.
	+ The archiver treats involved storages as append-only storages. The archiver
	+ never deletes any object. If removals are needed, they will be dealt with
	+ by other means.

	* Asynchronous archival.

	Periodically (e.g., via cron), the archiver runs, produces a list of objects
	- that need to be copied from master to slaves, and starts copying them over.
	- Very likely, during any given archival run other new objects will be added to
	- master; it will be the responsibility of future archiver runs, and not the
	- current one, to copy new objects over.
	+ that need to have more copies, and starts copying them over. The decision of
	+ which storages are choosen to be sources and destinations are not up to the
	+ storages themselves.
	+
	+ Very likely, during any given archival run, other new objects will be added
	+ to storages; it will be the responsibility of future archiver runs, and
	+ not the current one, to copy new objects over if needed.

	* Integrity at archival time.

	@@ -40,7 +48,7 @@
	reporting about the corruption will be emitted.

	Note that archival-time integrity checks are *not meant to replace periodic
	- integrity checks* on both master and slave copies.
	+ integrity checks*.

	* Parallel archival

	@@ -56,7 +64,8 @@
	information:

	* status: 3-state: missing (copy not present at destination), ongoing
	- (copy to destination ongoing), present (copy present at destination)
	+ (copy to destination ongoing), present (copy present at destination),
	+ corrupted (content detected as corrupted during an archival).
	* mtime: timestamp of last status change. This is either the destination
	archival time (when status=present), or the timestamp of the last archival
	request (status=ongoing); the timestamp is undefined when status=missing.
	@@ -86,22 +95,14 @@
	At each execution the director:

	1. for each object: retrieve its archival status
	-2. for each object that is in the master storage but has fewer copies than
	- those requested by the retention policy:
	- 1. if status=ongoing and mtime is not older than archival max age
	- then continue to next object
	- 2. check object integrity (e.g., with swh.storage.ObjStorage.check(obj_id))
	- 3. mark object as needing archival
	-3. group objects in need of archival in batches of archival batch size
	+2. for each object that has fewer copies than those requested by the
	+ retention policy:
	+ 1. mark object as needing archival
	+3. group objects in need of archival in batches of `archival batch size`
	4. for each batch:
	- 1. set status=ongoing and mtime=now() for each object in the batch
	- 2. spawn an archive worker on the whole batch (e.g., submitting the relevant
	+ 1. spawn an archive worker on the whole batch (e.g., submitting the relevant
	celery task)

	-Note that if an archiver worker task takes a long time (t > archival max age)
	-to complete, it is possible for another task to be scheduled on the same batch,
	-or an overlapping one.
	-

	### Archiver worker

	@@ -111,47 +112,66 @@
	Runtime parameters:

	* objects to archive
	+* archival policies (retention & archival max age)

	At each execution a worker:

	-1. create empty map { destinations -> objects that need to be copied there }
	+1. for each object in the batch
	+ 1. check that the object still need to be archived
	+ (#present copies < retention policy)
	+ 2. if an object has status=ongoing but the elapsed time from task submission
	+ is less than the archival max age, it count as present, as we assume
	+ that it will be copied in the futur. If the delay is elapsed, otherwise,
	+ is count as a missing copy.
	2. for each object to archive:
	1. retrieve current archive status for all destinations
	- 2. update the map noting where the object needs to be copied
	-3. for each destination:
	- 1. look up in the map objects that need to be copied there
	- 2. copy all objects to destination using the copier
	- 3. set status=present and mtime=now() for each copied object
	+ 2. create a map noting where the object is present and where it can be copied
	+ 3. Randomly choose couples (source, destination), where destinations are all
	+ differents, to make enough copies
	+3. for each (content, source, destination):
	+ 1. Join the contents by key (source, destination) to have a map
	+ {(source, destination) -> [contents]}
	+ 2. for each transfer couple, use a copier to make the copy.

	Note that:

	* In case multiple jobs where tasked to archive the same of overlapping
	- objects, step (2.2) might decide that some/all objects of this batch no
	- longer need to be archived to some/all destinations.
	+ objects, step (1) might decide that some/all objects of this batch no
	+ longer need to be archived.

	-* Due to parallelism, it is also possible that the same objects will be copied
	- over at the same time by multiple workers.
	+* Due to parallelism, it is possible that the same objects will be copied
	+ over at the same time by multiple workers. Also, the same object could end
	+ having more copies than the minimal number required.


	### Archiver copier

	The copier is run on demand by archiver workers, to transfer file batches from
	-master to a given destination.
	+a given source to a given destination.
	+
	+The copier transfers files one by one. The copying process is atomic at the file
	+granularity (i.e., individual files might be visible on the destination before
	+all files have been transferred) and ensures that *concurrent transfer of the
	+same files by multiple copier instances do not result in corrupted files*. Note
	+that, due to this and the fact that timestamps are updated by the worker, all
	+files copied in the same batch will have the same mtime even though the actual
	+file creation times on a given destination might differ.

	-The copier transfers all files together with a single network connection. The
	-copying process is atomic at the file granularity (i.e., individual files might
	-be visible on the destination before all files have been transferred) and
	-ensures that *concurrent transfer of the same files by multiple copier
	-instances do not result in corrupted files*. Note that, due to this and the
	-fact that timestamps are updated by the director, all files copied in the same
	-batch will have the same mtime even though the actual file creation times on a
	-given destination might differ.
	+The copier is implemented using the ObjStorage API for the sources and
	+destinations.

	-As a first approximation, the copier can be implemented using rsync, but a
	-dedicated protocol can be devised later. In the case of rsync, one should use
	---files-from to list the file to be copied. Rsync atomically renames files
	-one-by-one during transfer; so as long as --inplace is not used, concurrent
	-rsync of the same files should not be a problem.
	+At each execution of the copier:
	+
	+1. for each object in the batch:
	+ 1. check the object on the source storages
	+ * if the object if corrupted or missing
	+ * update its status in the database
	+ * remove it from the current batch
	+ 2. start the copy of the content from the source storage from the
	+ destination storage
	+ * if an error occured on one of the content that should have been valid,
	+ consider the whole batch as a failure.
	+ 3. Set status=present and mtime=now for each successfully copied object


	DB structure
	@@ -159,9 +179,13 @@

	Postgres SQL definitions for the archival status:

	- CREATE DOMAIN archive_id AS TEXT;
	+ -- Those names are sample of archives server names
	+ CREATE TYPE archive_id AS ENUM (
	+ 'uffizi',
	+ 'banco'
	+ );

	- CREATE TABLE archives (
	+ CREATE TABLE archive (
	id archive_id PRIMARY KEY,
	url TEXT
	);
	@@ -169,13 +193,39 @@
	CREATE TYPE archive_status AS ENUM (
	'missing',
	'ongoing',
	- 'present'
	+ 'present',
	+ 'corrupted'
	);

	CREATE TABLE content_archive (
	- content_id sha1 REFERENCES content(sha1),
	- archive_id archive_id REFERENCES archives(id),
	- status archive_status,
	- mtime timestamptz,
	- PRIMARY KEY (content_id, archive_id)
	+ content_id sha1 unique,
	+ copies jsonb
	);
	+
	+
	+Where the content_archive.copies field is of type jsonb and contains datas
	+about the storages that contains (or not) the content represented by the sha1
	+
	+ {
	+ "$schema": "http://json-schema.org/schema#",
	+ "title": "Copies data"
	+ "description": "Data about the presence of a content into the storages"
	+ "type": "object",
	+ "Patternproperties": {
	+ "^[a-zA-Z1-9]+$": {
	+ "description": "archival status for the server named by key",
	+ "type": "object",
	+ "properties": {
	+ "status": {
	+ "description": "Archival status on this copy matching one of the archive_status enum"
	+ "type": string
	+ },
	+ "mtime": {
	+ "description": "Last time of the status update"
	+ "type": float
	+ }
	+ }
	+ }
	+ },
	+ "additionalProperties": False
	+ }

File Metadata

Mime Type: text/plain
Expires: Sun, Aug 24, 4:56 PM (1 w, 8 h ago)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 3226056

D88.id289.diffNo OneTemporaryActions

D88.id289.diffView Options

File Metadata

Event Timeline

D88.id289.diff
No OneTemporary
Actions

D88.id289.diff
View Options