diff --git a/docs/index.rst b/docs/index.rst --- a/docs/index.rst +++ b/docs/index.rst @@ -141,7 +141,6 @@ architecture getting-started developer-setup - Infrastructure API documentation swh.core swh.dataset diff --git a/docs/infrastructure/elasticsearch.rst b/docs/infrastructure/elasticsearch.rst deleted file mode 100644 --- a/docs/infrastructure/elasticsearch.rst +++ /dev/null @@ -1,38 +0,0 @@ -.. _elasticsearch: - -============== -Elasticsearch -============== - -Software Heritage uses an Elasticsearch cluster for long-term log storage. - -Hardware implementation -======================= - -- 3x Xeon E3v6 (Skylake) servers with 32GB of RAM and 3x 2TB of hard drives each -- 2x gigabit switches - -List of nodes -------------- - -* esnode1.internal.softwareheritage.org. -* esnode2.internal.softwareheritage.org. -* esnode3.internal.softwareheritage.org. - -Architecture diagram -==================== - -.. graphviz:: ../images/elasticsearch.dot - -Per-node storage -================ - -- one root hard drive with a small filesystem -- 3x 2TB hard drives in RAID0 -- xfs filesystem on this volume, mounted on */srv/elasticsearch* - -Remark -====== - -The root hard drive of the Elasticsearch nodes is also used to -store an ext4 `Kafka` dedicated filesystem mounted on */srv/kafka* . diff --git a/docs/infrastructure/hypervisors.rst b/docs/infrastructure/hypervisors.rst deleted file mode 100644 --- a/docs/infrastructure/hypervisors.rst +++ /dev/null @@ -1,29 +0,0 @@ -=========== -Hypervisors -=========== - -Software Heritage uses a few hypervisors configured in a Proxmox cluster - -List of Proxmox nodes -===================== - -- beaubourg: Xeon E7-4809 server, 16 cores/512 GB RAM, bought in 2015 -- hypervisor3: EPYC 7301 server, 32 cores/256 GB RAM, bought in 2018 -- orsay: Opteron 6172, 48 cores/128 GB RAM, refurbished (2010 vintage) - -Orsay is not a production machine, its purpose is to run throw-away development/staging VMs. - -Per-node storage -================ - -The servers each have physically installed 2.5" SSDs (SAS or SATA), configured -in mdadm RAID10 pools. -A device mapper layer on top of these pools allows Proxmox to easily manage VM -disk images. - -Network storage -=============== - -A :ref:`ceph_cluster` is setup as a shared storage resource. -It can be used to temporarily transfer VM disk images from one hypervisor -node to another, or to directly store virtual machine disk images. diff --git a/docs/infrastructure/index.rst b/docs/infrastructure/index.rst deleted file mode 100644 --- a/docs/infrastructure/index.rst +++ /dev/null @@ -1,51 +0,0 @@ -=============================== -Software Heritage storage sites -=============================== - -.. toctree:: - :maxdepth: 2 - :hidden: - - storage_site_rocquencourt_physical - storage_site_rocquencourt_virtual - storage_site_azure_euwest - storage_site_amazon - storage_site_others - elasticsearch - hypervisors - object_storage - -Physical machines at Rocquencourt -================================= - -INRIA Rocquencourt is the main Software Heritage datacenter. -It is the only one to contain -:doc:`directly-managed physical machines `. - -Virtual machines at Rocquencourt -================================ - -The :doc:`virtual machines at Rocquencourt ` -are directly managed by Software Heritage staff as well and run on -:doc:`Software Heritage hypervisors `. - -Azure Euwest -============ - -Various virtual machines and other services are hosted at -:doc:`Azure Euwest ` - -Amazon S3 -========= - -Object storage -============== - -Even though there are different object storage implementations in different -locations, it has been deemed useful to regroup all object storage-related -information in a :doc:`single document `. - -Other locations -=============== - -:doc:`Other locations `. diff --git a/docs/infrastructure/object_storage.rst b/docs/infrastructure/object_storage.rst deleted file mode 100644 --- a/docs/infrastructure/object_storage.rst +++ /dev/null @@ -1,76 +0,0 @@ -============== -Object storage -============== - -There is not one but at least 4 different object stores directly managed -by the Software Heritage group: - -- Main archive -- Rocquencourt replica archive -- Azure archive -- AWS archive - -The Main archive -================ - -Uffizi -Located in Rocquencourt - -Replica archive -=============== - -Banco -Located in Rocquencourt, in a different building than the main one - -Azure archive -============= - -The Azure archive uses an Azure Block Storage backend, implemented in the -*swh.objstorage_backends.azure.AzureCloudObjStorage* Python class. - -Internally, that class uses the *block_blob_service* Azure API. - -AWS archive -=========== - -The AWS archive is stored in the *softwareheritage* Amazon S3 bucket, in the US-East - (N. Virginia) region. That bucket is public. - -It is being continuously populated by the :ref:`content_replayer` program. - -Softwareheritage Python programs access it using a libcloud backend. - -URL ---- - -``s3://softwareheritage/content`` - -.. _content_replayer: - -content_replayer ----------------- - -A Python program which reads new objects from Kafka and then copies them from the - object storages on Banco and Uffizi. - - -Implementation details ----------------------- - -* Uses *swh.objstorage.backends.libcloud* - -* Uses *libcloud.storage.drivers.s3* - - -Architecture diagram -==================== - -.. graph:: swh_archives - - "Main archive" -- "Replica archive"; - "Azure archive"; - "AWS archive"; - "Main archive" [shape=rectangle]; - "Replica archive" [shape=rectangle]; - "Azure archive" [shape=rectangle]; - "AWS archive" [shape=rectangle]; diff --git a/docs/infrastructure/storage_site_amazon.rst b/docs/infrastructure/storage_site_amazon.rst deleted file mode 100644 --- a/docs/infrastructure/storage_site_amazon.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. _storage_amazon: - -Amazon storage -============== - -A *softwareheritage* object storage S3 bucket is hosted publicly in the -US-east AWS region. - -Data is reachable from the *s3://softwareheritage/content* URL. diff --git a/docs/infrastructure/storage_site_azure_euwest.rst b/docs/infrastructure/storage_site_azure_euwest.rst deleted file mode 100644 --- a/docs/infrastructure/storage_site_azure_euwest.rst +++ /dev/null @@ -1,32 +0,0 @@ -Azure Euwest -============ - -virtual machines ----------------- - -- dbreplica0: contains a read-only instance of the *softwareheritage* database -- dbreplica1: contains a read-only instance of the *softwareheritage-indexer* database -- kafka01 to 06: journal nodes -- mirror-node-1 to 3 -- storage0: storage and object storage services used by the Azure workers -- vangogh: vault service and r/w database for the vault workers -- webapp0: webapp mirror using storage0 services to expose results -- worker01 to 10 and worker13: indexer workers -- worker11 to 12: vault workers (cooking) - -The PostgreSQL databases are populated using wal streaming from *somerset*. - -storage accounts ----------------- - -16 Azure storage account (0euwestswh to feuwestswh) are dedicated to blob -containers for object storage. -The first hexadecimal digit of an account name is also the first digit of -its content hashes. -Blobs are stored in location names of the form *6euwestswh/contents* - -Other storage accounts: - -- archiveeuwestswh: mirrors of dead software forges like *code.google.com* -- swhvaultstorage: cooked archives for the *vault* server running in azure. -- swhcontent: object storage content (individual blobs) diff --git a/docs/infrastructure/storage_site_others.rst b/docs/infrastructure/storage_site_others.rst deleted file mode 100644 --- a/docs/infrastructure/storage_site_others.rst +++ /dev/null @@ -1,24 +0,0 @@ -========================================= -Other Software Heritage storage locations -========================================= - -INRIA-provided storage at Rocquencourt -====================================== - -The *filer-backup:/swh1* NFS filesystem is used to store DAR backups. -It is mounted on *uffizi:/srv/remote-backups* - -The *uffizi:/srv/remote-backups* filesystem is regularly snapshotted and the snapshots are visible in -*uffizi:/srv/remote-backups/.snapshot/*. - -Workstations -============ - -Staff workstations are located at INRIA Paris. The most important one from a storage -point of view is *giverny.paris.inria.fr* and has more than 10 TB of directly-attached -storage, mostly used for research databases. - -Public website -============== - -Hosted by Gandi, its storage (including Wordpress) is located in one or more Gandi datacenter(s). diff --git a/docs/infrastructure/storage_site_rocquencourt_physical.rst b/docs/infrastructure/storage_site_rocquencourt_physical.rst deleted file mode 100644 --- a/docs/infrastructure/storage_site_rocquencourt_physical.rst +++ /dev/null @@ -1,64 +0,0 @@ -Physical machines at Rocquencourt -================================= - -hypervisors ------------ - -The :doc:`hypervisors ` mostly use local storage on the form of internal -SSDS but also have access to a :ref:`ceph_cluster`. - -NFS server ----------- - -There is only one NFS server managed by Software Heritage, *uffizi.internal.softwareheritage.org*. -That machine is located at Rocquencourt and is directly attached to two SAS storage bays. - -NFS-exported data is present under these local filesystem paths:: - -/srv/storage/space -/srv/softwareheritage/objects - -belvedere ---------- - -This server is used for at least two separate PostgreSQL instances: - -- *softwareheritage* database (port 5433) -- *swh-lister* and *softwareheritage-scheduler* databases (port 5434) - -Data is stored on local SSDs. The operating system lies on a LSI hardware RAID 1 volume and -each PostgreSQL instance uses a dedicated set of drives in mdadm RAID10 volume(s). - -It also uses a single NFS volume:: - - uffizi:/srv/storage/space/postgres-backups/prado - -banco ------ - -This machine is located in its own building in Rocquencourt, along -with a SAS storage bay. -It is intended to serve as a backup for the main site on building 30. - -Elasticsearch cluster ---------------------- - -The :doc:`Elasticsearch cluster ` only uses local storage on -its nodes. - -Test / staging server ---------------------- - -There is also *orsay*, a refurbished machine only used for testing / staging -new software versions. - -.. _ceph_cluster: - -Ceph cluster ------------- - -The Software Heritage Ceph cluster contains three nodes: - -- ceph-mon1 -- ceph-osd1 -- ceph-osd2 diff --git a/docs/infrastructure/storage_site_rocquencourt_virtual.rst b/docs/infrastructure/storage_site_rocquencourt_virtual.rst deleted file mode 100644 --- a/docs/infrastructure/storage_site_rocquencourt_virtual.rst +++ /dev/null @@ -1,43 +0,0 @@ -Virtual machines at Rocquencourt -================================ - -The following virtual machines are hosted on Proxmox hypervisors located at Rocquencourt. -All of them use local storage on their virtual hard drive. - -VMs without NFS mount points ----------------------------- - -- munin0 -- tate, used for public and private (intranet) wikis -- getty -- thyssen -- jenkins-debian1.internal.softwareheritage.org -- logstash0 -- kibana0 -- saatchi -- louvre - -Containers and VMs with nfs storage: ------------------------------------- - -- somerset.internal.softwareheritage.org is a lxc container running on *beaubourg* - It serves as a host for the *softwareheritage* and *softwareheritage-indexer* - databases. - -- worker01 to worker16.internal.softwareheritage.org: loader and lister workers -- pergamon: internal system administration services (puppet master, grafana, dns resolver, etc...) -- moma: webapp and deposit services exposed publicly - -These VMs access one or more of these NFS volumes located on uffizi:: - - uffizi:/srv/softwareheritage/objects - uffizi:/srv/storage/space - uffizi:/srv/storage/space/annex - uffizi:/srv/storage/space/annex/public - uffizi:/srv/storage/space/antelink - uffizi:/srv/storage/space/oversize-objects - uffizi:/srv/storage/space/personal - uffizi:/srv/storage/space/postgres-backups/somerset - uffizi:/srv/storage/space/provenance-index - uffizi:/srv/storage/space/swh-deposit -