diff --git a/docker/README.md b/docker/README.md index 8a303c8..0b59f8d 100644 --- a/docker/README.md +++ b/docker/README.md @@ -1,662 +1,662 @@ # Docker environment This directory contains Dockerfiles to run a small Software Heritage instance on development machines. The end goal is to smooth the contributors/developers workflow. Focus on coding, not configuring! WARNING: Running a Software Heritage instance on your machine can consume quite a bit of resources: if you play a bit too hard (e.g., if you try to list all GitHub repositories with the corresponding lister), you may fill your hard drive, and consume a lot of CPU, memory and network bandwidth. ## Dependencies This uses docker with docker-compose, so ensure you have a working docker environment and docker-compose is installed. We recommend using the latest version of docker, so please read https://docs.docker.com/install/linux/docker-ce/debian/ for more details on how to install docker on your machine. On a debian system, docker-compose can be installed from Debian repositories: ``` ~$ sudo apt install docker-compose ``` ## Quick start First, change to the docker dir if you aren't there yet: ``` ~$ cd swh-environment/docker ``` Then, start containers: ``` ~/swh-environment/docker$ docker-compose up -d [...] Creating docker_amqp_1 ... done Creating docker_zookeeper_1 ... done Creating docker_kafka_1 ... done Creating docker_flower_1 ... done Creating docker_swh-scheduler-db_1 ... done [...] ``` This will build docker images and run them. Check everything is running fine with: ``` ~/swh-environment/docker$ docker-compose ps Name Command State Ports ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- docker_amqp_1 docker-entrypoint.sh rabbi ... Up 15671/tcp, 0.0.0.0:5018->15672/tcp, 25672/tcp, 4369/tcp, 5671/tcp, 5672/tcp docker_flower_1 flower --broker=amqp://gue ... Up 0.0.0.0:5555->5555/tcp docker_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp docker_swh-deposit-db_1 docker-entrypoint.sh postgres Up 5432/tcp docker_swh-deposit_1 /entrypoint.sh Up 0.0.0.0:5006->5006/tcp [...] ``` The startup of some containers may fail the first time for dependency-related problems. If some containers failed to start, just run the `docker-compose up -d` command again. If a container really refuses to start properly, you can check why using the `docker-compose logs` command. For example: ``` ~/swh-environment/docker$ docker-compose logs swh-lister Attaching to docker_swh-lister_1 [...] swh-lister_1 | Processing /src/swh-scheduler swh-lister_1 | Could not install packages due to an EnvironmentError: [('/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz', '/tmp/pip-req-build-pm7nsax3/.hypothesis/unicodedata/8.0.0/charmap.json.gz', "[Errno 13] Permission denied: '/src/swh-scheduler/.hypothesis/unicodedata/8.0.0/charmap.json.gz'")] swh-lister_1 | ``` Once all containers are running, you can use the web interface by opening http://localhost:5080/ in your web browser. At this point, the archive is empty and needs to be filled with some content. To do so, you can create tasks that will scrape a forge. For example, to inject the code from the https://0xacab.org gitlab forge: ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ swh scheduler task add list-gitlab-full \ -p oneshot url=https://0xacab.org/api/v4 Created 1 tasks Task 1 Next run: just now (2018-12-19 14:58:49+00:00) Interval: 90 days, 0:00:00 Type: list-gitlab-full Policy: oneshot Args: Keyword args: url=https://0xacab.org/api/v4 ``` This task will scrape the forge's project list and create subtasks to inject each git repository found there. This will take a bit af time to complete. To increase the speed at which git repositories are imported, you can spawn more `swh-loader-git` workers: ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ celery status listers@50ac2185c6c9: OK loader@b164f9055637: OK indexer@33bc6067a5b8: OK vault@c9fef1bbfdc1: OK 4 nodes online. -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ celery control pool_grow 3 -d loader@b164f9055637 -> loader@b164f9055637: OK pool will grow -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ celery inspect -d loader@b164f9055637 stats | grep prefetch_count "prefetch_count": 4 ``` Now there are 4 workers ingesting git repositories. You can also increase the number of `swh-loader-git` containers: ``` ~/swh-environment/docker$ docker-compose up -d --scale swh-loader=4 [...] Creating docker_swh-loader_2 ... done Creating docker_swh-loader_3 ... done Creating docker_swh-loader_4 ... done ``` ## Updating the docker image All containers started by `docker-compose` are bound to a docker image named `swh/stack` including all the software components of Software Heritage. When new versions of these components are released, the docker image will not be automatically updated. In order to update all Software Heritage components to their latest version, the docker image needs to be explicitly rebuilt by issuing the following command from within the `docker` directory: ``` ~/swh-environment/docker$ docker build --no-cache -t swh/stack . ``` ## Details This runs the following services on their respectively standard ports, all of the following services are configured to communicate with each other: - swh-storage-db: a `softwareheritage` instance db that stores the Merkle DAG, - swh-objstorage: Content-addressable object storage, - swh-storage: Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata, - swh-web: the Software Heritage web user interface, - swh-scheduler: the API service as well as 2 utilities, the runner and the listener, - swh-lister: celery workers dedicated to running lister tasks, - swh-loaders: celery workers dedicated to importing/updating source code content (VCS repos, source packages, etc.), - swh-journal: Persistent logger of changes to the archive, with publish-subscribe support. That means you can start doing the ingestion using those services using the same setup described in the getting-started starting directly at https://docs.softwareheritage.org/devel/getting-started.html#step-4-ingest-repositories ### Exposed Ports Several services have their listening ports exposed on the host: - amqp: 5072 - kafka: 5092 - nginx: 5080 And for SWH services: - scheduler API: 5008 - storage API: 5002 - object storage API: 5003 - indexer API: 5007 - web app: 5004 - deposit app: 5006 Beware that these ports are not the same as the ports used from within the docker network. This means that the same command executed from the host or from a docker container will not use the same urls to access services. For example, to use the `celery` utility from the host, you may type: ``` ~/swh-environment/docker$ CELERY_BROKER_URL=amqp://:5072// celery status loader@61704103668c: OK [...] ``` To run the same command from within a container: ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api celery status +~/swh-environment/docker$ docker-compose exec swh-scheduler celery status loader@61704103668c: OK [...] ``` ## Managing tasks One of the main components of the Software Heritage platform is the task system. These are used to manage everything related to background process, like discovering new git repositories to import, ingesting them, checking a known repository is up to date, etc. The task system is based on Celery but uses a custom database-based scheduler. So when we refer to the term 'task', it may designate either a Celery task or a SWH one (ie. the entity in the database). When we refer to simply a "task" in the documentation, it designates the SWH task. When a SWH task is ready to be executed, a Celery task is created to handle the actual SWH task's job. Note that not all Celery tasks are directly linked to a SWH task (some SWH tasks are implemented using a Celery task that spawns Celery subtasks). A (SWH) task can be `recurring` or `oneshot`. `oneshot` tasks are only executed once, whereas `recurring` are regularly executed. The scheduling configuration of these recurring tasks can be set via the fields `current_interval` and `priority` (can be 'high', 'normal' or 'low') of the task database entity. ### Inserting a new lister task To list the content of a source code provider like github or a Debian distribution, you may add a new task for this. This task will (generally) scrape a web page or use a public API to identify the list of published software artefacts (git repos, debian source packages, etc.) Then, for each repository, a new task will be created to ingest this repository and keep it up to date. For example, to add a (one shot) task that will list git repos on the 0xacab.org gitlab instance, one can do (from this git repository): ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ swh scheduler task add list-gitlab-full \ -p oneshot url=https://0xacab.org/api/v4 Created 1 tasks Task 12 Next run: just now (2018-12-19 14:58:49+00:00) Interval: 90 days, 0:00:00 Type: list-gitlab-full Policy: oneshot Args: Keyword args: url=https://0xacab.org/api/v4 ``` This will insert a new task in the scheduler. To list existing tasks for a given task type: ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ swh scheduler task list-pending list-gitlab-full Found 1 list-gitlab-full tasks Task 12 Next run: 2 minutes ago (2018-12-19 14:58:49+00:00) Interval: 90 days, 0:00:00 Type: list-gitlab-full Policy: oneshot Args: Keyword args: url=https://0xacab.org/api/v4 ``` To list all existing task types: ``` -~/swh-environment/docker$ docker-compose exec swh-scheduler-api \ +~/swh-environment/docker$ docker-compose exec swh-scheduler \ swh scheduler task-type list Known task types: load-svn-from-archive: Loading svn repositories from svn dump load-svn: Create dump of a remote svn repository, mount it and load it load-deposit: Loading deposit archive into swh through swh-loader-tar check-deposit: Pre-checking deposit step before loading into swh archive cook-vault-bundle: Cook a Vault bundle load-hg: Loading mercurial repository swh-loader-mercurial load-hg-from-archive: Loading archive mercurial repository swh-loader-mercurial load-git: Update an origin of type git list-github-incremental: Incrementally list GitHub list-github-full: Full update of GitHub repos list list-debian-distribution: List a Debian distribution list-gitlab-incremental: Incrementally list a Gitlab instance list-gitlab-full: Full update of a Gitlab instance's repos list list-pypi: Full pypi lister load-pypi: Load Pypi origin index-mimetype: Mimetype indexer task index-mimetype-for-range: Mimetype Range indexer task index-fossology-license: Fossology license indexer task index-fossology-license-for-range: Fossology license range indexer task index-origin-head: Origin Head indexer task index-revision-metadata: Revision Metadata indexer task index-origin-metadata: Origin Metadata indexer task ``` ### Monitoring activity You can monitor the workers activity by connecting to the RabbitMQ console on `http://localhost:5080/rabbitmq` or the grafana dashboard on `http://localhost:5080/grafana`. If you cannot see any task being executed, check the logs of the `swh-scheduler-runner` service (here is a failure example due to the debian lister task not being properly registered on the swh-scheduler-runner service): ``` ~/swh-environment/docker$ docker-compose logs --tail=10 swh-scheduler-runner Attaching to docker_swh-scheduler-runner_1 swh-scheduler-runner_1 | "__main__", mod_spec) swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code swh-scheduler-runner_1 | exec(code, run_globals) swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 107, in swh-scheduler-runner_1 | run_ready_tasks(main_backend, main_app) swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/swh/scheduler/celery_backend/runner.py", line 81, in run_ready_tasks swh-scheduler-runner_1 | task_types[task['type']]['backend_name'] swh-scheduler-runner_1 | File "/usr/local/lib/python3.7/site-packages/celery/app/registry.py", line 21, in __missing__ swh-scheduler-runner_1 | raise self.NotRegistered(key) swh-scheduler-runner_1 | celery.exceptions.NotRegistered: 'swh.lister.debian.tasks.DebianListerTask' ``` ## Using docker setup development and integration testing If you hack the code of one or more archive components with a virtual env based setup as described in the [[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]], you may want to test your modifications in a working Software Heritage instance. The simplest way to achieve this is to use this docker-based environment. If you haven't followed the [[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]], you must clone the the [swh-environment] repo in your `swh-environment` directory: ``` ~/swh-environment$ git clone https://forge.softwareheritage.org/source/swh-environment.git . ``` Note the `.` at the end of this command: we want the git repository to be cloned directly in the `~/swh-environment` directory, not in a sub directory. Also note that if you haven't done it yet and you want to hack the source code of one or more Software Heritage packages, you really should read the [[https://docs.softwareheritage.org/devel/developer-setup.html|developer setup guide]]. From there, we will checkout or update all the swh packages: ``` ~/swh-environment$ ./bin/update ``` ### Install a swh package from sources in a container It is possible to run a docker container with some swh packages installed from sources instead of using the latest published packages from pypi. To do this you must write a docker-compose override file (`docker-compose.override.yml`). An example is given in the `docker-compose.override.yml.example` file: ``` yaml version: '2' services: swh-objstorage: volumes: - "$HOME/swh-environment/swh-objstorage:/src/swh-objstorage" ``` The file named `docker-compose.override.yml` will automatically be loaded by `docker-compose`. This example shows the simplest case of the `swh-objstorage` package: you just have to mount it in the container in `/src` and the entrypoint will ensure every swh-* package found in `/src/` is installed (using `pip install -e` so you can easily hack your code). If the application you play with has autoreload support, there is no need to restart the impacted container.) Note: if the docker fails to start when using local sources for one or more swh package, it's most probably due to permission problems on cache files. For example, if you have executed tests locally (using pytest or tox), you have cache files (__pycache__ etc.) that will prevent `pip install` from working within the docker. The solution is to clean these files and directories before trying to spawn the docker. ``` ~/swh-environment$ find . -type d -name __pycache__ -exec rm -rf {} \; ~/swh-environment$ find . -type d -name .tox -exec rm -rf {} \; ~/swh-environment$ find . -type d -name .hypothesis -exec rm -rf {} \; ``` ### Using locally installed swh tools with docker In all examples above, we have executed swh commands from within a running container. Now we also have these swh commands locally available in our virtual env, we can use them to interact with swh services running in docker containers. For this, we just need to configure a few environment variables. First, ensure your Software Heritage virtualenv is activated (here, using virtualenvwrapper): ``` ~$ workon swh (swh) ~/swh-environment$ export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ (swh) ~/swh-environment$ export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ ``` Now we can use the `celery` command directly to control the celery system running in the docker environment: ``` (swh) ~/swh-environment$ celery status vault@c9fef1bbfdc1: OK listers@ba66f18e7d02: OK indexer@cb14c33cbbfb: OK loader@61704103668c: OK 4 nodes online. (swh) ~/swh-environment$ celery control -d loader@61704103668c pool_grow 3 ``` And we can use the `swh-scheduler` command all the same: ``` (swh) ~/swh-environment$ swh scheduler task-type list Known task types: index-fossology-license: Fossology license indexer task index-mimetype: Mimetype indexer task [...] ``` ### Make your life a bit easier When you use virtualenvwrapper, you can add postactivation commands: ``` (swh) ~/swh-environment$ cat >>$VIRTUAL_ENV/bin/postactivate <<'EOF' # unfortunately, the interface cmd for the click autocompletion # depends on the shell # https://click.palletsprojects.com/en/7.x/bashcomplete/#activation shell=$(basename $SHELL) case "$shell" in "zsh") autocomplete_cmd=source_zsh ;; *) autocomplete_cmd=source ;; esac eval "$(_SWH_COMPLETE=$autocomplete_cmd swh)" export SWH_SCHEDULER_URL=http://127.0.0.1:5008/ export CELERY_BROKER_URL=amqp://127.0.0.1:5072/ export COMPOSE_FILE=~/swh-environment/docker/docker-compose.yml:~/swh-environment/docker/docker-compose.override.yml alias doco=docker-compose function swhclean { find ~/swh-environment -type d -name __pycache__ -exec rm -rf {} \; find ~/swh-environment -type d -name .tox -exec rm -rf {} \; find ~/swh-environment -type d -name .hypothesis -exec rm -rf {} \; } EOF ``` This postactivate script does: - install a shell completion handler for the swh-scheduler command, - preset a bunch of environment variables - `SWH_SCHEDULER_URL` so that you can just run `swh scheduler` against the scheduler API instance running in docker, without having to specify the endpoint URL, - `CELERY_BROKER` so you can execute the `celery` tool (without cli options) against the rabbitmq server running in the docker environment, - `COMPOSE_FILE` so you can run `docker-compose` from everywhere, - create an alias `doco` for `docker-compose` because this is way too long to type, - add a `swhclean` shell function to clean your source directories so that there is no conflict with docker containers using local swh repositories (see below). This will delete any `.tox`, `__pycache__` and `.hypothesis` directory found in your swh-environment directory. So now you can easily: * Start the SWH platform: ``` (swh) ~/swh-environment$ doco up -d [...] ``` * Check celery: ``` (swh) ~/swh-environment$ celery status listers@50ac2185c6c9: OK loader@b164f9055637: OK indexer@33bc6067a5b8: OK ``` * List task-types: ``` (swh) ~/swh-environment$ swh scheduler task-type list [...] ``` * Get more info on a task type: ``` (swh) ~/swh-environment$ swh scheduler task-type list -v -t load-hg Known task types: load-hg: swh.loader.mercurial.tasks.LoadMercurial Loading mercurial repository swh-loader-mercurial interval: 1 day, 0:00:00 [1 day, 0:00:00, 1 day, 0:00:00] backoff_factor: 1.0 max_queue_length: 1000 num_retries: None retry_delay: None ``` * Add a new task: ``` (swh) ~/swh-environment$ swh scheduler task add load-hg \ origin_url=https://hg.logilab.org/master/cubicweb Created 1 tasks Task 1 Next run: just now (2019-02-06 12:36:58+00:00) Interval: 1 day, 0:00:00 Type: load-hg Policy: recurring Args: Keyword args: origin_url: https://hg.logilab.org/master/cubicweb ``` * Respawn a task: ``` (swh) ~/swh-environment$ swh scheduler task respawn 1 ``` ## Starting a kafka-powered replica of the storage This repo comes with an optional `docker-compose.storage-replica.yml` docker compose file that can be used to test the kafka-powered replication mecanism for the main storage. This can be used like: ``` ~/swh-environment/docker$ docker-compose -f docker-compose.yml -f docker-compose.storage-replica.yml up -d [...] ``` Compared to the original compose file, this will: - overrides the swh-storage service to activate the kafka direct writer on swh.journal.objects prefixed topics using thw swh.storage.master ID, - overrides the swh-web service to make it use the replica instead of the master storage, - starts a db for the replica, - starts a storage service based on this db, - starts a replayer service that runs the process that listen to kafka to keeps the replica in sync. When using it, you will have a setup in which the master storage is used by workers and most other services, whereas the storage replica will be used to by the web application and should be kept in sync with the master storage by kafka. Note that the object storage is not replicated here, only the graph storage. ## Starting the backfiller Reading from the storage the objects from within range [start-object, end-object] to the kafka topics. ``` (swh)$ docker-compose \ -f docker-compose.yml \ -f docker-compose.storage-replica.yml \ -f docker-compose.storage-replica.override.yml \ run \ swh-journal-backfiller \ snapshot \ --start-object 000000 \ --end-object 000001 \ --dry-run ``` diff --git a/docker/conf/deposit.yml b/docker/conf/deposit.yml index d9c4ec7..3b788aa 100644 --- a/docker/conf/deposit.yml +++ b/docker/conf/deposit.yml @@ -1,19 +1,19 @@ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008 + url: http://swh-scheduler:5008 allowed_hosts: - swh-deposit loader-version: 2 private: secret_key: prod-in-docker db: host: swh-deposit-db port: 5432 name: swh-deposit user: postgres password: testpassword media_root: /tmp/swh-deposit/uploads diff --git a/docker/conf/indexer.yml b/docker/conf/indexer.yml index dd2133b..29684fd 100644 --- a/docker/conf/indexer.yml +++ b/docker/conf/indexer.yml @@ -1,31 +1,31 @@ storage: cls: remote args: url: http://swh-storage:5002/ objstorage: cls: remote args: url: http://swh-objstorage:5003/ indexer_storage: cls: remote args: url: http://swh-idx-storage:5007/ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ celery: task_broker: amqp://guest:guest@amqp// task_modules: - swh.indexer.tasks task_queues: - swh.indexer.tasks.ContentFossologyLicense - swh.indexer.tasks.ContentLanguage - swh.indexer.tasks.ContentMimetype - swh.indexer.tasks.ContentRangeFossologyLicense - swh.indexer.tasks.ContentRangeMimetype - swh.indexer.tasks.Ctags - swh.indexer.tasks.OriginHead - swh.indexer.tasks.OriginMetadata - swh.indexer.tasks.RecomputeChecksums - swh.indexer.tasks.RevisionMetadata diff --git a/docker/conf/indexer_journal_client.yml b/docker/conf/indexer_journal_client.yml index 91877c9..e8c030d 100644 --- a/docker/conf/indexer_journal_client.yml +++ b/docker/conf/indexer_journal_client.yml @@ -1,11 +1,11 @@ journal: brokers: - kafka group_id: swh.indexer.journal_client max_messages: 50 scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ diff --git a/docker/conf/lister.yml b/docker/conf/lister.yml index 0c326f1..af2f1b1 100644 --- a/docker/conf/lister.yml +++ b/docker/conf/lister.yml @@ -1,60 +1,60 @@ storage: cls: remote args: url: http://swh-storage:5002/ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ lister: cls: local args: db: postgresql://postgres@swh-listers-db/swh-listers celery: task_broker: amqp://guest:guest@amqp// task_modules: - swh.lister.bitbucket.tasks - swh.lister.cgit.tasks - swh.lister.cran.tasks - swh.lister.debian.tasks - swh.lister.github.tasks - swh.lister.gitlab.tasks - swh.lister.gnu.tasks - swh.lister.npm.tasks - swh.lister.packagist.tasks - swh.lister.phabricator.tasks - swh.lister.pypi.tasks task_queues: - swh.lister.bitbucket.tasks.FullBitBucketRelister - swh.lister.bitbucket.tasks.IncrementalBitBucketLister - swh.lister.bitbucket.tasks.RangeBitBucketLister - swh.lister.bitbucket.tasks.ping - swh.lister.cgit.tasks.CGitListerTask - swh.lister.cgit.tasks.ping - swh.lister.cran.tasks.CRANListerTask - swh.lister.cran.tasks.ping - swh.lister.debian.tasks.DebianListerTask - swh.lister.debian.tasks.ping - swh.lister.github.tasks.FullGitHubRelister - swh.lister.github.tasks.IncrementalGitHubLister - swh.lister.github.tasks.RangeGitHubLister - swh.lister.github.tasks.ping - swh.lister.gitlab.tasks.FullGitLabRelister - swh.lister.gitlab.tasks.IncrementalGitLabLister - swh.lister.gitlab.tasks.RangeGitLabLister - swh.lister.gitlab.tasks.ping - swh.lister.gnu.tasks.GNUListerTask - swh.lister.gnu.tasks.ping - swh.lister.npm.tasks.NpmIncrementalListerTask - swh.lister.npm.tasks.NpmListerTask - swh.lister.npm.tasks.ping - swh.lister.packagist.tasks.PackagistListerTask - swh.lister.packagist.tasks.ping - swh.lister.phabricator.tasks.FullPhabricatorLister - swh.lister.phabricator.tasks.IncrementalPhabricatorLister - swh.lister.phabricator.tasks.ping - swh.lister.pypi.tasks.PyPIListerTask - swh.lister.pypi.tasks.ping diff --git a/docker/conf/loader.yml b/docker/conf/loader.yml index f580933..1517c95 100644 --- a/docker/conf/loader.yml +++ b/docker/conf/loader.yml @@ -1,54 +1,54 @@ storage: cls: filter args: storage: cls: buffer args: storage: cls: remote args: url: http://swh-storage:5002/ min_batch_size: content: 10000 content_bytes: 104857600 directory: 1000 revision: 1000 scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ celery: task_broker: amqp://guest:guest@amqp// task_modules: - swh.loader.git.tasks - swh.loader.mercurial.tasks - swh.loader.svn.tasks - swh.deposit.loader.tasks - swh.loader.package.archive.tasks - swh.loader.package.debian.tasks - swh.loader.package.deposit.tasks - swh.loader.package.npm.tasks - swh.loader.package.pypi.tasks task_queues: - swh.loader.dir.tasks.LoadDirRepository - swh.loader.git.tasks.LoadDiskGitRepository - swh.loader.git.tasks.UncompressAndLoadDiskGitRepository - swh.loader.git.tasks.UpdateGitRepository - swh.loader.mercurial.tasks.LoadArchiveMercurial - swh.loader.mercurial.tasks.LoadMercurial - swh.loader.package.tasks.archive.LoadArchive - swh.loader.package.tasks.debian.LoadDebian - swh.loader.package.tasks.deposit.LoadDeposit - swh.loader.package.tasks.npm.LoadNpm - swh.loader.package.tasks.pypi.LoadPyPI - swh.loader.svn.tasks.DumpMountAndLoadSvnRepository - swh.loader.svn.tasks.LoadSvnRepository - swh.loader.svn.tasks.MountAndLoadSvnRepository - swh.deposit.loader.tasks.ChecksDepositTsk lister_db_url: postgresql://postgres@swh-listers-db/swh-listers url: 'http://swh-deposit:5006' diff --git a/docker/conf/nginx.conf b/docker/conf/nginx.conf index a0774ad..98b6a29 100644 --- a/docker/conf/nginx.conf +++ b/docker/conf/nginx.conf @@ -1,107 +1,107 @@ worker_processes 1; # Show startup logs on stderr; switch to debug to print, well, debug logs when # running nginx-debug error_log /dev/stderr info; events { worker_connections 1024; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; # Built-in Docker resolver. Needed to allow on-demand resolution of proxy # upstreams. resolver 127.0.0.11 valid=30s; server { listen 5080 default_server; # Add a trailing slash to top level requests (e.g. http://localhost:5080/flower) rewrite ^/([^/]+)$ /$1/ permanent; # In this pile of proxies, all upstreams are set using a variable. This # makes nginx DNS-resolve the name of the upstream when clients request # them, rather than on start. This avoids an unstarted container preventing # nginx from starting. # # Variables need to be set as early as possible, as they're statements from # the rewrite module and `rewrite [...] break;` will prevent these # statements from being executed. location /flower/ { set $upstream "http://flower:5555"; rewrite ^/flower/(.*)$ /$1 break; proxy_pass $upstream; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_redirect off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } location /rabbitmq/ { set $upstream "http://amqp:15672"; rewrite ^ $request_uri; rewrite ^/rabbitmq(/.*)$ $1 break; proxy_pass $upstream$uri; } location /scheduler { - set $upstream "http://swh-scheduler-api:5008"; + set $upstream "http://swh-scheduler:5008"; rewrite ^/scheduler/(.*)$ /$1 break; proxy_pass $upstream; } location /storage { set $upstream "http://swh-storage:5002"; rewrite ^/storage/(.*)$ /$1 break; proxy_pass $upstream; } location /indexer-storage { set $upstream "http://swh-idx-storage:5007"; rewrite ^/indexer-storage/(.*)$ /$1 break; proxy_pass $upstream; } location /deposit { set $upstream "http://swh-deposit:5006"; rewrite ^/deposit/(.*)$ /$1 break; proxy_pass $upstream; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_redirect off; } location /objstorage { set $upstream "http://swh-objstorage:5003"; rewrite ^/objstorage/(.*)$ /$1 break; proxy_pass $upstream; } location /prometheus { set $upstream "http://prometheus:9090"; proxy_pass $upstream; } location /grafana { set $upstream "http://grafana:3000"; rewrite ^/grafana/(.*)$ /$1 break; proxy_pass $upstream; } location / { set $upstream "http://swh-web:5004"; proxy_pass $upstream; } } } diff --git a/docker/conf/vault-worker.yml b/docker/conf/vault-worker.yml index 8a195ac..b9b463e 100644 --- a/docker/conf/vault-worker.yml +++ b/docker/conf/vault-worker.yml @@ -1,17 +1,17 @@ storage: cls: remote args: url: http://swh-storage:5002/ vault: cls: remote args: - url: http://swh-vault-api:5005/ + url: http://swh-vault:5005/ celery: task_broker: amqp://guest:guest@amqp// task_modules: - swh.vault.cooking_tasks task_queues: - swh.vault.cooking_tasks.SWHBatchCookingTask - swh.vault.cooking_tasks.SWHCookingTask max_bundle_size: 536870912 diff --git a/docker/conf/vault-api.yml b/docker/conf/vault.yml similarity index 86% rename from docker/conf/vault-api.yml rename to docker/conf/vault.yml index b3ec6a3..5d00ae1 100644 --- a/docker/conf/vault-api.yml +++ b/docker/conf/vault.yml @@ -1,17 +1,17 @@ storage: cls: remote args: url: http://swh-storage:5002/ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ vault: cls: local args: db: postgresql:///?service=swh-vault cache: cls: pathslicing args: root: /srv/softwareheritage/vault slicing: 0:5 diff --git a/docker/conf/web-replica.yml b/docker/conf/web-replica.yml index 06ba565..a4d0df7 100644 --- a/docker/conf/web-replica.yml +++ b/docker/conf/web-replica.yml @@ -1,37 +1,37 @@ storage: cls: remote args: url: http://swh-storage-replica:5002/ timeout: 1 objstorage: cls: remote args: url: http://swh-objstorage:5003/ indexer_storage: cls: remote args: url: http://swh-idx-storage:5007/ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ vault: cls: remote args: - url: http://swh-vault-api:5005/ + url: http://swh-vault:5005/ deposit: private_api_url: https://swh-deposit:5006/1/private/ private_api_user: swhworker private_api_password: '' allowed_hosts: - "*" debug: yes serve_assets: yes diff --git a/docker/conf/web.yml b/docker/conf/web.yml index 733c545..a6046d3 100644 --- a/docker/conf/web.yml +++ b/docker/conf/web.yml @@ -1,67 +1,67 @@ storage: cls: remote args: url: http://swh-storage:5002/ timeout: 1 objstorage: cls: remote args: url: http://swh-objstorage:5003/ indexer_storage: cls: remote args: url: http://swh-idx-storage:5007/ scheduler: cls: remote args: - url: http://swh-scheduler-api:5008/ + url: http://swh-scheduler:5008/ vault: cls: remote args: - url: http://swh-vault-api:5005/ + url: http://swh-vault:5005/ deposit: private_api_url: https://swh-deposit:5006/1/private/ private_api_user: swhworker private_api_password: '' allowed_hosts: - "*" debug: yes serve_assets: yes development_db: /tmp/db.sqlite3 throttling: scopes: swh_api: limiter_rate: default: 120/h exempted_networks: - 0.0.0.0/0 swh_api_origin_search: limiter_rate: default: 70/m exempted_networks: - 0.0.0.0/0 swh_api_origin_visit_latest: limiter_rate: default: 700/m exempted_networks: - 0.0.0.0/0 swh_vault_cooking: limiter_rate: default: 120/h exempted_networks: - 0.0.0.0/0 swh_save_origin: limiter_rate: default: 120/h exempted_networks: - 0.0.0.0/0 diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml index 44da7d2..8d33ebc 100644 --- a/docker/docker-compose.yml +++ b/docker/docker-compose.yml @@ -1,390 +1,390 @@ version: '2' services: amqp: image: rabbitmq:3.6-management ports: - 5072:5672 # flower: # image: mher/flower # command: --broker=amqp://guest:guest@amqp:5672// --url_prefix=flower # ports: # - 5055:5555 # depends_on: # - amqp zookeeper: image: wurstmeister/zookeeper restart: always kafka: image: wurstmeister/kafka ports: - "5092:9092" env_file: ./env/kafka.env depends_on: - zookeeper kafka-manager: image: hlebalbau/kafka-manager:stable ports: - "5093:9000" environment: ZK_HOSTS: zookeeper:2181 APPLICATION_SECRET: random-secret command: -Dpidfile.path=/dev/null prometheus: image: prom/prometheus depends_on: - prometheus-statsd-exporter command: # Needed for the reverse-proxy - "--web.external-url=/prometheus" - "--config.file=/etc/prometheus/prometheus.yml" volumes: - "./conf/prometheus.yml:/etc/prometheus/prometheus.yml:ro" restart: unless-stopped prometheus-statsd-exporter: image: prom/statsd-exporter command: - "--statsd.mapping-config=/etc/prometheus/statsd-mapping.yml" volumes: - "./conf/prometheus-statsd-mapping.yml:/etc/prometheus/statsd-mapping.yml:ro" restart: unless-stopped grafana: image: grafana/grafana restart: unless-stopped depends_on: - prometheus environment: GF_SERVER_ROOT_URL: http://localhost:5080/grafana volumes: - "./conf/grafana/provisioning:/etc/grafana/provisioning:ro" - "./conf/grafana/dashboards:/var/lib/grafana/dashboards" nginx: image: nginx volumes: - "./conf/nginx.conf:/etc/nginx/nginx.conf:ro" ports: - 5080:5080 # Scheduler swh-scheduler-db: image: postgres:11 env_file: - ./env/scheduler-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: - swh-scheduler-api: + swh-scheduler: image: swh/stack build: ./ env_file: - ./env/scheduler-db.env - ./env/scheduler.env environment: SWH_CONFIG_FILENAME: /scheduler.yml SWH_SCHEDULER_CONFIG_FILE: /scheduler.yml entrypoint: /entrypoint.sh depends_on: - swh-scheduler-db ports: - 5008:5008 volumes: - "./conf/scheduler.yml:/scheduler.yml:ro" - - "./services/swh-scheduler-api/entrypoint.sh:/entrypoint.sh:ro" + - "./services/swh-scheduler/entrypoint.sh:/entrypoint.sh:ro" swh-scheduler-listener: image: swh/stack build: ./ env_file: - ./env/scheduler-db.env - ./env/scheduler.env environment: SWH_CONFIG_FILENAME: /scheduler.yml SWH_SCHEDULER_CONFIG_FILE: /scheduler.yml entrypoint: /entrypoint.sh command: start-listener depends_on: - - swh-scheduler-api + - swh-scheduler - amqp volumes: - "./conf/scheduler.yml:/scheduler.yml:ro" - "./services/swh-scheduler-worker/entrypoint.sh:/entrypoint.sh:ro" swh-scheduler-runner: image: swh/stack build: ./ env_file: - ./env/scheduler-db.env - ./env/scheduler.env environment: SWH_CONFIG_FILENAME: /scheduler.yml SWH_SCHEDULER_CONFIG_FILE: /scheduler.yml entrypoint: /entrypoint.sh command: start-runner -p 10 depends_on: - - swh-scheduler-api + - swh-scheduler - amqp volumes: - "./conf/scheduler.yml:/scheduler.yml:ro" - "./services/swh-scheduler-worker/entrypoint.sh:/entrypoint.sh:ro" # Graph storage swh-storage-db: image: postgres:11 env_file: - ./env/storage-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: swh-storage: image: swh/stack build: ./ ports: - 5002:5002 depends_on: - swh-storage-db - swh-objstorage - kafka env_file: - ./env/storage-db.env environment: SWH_CONFIG_FILENAME: /storage.yml STORAGE_BACKEND: postgresql entrypoint: /entrypoint.sh volumes: - "./conf/storage.yml:/storage.yml:ro" - "./services/swh-storage/entrypoint.sh:/entrypoint.sh:ro" # Object storage swh-objstorage: build: ./ image: swh/stack ports: - 5003:5003 environment: SWH_CONFIG_FILENAME: /objstorage.yml entrypoint: /entrypoint.sh volumes: - "./conf/objstorage.yml:/objstorage.yml:ro" - "./services/swh-objstorage/entrypoint.sh:/entrypoint.sh:ro" # Indexer storage swh-idx-storage-db: image: postgres:11 env_file: - ./env/indexers-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: swh-idx-storage: image: swh/stack build: ./ ports: - 5007:5007 depends_on: - swh-idx-storage-db env_file: - ./env/indexers-db.env environment: SWH_CONFIG_FILENAME: /indexer_storage.yml entrypoint: /entrypoint.sh volumes: - "./conf/indexer_storage.yml:/indexer_storage.yml:ro" - "./services/swh-indexer-storage/entrypoint.sh:/entrypoint.sh:ro" # Web interface swh-web: build: ./ image: swh/stack ports: - 5004:5004 depends_on: - swh-objstorage - swh-storage - swh-idx-storage environment: VERBOSITY: 3 DJANGO_SETTINGS_MODULE: swh.web.settings.development SWH_CONFIG_FILENAME: /web.yml entrypoint: /entrypoint.sh volumes: - "./conf/web.yml:/web.yml:ro" - "./services/swh-web/entrypoint.sh:/entrypoint.sh:ro" swh-deposit-db: image: postgres:11 env_file: - ./env/deposit-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: swh-deposit: image: swh/stack build: ./ ports: - 5006:5006 depends_on: - swh-deposit-db - - swh-scheduler-api + - swh-scheduler environment: VERBOSITY: 3 SWH_CONFIG_FILENAME: /deposit.yml DJANGO_SETTINGS_MODULE: swh.deposit.settings.production env_file: - ./env/deposit-db.env entrypoint: /entrypoint.sh volumes: - "./conf/deposit.yml:/deposit.yml:ro" - "./services/swh-deposit/entrypoint.sh:/entrypoint.sh:ro" swh-vault-db: image: postgres:11 env_file: - ./env/vault-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: - swh-vault-api: + swh-vault: image: swh/stack build: ./ env_file: - ./env/vault-db.env environment: - SWH_CONFIG_FILENAME: /vault-api.yml + SWH_CONFIG_FILENAME: /vault.yml command: server ports: - 5005:5005 depends_on: - swh-vault-db - swh-objstorage - swh-storage - - swh-scheduler-api + - swh-scheduler entrypoint: /entrypoint.sh volumes: - - "./conf/vault-api.yml:/vault-api.yml:ro" + - "./conf/vault.yml:/vault.yml:ro" - "./services/swh-vault/entrypoint.sh:/entrypoint.sh:ro" swh-vault-worker: image: swh/stack build: ./ command: worker environment: SWH_CONFIG_FILENAME: /cooker.yml depends_on: - - swh-vault-api + - swh-vault - swh-storage entrypoint: /entrypoint.sh volumes: - "./conf/vault-worker.yml:/cooker.yml:ro" - "./services/swh-vault/entrypoint.sh:/entrypoint.sh:ro" # Lister Celery workers swh-listers-db: image: postgres:11 env_file: - ./env/listers-db.env environment: # unset PGHOST as db service crashes otherwise PGHOST: swh-lister: image: swh/stack build: ./ env_file: - ./env/listers-db.env - ./env/listers.env user: swh environment: STATSD_HOST: prometheus-statsd-exporter STATSD_PORT: 9125 SWH_WORKER_INSTANCE: listers SWH_CONFIG_FILENAME: /lister.yml depends_on: - swh-listers-db - - swh-scheduler-api + - swh-scheduler - swh-scheduler-runner - swh-storage - amqp entrypoint: /entrypoint.sh volumes: - "./conf/lister.yml:/lister.yml:ro" - "./services/swh-listers-worker/entrypoint.sh:/entrypoint.sh:ro" # Loader Celery workers swh-loader: image: swh/stack build: ./ env_file: - ./env/listers.env user: swh environment: STATSD_HOST: prometheus-statsd-exporter STATSD_PORT: 9125 SWH_WORKER_INSTANCE: loader SWH_CONFIG_FILENAME: /loader.yml entrypoint: /entrypoint.sh depends_on: - swh-storage - - swh-scheduler-api + - swh-scheduler - amqp volumes: - "./conf/loader.yml:/loader.yml:ro" - "./services/swh-loaders-worker/entrypoint.sh:/entrypoint.sh:ro" # Indexer Celery workers swh-indexer: image: swh/stack build: ./ user: swh env_file: - ./env/indexers-db.env - ./env/indexers.env environment: STATSD_HOST: prometheus-statsd-exporter STATSD_PORT: 9125 entrypoint: /entrypoint.sh depends_on: - swh-scheduler-runner - swh-idx-storage - swh-storage - swh-objstorage - amqp volumes: - "./conf/indexer.yml:/indexer.yml:ro" - "./services/swh-indexer-worker/entrypoint.sh:/entrypoint.sh:ro" # Journal related swh-indexer-journal-client: image: swh/stack build: ./ entrypoint: /entrypoint.sh depends_on: - kafka - swh-storage - - swh-scheduler-api + - swh-scheduler volumes: - "./conf/indexer_journal_client.yml:/etc/softwareheritage/indexer/journal_client.yml:ro" - "./services/swh-indexer-journal-client/entrypoint.sh:/entrypoint.sh:ro" diff --git a/docker/services/swh-scheduler-api/entrypoint.sh b/docker/services/swh-scheduler/entrypoint.sh similarity index 100% rename from docker/services/swh-scheduler-api/entrypoint.sh rename to docker/services/swh-scheduler/entrypoint.sh diff --git a/docker/tests/run_tests.sh b/docker/tests/run_tests.sh index effa51c..06e774c 100755 --- a/docker/tests/run_tests.sh +++ b/docker/tests/run_tests.sh @@ -1,182 +1,182 @@ #!/bin/bash # Main script to run high level tests on the Software Heritage stack # Use a temporary directory as working directory WORKDIR=/tmp/swh-docker-dev_tests # Create it if it does not exist mkdir $WORKDIR 2>/dev/null # Ensure it is empty before running the tests rm -rf $WORKDIR/* # We want the script to exit at the first encountered error set -e # Get test scripts directory TEST_SCRIPTS_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd) # Set the docker-compose.yml file to use export COMPOSE_FILE=$TEST_SCRIPTS_DIR/../docker-compose.yml # Useful global variables SWH_WEB_API_BASEURL="http://localhost:5004/api/1" CURRENT_TEST_SCRIPT="" # Colored output related variables and functions (only if stdout is a terminal) if test -t 1; then GREEN='\033[0;32m' RED='\033[0;31m' NC='\033[0m' else DOCO_OPTIONS='--no-ansi' fi # Remove previously dumped service logs file if any rm -f $TEST_SCRIPTS_DIR/swh-docker-compose.logs function colored_output { local msg="$2" if [ "$CURRENT_TEST_SCRIPT" != "" ]; then msg="[$CURRENT_TEST_SCRIPT] $msg" fi echo -e "${1}${msg}${NC}" } function status_message { colored_output ${GREEN} "$1" } function error_message { colored_output ${RED} "$1" } function dump_docker_logs { error_message "Dumping logs for all services in file $TEST_SCRIPTS_DIR/swh-docker-compose.logs" docker-compose logs > $TEST_SCRIPTS_DIR/swh-docker-compose.logs } # Exit handler that will get called when this script terminates function finish { if [ $? -ne 0 ] && [ "$CURRENT_TEST_SCRIPT" != "" ]; then local SCRIPT_NAME=$CURRENT_TEST_SCRIPT CURRENT_TEST_SCRIPT="" error_message "An error occurred when running test script ${SCRIPT_NAME}" dump_docker_logs fi docker-compose $DOCO_OPTIONS down rm -rf $WORKDIR } trap finish EXIT # Docker-compose events listener that will be executed in background # Parameters: # $1: PID of parent process function listen_docker_events { docker-compose $DOCO_OPTIONS events | while read event do service=$(echo $event | cut -d " " -f7 | sed 's/^name=swh-docker-dev_\(.*\)_1)/\1/') event_type=$(echo $event | cut -d ' ' -f4) # "docker-compose down" has been called, exiting this child process if [ "$event_type" = "kill" ] ; then exit # a swh service crashed, sending signal to parent process to exit with error elif [ "$event_type" = "die" ]; then if [[ "$service" =~ ^swh.* ]]; then exit_code=$(docker-compose ps | grep $service | awk '{print $4}') if [ "$exit_code" != "0" ]; then error_message "Service $service died unexpectedly, exiting" dump_docker_logs kill -s SIGUSR1 $1; exit fi fi fi done } trap "exit 1" SIGUSR1 declare -A SERVICE_LOGS_NB_LINES_READ # Function to wait for a specific string to be outputted in a specific # docker-compose service logs. # When called multiple times on the same service, only the newly outputted # logs since the last call will be processed. # Parameters: # $1: a timeout value in seconds to stop waiting and exit with error # $2: docker-compose service name # $3: the string to look for in the produced logs function wait_for_service_output { local nb_lines_to_skip=0 if [[ -v "SERVICE_LOGS_NB_LINES_READ[$2]" ]]; then let nb_lines_to_skip=${SERVICE_LOGS_NB_LINES_READ[$2]}+1 fi SECONDS=0 local service_logs=$(docker-compose $DOCO_OPTIONS logs $2 | tail -n +$nb_lines_to_skip) until echo -ne "$service_logs" | grep -m 1 "$3" >/dev/null ; do sleep 1; if (( $SECONDS > $1 )); then error_message "Could not find pattern \"$3\" in $2 service logs after $1 seconds" exit 1 fi let nb_lines_to_skip+=$(echo -ne "$service_logs" | wc -l) service_logs=$(docker-compose $DOCO_OPTIONS logs $2 | tail -n +$nb_lines_to_skip) done let nb_lines_to_skip+=$(echo -ne "$service_logs" | wc -l) SERVICE_LOGS_NB_LINES_READ[$2]=$nb_lines_to_skip } # Function to make an HTTP request and gets its response. # It should be used the following way: # response=$(http_request ) # Parameters: # $1: http method name (GET, POST, ...) # $2: request url function http_request { local response=$(curl -sS -X $1 $2) echo $response } # Function to check that an HTTP request ends up with no errors. # If the HTTP response code is different from 200, an error will # be raised and the main script will terminate # Parameters: # $1: http method name (GET, POST, ...) # $2: request url function http_request_check { curl -sSf -X $1 $2 > /dev/null } # Function to run the content of a script dedicated to test a specific # part of the Software Heritage stack. function run_test_script { local SCRIPT_NAME=$(basename $1) status_message "Executing test script $SCRIPT_NAME" CURRENT_TEST_SCRIPT=$SCRIPT_NAME source $1 } # Move to work directory cd $WORKDIR # Start the docker-compose event handler as a background process status_message "Starting docker-compose events listener" listen_docker_events $$ & # Start the docker-compose environment including the full Software Heritage stack status_message "Starting swh docker-compose environment" docker-compose $DOCO_OPTIONS up -d # Ensure all swh services are up before running tests status_message "Waiting for swh services to be up" docker-compose $DOCO_OPTIONS exec -T swh-storage wait-for-it localhost:5002 -s --timeout=0 docker-compose $DOCO_OPTIONS exec -T swh-objstorage wait-for-it localhost:5003 -s --timeout=0 docker-compose $DOCO_OPTIONS exec -T swh-web wait-for-it localhost:5004 -s --timeout=0 -docker-compose $DOCO_OPTIONS exec -T swh-vault-api wait-for-it localhost:5005 -s --timeout=0 +docker-compose $DOCO_OPTIONS exec -T swh-vault wait-for-it localhost:5005 -s --timeout=0 docker-compose $DOCO_OPTIONS exec -T swh-deposit wait-for-it localhost:5006 -s --timeout=0 docker-compose $DOCO_OPTIONS exec -T swh-idx-storage wait-for-it localhost:5007 -s --timeout=0 -docker-compose $DOCO_OPTIONS exec -T swh-scheduler-api wait-for-it localhost:5008 -s --timeout=0 +docker-compose $DOCO_OPTIONS exec -T swh-scheduler wait-for-it localhost:5008 -s --timeout=0 # Execute test scripts for test_script in $TEST_SCRIPTS_DIR/test_*; do run_test_script ${test_script} CURRENT_TEST_SCRIPT="" done diff --git a/docker/tests/test_01_loader_git.sh b/docker/tests/test_01_loader_git.sh old mode 100644 new mode 100755 index e907d0f..f5ba60a --- a/docker/tests/test_01_loader_git.sh +++ b/docker/tests/test_01_loader_git.sh @@ -1,70 +1,70 @@ #!/bin/bash shopt -s nullglob extglob TEST_GIT_REPO_NAME="swh-loader-core" TEST_GIT_REPO_URL="https://forge.softwareheritage.org/source/${TEST_GIT_REPO_NAME}.git" status_message "Scheduling the loading of the git repository located at ${TEST_GIT_REPO_URL}" -docker-compose $DOCO_OPTIONS exec -T swh-scheduler-api swh scheduler task add load-git repo_url=$TEST_GIT_REPO_URL +docker-compose $DOCO_OPTIONS exec -T swh-scheduler swh scheduler task add load-git repo_url=$TEST_GIT_REPO_URL status_message "Waiting for the git loading task to complete" wait_for_service_output 300 swh-loader "swh.loader.git.tasks.UpdateGitRepository.*succeeded" status_message "The loading task has been successfully executed" status_message "Getting all git objects contained in the repository" git clone $TEST_GIT_REPO_URL cd $TEST_GIT_REPO_NAME cd "$(git rev-parse --git-path objects)" for p in pack/pack-*([0-9a-f]).idx ; do git show-index < $p | cut -f 2 -d ' ' > $WORKDIR/git_objects done for o in [0-9a-f][0-9a-f]/*([0-9a-f]) ; do echo ${o/\/} >> $WORKDIR/git_objects done declare -ga CONTENTS declare -ga DIRECTORIES declare -ga REVISIONS declare -ga RELEASES while IFS='' read -r object || [[ -n "$object" ]]; do object_type=$(git cat-file -t $object) if [ "$object_type" = "blob" ]; then CONTENTS+=($object) elif [ "$object_type" = "tree" ]; then DIRECTORIES+=($object) elif [ "$object_type" = "commit" ]; then REVISIONS+=($object) elif [ "$object_type" = "tag" ]; then RELEASES+=($object) fi done < $WORKDIR/git_objects status_message "Checking all git objects have been successfully loaded into the archive" status_message "Checking contents" for content in "${CONTENTS[@]}"; do http_request_check GET ${SWH_WEB_API_BASEURL}/content/sha1_git:$content/ done status_message "All contents have been successfully loaded into the archive" status_message "Checking directories" for directory in "${DIRECTORIES[@]}"; do http_request_check GET ${SWH_WEB_API_BASEURL}/directory/$directory/ done status_message "All directories have been successfully loaded into the archive" status_message "Checking revisions" for revision in "${REVISIONS[@]}"; do http_request_check GET ${SWH_WEB_API_BASEURL}/revision/$revision/ done status_message "All revisions have been successfully loaded into the archive" status_message "Checking releases" for release in "${RELEASES[@]}"; do http_request_check GET ${SWH_WEB_API_BASEURL}/release/$release/ done status_message "All releases have been successfully loaded into the archive"