diff --git a/sysadm/deployment/howto-process-add-forge-now-requests.rst b/sysadm/deployment/howto-process-add-forge-now-requests.rst index d6e1b64..d9a64d8 100644 --- a/sysadm/deployment/howto-process-add-forge-now-requests.rst +++ b/sysadm/deployment/howto-process-add-forge-now-requests.rst @@ -1,300 +1,302 @@ .. _how-to-process-add-forge-now-requests: How to process add-forge-now requests ===================================== .. admonition:: Intended audience :class: important sysadm staff members The processing is semi-automatic for the moment. Referencing the steps is a kickstarter for automation. Introduction ------------ A forge ticket (`see for example the git.afpy.org ticket `_) should have been opened by a moderator. Meaning the `moderation process is ongoing `_ and the upstream forge (to be ingested) has been notified we will start the ingestion soon. .. _add-forge-now-testing-on-staging: Testing on staging ------------------ To ensure we can ingest that forge, we start by testing out a subset of that forge listing on staging. It's a pre-check flight to determine we have the right amount of information. On a staging node (usually the scheduling node of the domain), run: .. code:: swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ add-forge-now --preset staging \ register-lister gitea \ url= For example, forge `git.afpy.org `_ which is a `gitea `_ instance, we'd run: .. code:: swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ add-forge-now --preset staging \ register-lister gitea \ url=https://git.afpy.org/api/v1/ INFO:swh.lister.pattern:Max origins per page set, truncated 36 page results down to 30 INFO:swh.lister.pattern:Disabling origins before sending them to the scheduler INFO:swh.lister.pattern:Reached page limit of 3, terminating Ensure the :ref:`lister got registered` in the staging scheduler db. After a bit of time, you can :ref:`check origins from that forge got listed ` in the scheduler db: Still on a staging node, we trigger the first ingestion for those origins: .. code:: - swh scheduler --preset staging add-forge-now \ + swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ + add-forge-now --preset staging \ schedule-first-visits \ --visit-type \ --visit-type \ --lister-name \ --lister-instance-name For our particular instance: .. code:: - swh scheduler --preset staging add-forge-now \ + swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ + add-forge-now --preset staging \ schedule-first-visits \ --visit-type git \ --lister-name gitea \ --lister-instance-name git.afpy.org 100 slots available in celery queue 15 visits to send to celery After some time, :ref:`check those origins got ingested at least in part `. If everything is fine, let's :ref:`schedule that forge in production `. .. _add-forge-now-deploying-on-production: Deploying on production ----------------------- After :ref:`testing with success the forge ingestion in staging `, it's time to deploy the full and recurrent listing for that forge. Let's start by registering the lister for that forge as usual: .. code:: swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ add-forge-now ( --preset production ) \ register-lister \ url= For example: .. code:: swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ add-forge-now ( --preset production ) \ register-lister gitea \ url=https://git.afpy.org/api/v1/ Ensure the :ref:`lister got registered` in the production scheduler db. After a bit of time, you can :ref:`check origins from that forge got listed ` in the scheduler db: Once the listing is through, we trigger the add-forge-now scheduling to make a first pass on that forge. .. code:: swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ add-forge-now ( --preset production ) \ schedule-first-visits \ --visit-type \ --lister-name \ --lister-instance-name For example: .. code:: swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ - add-forge-now \ + add-forge-now ( --preset production ) \ schedule-first-visits \ --visit-type git \ --lister-name gitea \ --lister-instance-name git.afpy.org 10000 slots available in celery queue 37 visits to send to celery After a while, :ref:`you can check those origins should have been ingested in part `. You can now notify the moderator in the ticket that the first ingestion got done. .. _add-forge-now-checks: Usual checks ------------ In the following, we will demonstrate the usual checks happening in the scheduler db. The format will be the generic query to execute followed by an actual execution (with a sampled output). .. _check-lister-is-registered: Check the lister is registered ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: select * from listers where name='' and instance_name=''; Example: .. code:: 2022-12-06 11:50:17 swh-scheduler@db1:5432 λ \ select * from listers where name='gitea' and instance_name='git.afpy.org'; +--------------------------------------+-------+---------------+-------------------------------+ | id | name | instance_name | created | ... +--------------------------------------+-------+---------------+-------------------------------+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | gitea | git.afpy.org | 2022-12-06 10:47:46.975571+00 | +--------------------------------------+-------+---------------+-------------------------------+ (1 row) Time: 4.109 ms .. _check-origins-got-listed: Check origins got listed ^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: select lister_id, url, visit_type from listed_origins where lister_id = (select id from listers where name='' and instance_name=''); Example: .. code:: 2022-12-06 11:50:24 swh-scheduler@db1:5432 λ \ select lister_id, url, visit_type from listed_origins where lister_id = (select id from listers where name='gitea' and instance_name='git.afpy.org'); +--------------------------------------+-----------------------------------------------------------+------------+ | lister_id | url | visit_type | +--------------------------------------+-----------------------------------------------------------+------------+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/AFPy/afpy.org.git | git | | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/foxmask/baeuda.git | git | | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/fcode/boilerplate-python.git | git | ... +--------------------------------------+-----------------------------------------------------------+------------+ (15 rows) Time: 1225.399 ms (00:01.225) .. _check-origins-got-ingested: Check origins got ingested ^^^^^^^^^^^^^^^^^^^^^^^^^^ Either one of the query is fine: .. code:: select visit_type, url, last_visit_status from origin_visit_stats where visit_type='' and url like 'https://%'; Example: .. code:: 2022-12-12 12:08:58 softwareheritage-scheduler@belvedere:5432 λ \ select visit_type, url, last_visit_status from origin_visit_stats where visit_type='git' and url like 'https://git.afpy.org%'; +------------+-----------------------------------------------------------+-------------------+ | visit_type | url | last_visit_status | +------------+-----------------------------------------------------------+-------------------+ | git | https://git.afpy.org/mdk/infra.git | successful | | git | https://git.afpy.org/ChristopheNan/python-docs-fr.git | successful | | git | https://git.afpy.org/fcode/delarte.git | successful | ... +------------+-----------------------------------------------------------+-------------------+ (37 rows) Time: 95171.399 ms (01:35.171) or this one, though this will take longer to execute: .. code:: select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='' and instance_name='') Example: .. code:: 2022-12-12 11:56:57 softwareheritage-scheduler@belvedere:5432 λ \ select last_visit_status, count(ovs.url) from origin_visit_stats ovs join listed_origins lo USING(url, visit_type) where lister_id = (select id from listers where name='gitea' and instance_name='git.afpy.org') and visit_type='git' group by last_visit_status; +-------------------+-------+ | last_visit_status | count | +-------------------+-------+ | successful | 37 | +-------------------+-------+ (1 row) Time: 149774.756 ms (02:29.775)