diff --git a/sysadm/deployment/howto-process-add-forge-now-requests.rst b/sysadm/deployment/howto-process-add-forge-now-requests.rst new file mode 100644 --- /dev/null +++ b/sysadm/deployment/howto-process-add-forge-now-requests.rst @@ -0,0 +1,300 @@ +.. _how-to-process-add-forge-now-requests: + +How to process add-forge-now requests +===================================== + +.. admonition:: Intended audience + :class: important + + sysadm staff members + +The processing is semi-automatic for the moment. Referencing the steps is a kickstarter +for automation. + + +Introduction +------------ + +A forge ticket (`see for example the git.afpy.org ticket +`_) should +have been opened by a moderator. + +Meaning the `moderation process is ongoing +`_ and the upstream +forge (to be ingested) has been notified we will start the ingestion soon. + + +.. _add-forge-now-testing-on-staging: + +Testing on staging +------------------ + +To ensure we can ingest that forge, we start by testing out a subset of that forge +listing on staging. It's a pre-check flight to determine we have the right amount of +information. + +On a staging node (usually the scheduling node of the domain), run: + +.. code:: + + swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ + add-forge-now --preset staging \ + register-lister gitea \ + url= + + +For example, forge `git.afpy.org `_ which is a `gitea +`_ instance, we'd run: + +.. code:: + + swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \ + add-forge-now --preset staging \ + register-lister gitea \ + url=https://git.afpy.org/api/v1/ + + INFO:swh.lister.pattern:Max origins per page set, truncated 36 page results down to 30 + INFO:swh.lister.pattern:Disabling origins before sending them to the scheduler + INFO:swh.lister.pattern:Reached page limit of 3, terminating + + +Ensure the :ref:`lister got registered` in the staging +scheduler db. + +After a bit of time, you can :ref:`check origins from that forge got listed +` in the scheduler db: + + +Still on a staging node, we trigger the first ingestion for those origins: + +.. code:: + + swh scheduler --preset staging add-forge-now \ + schedule-first-visits \ + --visit-type \ + --visit-type \ + --lister-name \ + --lister-instance-name + +For our particular instance: + +.. code:: + + swh scheduler --preset staging add-forge-now \ + schedule-first-visits \ + --visit-type git \ + --lister-name gitea \ + --lister-instance-name git.afpy.org + + 100 slots available in celery queue + 15 visits to send to celery + +After some time, :ref:`check those origins got ingested at least in part +`. + +If everything is fine, let's :ref:`schedule that forge in production +`. + + +.. _add-forge-now-deploying-on-production: + +Deploying on production +----------------------- + +After :ref:`testing with success the forge ingestion in staging +`, it's time to deploy the full and recurrent listing +for that forge. + +Let's start by registering the lister for that forge as usual: + +.. code:: + + swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ + add-forge-now ( --preset production ) \ + register-lister \ + url= + +For example: + +.. code:: + + swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ + add-forge-now ( --preset production ) \ + register-lister gitea \ + url=https://git.afpy.org/api/v1/ + +Ensure the :ref:`lister got registered` in the production +scheduler db. + +After a bit of time, you can :ref:`check origins from that forge got listed +` in the scheduler db: + +Once the listing is through, we trigger the add-forge-now scheduling to make a first +pass on that forge. + +.. code:: + + swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ + add-forge-now ( --preset production ) \ + schedule-first-visits \ + --visit-type \ + --lister-name \ + --lister-instance-name + +For example: + +.. code:: + + swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \ + add-forge-now \ + schedule-first-visits \ + --visit-type git \ + --lister-name gitea \ + --lister-instance-name git.afpy.org + + 10000 slots available in celery queue + 37 visits to send to celery + +After a while, :ref:`you can check those origins should have been ingested in part +`. You can now notify the moderator in the ticket that the +first ingestion got done. + +.. _add-forge-now-checks: + +Usual checks +------------ + +In the following, we will demonstrate the usual checks happening in the scheduler db. +The format will be the generic query to execute followed by an actual execution (with a +sampled output). + +.. _check-lister-is-registered: + +Check the lister is registered +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: + + select * from listers + where name='' and + instance_name=''; + +Example: + +.. code:: + + 2022-12-06 11:50:17 swh-scheduler@db1:5432 λ \ + select * from listers + where name='gitea' and + instance_name='git.afpy.org'; + + +--------------------------------------+-------+---------------+-------------------------------+ + | id | name | instance_name | created | ... + +--------------------------------------+-------+---------------+-------------------------------+ + | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | gitea | git.afpy.org | 2022-12-06 10:47:46.975571+00 | + +--------------------------------------+-------+---------------+-------------------------------+ + (1 row) + + Time: 4.109 ms + +.. _check-origins-got-listed: + +Check origins got listed +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: + + select lister_id, url, visit_type from listed_origins + where lister_id = (select id from listers + where name='' + and instance_name=''); + +Example: + +.. code:: + + 2022-12-06 11:50:24 swh-scheduler@db1:5432 λ \ + select lister_id, url, visit_type from listed_origins + where lister_id = (select id from listers + where name='gitea' and + instance_name='git.afpy.org'); + + +--------------------------------------+-----------------------------------------------------------+------------+ + | lister_id | url | visit_type | + +--------------------------------------+-----------------------------------------------------------+------------+ + | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/AFPy/afpy.org.git | git | + | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/foxmask/baeuda.git | git | + | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/fcode/boilerplate-python.git | git | + ... + +--------------------------------------+-----------------------------------------------------------+------------+ + (15 rows) + + Time: 1225.399 ms (00:01.225) + + +.. _check-origins-got-ingested: + +Check origins got ingested +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Either one of the query is fine: + +.. code:: + + select visit_type, url, last_visit_status from origin_visit_stats + where visit_type='' + and url like 'https://%'; + +Example: + +.. code:: + + 2022-12-12 12:08:58 softwareheritage-scheduler@belvedere:5432 λ \ + select visit_type, url, last_visit_status from origin_visit_stats + where visit_type='git' and + url like 'https://git.afpy.org%'; + + +------------+-----------------------------------------------------------+-------------------+ + | visit_type | url | last_visit_status | + +------------+-----------------------------------------------------------+-------------------+ + | git | https://git.afpy.org/mdk/infra.git | successful | + | git | https://git.afpy.org/ChristopheNan/python-docs-fr.git | successful | + | git | https://git.afpy.org/fcode/delarte.git | successful | + ... + +------------+-----------------------------------------------------------+-------------------+ + (37 rows) + + Time: 95171.399 ms (01:35.171) + +or this one, though this will take longer to execute: + +.. code:: + + select last_visit_status, count(ovs.url) + from origin_visit_stats ovs + join listed_origins lo USING(url, visit_type) + where lister_id = (select id from listers where name='' + and instance_name='') + +Example: + +.. code:: + + 2022-12-12 11:56:57 softwareheritage-scheduler@belvedere:5432 λ \ + select last_visit_status, count(ovs.url) + from origin_visit_stats ovs + join listed_origins lo USING(url, visit_type) + where lister_id = (select id from listers + where name='gitea' and + instance_name='git.afpy.org') + and visit_type='git' + group by last_visit_status; + + +-------------------+-------+ + | last_visit_status | count | + +-------------------+-------+ + | successful | 37 | + +-------------------+-------+ + (1 row) + + Time: 149774.756 ms (02:29.775) diff --git a/sysadm/deployment/index.rst b/sysadm/deployment/index.rst --- a/sysadm/deployment/index.rst +++ b/sysadm/deployment/index.rst @@ -11,3 +11,4 @@ howto-debian-packaging jenkins argocd + howto-process-add-forge-now-requests