Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F7163494
D8929.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
10 KB
Subscribers
None
D8929.diff
View Options
diff --git a/sysadm/deployment/howto-process-add-forge-now-requests.rst b/sysadm/deployment/howto-process-add-forge-now-requests.rst
new file mode 100644
--- /dev/null
+++ b/sysadm/deployment/howto-process-add-forge-now-requests.rst
@@ -0,0 +1,300 @@
+.. _how-to-process-add-forge-now-requests:
+
+How to process add-forge-now requests
+=====================================
+
+.. admonition:: Intended audience
+ :class: important
+
+ sysadm staff members
+
+The processing is semi-automatic for the moment. Referencing the steps is a kickstarter
+for automation.
+
+
+Introduction
+------------
+
+A forge ticket (`see for example the git.afpy.org ticket
+<https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4674>`_) should
+have been opened by a moderator.
+
+Meaning the `moderation process is ongoing
+<https://archive.softwareheritage.org/admin/add-forge/request/18/>`_ and the upstream
+forge (to be ingested) has been notified we will start the ingestion soon.
+
+
+.. _add-forge-now-testing-on-staging:
+
+Testing on staging
+------------------
+
+To ensure we can ingest that forge, we start by testing out a subset of that forge
+listing on staging. It's a pre-check flight to determine we have the right amount of
+information.
+
+On a staging node (usually the scheduling node of the domain), run:
+
+.. code::
+
+ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \
+ add-forge-now --preset staging \
+ register-lister gitea \
+ url=<url>
+
+
+For example, forge `git.afpy.org <https://git.afpy.org>`_ which is a `gitea
+<https://gitea.io/en-us/>`_ instance, we'd run:
+
+.. code::
+
+ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ \
+ add-forge-now --preset staging \
+ register-lister gitea \
+ url=https://git.afpy.org/api/v1/
+
+ INFO:swh.lister.pattern:Max origins per page set, truncated 36 page results down to 30
+ INFO:swh.lister.pattern:Disabling origins before sending them to the scheduler
+ INFO:swh.lister.pattern:Reached page limit of 3, terminating
+
+
+Ensure the :ref:`lister got registered<check-lister-is-registered>` in the staging
+scheduler db.
+
+After a bit of time, you can :ref:`check origins from that forge got listed
+<check-origins-got-listed>` in the scheduler db:
+
+
+Still on a staging node, we trigger the first ingestion for those origins:
+
+.. code::
+
+ swh scheduler --preset staging add-forge-now \
+ schedule-first-visits \
+ --visit-type <visit-type> \
+ --visit-type <another-visit-type> \
+ --lister-name <lister> \
+ --lister-instance-name <lister-instance-name>
+
+For our particular instance:
+
+.. code::
+
+ swh scheduler --preset staging add-forge-now \
+ schedule-first-visits \
+ --visit-type git \
+ --lister-name gitea \
+ --lister-instance-name git.afpy.org
+
+ 100 slots available in celery queue
+ 15 visits to send to celery
+
+After some time, :ref:`check those origins got ingested at least in part
+<check-origins-got-ingested>`.
+
+If everything is fine, let's :ref:`schedule that forge in production
+<add-forge-now-deploying-on-production>`.
+
+
+.. _add-forge-now-deploying-on-production:
+
+Deploying on production
+-----------------------
+
+After :ref:`testing with success the forge ingestion in staging
+<add-forge-now-testing-on-staging>`, it's time to deploy the full and recurrent listing
+for that forge.
+
+Let's start by registering the lister for that forge as usual:
+
+.. code::
+
+ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \
+ add-forge-now ( --preset production ) \
+ register-lister <lister-name> \
+ url=<url>
+
+For example:
+
+.. code::
+
+ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \
+ add-forge-now ( --preset production ) \
+ register-lister gitea \
+ url=https://git.afpy.org/api/v1/
+
+Ensure the :ref:`lister got registered<check-lister-is-registered>` in the production
+scheduler db.
+
+After a bit of time, you can :ref:`check origins from that forge got listed
+<check-origins-got-listed>` in the scheduler db:
+
+Once the listing is through, we trigger the add-forge-now scheduling to make a first
+pass on that forge.
+
+.. code::
+
+ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \
+ add-forge-now ( --preset production ) \
+ schedule-first-visits \
+ --visit-type <visit-type> \
+ --lister-name <lister-name> \
+ --lister-instance-name <lister-instance-name>
+
+For example:
+
+.. code::
+
+ swh scheduler --url http://saatchi.internal.softwareheritage.org:5008/ \
+ add-forge-now \
+ schedule-first-visits \
+ --visit-type git \
+ --lister-name gitea \
+ --lister-instance-name git.afpy.org
+
+ 10000 slots available in celery queue
+ 37 visits to send to celery
+
+After a while, :ref:`you can check those origins should have been ingested in part
+<check-origins-got-ingested>`. You can now notify the moderator in the ticket that the
+first ingestion got done.
+
+.. _add-forge-now-checks:
+
+Usual checks
+------------
+
+In the following, we will demonstrate the usual checks happening in the scheduler db.
+The format will be the generic query to execute followed by an actual execution (with a
+sampled output).
+
+.. _check-lister-is-registered:
+
+Check the lister is registered
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code::
+
+ select * from listers
+ where name='<lister-name>' and
+ instance_name='<lister-instance>';
+
+Example:
+
+.. code::
+
+ 2022-12-06 11:50:17 swh-scheduler@db1:5432 λ \
+ select * from listers
+ where name='gitea' and
+ instance_name='git.afpy.org';
+
+ +--------------------------------------+-------+---------------+-------------------------------+
+ | id | name | instance_name | created | ...
+ +--------------------------------------+-------+---------------+-------------------------------+
+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | gitea | git.afpy.org | 2022-12-06 10:47:46.975571+00 |
+ +--------------------------------------+-------+---------------+-------------------------------+
+ (1 row)
+
+ Time: 4.109 ms
+
+.. _check-origins-got-listed:
+
+Check origins got listed
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code::
+
+ select lister_id, url, visit_type from listed_origins
+ where lister_id = (select id from listers
+ where name='<lister-name>'
+ and instance_name='<lister-instance-name>');
+
+Example:
+
+.. code::
+
+ 2022-12-06 11:50:24 swh-scheduler@db1:5432 λ \
+ select lister_id, url, visit_type from listed_origins
+ where lister_id = (select id from listers
+ where name='gitea' and
+ instance_name='git.afpy.org');
+
+ +--------------------------------------+-----------------------------------------------------------+------------+
+ | lister_id | url | visit_type |
+ +--------------------------------------+-----------------------------------------------------------+------------+
+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/AFPy/afpy.org.git | git |
+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/foxmask/baeuda.git | git |
+ | d07d1c90-5016-4ab6-91ac-3300f8eb4fc6 | https://git.afpy.org/fcode/boilerplate-python.git | git |
+ ...
+ +--------------------------------------+-----------------------------------------------------------+------------+
+ (15 rows)
+
+ Time: 1225.399 ms (00:01.225)
+
+
+.. _check-origins-got-ingested:
+
+Check origins got ingested
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Either one of the query is fine:
+
+.. code::
+
+ select visit_type, url, last_visit_status from origin_visit_stats
+ where visit_type='<visit-type>'
+ and url like 'https://<lister-instance-name>%';
+
+Example:
+
+.. code::
+
+ 2022-12-12 12:08:58 softwareheritage-scheduler@belvedere:5432 λ \
+ select visit_type, url, last_visit_status from origin_visit_stats
+ where visit_type='git' and
+ url like 'https://git.afpy.org%';
+
+ +------------+-----------------------------------------------------------+-------------------+
+ | visit_type | url | last_visit_status |
+ +------------+-----------------------------------------------------------+-------------------+
+ | git | https://git.afpy.org/mdk/infra.git | successful |
+ | git | https://git.afpy.org/ChristopheNan/python-docs-fr.git | successful |
+ | git | https://git.afpy.org/fcode/delarte.git | successful |
+ ...
+ +------------+-----------------------------------------------------------+-------------------+
+ (37 rows)
+
+ Time: 95171.399 ms (01:35.171)
+
+or this one, though this will take longer to execute:
+
+.. code::
+
+ select last_visit_status, count(ovs.url)
+ from origin_visit_stats ovs
+ join listed_origins lo USING(url, visit_type)
+ where lister_id = (select id from listers where name='<lister-name>'
+ and instance_name='<lister-instance-name>')
+
+Example:
+
+.. code::
+
+ 2022-12-12 11:56:57 softwareheritage-scheduler@belvedere:5432 λ \
+ select last_visit_status, count(ovs.url)
+ from origin_visit_stats ovs
+ join listed_origins lo USING(url, visit_type)
+ where lister_id = (select id from listers
+ where name='gitea' and
+ instance_name='git.afpy.org')
+ and visit_type='git'
+ group by last_visit_status;
+
+ +-------------------+-------+
+ | last_visit_status | count |
+ +-------------------+-------+
+ | successful | 37 |
+ +-------------------+-------+
+ (1 row)
+
+ Time: 149774.756 ms (02:29.775)
diff --git a/sysadm/deployment/index.rst b/sysadm/deployment/index.rst
--- a/sysadm/deployment/index.rst
+++ b/sysadm/deployment/index.rst
@@ -11,3 +11,4 @@
howto-debian-packaging
jenkins
argocd
+ howto-process-add-forge-now-requests
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Thu, Jan 30, 9:15 AM (22 h, 54 m ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3214934
Attached To
D8929: sysadm: Add a "how to process add-forge-now requests"
Event Timeline
Log In to Comment