Page MenuHomeSoftware Heritage

Rationalize task type names
Closed, MigratedEdits Locked

Description

As of today, there is no naming convention for task types, so we have:

  • indexer_fossology_license & friends,
  • origin-update-git and friends,
  • swh-deposit-archive-checks (...)
  • swh-lister-debian (...)
  • swh-vault-cooking

It would be nice to have a common naming scheme there.

Actions to take:

  • rename old task names to new ones in swh codebase
  • add new task types in production database
  • tag and deploy impacted components:
    • swh-lister
    • swh-scheduler
    • swh-deposit
    • swh-web
    • swh-vault
    • swh-indexers
  • rename type column in production task table
  • restart scheduler runner (stopped to ease data migration) ~> that needed a fix (hot for now), a diff is on its way
  • remove sql code to keep backward compatibility task names in swh-scheduler

Related Objects

Mentioned In
rDENV490495d31ced: Update scheduler task names to new ones
rDSCHaa70df4feb3e: Drop backward compatible names
rDSCH63eeabf55d88: data: Add missing swh-vault-batch-cooking/cook-vault-bundle-type
rDSCH766dae940e45: 50-swh-data: Fix inverted select/insert in backward sql function
D1482: GNU Lister
D1497: Maven Lister
D1492: CRAN Lister
rCDFD490495d31ced: Update scheduler task names to new ones
rDSCHd6fce0d5005c: swh-scheduler: Use new task names
rDDEPc1e6ffa164ae: signals: Update scheduler task names to new ones
rDLS701d833cdf0e: Update scheduler task names to new ones
rDCIDXde4744f95fc2: journal_client/cli: Update scheduler task to new one
rDVAU8644c35b85b8: backend: Update scheduler task name to new one
D1493: Update scheduler task name to new one
D1491: Use scheduler task names to new ones
D1490: Update scheduler task names to new ones
D1489: Update scheduler task to new one
D1488: Update scheduler task names to new ones
D1487: Update scheduler task names to new ones
rDWAPPSfba9d8cef9b5: tests: Update scheduler task names to new ones
rDSCHd338c769615a: sql/swh-data: Update scheduler task names but keep backward compatibility
D1440: Update scheduler task names but keep backward compatibility
rDWAPPSbd58777442e2: common.origin_save: Update scheduler loading task names
D1438: Activate save code now for hg and svn origin types
rDLS0b8d1d464db1: npm.lister: Update loading task name
rDSCH4b0e9527cb38: sql/data: Add npm related task types
D1401: Align config filename with production and update npm loading task name
D1400: Add npm related task types
Mentioned Here
T1419: hg/svn support in save code now
T1157: Generic scheduler task creation according to task type

Event Timeline

douardda created this task.

That'd be great.

yes, i'd say first:

  • drop swh prefix as it's redundant
  • choose definitely one separator (either - or _; i prefer - as it's simpler to type, well in qwerty at least).

Here is my proposal as in keeping task type as is (vs improving it to something more fancy like splitting the task type into task-type, tool-type, etc...).

<tool>-<origin-type-or-forge>-<extra>:

  • <module>: lister, loader, checker, deposit, updater, vault...
  • <origin-type-or-forge> debian, svn, git, deposit, etc...
  • <extra>: for example for lister, it's often a frequency thing, full vs incremental for example, for loader, it's often a property differentiating from the mainstream loader (e.g. loader-git, loader-git-archive comes to mind).

for example, that'd give (not exhaustive):

  • loader-git
  • loader-git-archive
  • loader-hg
  • loader-hg-archive
  • lister-gitlab-incremental
  • lister-gitlab-full
  • loader-deposit-check
  • loader-deposit
  • indexer-mimetype
  • indexer-mimetype-range
  • indexer-origin-head
  • indexer-origin-intrinsic-metadata
  • etc...

Related T1157 (generic task creation with exposition on how i see the task type)

What do you think?

+1 for this need, and +1 also to the initial draft by @ardumont.

As a minor improvement I suggest switching from nouns to verb, so: load-git, load-hg-archive, list-gitlab-full, etc. Rationale: "do this" is the semantics associated to a message.

As a minor improvement I suggest switching from nouns to verb, so: load-git, load-hg-archive, list-gitlab-full, etc. Rationale: "do this" is the semantics associated to a message.

sounds great to me.

We started that with the npm loader/lister today ;)

In support of T1419, I've now added the following to the production database:

softwareheritage-scheduler=> select type, description, backend_name from task_type where type like 'load-%';
       type       |                         description                          |                    backend_name
------------------+--------------------------------------------------------------+----------------------------------------------------
 load-git         | Update an origin of type git                                 | swh.loader.git.tasks.UpdateGitRepository
 load-hg          | Update an origin of type mercurial                           | swh.loader.mercurial.tasks.LoadMercurial
 load-svn         | Create dump of a remote svn repository, mount it and load it | swh.loader.svn.tasks.DumpMountAndLoadSvnRepository
 load-deb-package | Load a Debian package                                        | swh.loader.debian.tasks.LoadDebianPackage
 load-npm         | Load npm origin                                              | swh.loader.npm.tasks.LoadNpm
(5 rows)

(well, the debian and npm ones already existed)

anlambert raised the priority of this task from Low to Normal.May 14 2019, 2:20 PM
ardumont updated the task description. (Show Details)

Task's types to migrate:

|---------------+-----------------------------------+---------------------------------+----------|
| status update | type                              | old-type                        |    count |
|---------------+-----------------------------------+---------------------------------+----------|
| x             | check-deposit                     | swh-deposit-archive-checks      |       48 |
| x             | load-deposit                      | swh-deposit-archive-loading     |       52 |
| x             | list-debian-distribution          | swh-lister-debian               |        2 |
| x             | list-github-full                  | swh-lister-github-full          |        1 |
| x             | list-github-incremental           | swh-lister-github-incremental   |        1 |
| x             | list-gitlab-full                  | swh-lister-gitlab-full          |        4 |
| x             | list-gitlab-incremental           | swh-lister-gitlab-incremental   |        4 |
| x             | list-pypi                         | swh-lister-pypi                 |        1 |
| x             | cook-vault-bundle-batch           | swh-vault-batch-cooking         |       58 |
| x             | cook-vault-bundle                 | swh-vault-cooking               |      185 |
| x             | load-hg-from-archive              | origin-load-archive-hg          |        1 |
| x             | index-revision-metadata           | indexer_revision_metadata       |     1447 |
| x             | index-origin-head                 | indexer_origin_head             |   317259 |
| x             | index-fossology-licence-for-range | indexer_range_fossology_license |   100000 |
| x             | index-mimetype-for-range          | indexer_range_mimetype          |   100000 |
| x             | load-pypi                         | origin-update-pypi              |   174543 |
| x             | index-origin-metadata             | indexer_origin_metadata         | 52728643 |
| x             | load-git                          | origin-update-git               | 82671870 |
|---------------+-----------------------------------+---------------------------------+----------|

Legend:

  • x: done
  • -: running
  • rename type column in production task table
|-----------------------------------+----------|
|               type                |  total   |
|-----------------------------------+----------|
| list-github-incremental           |        1 |
| load-hg-from-archive              |        1 |
| list-pypi                         |        1 |
| list-npm-full                     |        1 |
| list-github-full                  |        1 |
| list-debian-distribution          |        2 |
| list-gitlab-full                  |        4 |
| list-gitlab-incremental           |        4 |
| load-svn                          |        9 |
| load-hg                           |       15 |
| check-deposit                     |       48 |
| load-deposit                      |       52 |
| cook-vault-bundle-batch           |       58 |
| cook-vault-bundle                 |      185 |
| index-revision-metadata           |     1447 |
| load-deb-package                  |    29322 |
| index-fossology-license-for-range |   100000 |
| index-mimetype-for-range          |   100000 |
| load-pypi                         |   174543 |
| index-origin-head                 |   317259 |
| load-npm                          |   983063 |
| index-origin-metadata             | 52728643 |
| load-git                          | 82671894 |
|-----------------------------------+----------|
ardumont claimed this task.
ardumont updated the task description. (Show Details)