Page MenuHomeSoftware Heritage

Aligh swh backends for migration tools to work
Closed, ResolvedPublic

Description

Recent migration from the scrubber which did not comply fully with the documentation [1]
in regards to migration (through swh db upgrade cli) started some work to fix it [1].
The migration for the scrubber was fixed but broke some other modules as side-effects.

After further discussion with @douardda regarding the multiple proposed and somewhat
wrong implementations, we've come to an understanding and a proper plan to fix
everything.

  • Keep the current core code [2] (swh.core v2.10 with code and doc ok and aligned)
  • D7953: Fix swh.storage's datastore to add the attribute current_version
  • D7954: Fix swh.indexer's storage datastore to add the attribute current version
  • Drop unnecessary reverting diff [3] (again core code is actually ok providing ^)
  • Drop diff [4] which is irrelevant as it does not fix (I misunderstood the code)
  • D7958: Clean up unneeded code in scheduler
  • T4305#86415: Checks

[1] https://docs.softwareheritage.org/devel/swh-core/db.html#implementation-of-a-swh-core-db-datastore

[2] D7914

[3] D7943

[4] D7949

Event Timeline

ardumont renamed this task from Unstuck migration tools for all swh backend module to Aligh swh backends for migration tools to work.Jun 3 2022, 11:33 AM
ardumont changed the task status from Open to Work in Progress.
ardumont triaged this task as High priority.
ardumont created this task.
  • storage [1]
  • indexer [2]
  • scheduler [3]
  • scrubber [4]
  • vault: status quo, still not working ¯\_(ツ)_/¯ [5]

[1]

swhstorage@storage1:~$ dpkg -l python3-swh.storage | grep ii
ii  python3-swh.storage 1.4.1-1~swh1~bpo10+1 all          Software Heritage storage utilities
swhstorage@storage1:~$ head -5 config.yml
---
storage:
  cls: postgresql
  args:
    db: host=db1.internal.staging.swh.network port=5432 user=swh dbname=swh password=<redacted>
swhstorage@storage1:~$ swh db --config-file config.yml version storage
WARNING the database does not have a dbmodule table.
module: storage
flavor: default
current code version: 183
version: 183

[2]

swhstorage@storage1:~$ head -5 indexer.yml
---
indexer_storage:
  cls: postgresql
  db: host=db1.internal.staging.swh.network port=5432 user=swh-indexer dbname=swh-indexer
    password=<redacted>
swhstorage@storage1:~$ dpkg -l python3-swh.indexer.storage | grep ii
ii  python3-swh.indexer.storage 1.0.0-1~swh1~bpo10+1 all          Software Heritage Content Indexer Storage
swhstorage@storage1:~$ swh db --config-file indexer.yml version indexer --module-config-key=indexer_storage
WARNING the database does not have a dbmodule table.
module: indexer
current code version: 134
version: 133

Note: By manual patching the class to avoid the version bump (blocked migration)

root@storage1:~# grep 'current_version =' /usr/lib/python3/dist-packages/swh/indexer/storage/__init__.py
    current_version = 134

[3]

swhscheduler@scheduler0:~$ dpkg -l python3-swh.scheduler | grep ii
ii  python3-swh.scheduler 1.2.0-1~swh1~bpo10+1 all          Software Heritage Scheduler
swhscheduler@scheduler0:~$ head -4 scheduler.yml
---
scheduler:
  cls: postgresql
  db: host=db1.internal.staging.swh.network port=5432 dbname=swh-scheduler user=swh-scheduler
    password=<redacted>
swhscheduler@scheduler0:~$ swh db --config-file scheduler.yml version scheduler
WARNING the database does not have a dbmodule table.
module: scheduler
current code version: 33
version: 33

[4]

swhworker@scrubber0:~$ dpkg -l python3-swh.scrubber | grep ii
ii  python3-swh.scrubber 0.0.6-1~swh1~bpo10+1 all          Software Heritage Datastore Scrubber
swhworker@scrubber0:~$ head -4 config.yml
---
scrubber:
  cls: postgresql
  db: host=db1.internal.staging.swh.network port=5432 dbname=swh-scrubber user=swh-scrubber ...
swhworker@scrubber0:~$ swh db --config-file config.yml version scrubber
module: scrubber
current code version: 2
version: 2

[5]

swhvault@vault:~$ dpkg -l python3-swh.vault | grep ii
ii  python3-swh.vault 1.6.1-1~swh1~bpo10+1 all          Software Heritage Vault
swhvault@vault:~$ head -5 config.yml
---
vault:
  cls: postgresql
  db: host=db1.internal.staging.swh.network port=5432 user=swh-vault dbname=swh-vault
    password=<redacted>
swhvault@vault:~$ swh db --config-file config.yml version vault
WARNING the database does not have a dbmodule table.
module: vault
Traceback (most recent call last):
  File "/usr/bin/swh", line 33, in <module>
    sys.exit(load_entry_point('swh.core==2.10', 'console_scripts', 'swh')())
  File "/usr/lib/python3/dist-packages/swh/core/cli/__init__.py", line 184, in main
    return swh(auto_envvar_prefix="SWH")
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/core/cli/db.py", line 303, in db_version
    datastore = datastore_factory(**cfg)
  File "/usr/lib/python3/dist-packages/swh/vault/__init__.py", line 57, in get_vault
    return Vault(**kwargs)
  File "/usr/lib/python3/dist-packages/swh/vault/backend.py", line 75, in __init__
    self.cache = VaultCache(**config["cache"])
KeyError: 'cache'

Conclusion, it's mostly [1] ok now. Those who were not usable with the cli tool are now
ok.

[1] Beyond the vault (T4312) which need dedicated work.

ardumont claimed this task.
ardumont updated the task description. (Show Details)