Page MenuHomeSoftware Heritage

Aligh swh backends for migration tools to work
Closed, ResolvedPublic


Recent migration from the scrubber which did not comply fully with the documentation [1]
in regards to migration (through swh db upgrade cli) started some work to fix it [1].
The migration for the scrubber was fixed but broke some other modules as side-effects.

After further discussion with @douardda regarding the multiple proposed and somewhat
wrong implementations, we've come to an understanding and a proper plan to fix

  • Keep the current core code [2] (swh.core v2.10 with code and doc ok and aligned)
  • D7953: Fix's datastore to add the attribute current_version
  • D7954: Fix swh.indexer's storage datastore to add the attribute current version
  • Drop unnecessary reverting diff [3] (again core code is actually ok providing ^)
  • Drop diff [4] which is irrelevant as it does not fix (I misunderstood the code)
  • D7958: Clean up unneeded code in scheduler
  • T4305#86415: Checks


[2] D7914

[3] D7943

[4] D7949

Event Timeline

ardumont renamed this task from Unstuck migration tools for all swh backend module to Aligh swh backends for migration tools to work.Jun 3 2022, 11:33 AM
ardumont changed the task status from Open to Work in Progress.
ardumont triaged this task as High priority.
ardumont created this task.
  • storage [1]
  • indexer [2]
  • scheduler [3]
  • scrubber [4]
  • vault: status quo, still not working ¯\_(ツ)_/¯ [5]


swhstorage@storage1:~$ dpkg -l | grep ii
ii 1.4.1-1~swh1~bpo10+1 all          Software Heritage storage utilities
swhstorage@storage1:~$ head -5 config.yml
  cls: postgresql
    db: port=5432 user=swh dbname=swh password=<redacted>
swhstorage@storage1:~$ swh db --config-file config.yml version storage
WARNING the database does not have a dbmodule table.
module: storage
flavor: default
current code version: 183
version: 183


swhstorage@storage1:~$ head -5 indexer.yml
  cls: postgresql
  db: port=5432 user=swh-indexer dbname=swh-indexer
swhstorage@storage1:~$ dpkg -l | grep ii
ii 1.0.0-1~swh1~bpo10+1 all          Software Heritage Content Indexer Storage
swhstorage@storage1:~$ swh db --config-file indexer.yml version indexer --module-config-key=indexer_storage
WARNING the database does not have a dbmodule table.
module: indexer
current code version: 134
version: 133

Note: By manual patching the class to avoid the version bump (blocked migration)

root@storage1:~# grep 'current_version =' /usr/lib/python3/dist-packages/swh/indexer/storage/
    current_version = 134


swhscheduler@scheduler0:~$ dpkg -l python3-swh.scheduler | grep ii
ii  python3-swh.scheduler 1.2.0-1~swh1~bpo10+1 all          Software Heritage Scheduler
swhscheduler@scheduler0:~$ head -4 scheduler.yml
  cls: postgresql
  db: port=5432 dbname=swh-scheduler user=swh-scheduler
swhscheduler@scheduler0:~$ swh db --config-file scheduler.yml version scheduler
WARNING the database does not have a dbmodule table.
module: scheduler
current code version: 33
version: 33


swhworker@scrubber0:~$ dpkg -l python3-swh.scrubber | grep ii
ii  python3-swh.scrubber 0.0.6-1~swh1~bpo10+1 all          Software Heritage Datastore Scrubber
swhworker@scrubber0:~$ head -4 config.yml
  cls: postgresql
  db: port=5432 dbname=swh-scrubber user=swh-scrubber ...
swhworker@scrubber0:~$ swh db --config-file config.yml version scrubber
module: scrubber
current code version: 2
version: 2


swhvault@vault:~$ dpkg -l python3-swh.vault | grep ii
ii  python3-swh.vault 1.6.1-1~swh1~bpo10+1 all          Software Heritage Vault
swhvault@vault:~$ head -5 config.yml
  cls: postgresql
  db: port=5432 user=swh-vault dbname=swh-vault
swhvault@vault:~$ swh db --config-file config.yml version vault
WARNING the database does not have a dbmodule table.
module: vault
Traceback (most recent call last):
  File "/usr/bin/swh", line 33, in <module>
    sys.exit(load_entry_point('swh.core==2.10', 'console_scripts', 'swh')())
  File "/usr/lib/python3/dist-packages/swh/core/cli/", line 184, in main
    return swh(auto_envvar_prefix="SWH")
  File "/usr/lib/python3/dist-packages/click/", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/swh/core/cli/", line 303, in db_version
    datastore = datastore_factory(**cfg)
  File "/usr/lib/python3/dist-packages/swh/vault/", line 57, in get_vault
    return Vault(**kwargs)
  File "/usr/lib/python3/dist-packages/swh/vault/", line 75, in __init__
    self.cache = VaultCache(**config["cache"])
KeyError: 'cache'

Conclusion, it's mostly [1] ok now. Those who were not usable with the cli tool are now

[1] Beyond the vault (T4312) which need dedicated work.

ardumont claimed this task.
ardumont updated the task description. (Show Details)