Page MenuHomeSoftware Heritage

swh-web: Migrate from sqlite to postgresql
ClosedPublic

Authored by ardumont on Mar 31 2021, 3:31 PM.

Details

Summary

This adapts the production settings to allow the use of the postgresql backend.

This does not touch the development or testing settings module though.

Related to T2945

Test Plan

docker tryouts.

Checked migration part is running with the django-admin-migrate dumpdata/loaddata stanza.

Out of the retrieved staging webapp json dump:

$ swh-doco exec swh-web /bin/bash
swh@3f80177ef10a:/$ django-admin loaddata /webapp-schema.staging.json
Installed 132 object(s) from 1 fixture(s)

psql service=swh-web-dev
swh-web=# select * from save_origin_request;
swh-web=# \conninfo
You are connected to database "swh-web" as user "postgres" on host "localhost" (address "::1") at port "5437".
 id |        request_date        | visit_type |                             origin_url                              |  status  | loading_task_id |         visit_date         | loading_task_status
----+----------------------------+------------+---------------------------------------------------------------------+----------+-----------------+----------------------------+---------------------
  1 | 2020-10-30 18:44:05.807+00 | git        | https://github.com/kubernetes/kubernetes.git                        | accepted |               1 | 2020-10-30 22:49:34.565+00 | succeeded
  2 | 2020-10-30 18:55:31.97+00  | git        | https://gitlab.com/ardumont/home                                    | accepted |               2 | 2020-10-30 19:44:19.999+00 | succeeded
  3 | 2020-10-30 18:55:42.809+00 | git        | https://github.com/NixOS/nixpkgs                                    | accepted |               3 | 2020-10-30 19:45:27.076+00 | failed
  4 | 2020-10-30 18:59:10.378+00 | hg         | https://hg.sr.ht/~zimoun/hello-example                              | accepted |              10 | 2020-11-02 14:34:22.766+00 | failed
  5 | 2020-10-31 10:10:12.777+00 | git        | https://github.com/torvalds/linux                                   | accepted |               4 | 2020-11-01 03:02:47.823+00 | failed
  6 | 2020-11-02 08:33:41.303+00 | git        | https://github.com/videolan/vlc                                     | accepted |               5 | 2020-11-02 09:44:37.583+00 | succeeded
  7 | 2020-11-02 09:54:45.282+00 | git        | https://github.com/torvalds/linux                                   | accepted |               6 |                            | failed
  8 | 2020-11-02 13:53:20.516+00 | git        | https://github.com/torvalds/linux                                   | accepted |               7 |                            | failed
  9 | 2020-11-02 14:01:46.448+00 | git        | https://github.com/torvalds/linux                                   | accepted |               8 | 2020-11-02 16:37:06.489+00 | succeeded
 10 | 2020-11-02 14:03:34.946+00 | git        | https://github.com/NixOS/nixpkgs                                    | accepted |               9 |                            | failed
 11 | 2020-11-02 15:53:18.317+00 | git        | https://github.com/NixOS/nixpkgs                                    | accepted |              12 | 2020-11-03 00:14:12.155+00 | succeeded
 12 | 2020-11-02 17:12:48.265+00 | git        | https://gitorious.org/parmap/parmap.git                             | accepted |              13 | 2020-11-02 17:13:34.731+00 | failed
 13 | 2020-11-03 09:40:50.622+00 | git        | https://github.com/vsellier/easy-cozy                               | accepted |           34059 | 2020-11-03 09:40:56.756+00 | succeeded
 14 | 2020-11-03 22:38:27.531+00 | git        | https://github.com/kubernetes/kubernetes.git                        | accepted |           34069 | 2020-11-03 22:49:50.343+00 | succeeded
 15 | 2020-11-03 22:40:01.333+00 | git        | https://github.com/torvalds/linux                                   | accepted |           34070 | 2020-11-04 00:57:04.078+00 | succeeded
 16 | 2020-11-04 11:08:01.602+00 | git        | https://forge.softwareheritage.org/source/swh-core.git              | pending  |              -1 |                            | not created
 17 | 2020-11-10 08:15:04.852+00 | git        | https://github.com/videolan/vlc                                     | accepted |           34073 | 2020-11-10 08:28:04.666+00 | succeeded

production like success as well but a bit more tedious:

swh@c32204adf7fa:/$ django-admin loaddata --settings=swh.web.settings.production /swh-web.production.json
Installed 71460 object(s) from 1 fixture(s)

In docker-compose.override.yaml

...
  swh-web:
    volumes:
      - "$SWH_ENVIRONMENT_HOME/swh-web:/src/swh-web"
      - "/tmp/webapp-schema.staging.json:/webapp-schema.staging.json:ro"
      - "/tmp/swh-web.production.json:/swh-web.production.json:ro"

Diff Detail

Repository
rDWAPPS Web applications
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 20373
Build 31627: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 31626: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D5392 (id=19305)

Rebasing onto a42327ff4f...

Current branch diff-target is up to date.
Changes applied before test
commit 0c652d99b0827c350ae083e031bd5bff92f121f7
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Mar 31 15:20:09 2021 +0200

    swh-web: Migrate from sqlite to postgresql
    
    This adapts the production settings to allow the use of the postgresql backend.
    
    This does not touch the development or testing settings module though.
    
    Related to T2945

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/643/ for more details.

I added inline comments for some improvements.

Also you need to add psycopg2 in requirements.txt, we get the dependency by transitivity with swh.storage
but django doc clearly indicates that it should be explicitely added as dependency.

swh/web/settings/common.py
99

There is already a default value for development_db in config.py, not sure that change is needed.

swh/web/settings/production.py
34

The config key should keep the name production_db for consistency.

Also you should rather set default values in config.py as it enables to separate
swh-web configuration from django settings.

36–60

For better readability, this should be rewritten to:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": db_conf.get("name"),
        "HOST": db_conf.get("host"),
        "PORT": db_conf.get("port"),
        "USER": db_conf.get("user"),
        "PASSWORD": db_conf.get("password"),
        
    }
}
This revision now requires changes to proceed.Mar 31 2021, 4:03 PM
ardumont edited the test plan for this revision. (Show Details)

Thanks for the hints, i'll check.

I'm mostly trying to figure out how to try and check the migration from docker. Since I
started modifying this, I thought of opening this.

swh/web/settings/common.py
99

not sure either.
I had issues and i tried that, i''ll double check.

swh/web/settings/common.py
99

Well, we have a keyError if we do not repeat the development_db key...

swh-web_1                       |     "NAME": swh_web_config["development"]
swh-web_1                       | KeyError: 'development'

With the current state of D5391 (which drops that key which has nothing to do with
production)

swh/web/settings/common.py
99

"repeat the development key"...
I mean if we do not set that development_db key in the config.yml, we have ^

So for me that change is more a cleanup...
production should not require that key.

swh/web/settings/production.py
34

The config key should keep the name production_db for consistency.

right, adapted.

Also you should rather set default values in config.py as it enables to separate

swh-web configuration from django settings.

I did not get that part.

36–60

indeed, i shall readapt the deposit with this as well, thanks.

swh/web/settings/common.py
99

Yes, got it.

Locally I do not have that error when I remove the development_db from my config file
so my surprise.

99

I did not test in docker though.

swh/web/settings/production.py
34

config.py can be seen as a specification for swh-web configuration file.

Django settings use some of these configuration values but not all so the distinction.

swh/web/settings/common.py
99

with the current docker without the other diff would pass as they set both the dev and prod key.
Drop the dev key and bim, that would crash.

Adapt according to review.
Docker still happy

swh/web/settings/common.py
99

Nope, just tested in docker (without your diff applied) and it works as expected if you remove the development_db from the swh-web config file.

I have the impression D5391 misses some env files, might be related.

Build is green

Patch application report for D5392 (id=19306)

Rebasing onto a42327ff4f...

Current branch diff-target is up to date.
Changes applied before test
commit 07f06cba53b12d9f0889c335a52649a84ee93550
Author: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
Date:   Wed Mar 31 15:20:09 2021 +0200

    swh-web: Migrate from sqlite to postgresql
    
    This adapts the production settings to allow the use of the postgresql backend.
    
    This does not touch the development or testing settings module though.
    
    Related to T2945

See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/644/ for more details.

swh/web/config.py
97 ↗(On Diff #19306)

i saw it and then forgot to update it.
Although i'm not sure what to put as default though.

Looks good to me !

swh/web/config.py
97 ↗(On Diff #19306)

This is enough from my point of view.

This revision is now accepted and ready to land.Mar 31 2021, 5:05 PM
swh/web/settings/common.py
99

Nope, just tested in docker (without your diff applied) and it works as expected if
you remove the development_db from the swh-web config file.

Something does not compute for me here. I must have misunderstood something.

If the dev key is not present, is the config spec you are referring to below supposed to
complete said information?

I have the impression D5391 misses some env files, might be related.

If env variables are missing [1], I do not think that will relate to the issue of that
key error i have... ¯\_(ツ)_/¯ ;)

[1] which I doubt, swh-web container refused to start up until all were rightfully set

swh/web/settings/common.py
99

Yes, there is default value hardcoded in config.py that will be picked if not provided in the config file.

Great, thanks for the review.

I won't land it just yet so you are not blocked to release whatever you wish to.

In the mean time, I'll prepare the dbs first (staging, prod).