Page MenuHomeSoftware Heritage

Introduce a new displayname field for persons in the PostgreSQL storage
ClosedPublic

Authored by olasd on Jan 31 2022, 6:50 PM.

Details

Summary

Extend the APIs for Revisions and Releases to honor the field by default,
unless the new ignore_displayname argument is set.

Test Plan
  • tested the 182.sql migration script manually
  • all tests run without changes (except for one revision_log kwarg)
  • still need to add new tests to exercise the new field

Diff Detail

Repository
rDSTO Storage manager
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 26412
Build 41291: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 41290: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7051 (id=25577)

Rebasing onto 6f0252465c...

Current branch diff-target is up to date.
Changes applied before test
commit 281e004ea3c028fa934a48de4e1527e38c6f77a7
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 31 18:46:12 2022 +0100

    Introduce a new displayname field for persons in the PostgreSQL storage
    
    Extend the APIs for Revisions and Releases to honor the field by default,
    unless the new `ignore_displayname` argument is set.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1543/ for more details.

olasd requested review of this revision.Jan 31 2022, 7:01 PM

Build is green

Patch application report for D7051 (id=25580)

Could not rebase; Attempt merge onto 6f0252465c...

Updating 6f025246..ba98fa36
Fast-forward
 sql/upgrades/182.sql                            | 48 ++++++++++++++
 swh/storage/cassandra/storage.py                | 13 +++-
 swh/storage/in_memory.py                        |  8 ++-
 swh/storage/interface.py                        | 19 +++++-
 swh/storage/postgresql/converters.py            |  5 ++
 swh/storage/postgresql/db.py                    | 52 ++++++++++-----
 swh/storage/postgresql/storage.py               | 30 +++++++--
 swh/storage/sql/30-schema.sql                   | 25 ++++----
 swh/storage/sql/40-funcs.sql                    | 17 +++--
 swh/storage/tests/storage_tests.py              |  6 +-
 swh/storage/tests/test_postgresql.py            | 85 +++++++++++++++++++++++++
 swh/storage/tests/test_postgresql_converters.py | 21 +++---
 12 files changed, 273 insertions(+), 56 deletions(-)
 create mode 100644 sql/upgrades/182.sql
Changes applied before test
commit ba98fa36cc868d2575b283d336b905573c520adc
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 31 18:46:12 2022 +0100

    Introduce a new displayname field for persons in the PostgreSQL storage
    
    Extend the APIs for Revisions and Releases to honor the field by default,
    unless the new `ignore_displayname` argument is set.

commit f868f3c8ad46d4e9bd53ee800ba26bb03443b705
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 31 20:09:43 2022 +0100

    Mostly use normalized Person objects in tests
    
    This opens up the possibility of eventually ignoring the `name` and
    `email` fields stored in database in favor of parsing them again from
    the fullname field (and therefore to update our parsing logic without
    having to affect stored data).

commit d4ddd41535d0ce1cd50d51d297e154bf0ab6e649
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 31 20:08:04 2022 +0100

    postgresql: Use Person.from_fullname if name and email are None
    
    This allows us to populate sensible name and email values out of the new
    displayname field, without having to store them.

See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1545/ for more details.

ardumont added 1 blocking reviewer(s): Reviewers.
ardumont added a subscriber: ardumont.

lgtm so far.

Am I right in understanding that there will be a process in which users ask for some
changes (about their display name) and then we are modifying (in db) the new displayname
field in the Person table [1]? And then the reading part will simply use that
displayname (if present) over the fields we used to use for display.

[1] That new displayname field does not enter into the realm of hash computations so no
impact there \o/.

vlorentz added a subscriber: vlorentz.

Looks great, thanks!

Could you add revision_log to the tests (with and without the kwarg)?

swh/storage/tests/test_postgresql.py
291

oh, I didn't realize when reading the code that revision_get (via db_to_author) parses the displayname. Nice.

This may be worth a small comment in the SQL code to explain the two NULLs are replaced by the Python code when possible.

This revision now requires changes to proceed.Feb 1 2022, 10:44 AM
  • (fallout of D7053) Make test_release_add_get_arbitrary non-flaky
  • add suggested comments for NULL name/email
  • add tests for revision_log

Build was aborted

Patch application report for D7051 (id=25589)

Rebasing onto f868f3c8ad...

Current branch diff-target is up to date.
Changes applied before test
commit 4544d7ca8e9eddc823308420d715713e5c253120
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Mon Jan 31 18:46:12 2022 +0100

    Introduce a new displayname field for persons in the PostgreSQL storage
    
    Extend the APIs for Revisions and Releases to honor the field by default,
    unless the new `ignore_displayname` argument is set.

commit 97caa933c84820eada8a6c27393f9dbe67111a7d
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date:   Tue Feb 1 13:01:15 2022 +0100

    Make test_release_add_get_arbitrary non-flaky
    
    It was made flaky by d4ddd41535d0ce1cd50d51d297e154bf0ab6e649.

Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1546/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1546/console

This revision is now accepted and ready to land.Feb 1 2022, 1:39 PM

Am I right in understanding that there will be a process in which users ask for some
changes (about their display name) and then we are modifying (in db) the new displayname
field in the Person table [1]? And then the reading part will simply use that
displayname (if present) over the fields we used to use for display.

That is correct.

This revision was landed with ongoing or failed builds.Feb 1 2022, 2:05 PM
This revision was automatically updated to reflect the committed changes.