The command refreshes all the active mailmaps, and disables every
mailmap that has recently been updated and marked as disabled, once.
Details
new tests added for all expected behaviors of the management command
Diff Detail
- Repository
- rDWAPPS Web applications
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build was aborted
Patch application report for D7090 (id=25712)
Rebasing onto 4ed4512e7b...
First, rewinding head to replay your work on top of it... Applying: Pin pytest to < 7.0.0 Using index info to reconstruct a base tree... M requirements-test.txt Falling back to patching base and 3-way merge... No changes -- Patch already applied. Applying: Add a sync_mailmaps management command
Changes applied before test
commit e7538629501dba9c4e6e0783a111ac0510027d36 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once.
Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1332/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1332/console
Build was aborted
Patch application report for D7090 (id=25710)
Rebasing onto 3f465b3025...
First, rewinding head to replay your work on top of it... Applying: Add a sync_mailmaps management command Using index info to reconstruct a base tree... M requirements-test.txt Falling back to patching base and 3-way merge... Auto-merging requirements-test.txt
Changes applied before test
commit a629b99aa6d469ccc715f87f6f3877f389855415 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once.
Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1331/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1331/console
Build is green
Patch application report for D7090 (id=25713)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit dd2934197e12cc2d6795fb6509c309d4da4fab98 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once.
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1333/ for more details.
Shouldn't storage_db.cursor() be used as a context manager to make sure transactions committed/rolledback appropriately? (forget this, I wrote it before noticing @transaction.atomic)
Why transaction.atomic() in tests? if I understand pytest-postgresql correctly, this is a noop
Why the quotes around "QuerySet[UserMailmap]" type annotations? Both names are in scope
swh/web/auth/management/commands/sync_mailmaps.py | ||
---|---|---|
40 | What about --dry-run instead, as it is more idiomatic? | |
110 | Looks racey: If someone uses the API to update one of the record between the moment we ran the SELECT and the moment we run this line, then this run uses the old data, and the next run of the script will ignore it. it should the timestamp at the beginning of the function. Plus, this uses the time of the host running the script, while updates from the API use the time of the DB host. I don't know if this is an issue or what to do about it. | |
swh/web/tests/auth/test_mailmap.py | ||
85–99 | why fixtures instead of constants? | |
170–174 | this feels like taking DRY a little too far, and makes it harder to understand what the test actually tests. Why not this? expected_displaynames = { "Original Name V A <from_emailVA@example.com>": "Display Name V A", "Original Name V A 2 <from_emailVA@example.com>": "Display Name V A", "Original Name V A 3 <from_emailVA@example.com>": "Display Name V A", } (same below) |
The management command runs within a *single* swh.storage db transaction (managed manually) and within a single swh.web db transaction (using django's transaction.atomic decorator). Using the cursor as a context manager would commit multiple transactions.
Why transaction.atomic() in tests? if I understand pytest-postgresql correctly, this is a noop
Yeah, that would have been me misunderstanding what transaction=False does in pytest.mark.django_db. I want to make sure that every manual operation on the django db is committed before I'm running the management command, so I think the transaction.atomic() are needed.
Why the quotes around "QuerySet[UserMailmap]" type annotations? Both names are in scope
Because the parametric type annotation isn't part of Django itself, but of types-django, so QuerySet[UserMailmap] is actually undefined even if both names are in scope. I concede I could use from __future__ import annotations, probably.
swh/web/auth/management/commands/sync_mailmaps.py | ||
---|---|---|
40 | I want dry-run to be the default. | |
110 | The whole management command is run within a single django database transaction (see @transaction.atomic decorator), so no, it shouldn't be racy. | |
swh/web/tests/auth/test_mailmap.py | ||
85–99 | meh. | |
170–174 | The "same below" overflows our line length limit, but point taken :-) |
Build is green
Patch application report for D7090 (id=25722)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit 9b3c1e97006641c1db96550600b2e904b4444d7d Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once.
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1335/ for more details.
I just checked, and the update is racy. (on postgresql only; it's not on sqlite since sqlite does not allow this kind of concurrency, at least not by default)
Here is how to replicate:
First, change the code like so:
diff --git a/swh/web/auth/management/commands/sync_mailmaps.py b/swh/web/auth/management/commands/sync_mailmaps.py index fdfe9914..4a4a4933 100644 --- a/swh/web/auth/management/commands/sync_mailmaps.py +++ b/swh/web/auth/management/commands/sync_mailmaps.py @@ -98,16 +98,26 @@ def handle(self, *args, **options): ) with psycopg2.connect(options["storage_dbconn"]) as db: - self.disable_mailmaps(db, mailmaps_to_disable) - self.refresh_mailmaps(db, mailmaps_to_refresh) + from django.forms.models import model_to_dict + mailmaps_to_disable = list(mailmaps_to_disable) + mailmaps_to_refresh = list(mailmaps_to_refresh) + print("disabling", [model_to_dict(x) for x in mailmaps_to_disable]) + print("refreshing", [model_to_dict(x) for x in mailmaps_to_refresh]) + print("refreshing", [x.last_update_date for x in mailmaps_to_refresh]) + #self.disable_mailmaps(db, mailmaps_to_disable) + #self.refresh_mailmaps(db, mailmaps_to_refresh) if not options["perform"]: db.rollback() else: db.commit() + print("waiting...") + input("done?") + if options["perform"]: updated = verified_mailmaps.update( mailmap_last_processing_date=timezone.now() ) else: updated = verified_mailmaps.count()
Then, create a user, and prepare an update (in shell 1):
$ django-admin shell --settings=swh.web.settings.tests In [1]: from swh.web.auth.models import UserMailmap In [9]: for x in UserMailmap.objects.all(): ...: x.delete() In [10]: o = UserMailmap.objects.create(from_email="origemail@example.org", from_email_verified=True, display_name="DisplayName 1", display_name_activated=True) In [11]: o.save() In [12]: o2 = UserMailmap.objects.get() In [13]: o2.display_name_activated = False
run the script, don't reply to done? yet (in shell 2):
$ django-admin sync_mailmaps --settings=swh.web.settings.tests service=swh-replica --perform 0 mailmaps to disable, 1 mailmaps to refresh disabling [] refreshing [{'id': 2, 'user_id': None, 'from_email': 'origemail@example.org', 'from_email_verified': True, 'from_email_verification_request_date': None, 'display_name': 'DisplayName 1', 'display_name_activated': True, 'to_email': None, 'to_email_verified': False, 'to_email_verification_request_date': None, 'mailmap_last_processing_date': datetime.datetime(2022, 2, 4, 17, 16, 13, 182014, tzinfo=<UTC>)}] refreshing [datetime.datetime(2022, 2, 4, 17, 16, 0, 346677, tzinfo=<UTC>)] waiting... done?
then, update the user (in shell 1):
In [14]: o2.save()
answer done? in shell 2, and run it again:
y Synced 1 mailmaps to swh.storage database $ django-admin sync_mailmaps --settings=swh.web.settings.tests service=swh-replica --perform 0 mailmaps to disable, 0 mailmaps to refresh disabling [] refreshing [] refreshing [] waiting... done?
the display name should be removed, but it's not.
By the way, I just realized the time filter is only on the filter_to_disable query, not on filter_to_update. How come?
swh/web/auth/management/commands/sync_mailmaps.py | ||
---|---|---|
40 | ok then, could you change the help to "Perform actions (instead of being a dry run)"? I had to check the code to understand what "perform actions" means | |
110 | (continuing this in the main thread) |
Not exactly.
Thanks for the replication steps, they've been useful.
In [14]: o2.save()
This resets all fields in the UserMailmap object, including the ones you haven't touched. In particular, this will reset mailmap_last_processing_date to the value received when the object was get()-ten.
I guess the simplest workaround is to tell django not to reset mailmap_last_processing_date on UserMailmap.save.
Build was aborted
Patch application report for D7090 (id=25731)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit 467905ae2d05133bf11f785806263a3c61f3426b Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once. To do these actions as atomically as possible, we make sure to select all rows for update, and to defer all _date fields in the default queryset manager for UserMailmap objects to avoid resetting them to earlier values by mistake.
Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1336/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1336/console
Build has FAILED
Patch application report for D7090 (id=25732)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit 9b7b1d1d39260245af5ed98fe06306e1d45cefa6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once. To do these actions as atomically as possible, we make sure to select all rows for update, and to defer all _date fields in the default queryset manager for UserMailmap objects to avoid resetting them to earlier values by mistake.
Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1337/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1337/console
Build was aborted
Patch application report for D7090 (id=25734)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit 9ec9e9777065eedc653118daff0267fdd85bedd6 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once. To do these actions as atomically as possible, we make sure to select all rows for update, and to defer all _date fields in the default queryset manager for UserMailmap objects to avoid resetting them to earlier values by mistake.
Link to build: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1338/
See console output for more information: https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1338/console
Build is green
Patch application report for D7090 (id=25735)
Rebasing onto 4ed4512e7b...
Current branch diff-target is up to date.
Changes applied before test
commit 39a32f958e1cdfba90aa6ed6b918d30ab3c6d1cd Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Fri Feb 4 15:19:27 2022 +0100 Add a sync_mailmaps management command The command refreshes all the active mailmaps, and disables every mailmap that has recently been updated and marked as disabled, once. To do these actions as atomically as possible, we make sure to select all rows for update, and to defer all _date fields in the default queryset manager for UserMailmap objects to avoid resetting them to earlier values by mistake.
See https://jenkins.softwareheritage.org/job/DWAPPS/job/tests-on-diff/1339/ for more details.