Page MenuHomeSoftware Heritage

Phabricator: Prevent erroneous scheduler tasks disabling
ClosedPublic

Authored by anlambert on Sep 16 2019, 8:27 PM.

Details

Summary

This diff contains the set of fixes for the issues noticed while working on D1982.

Previously, the Phabricator lister was disabling some loading tasks while it was not
supposed to. More precisely, due to an invalid index provided to a database query,
the latest created scheduler task was disabled each time a new page of results was
provided to the lister by the Phabricator API. Moreover, database queries were not
filtered according to the Phabricator instance resulting in possible disabling of
scheduler tasks from other instances.

Closes T2000

Depends on D1984

Test Plan

New tests have been added for both task disabling cases.

I have the feeling that we could encounter such issues in other indexing listers.
Maybe the new tests could be moved in base testing class to check all listers behave
correctly regarding scheduler tasks creation.

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Thanks, sounds good.

I'll need another pass a tad later though ;)

swh/lister/phabricator/lister.py
98–118

I forgot we had that behavior...

152

Now i'm wondering if we don't need to fix that for the other "instance" aware lister (e.g gitlab, cgit, ...)

swh/lister/core/tests/test_lister.py
116

Thanks btw, i did not that one (since 3.5) [1]
I would have probably tried to use "side_effect" instead.

[1] https://docs.python.org/3/library/unittest.mock.html#unittest.mock.Mock

swh/lister/phabricator/lister.py
98–118

I wonder if we need those cleanup actions...
That complexifies and probably slows down the scheduler.

My take is that we could simplify the indexing_lister and removes that method.
Let the scheduler fails enough time (because the repo no longer exists) so that it disables the tasks.

Thoughts?

152

Ok, i remember something changed in the data "model" translation for phabricator.
We used to have the uid which includes the forge_url so that was unique.

-            'uid': self.forge_url + str(repo['id']),
+            'uid': url,

We no longer have that since 87cec2f5c39e86e179189c8d0bbeabafe5b64537.
On second though, i'm just being noisy, that should not change anything, shouldn't it?
I expect url to be unique anyways.

swh/lister/core/tests/test_lister.py
116

I did not know* that one

swh/lister/phabricator/lister.py
98–118

(Not necessarily for this diff though, let's discuss it ;)

ardumont added a subscriber: olasd.
ardumont added inline comments.
swh/lister/phabricator/lister.py
98–118

Ok, so discussing with @olasd, i thought we had that behavior for the scheduler but in the end, it's only for the oneshot tasks and not the recurring ones.
So in the end, right now, we cannot do as i proposed.

Cheers,

This revision is now accepted and ready to land.Sep 17 2019, 10:56 AM
This revision was automatically updated to reflect the committed changes.