Jan 8 2023
Jan 7 2023
Jan 6 2023
Jan 5 2023
Dec 21 2022
Dec 20 2022
I like the idea (I'm a bit worried about the list becoming stale really fast, but...)
This looks sensible. When this is deployed, we'll need to monitor how the label set grows over time, of course!
Apply @vlorentz's comments
Sysadmin projects have been migrated on October 18
We've dropped the pristine-tar branch which was causing a bunch of issues (and wasn't being used anyway). The github mirroring from phabricator should now work automatically.
Dec 16 2022
Dec 14 2022
Dec 12 2022
Thanks for looking at this (fwiw, this warning is pretty innocuous for now, and is not what prevents the Debian build; the issue seems to be in pytest).
Dec 9 2022
I like it, thanks!
Dec 8 2022
For --staging, we just want to create a single oneshot full listing task.
Why bother with a full listing?
Since we now got the means to limit the listing, i'd use it. That's a faster feedback
loop and without stressing twice the upstream forge (one round for staging and another
for production at a relatively small intervals of time).
Maybe I need to configure the default to a more sensible defaults though. Like 3 pages
with 10 results (so that we can see that the pagination works too). Currently it's a tad
small (1 page of 15 or something).
Dec 7 2022
So, I don't think we want to have to remember if there's full or incremental versions of the lister.
A couple of suggestions inline, but lgtm otherwise, thanks!
Why not just touch all the files?
Dec 6 2022
Either way, the uuid argument in the cli endpoint should go away!
Does it seem like we're going to use these arguments in another caller of grab_next_visits?
That's a good question. I think not.
So, instead of adding more stuff to the grab_next_visits signature, I would have suggested just calling lister_get in swh/scheduler/cli/origin.py to get the lister uuid, and replacing the --lister-uuid CLI argument with --lister-name and --lister-instance-name (so, only changing the CLI function).
As I mentioned on IRC, I think we should do that "join" directly in swh scheduler origin send-to-celery (and have it error out if the lister name/instance provided don't match an existing lister); using the uuid was an "easy hack" to extend the API of grab_next_visits, but using lister name/instance in the CLI interface makes more sense.
Dec 5 2022
So, overall, what I'm uncomfortable with is decoding incoming cursors without validating that we have output and generated them ourselves. They're currently untrusted user input, that we're decoding with a very simple algorithm, and it'd be easy enough for anyone to fiddle with them (even with the prefix/suffix addition).
Could you add this check?for prefix in ignored_project_prefixes: if not prefix.endswith("/"): raise ValueError("Prefixes must end with /")
Otherwise we are going to forget to add slashes at the end, and it will match too many namespaces.