- Group Reviewers
- rDCIDXce84a5e016e7: Add the origin intrinsic metadata indexer
I'm not done yet ;)
Can you please drop the SWH prefix (and the Task suffix as well), that's redundant (we are in a swh module, as per the fqdn shows already).
I'm trying to progressively remove that when i stumble upon it, same for the other task.
It's redundant, isn't it?
You cannot pass here since the method will already have returned if one of them is None.
For information, you may want to update this with the latest swh.storage.algo.snapshot_get_all_branches api olasd opened 
Something is not clear here, what's the indexer? Is that an indexer or a chain?
At first i thought you called the indexer twice...
I have the feeling that's the indexer's next_step call which returns a celery chain (or something, well, a celery task or group of tasks...). But i'm 100% sure ;)
Can you please rename indexer to something more explicit?
I have the following error: P321
Don't know if this is due to local configuration or about the other unlanded diff.
Anyway, looks good.
I'm kinda curious about the argument annotation *, could you point me to documentation explaining its usage?
I was unclear, i was just speaking about the other new one introduced here OriginMetadataTask.
Since everything is touched now... and that impacts the production anyway.
Could you please also remove the suffix Task from all those tasks?
Thanks in advance.
What i failed to explain is that currently, the snapshot_get_latest is a paginated result so you won't have all branches if you keep the old api call.
I think that's relevant here.
It prevents the next arguments from being called as positional arguments, ie. index('foo', 'bar', 'baz') is invalid; you have to use this: index('foo', revisions_metadata='bar', origin_head_map='baz').
That's intended as a way to prevent mistakes that would cause hard-to-find bugs in future versions with a different API.
I too wanted to indent, but flakes8 doesn't agree with us
Will do in a later diff.
doopsy, cf. patch below.
diff --git a/swh/indexer/origin_head.py b/swh/indexer/origin_head.py index 7551fde..7ec69ce 100644 --- a/swh/indexer/origin_head.py +++ b/swh/indexer/origin_head.py @@ -10,7 +10,7 @@ import logging from celery import chain from swh.indexer.indexer import OriginIndexer -from swh.indexer.tasks import OriginMetadataTask, RevisionMetadataTask +from swh.indexer.tasks import OriginMetadata, RevisionMetadata class OriginHeadIndexer(OriginIndexer): @@ -31,8 +31,8 @@ class OriginHeadIndexer(OriginIndexer): CONFIG_BASE_FILENAME = 'indexer/origin_head' - revision_metadata_task = RevisionMetadataTask() - origin_intrinsic_metadata_task = OriginMetadataTask() + revision_metadata_task = RevisionMetadata() + origin_intrinsic_metadata_task = OriginMetadata() def filter(self, ids): yield from ids
sure, the explanation was needed though ;)
@vlorentz And that's wrong. I did not see that ;)
Oct 25 17:32:54 worker01 python3: [2018-10-25 17:32:54,017: ERROR/MainProcess] Task swh.indexer.tasks.OrchestratorAllContents[b39b8a5b-a046-4e0e-98b0-88aeab65af5a] raised unexpected: NotImplementedError('Tasks must define the run method.',) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 240, in trace_task R = retval = fun(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 438, in __protected_call__ return self.run(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/task.py", line 437, in run raise NotImplementedError('Tasks must define the run method.') NotImplementedError: Tasks must define the run method.
We must keep the swh.scheduler.task here.