Details
- Reviewers
ardumont moranegg - Group Reviewers
Reviewers - Commits
- rDCIDXce84a5e016e7: Add the origin intrinsic metadata indexer
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Branch
- origin-metadata-indexer
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 1718 Build 2064: arc lint + arc unit
Event Timeline
I'm not done yet ;)
swh/indexer/__init__.py | ||
---|---|---|
17 | Can you please drop the SWH prefix (and the Task suffix as well), that's redundant (we are in a swh module, as per the fqdn shows already). I'm trying to progressively remove that when i stumble upon it, same for the other task. | |
swh/indexer/origin_head.py | ||
41 | persistent | |
58 | It's redundant, isn't it? You cannot pass here since the method will already have returned if one of them is None. | |
73 | For information, you may want to update this with the latest swh.storage.algo.snapshot_get_all_branches api olasd opened [1] | |
swh/indexer/tasks.py | ||
24 | Something is not clear here, what's the indexer? Is that an indexer or a chain? At first i thought you called the indexer twice... I have the feeling that's the indexer's next_step call which returns a celery chain (or something, well, a celery task or group of tasks...). But i'm 100% sure ;) Can you please rename indexer to something more explicit? |
swh/indexer/tests/__init__.py | ||
---|---|---|
29 | To avoid repeating myself, D518#inline-2716. Also, maybe stack this diff onto D518? |
I have the following error: P321
Don't know if this is due to local configuration or about the other unlanded diff.
Anyway, looks good.
swh/indexer/metadata.py | ||
---|---|---|
298 | I'm kinda curious about the argument annotation *, could you point me to documentation explaining its usage? | |
310 | indentation? |
swh/indexer/__init__.py | ||
---|---|---|
17 |
I was unclear, i was just speaking about the other new one introduced here OriginMetadataTask. Since everything is touched now... and that impacts the production anyway. Could you please also remove the suffix Task from all those tasks? Thanks in advance. |
swh/indexer/origin_head.py | ||
---|---|---|
73 | What i failed to explain is that currently, the snapshot_get_latest is a paginated result so you won't have all branches if you keep the old api call. I think that's relevant here. |
swh/indexer/metadata.py | ||
---|---|---|
298 | It prevents the next arguments from being called as positional arguments, ie. index('foo', 'bar', 'baz') is invalid; you have to use this: index('foo', revisions_metadata='bar', origin_head_map='baz'). That's intended as a way to prevent mistakes that would cause hard-to-find bugs in future versions with a different API. | |
310 | I too wanted to indent, but flakes8 doesn't agree with us | |
swh/indexer/origin_head.py | ||
73 | Will do in a later diff. |
swh/indexer/origin_head.py | ||
---|---|---|
12 | oopsy ;) | |
34 | doopsy, cf. patch below. diff --git a/swh/indexer/origin_head.py b/swh/indexer/origin_head.py index 7551fde..7ec69ce 100644 --- a/swh/indexer/origin_head.py +++ b/swh/indexer/origin_head.py @@ -10,7 +10,7 @@ import logging from celery import chain from swh.indexer.indexer import OriginIndexer -from swh.indexer.tasks import OriginMetadataTask, RevisionMetadataTask +from swh.indexer.tasks import OriginMetadata, RevisionMetadata class OriginHeadIndexer(OriginIndexer): @@ -31,8 +31,8 @@ class OriginHeadIndexer(OriginIndexer): CONFIG_BASE_FILENAME = 'indexer/origin_head' - revision_metadata_task = RevisionMetadataTask() - origin_intrinsic_metadata_task = OriginMetadataTask() + revision_metadata_task = RevisionMetadata() + origin_intrinsic_metadata_task = OriginMetadata() def filter(self, ids): yield from ids | |
73 | sure, the explanation was needed though ;) |
swh/indexer/__init__.py | ||
---|---|---|
17 | Production change: 017daf371998539e1ee12be821a0cf4e009d1f42 |
swh/indexer/tasks.py | ||
---|---|---|
22–28 | @vlorentz And that's wrong. I did not see that ;) In production: Oct 25 17:32:54 worker01 python3[79159]: [2018-10-25 17:32:54,017: ERROR/MainProcess] Task swh.indexer.tasks.OrchestratorAllContents[b39b8a5b-a046-4e0e-98b0-88aeab65af5a] raised unexpected: NotImplementedError('Tasks must define the run method.',) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 240, in trace_task R = retval = fun(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/trace.py", line 438, in __protected_call__ return self.run(*args, **kwargs) File "/usr/lib/python3/dist-packages/celery/app/task.py", line 437, in run raise NotImplementedError('Tasks must define the run method.') NotImplementedError: Tasks must define the run method. We must keep the swh.scheduler.task here. |
swh/indexer/tasks.py | ||
---|---|---|
22–28 | Let's just rename run_task to run |