Changeset View
Changeset View
Standalone View
Standalone View
swh/indexer/indexer.py
Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines | class OriginIndexer(BaseIndexer): | ||||
Note: the :class:`OriginIndexer` is not an instantiable object. | Note: the :class:`OriginIndexer` is not an instantiable object. | ||||
To use it in another context one should inherit from this class | To use it in another context one should inherit from this class | ||||
and override the methods mentioned in the :class:`BaseIndexer` | and override the methods mentioned in the :class:`BaseIndexer` | ||||
class. | class. | ||||
""" | """ | ||||
def run(self, origin_urls, policy_update='update-dups', | def run(self, origin_urls, policy_update='update-dups', | ||||
next_step=None, **kwargs): | next_step=None, **kwargs): | ||||
"""Given a list of origin ids: | """Given a list of origin urls: | ||||
- retrieve origins from storage | - retrieve origins from storage | ||||
- execute the indexing computations | - execute the indexing computations | ||||
- store the results (according to policy_update) | - store the results (according to policy_update) | ||||
Args: | Args: | ||||
ids ([Union[int, Tuple[str, bytes]]]): list of origin ids or | origin_urls ([str]): list of origin urls. | ||||
(type, url) tuples. | |||||
policy_update (str): either 'update-dups' or 'ignore-dups' to | policy_update (str): either 'update-dups' or 'ignore-dups' to | ||||
respectively update duplicates (default) or ignore them | respectively update duplicates (default) or ignore them | ||||
next_step (dict): a dict in the form expected by | next_step (dict): a dict in the form expected by | ||||
`scheduler.backend.SchedulerBackend.create_tasks` without | `scheduler.backend.SchedulerBackend.create_tasks` without | ||||
`next_run`, plus an optional `result_name` key. | `next_run`, plus an optional `result_name` key. | ||||
parse_ids (bool): Do we need to parse id or not (default) | parse_ids (bool): Do we need to parse id or not (default) | ||||
**kwargs: passed to the `index` method | **kwargs: passed to the `index` method | ||||
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines |