Details
- Reviewers
douardda ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T1328: Add Ruby/Gem metadata indexer
- Commits
- rDCIDX8147565ac8d8: Add gemspec mapping.
Diff Detail
- Repository
- rDCIDX Metadata indexer
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/221/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/221/console
swh/indexer/metadata_dictionary.py | ||
---|---|---|
502 | For all logging statements, shouldn't it make sens to have the id of the object being indexed ? I've not checked how indexers work so i'm not sure if this id makes sense and is available at this level, but IMHO having warnings and error messages in the logs which we cannot easily link to a known entity (an origin, a revision, etc.) is pretty frustrating. | |
508–510 | unless i'm mistaken, the match var is not used, so why not for line in lines: if self._re_spec_new.match*line): break Another solution is using the dropwhile iterator (itertools), something like: lines = dropwhile(lambda x: not self._re_spec_new.match(x), lines.split('\n')) and move the 'emptyness' detection at the end of the function. | |
538–539 | I don't like too much this kind of if isinstance(str)... What are the expected types for s? Can it be something else than str or None? | |
542 | why forbid tuples or iterators? | |
swh/indexer/tests/test_metadata.py | ||
737 | When writing tests for this kind of feature, please also add limit cases. How about invalid spec files? what if the description (or any other field) is a (literal) multi line string/list/dict? One test for the most basic and "friendly" data is not enough. |
Also the build is not happy for import reason (swh.scheduler.conftest...), maybe a missing bump version somewhere?
swh/indexer/metadata_dictionary.py | ||
---|---|---|
502 | ||
529 | There's a lot of stuff that's not supported (most of which would need a real Ruby interpreter), so it would get very spammy. | |
538–539 | Any basic data type (str, list, dict, bytes, int, float, tuple), we're parsing arbitrary and untrusted input. | |
542 | Ruby does not have tuple literals, and ast.literal_eval never returns iterators. | |
swh/indexer/tests/test_metadata.py | ||
737 |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/225/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/225/console
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/229/ for more details.
- Re-implement a safe subset of ast.literal_eval instead of checking the type of its output afterward.
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/230/ for more details.
swh/indexer/metadata_dictionary.py | ||
---|---|---|
502 | All logged events from tasks some metadata items attached in the systemd journal (see swh.core.logger.get_extra_data and JournalHandler), notably the (celery) task id. |
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/249/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/249/console
Build is green
See https://jenkins.softwareheritage.org/job/DCIDX/job/tox/264/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/282/
See console output for more information: https://jenkins.softwareheritage.org/job/DCIDX/job/tox/282/console