Page MenuHomeSoftware Heritage

Add gemspec mapping.
ClosedPublic

Authored by vlorentz on Jan 15 2019, 8:19 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

douardda added a subscriber: douardda.
douardda added inline comments.
swh/indexer/metadata_dictionary.py
502

For all logging statements, shouldn't it make sens to have the id of the object being indexed ? I've not checked how indexers work so i'm not sure if this id makes sense and is available at this level, but IMHO having warnings and error messages in the logs which we cannot easily link to a known entity (an origin, a revision, etc.) is pretty frustrating.

508–510

unless i'm mistaken, the match var is not used, so why not

for line in lines:
  if self._re_spec_new.match*line):
    break

Another solution is using the dropwhile iterator (itertools), something like:

lines = dropwhile(lambda x: not self._re_spec_new.match(x), lines.split('\n'))

and move the 'emptyness' detection at the end of the function.

538–539

I don't like too much this kind of if isinstance(str)... What are the expected types for s? Can it be something else than str or None?
In which case I prefer if s is not None:

542

why forbid tuples or iterators?

swh/indexer/tests/test_metadata.py
737

When writing tests for this kind of feature, please also add limit cases. How about invalid spec files? what if the description (or any other field) is a (literal) multi line string/list/dict?

One test for the most basic and "friendly" data is not enough.

This revision now requires changes to proceed.Jan 16 2019, 10:21 AM
ardumont added inline comments.
swh/indexer/metadata_dictionary.py
527

Maybe put this in a parse_ruby_expression method.

529

It'd be interesting to know what's not supported so a log here would be nice (content's id)

Also the build is not happy for import reason (swh.scheduler.conftest...), maybe a missing bump version somewhere?

vlorentz added inline comments.
swh/indexer/metadata_dictionary.py
502
529

There's a lot of stuff that's not supported (most of which would need a real Ruby interpreter), so it would get very spammy.

538–539

Any basic data type (str, list, dict, bytes, int, float, tuple), we're parsing arbitrary and untrusted input.

542

Ruby does not have tuple literals, and ast.literal_eval never returns iterators.

swh/indexer/tests/test_metadata.py
737
  • Use itertools instead of a for loop.
  • Move Ruby evaluation to its own function.
  • Sanitize the output of eval_ruby_expression.
  • Re-implement a safe subset of ast.literal_eval instead of checking the type of its output afterward.
olasd added inline comments.
swh/indexer/metadata_dictionary.py
502

All logged events from tasks some metadata items attached in the systemd journal (see swh.core.logger.get_extra_data and JournalHandler), notably the (celery) task id.

This revision is now accepted and ready to land.Jan 29 2019, 9:55 AM
This revision was automatically updated to reflect the committed changes.