Page MenuHomeSoftware Heritage

Add gemspec mapping.
ClosedPublic

Authored by vlorentz on Jan 15 2019, 8:19 PM.

Diff Detail

Repository
rDCIDX Object indexer
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

vlorentz created this revision.Jan 15 2019, 8:19 PM
douardda requested changes to this revision.Jan 16 2019, 10:21 AM
douardda added a subscriber: douardda.
douardda added inline comments.
swh/indexer/metadata_dictionary.py
503

For all logging statements, shouldn't it make sens to have the id of the object being indexed ? I've not checked how indexers work so i'm not sure if this id makes sense and is available at this level, but IMHO having warnings and error messages in the logs which we cannot easily link to a known entity (an origin, a revision, etc.) is pretty frustrating.

509–511

unless i'm mistaken, the match var is not used, so why not

for line in lines:
  if self._re_spec_new.match*line):
    break

Another solution is using the dropwhile iterator (itertools), something like:

lines = dropwhile(lambda x: not self._re_spec_new.match(x), lines.split('\n'))

and move the 'emptyness' detection at the end of the function.

539–540

I don't like too much this kind of if isinstance(str)... What are the expected types for s? Can it be something else than str or None?
In which case I prefer if s is not None:

543

why forbid tuples or iterators?

swh/indexer/tests/test_metadata.py
738

When writing tests for this kind of feature, please also add limit cases. How about invalid spec files? what if the description (or any other field) is a (literal) multi line string/list/dict?

One test for the most basic and "friendly" data is not enough.

This revision now requires changes to proceed.Jan 16 2019, 10:21 AM
ardumont added inline comments.
swh/indexer/metadata_dictionary.py
528

Maybe put this in a parse_ruby_expression method.

530

It'd be interesting to know what's not supported so a log here would be nice (content's id)

Also the build is not happy for import reason (swh.scheduler.conftest...), maybe a missing bump version somewhere?

vlorentz marked 7 inline comments as done.Jan 16 2019, 11:42 AM
vlorentz added inline comments.
swh/indexer/metadata_dictionary.py
503
530

There's a lot of stuff that's not supported (most of which would need a real Ruby interpreter), so it would get very spammy.

539–540

Any basic data type (str, list, dict, bytes, int, float, tuple), we're parsing arbitrary and untrusted input.

543

Ruby does not have tuple literals, and ast.literal_eval never returns iterators.

swh/indexer/tests/test_metadata.py
738
vlorentz updated this revision to Diff 3052.Jan 16 2019, 11:42 AM
  • Use itertools instead of a for loop.
  • Move Ruby evaluation to its own function.
vlorentz updated this revision to Diff 3062.Jan 16 2019, 6:42 PM
  • Sanitize the output of eval_ruby_expression.
vlorentz updated this revision to Diff 3067.Jan 17 2019, 11:38 AM
  • Re-implement a safe subset of ast.literal_eval instead of checking the type of its output afterward.
olasd added a subscriber: olasd.Jan 17 2019, 4:03 PM
olasd added inline comments.
swh/indexer/metadata_dictionary.py
503

All logged events from tasks some metadata items attached in the systemd journal (see swh.core.logger.get_extra_data and JournalHandler), notably the (celery) task id.

vlorentz updated this revision to Diff 3133.Jan 23 2019, 2:40 PM
  • rebase on top of D989.
vlorentz updated this revision to Diff 3159.Jan 24 2019, 3:06 PM
  • Rebase
ardumont accepted this revision.Jan 28 2019, 3:21 PM
douardda accepted this revision.Jan 29 2019, 9:55 AM
This revision is now accepted and ready to land.Jan 29 2019, 9:55 AM
vlorentz updated this revision to Diff 3226.Jan 29 2019, 10:13 AM
  • rebase + squash
This revision was automatically updated to reflect the committed changes.