Page MenuHomeSoftware Heritage

Add gemspec mapping.
ClosedPublic

Authored by vlorentz on Jan 15 2019, 8:19 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
gemspec
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3658
Build 4763: tox-on-jenkinsJenkins
Build 4762: arc lint + arc unit

Event Timeline

douardda added a subscriber: douardda.
douardda added inline comments.
swh/indexer/metadata_dictionary.py
426

For all logging statements, shouldn't it make sens to have the id of the object being indexed ? I've not checked how indexers work so i'm not sure if this id makes sense and is available at this level, but IMHO having warnings and error messages in the logs which we cannot easily link to a known entity (an origin, a revision, etc.) is pretty frustrating.

432–434

unless i'm mistaken, the match var is not used, so why not

for line in lines:
  if self._re_spec_new.match*line):
    break

Another solution is using the dropwhile iterator (itertools), something like:

lines = dropwhile(lambda x: not self._re_spec_new.match(x), lines.split('\n'))

and move the 'emptyness' detection at the end of the function.

462–463

I don't like too much this kind of if isinstance(str)... What are the expected types for s? Can it be something else than str or None?
In which case I prefer if s is not None:

466

why forbid tuples or iterators?

swh/indexer/tests/test_metadata.py
698

When writing tests for this kind of feature, please also add limit cases. How about invalid spec files? what if the description (or any other field) is a (literal) multi line string/list/dict?

One test for the most basic and "friendly" data is not enough.

This revision now requires changes to proceed.Jan 16 2019, 10:21 AM
ardumont added inline comments.
swh/indexer/metadata_dictionary.py
451

Maybe put this in a parse_ruby_expression method.

453

It'd be interesting to know what's not supported so a log here would be nice (content's id)

Also the build is not happy for import reason (swh.scheduler.conftest...), maybe a missing bump version somewhere?

vlorentz added inline comments.
swh/indexer/metadata_dictionary.py
426
453

There's a lot of stuff that's not supported (most of which would need a real Ruby interpreter), so it would get very spammy.

462–463

Any basic data type (str, list, dict, bytes, int, float, tuple), we're parsing arbitrary and untrusted input.

466

Ruby does not have tuple literals, and ast.literal_eval never returns iterators.

swh/indexer/tests/test_metadata.py
698
  • Use itertools instead of a for loop.
  • Move Ruby evaluation to its own function.
  • Sanitize the output of eval_ruby_expression.
  • Re-implement a safe subset of ast.literal_eval instead of checking the type of its output afterward.
olasd added inline comments.
swh/indexer/metadata_dictionary.py
426

All logged events from tasks some metadata items attached in the systemd journal (see swh.core.logger.get_extra_data and JournalHandler), notably the (celery) task id.

This revision is now accepted and ready to land.Jan 29 2019, 9:55 AM
This revision was automatically updated to reflect the committed changes.