Page MenuHomeSoftware Heritage

Add support for skipping large contents in from_disk.
ClosedPublic

Authored by vlorentz on Feb 20 2020, 4:42 PM.

Details

Reviewers
anlambert
Group Reviewers
Reviewers
Summary

It will be useful to loaders, as they currently load the entire
content in memory before deciding to skip it.

Diff Detail

Repository
rDMOD Data model
Branch
max-content-length
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 10715
Build 16075: tox-on-jenkinsJenkins
Build 16074: arc lint + arc unit

Event Timeline

anlambert added a subscriber: anlambert.

Some docstrings need to be updated, my other comments are just nitpicks

swh/model/from_disk.py
113

documentation for the max_content_length parameter is missing

117

Alternative way to make flake8 happy without using a backslash

too_large = (max_content_length is not None
             and length > max_content_length)
265–267

same here

swh/model/tests/test_from_disk.py
555

You could write:

assert 'too large' in limited_content.data['reason']

to make flake8 happy and remove the backslash use.

This revision now requires changes to proceed.Feb 20 2020, 5:19 PM
swh/model/tests/test_from_disk.py
555

I prefer an exact match unless there's a reason not to.

add doc for max_content_length.

This revision is now accepted and ready to land.Feb 20 2020, 5:33 PM

60c3aa16cf8778c7413606e2a532cfc47966d63c