Page MenuHomeSoftware Heritage

Use python-magic instead of file_magic.
Changes PlannedPublic

Authored by vlorentz on Jan 8 2019, 2:00 PM.

Details

Summary

Depends on D895
Depends on T1504
Resolves T861

Diff Detail

Repository
rDCIDX Object indexer
Branch
python-magic
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3350
Build 4318: tox-on-jenkinsJenkins
Build 4317: arc lint + arc unit

Event Timeline

vlorentz created this revision.Jan 8 2019, 2:00 PM
vlorentz updated this revision to Diff 2832.Jan 8 2019, 2:17 PM
  • rebase
ardumont added inline comments.
swh/indexer/mimetype.py
35

With this, the tool registration mechanism must change.
Right now, the tool used is declared in the configuration, thus static.

With this, that becomes dynamic.
To stay consistent, that must be recorded alongside the index result.

Right now, the configuration is the one by default referencing python3-magic [1]

[1] https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/mimetype.py$38-44

vlorentz planned changes to this revision.Jan 8 2019, 2:27 PM
vlorentz marked an inline comment as done.
vlorentz added inline comments.
swh/indexer/mimetype.py
35

Agreed. Let's put this diff on hold while we migrate to introspecting tools.

anlambert added inline comments.
requirements.txt
5

I would keep file_magic by default here as python-magic is not packaged in Debian stretch (python3-magic deb package corresponds to file_magic).
Otherwise, a backport package for python-magic must be created for successfull deployment to production environment

vlorentz marked an inline comment as done.Jan 8 2019, 2:29 PM
vlorentz added inline comments.
requirements.txt
5

The problem is that the python-magic in Stretch has a bug that affects us (T861).

anlambert added inline comments.Jan 8 2019, 2:34 PM
requirements.txt
5

Might be related to a bug I also observed in the webapp, I used a dirty workaround that seems to do the job (see https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/utils.py$71).

vlorentz marked an inline comment as done.Jan 8 2019, 2:36 PM
vlorentz added inline comments.
requirements.txt
5

I don't thing so, the bug T861 always happens for a given content.

anlambert added inline comments.Jan 8 2019, 2:41 PM
requirements.txt
5

Effectively, so backporting the python3-magic deb package from testing to stretch is required before production deployment. I could also benefit from it in the webapp and remove my dirty hack.

vlorentz updated this revision to Diff 2842.Jan 9 2019, 3:17 PM
  • Always use python-magic, drop file_magic.
vlorentz retitled this revision from Add support for python-magic in addition to file_magic, and make it the default. to Use python-magic instead of file_magic..Jan 9 2019, 3:18 PM
vlorentz edited the summary of this revision. (Show Details)
vlorentz updated this revision to Diff 2843.Jan 9 2019, 3:26 PM
  • Fix version.
olasd added a subscriber: olasd.Jan 9 2019, 5:31 PM
olasd added inline comments.
swh/indexer/mimetype.py
16–21

If they're both available with import magic, they're incompatible and shouldn't really be installed at the same time.

Could we just check that the magic module has an attribute that's only available in the version we want, and fail importing if it doesn't?

vlorentz updated this revision to Diff 2864.Jan 9 2019, 5:43 PM
vlorentz marked an inline comment as done.
  • 'import magic' then check it's the right one, instead of using pkg_resources.

(the build failure is caused by D897)

vlorentz updated this revision to Diff 2877.Jan 10 2019, 11:50 AM
  • rebuild
ardumont added a comment.EditedJan 11 2019, 12:13 PM

I'm ok with the idea.

This needs to be in pending state or something.
This diff's code depends on a not-yet existing backport package python-magic to be built IIUC.

ardumont edited the summary of this revision. (Show Details)Feb 11 2019, 4:32 PM
ardumont edited the summary of this revision. (Show Details)
vlorentz planned changes to this revision.Jun 26 2019, 12:05 PM