Page MenuHomeSoftware Heritage

Use python-magic instead of file_magic.
ClosedPublic

Authored by vlorentz on Jan 8 2019, 2:00 PM.

Diff Detail

Repository
rDCIDX Metadata indexer
Branch
python-magic
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 3283
Build 4231: tox-on-jenkinsJenkins
Build 4230: arc lint + arc unit

Event Timeline

ardumont added inline comments.
swh/indexer/mimetype.py
29

With this, the tool registration mechanism must change.
Right now, the tool used is declared in the configuration, thus static.

With this, that becomes dynamic.
To stay consistent, that must be recorded alongside the index result.

Right now, the configuration is the one by default referencing python3-magic [1]

[1] https://forge.softwareheritage.org/source/swh-indexer/browse/master/swh/indexer/mimetype.py$38-44

vlorentz marked an inline comment as done.
vlorentz added inline comments.
swh/indexer/mimetype.py
29

Agreed. Let's put this diff on hold while we migrate to introspecting tools.

anlambert added inline comments.
requirements.txt
5

I would keep file_magic by default here as python-magic is not packaged in Debian stretch (python3-magic deb package corresponds to file_magic).
Otherwise, a backport package for python-magic must be created for successfull deployment to production environment

vlorentz added inline comments.
requirements.txt
5

The problem is that the python-magic in Stretch has a bug that affects us (T861).

requirements.txt
5

Might be related to a bug I also observed in the webapp, I used a dirty workaround that seems to do the job (see https://forge.softwareheritage.org/source/swh-web/browse/master/swh/web/browse/utils.py$71).

vlorentz added inline comments.
requirements.txt
5

I don't thing so, the bug T861 always happens for a given content.

requirements.txt
5

Effectively, so backporting the python3-magic deb package from testing to stretch is required before production deployment. I could also benefit from it in the webapp and remove my dirty hack.

  • Always use python-magic, drop file_magic.
vlorentz retitled this revision from Add support for python-magic in addition to file_magic, and make it the default. to Use python-magic instead of file_magic..Jan 9 2019, 3:18 PM
vlorentz edited the summary of this revision. (Show Details)
olasd added inline comments.
swh/indexer/mimetype.py
10–15

If they're both available with import magic, they're incompatible and shouldn't really be installed at the same time.

Could we just check that the magic module has an attribute that's only available in the version we want, and fail importing if it doesn't?

vlorentz marked an inline comment as done.
  • 'import magic' then check it's the right one, instead of using pkg_resources.

(the build failure is caused by D897)

I'm ok with the idea.

This needs to be in pending state or something.
This diff's code depends on a not-yet existing backport package python-magic to be built IIUC.

ardumont edited the summary of this revision. (Show Details)
  • rebase
  • squash some commits