Reading them from the config is prone to errors, eg. mismatch between the version declared in the config and the actual system.
Description
Related Objects
Event Timeline
Can you elaborate on how this would be implemented?
In particular, i'd be against of trusting any version self-declared by the tool in use (e.g., by invoking --version), precisely because that information might be wrong (e.g., the self-declared version might be stale in the source code). It should be on us, as project running the tools, to declare which version of the tool we are running. As it's important reproducibility information, we shouldn't trust the tool output.
Maybe this is not what you plan here, but I felt it was useful to anticipate the concern :-)
any version self-declared by the tool in use
That was my first thought, but as not all tools do it (eg. file_magic and python-magic), that's not possible anyway.
I am currently investigating pkg_resources.
Proposal for Python packages:
import pkgutil # stdlib import pkg_resources # part of setuptools # Get a "package" by its unique name # (avoids name clashes, like between `python-magic` and `file_magic`) dist = pkg_resources.get_distribution(dist_spec) # This is usually FileFinder for ~/.local/lib/python3.X/site-packages or /usr/python3.X/site-packages importer = pkgutil.get_importer(dist.module_path) # Actually import the module module = importer.find_module(module_name).load_module()
for instance, for python-magic:
>>> dist_spec = 'python-magic' >>> module_name = 'magic' >>> dist = pkg_resources.get_distribution(dist_spec) >>> importer = pkgutil.get_importer(dist.module_path) >>> magic = importer.find_module(module_name).load_module() >>> print(dist.version) 0.4.15 >>> magic.Magic.from_buffer <function Magic.from_buffer at 0x7fb4b08c8c80> >>> magic.detect_from_content Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'magic' has no attribute 'detect_from_content'
and for file_magic:
>>> dist_spec = 'file_magic' >>> module_name = 'magic' >>> dist = pkg_resources.get_distribution(dist_spec) >>> importer = pkgutil.get_importer(dist.module_path) >>> magic = importer.find_module(module_name).load_module() >>> print(dist.version) 0.3.0 >>> magic.Magic.from_buffer Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: type object 'Magic' has no attribute 'from_buffer' >>> magic.detect_from_content <function detect_from_content at 0x7fb4b08f3c80>
These version numbers are extracted from site-packages/*.dist-info/, which is written by package managers.
That's as close to a unique identifier as we can get using only a name and a version.
To get a really unique identifier, we would also need to know which package manager was used, and the repository used by the package manager. Is it worth it?
Another concern is non-Python packages, like libmagic itself. Other than asking dpkg db and hoping for the best (ie. there is no other version installed in /.local or /usr/local), I don't see how to do it.
Though if we decide to stick with hardcoding tool versions in the config, we could add some runtime checks for Python packages:
>>> dist_spec = 'file_magic==0.4.0' >>> dist = pkg_resources.get_distribution(dist_spec) [...] pkg_resources.VersionConflict: (file-magic 0.3.0 (/usr/lib/python3/dist-packages), Requirement.parse('file_magic==0.4.0')) >>> dist_spec = 'file_magic==0.3.0' >>> dist = pkg_resources.get_distribution(dist_spec)