This only adds the content mimetype indexer for now.
Some extra mocking work is needed to test the fossology license one so it will go in another diff.
Note that it also refactors the tests dataset to stop hard-coding wrong ids and use proper hash from our model.
In another extra diff after that, we'll drop obsolete parts and refactor to simplify existing base code.
Related to T4273