Page MenuHomeSoftware Heritage

clean up bogus mimetype values in content_mimetype table
Closed, MigratedEdits Locked

Description

As the indexer will use another tool to index, the current bogus values won't be updated.
So they should be cleaned up.

Event Timeline

Bogus mimetype values are identified by the following queries:

softwareheritage=> select count(*) from content_mimetype where mimetype LIKE '[%' or mimetype like '' and indexer_configuration_id=7;
 count
-------
 50733
(1 row)

As soon as the index is ready on prado (i did the analysis on somerset), clean up will be done with:

delete from content_mimetype where mimetype LIKE '[%' or mimetype like '' and indexer_configuration_id=7;
ardumont claimed this task.