As explained in the parent task, bogus values exist for the indexer mimetype.
We need to list and schedule them back again.
After the fix has been deployed (T849)
As explained in the parent task, bogus values exist for the indexer mimetype.
We need to list and schedule them back again.
After the fix has been deployed (T849)
Depends on T761
One worker (worker08.euwest.azure) has been migrated so it's working alone for now.
The old tool is id 7, the new one is 9:
softwareheritage=> select * from indexer_configuration where id in (7, 9); id | tool_name | tool_version | tool_configuration ----+-----------+-----------------+-------------------------------------------------------- 7 | file | 5.22 | {"command_line": "file --mime <filepath>"} 9 | file | 1:5.30-1+deb9u1 | {"type": "library", "debian-package": "python3-magic"} (2 rows)
Old and bogus values are:
softwareheritage=> select count(*) from content_mimetype where mimetype LIKE '[%' or mimetype like '' and indexer_configuration_id=7; count ------- 50733 (1 row)
The list of those id has been scheduled back and those have been indexed.
Checking that the new indexed values with the new id, nothing is returned:
softwareheritage=> select count(*) from content_mimetype where (mimetype LIKE '[%' or mimetype like '') and indexer_configuration_id=9; count ------- 0 (1 row)
Checking for example some ids with bogus values, i have indeed 2 values (one for the old tool which is bogus, one for the new one which is not):
softwareheritage=> select convert_from(mimetype, 'utf-8'), convert_from(encoding, 'utf-8'), indexer_configuration_id from content_mimetype where id='\x8feab4fd3881e396012724e166801bb3a4b41419'; convert_from | convert_from | indexer_configuration_id -----------------------------+--------------+-------------------------- [ [application/octet-stream | binary | 7 application/x-mach-binary | binary | 9 (2 rows) softwareheritage=> select convert_from(mimetype, 'utf-8'), convert_from(encoding, 'utf-8'), indexer_configuration_id from content_mimetype where id='\xcd0187768974258b2e959320f52137389b020bce'; convert_from | convert_from | indexer_configuration_id -------------------------------+--------------+-------------------------- [ [ [application/octet-stream | binary | 7 application/x-mach-binary | binary | 9 (2 rows)