normalize encoding values across mimetype and language indexers
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	ardumont
	Jun 6 2017, 1:29 PM

Description

In the language indexer, we need to detect the encoding to permit to compute the language from the text.

As we already compute the content to detect the mimetype and the encoding in a prior step, we should use that encoding.
But an implementation detail prevents this.

The encoding detected by the cli 'file' used in the mimetype indexer and the native decoding of our environment (python) does not match.
We should normalize this.

Related Objects

Mentioned In: T722: Improve language indexer performance

Event Timeline

ardumont created this task.Jun 6 2017, 1:29 PM

ardumont mentioned this in T722: Improve language indexer performance.

zack renamed this task from Reuse encoding detected in mimetype indexer for language indexer to normalize encoding values across mimetype and language indexers.Jun 6 2017, 1:53 PM

This task has been migrated to GitLab.

normalize encoding values across mimetype and language indexersClosed, MigratedEdits LockedActions

Description

Related Objects

Event Timeline

normalize encoding values across mimetype and language indexers
Closed, MigratedEdits Locked
Actions