Indexer language is slow due to the the tool used underneath (pygments) and possibly the content's size.
To give some details, pygments is used to detect language since it's the tool which detects more language.
Problem is, its api is working only on text and not on bytes (and we deal with bytes). So we need to detect its encoding and then decode it appropriately.
It has been already improved recenly to detect the encoding incrementally.
It's not enough though.
Hints:
- use the detected encoding from the mimetype indexer and pass along that optional information.
- take only the first 10k (as configuration option) of the raw contents