We want to have metadata in the DB that associate each blob to its intrinsic filetype.
As a first approximation the filetype might be encoded as a MIME type and computed using file --mime-type.
Having also the detected encoding (as per file --mime-encoding) would be nice too and will help the webapp quite a bit.
More advanced and structured information could be detected by using other tools, some of which are summarized in LWN.net's File-format analysis tools for archivists article.