HomeSoftware Heritage

Add support for a redis-based reporting for invalid mirrorred objects

Description

Add support for a redis-based reporting for invalid mirrorred objects

The idea is that we check the BaseModel validity at journal
deserialization time so that we still have access to the raw object from
kafka for complete reporting (object id plus raw message from kafka).

This uses a new ModelObjectDeserializer class that is responsible for
deserializing the kafka message (still using kafka_to_value) then
immediately create the BaseModel object from that dict. Its convert
method is then passed as value_deserializer argument of the
JournalClient.

Then, for each deserialized object from kafka, if it's a HashableObject,
check its validity by comparing the computed hash with its id.

If it's invalid, report the error in logs, and if configured, register the
invalid object in via the reporter callback.

In the cli code, a Redis.set() is used a such a callback (if configured).
So it simply stores invalid objects using the object id a key (typically its
swhid), and the raw kafka message value as value.

Related to T3693.