Page MenuHomeSoftware Heritage

Better Validation in swh.model
Closed, ResolvedPublic


The data model defined in swh.model.model does the job of declaring the data model, especially expected types for each entity attribute.

However, these types are not currently checked nor validated.

mypy could be used to do more (static) type checking, we are not there yet.

Also, since we are gradually delegating validation of any ingested content (be it from swh.loader or swh.journal.replayer) to the model, we need to enforce its strictness:

  • enforce the checking/validation capabilities of this model (and static validation using mypy might not be enough here) and
  • ensure we have correctly specified, documented and tested API for this model stack (including the from_dict()/to_dict() methods).

One point to decide is whether we want runtime validation or not (or maybe a disengageable one, see attr.set_run_validators() for example).

Some pointers related/useful for this subject:

Event Timeline

douardda triaged this task as Normal priority.Mar 11 2020, 4:06 PM
douardda created this task.
douardda updated the task description. (Show Details)

Also BaseModel.from_dict is currently pretty inconsistent: sometimes it will take care of instanciating model entities for attributes (e.g. for TimestampWithTimezone.timestamp), sometimes not (e.g. SkippedContent.origin)

Let's consider this done with the landing of D2819 which adds runtime type validation.