Page MenuHomeSoftware Heritage

Use hashes for all kafka keys
Open, NormalPublic

Description

including origins, visits, visit statuses, and extrinsic metadata

currently we are using a mix of hashes, and dicts of bytes and str, so it's less than ideal.

It would also solve the issue we have to implement T2668.

Event Timeline

vlorentz updated the task description. (Show Details)

The following objects remain:

  • origin
  • origin_visit
  • origin_visit_status
  • metadata_fetcher
  • metadata_authority

At this point, metadata_fetcher and metadata_authority are basically churn that should get entirely folded into raw_extrinsic_metadata (same as person in revision/release). origin_visit is mostly redundant with its origin_visit_statuses and could probably go away as well.

Then remains origin and origin_visit_status. Now that we have an ExtendedSWHID for origin, we can probably use that (one could argue that origin is redundant with origin_visit(_status)?, but we still do have some origins with zero visits). origin_visit_status could get a raw_extrinsic_metadata-like manifest.

👍

(and we should keep the origin topic; we already have an ExtSWHID for origins anyway)