HomeSoftware Heritage

Fix CardinalityViolation in grab_next_visits on duplicate origins

Description

Fix CardinalityViolation in grab_next_visits on duplicate origins

grab_next_visits grabs from listed_origins, whose primary key is
(lister_id, url, visit_type) and uses it to upsert in origin_visit_stats,
whose primary key is (url, visit_type).
This causes the error `ON CONFLICT DO UPDATE command cannot affect row a
second time` when the same (origin, type) pair is grabbed twice.

This commit deduplicates the (origin, type) pairs before upserting.

Details

Provenance
vlorentzAuthored on Nov 22 2021, 1:32 PM
vlorentzPushed on Nov 22 2021, 3:51 PM
Differential Revision
D6664: Fix CardinalityViolation in grab_next_visits on duplicate origins
Parents
rDSCH00ff02eab9c9: recurrent visits: use policy weights instead of ratios
Branches
Unknown
Tags
Unknown
References
tag: v0.20.0
Build Status
Buildable 25101
Build 39217: test-and-buildJenkins console · Jenkins