Page MenuHomeSoftware Heritage

provenance: origin processed several time by different clients
Closed, InvalidPublic

Description

The origin client are processing several times the same origin:

./client.py 5

INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/apper.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/apper.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/apper.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/apper.git in 0:00:11.722244
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/apper.git in 0:00:11.912540
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/apper.git in 0:00:11.996918
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/apper.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/apper.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appimage-packaging.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appimage-packaging.git in 0:00:02.261688
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appimage-packaging.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appimage-packaging.git in 0:00:00.413402
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appmenu-runner.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appmenu-runner.git in 0:00:00.742465
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appmenu-runner.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/apper.git in 0:00:05.088323
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appmenu-runner.git in 0:00:00.395179
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appstream-runner.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appstream-runner.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/apper.git in 0:00:05.518890
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appstream-runner.git in 0:00:00.459686
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appstream-runner.git in 0:00:00.478736
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/appstream-runner.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/ark.git
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/ark.git
INFO:swh.provenance.origin:Processed origin https://anongit.kde.org/appstream-runner.git in 0:00:00.504360
INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/ark.git

The origin server is probably sending the same url until a ack is received

Event Timeline

vsellier triaged this task as Normal priority.Jun 2 2022, 3:40 PM
vsellier created this task.

It's not as bad as expected, it seems only 2 clients are proceeding the same origin at the same time:

(provenance) provenance-client01:~/provenance/provenance-tools/origins$ sort ~/origin-client.log | uniq -c  | sort -n | grep -v 50 | tail -n 10
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/nepomuk-web-extractor/vhanda/dms.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/networkmanagement/borzenkov/networkmanagement.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/networkmanagement/gokcen/networkmanagement.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/ocs-cdn/kvermette/ocs-cdn-privatekeys.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/okteta/kossebau/okteta.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/okular/azatkhuzhin/okular.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/okular/chinmoyr/okulardigitalsignature.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/okular/mamun/text_selection_and_highlighting.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/okular/thomasfischer/bestfit.git
      2 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/clones/phonon-gstreamer/garg/phonon-gstreamer.git

after some ingestion time, it seems the first analysis is wrong:

(provenance) provenance-client01:~$ sort ~/origin-client.log |grep Processing | uniq -c  | sort -n | tail -n 10
      6 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/360factors/workflow-conditions.git
      6 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/4s/cda-document-viewer.git
      6 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/4tic/goc.git
      6 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/ALEXks/sapfor_2017.git
      6 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/ASIWeb/testcomplete.git
      7 INFO:swh.provenance.origin:Processing origin https://anonhg.netbsd.org/xsrc-draft/
      7 INFO:swh.provenance.origin:Processing origin https://anonhg.netbsd.org/xsrc-public/
      8 INFO:swh.provenance.origin:Processing origin https://anonhg.netbsd.org/xsrc/
     13 INFO:swh.provenance.origin:Processing origin https://bitbucket.org/9front/plan9front
     20 INFO:swh.provenance.origin:Processing origin https://anongit.kde.org/kdenlive.git

This issue was a false lead, different snapshots are declared for the same origin letting me believe there were some duplicates.
I completely missed it and fall head first into the trap

At least, it forced me to implement a docker environment to test provenance locally

aeviso@met:~/provenance/data$ zgrep -c  "https://anongit.kde.org/kdenlive.git" origins_2021-12-20.csv.gz 
20