Page MenuHomeSoftware Heritage

Visits for the GNU injection don't contain all the occurrences.
Closed, MigratedEdits Locked

Description

softwareheritage=> select * from occurrence_history where origin = 4511612;
┌─────────┬────────────────────┬─────────────────────────────────────────────────────────────────────────┬─────────────┬────────┬───────────┐
│ origin  │       branch       │                                 target                                  │ target_type │ visits │ object_id │
├─────────┼────────────────────┼─────────────────────────────────────────────────────────────────────────┼─────────────┼────────┼───────────┤
│ 4511612 │ gzip-1.2.4.tar     │ L\327P\314^\272\355\236\347\246#\312\267J\220o\234\215\346?             │ revision    │ {1}    │   3040537 │
│ 4511612 │ gzip-1.2.4.tar.gz  │ L\327P\314^\272\355\236\347\246#\312\267J\220o\234\215\346?             │ revision    │ {2}    │   3027132 │
│ 4511612 │ gzip-1.2.4a.tar    │ H\257\214\226b\000\314Y\267;zM\306\320\251T\212\250\204\263             │ revision    │ {3}    │   2992936 │
│ 4511612 │ gzip-1.2.4a.tar.gz │ \231C`rM\275\327&\221\224 \277Hk\311'-"\370\205                         │ revision    │ {4}    │   3023538 │
│ 4511612 │ gzip-1.3.12.tar    │ \034\023\274\240\036\334\342\000\260\260\016X\263e\233\312\206\336\305J │ revision    │ {5}    │   2994581 │
│ 4511612 │ gzip-1.3.12.tar.gz │ \000\370:\357\376%X\015\336\265\035\021\344\315>\370{\361\361,          │ revision    │ {6}    │   2999556 │
│ 4511612 │ gzip-1.3.13.tar.gz │ \012\206\240\206e\312\321p4\362\004\232@\267\243\033\370<\036\200       │ revision    │ {7}    │   3011838 │
│ 4511612 │ gzip-1.3.13.tar.xz │ 2\330\230KO\213\361~\272i\354\241-\236\025\200\313M\253\037             │ revision    │ {8}    │   3033738 │
│ 4511612 │ gzip-1.3.9.tar     │ \273\221\235\276\310d\266^\035\023\364\036}\002$\255\202\007\351\027    │ revision    │ {9}    │   3005745 │
│ 4511612 │ gzip-1.3.9.tar.gz  │ \261\374e\356\177m\260\034,\313j\227\334*5\001\202;\3475                │ revision    │ {10}   │   3013005 │
│ 4511612 │ gzip-1.4.tar.gz    │ \204\320\301t\317d\266\255\022\343^\036[\010rJ\271\006+\257             │ revision    │ {11}   │   3042681 │
│ 4511612 │ gzip-1.4.tar.xz    │ \210\246\374\331\015\005\334\333\360^o\204\370\011\023\253lG\024\356    │ revision    │ {11}   │   3032717 │
│ 4511612 │ gzip-1.5.tar.gz    │ :\020'sd9A\023\334mh\3048\012p\363\335\264@p                            │ revision    │ {12}   │   3038557 │
│ 4511612 │ gzip-1.5.tar.xz    │ \010G\320\375\315\236[\277\225\004\205J"\012\306D~\360\202\333          │ revision    │ {13}   │   3009745 │
│ 4511612 │ gzip-1.6.tar.gz    │ \206\243\277U\306J9\361\3306Y8\177u\232\252|v\363X                      │ revision    │ {14}   │   2998938 │
│ 4511612 │ gzip-1.6.tar.xz    │ \232\212\376\252\312\024>\366]\355{\322\343>\273\356\033\214\373)       │ revision    │ {15}   │   2992192 │
└─────────┴────────────────────┴─────────────────────────────────────────────────────────────────────────┴─────────────┴────────┴───────────┘
(16 lignes)

Temps : 7,989 ms
softwareheritage=> select * from origin_visit where origin = 4511612;
┌─────────┬───────┬────────────────────────┬────────┬──────────┐
│ origin  │ visit │          date          │ status │ metadata │
├─────────┼───────┼────────────────────────┼────────┼──────────┤
│ 4511612 │     1 │ 2015-08-28 11:13:26+00 │ full   │ ¤        │
└─────────┴───────┴────────────────────────┴────────┴──────────┘
(1 ligne)

The tarball loader created one visit per occurrence initially; All visits have been merged, but it seems the occurrences didn't get fixed to match.

Event Timeline

So with the API can we access directly version 1.3.12? aka object_id: 2994581

Not with the current public API: the referential integrity of occurrences generated by the GNU injection is not verified, so the occurrences for "not version 1.2.4" are unreachable.

olasd changed the task status from Open to Work in Progress.May 4 2017, 2:11 PM
olasd claimed this task.
softwareheritage=> select origin.id, count(distinct visit) from origin left join origin_visit ov on ov.origin = origin.id where type = 'ftp' group by origin.id having count(distinct visit) <> 1;
┌────┬───────┐
│ id │ count │
├────┼───────┤
└────┴───────┘
(0 ligne)

It turns out that all origins that have type ftp have only one visit, which makes the fix on occurrence_history really easy:

update occurrence_history set visits = array[1::bigint] where origin in (select id from origin where type = 'ftp') and visits <> array[1::bigint];

All updates performed.