Page MenuHomeSoftware Heritage

FUSE: cache: update cache with new origin visits
ClosedPublic

Authored by haltode on Dec 15 2020, 2:54 PM.

Details

Summary

Closes T2841.

Force a cache update on origin visits when last fetched more than one
day ago.

Diff Detail

Repository
rDFUSE FUSE virtual file system
Branch
feature/update-cache-new-origin-visits
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 17982
Build 27775: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 27774: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D4744 (id=16799)

Rebasing onto 7154e451ff...

Current branch diff-target is up to date.
Changes applied before test
commit a2ce5ab4fa491e4fd1865cc0b0deefef75d7eeaf
Author: Thibault Allançon <haltode@gmail.com>
Date:   Tue Dec 15 14:20:48 2020 +0100

    cache: update cache with new origin visits
    
    Closes T2841.
    
    Force a cache update on origin visits when last fetched more than one
    day ago.

commit dd96749820b159fc85d4a4608a7862d2458d7d58
Author: Thibault Allançon <haltode@gmail.com>
Date:   Tue Dec 15 14:47:51 2020 +0100

    cache: add primary key to db tables

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/238/ for more details.

zack requested changes to this revision.Dec 16 2020, 12:12 PM
zack added a subscriber: zack.
zack added inline comments.
swh/fuse/cache.py
124–125

is this needed now that you're making the "swhid" column a primary key?
please check, and remove it if not needed anymore

128

Minor naming issue: this is more like "fetch" than "last_fetched", because due to the primary key you will never have more than one entries for the same URL. All in all, something with "time" in the name would be more idiomatic, like "itime" (for "insertion time [in cache]").

130–131

ditto

150

not sure why you want to make timestamp checking optional (do we have a use case for that?), but if you really do make the default True, because that's the sane cache semantics

This revision now requires changes to proceed.Dec 16 2020, 12:12 PM
haltode marked 4 inline comments as done.
  • Rebase on master
  • Fix zack comments
swh/fuse/cache.py
124–125

Indeed, unique and primary keys already have an index associated (https://stackoverflow.com/questions/3379292/is-an-index-needed-for-a-primary-key-in-sqlite). Thanks for the catch!

Build is green

Patch application report for D4744 (id=16826)

Rebasing onto 9546ba2f7b...

Current branch diff-target is up to date.
Changes applied before test
commit a6cef6bad56e709fff28a1287031a481e37c097b
Author: Thibault Allançon <haltode@gmail.com>
Date:   Tue Dec 15 14:20:48 2020 +0100

    cache: update cache with new origin visits
    
    Closes T2841.
    
    Force a cache update on origin visits when last fetched more than one
    day ago.

commit b6e8cf744f3e4c1e16fd1eca6d4ae98629d39613
Author: Thibault Allançon <haltode@gmail.com>
Date:   Tue Dec 15 14:47:51 2020 +0100

    cache: add primary key to db tables

See https://jenkins.softwareheritage.org/job/DFUSE/job/tests-on-diff/240/ for more details.

This revision is now accepted and ready to land.Dec 16 2020, 4:19 PM