Page MenuHomeSoftware Heritage

loaders: Move the proxy storage filter after the buffer proxy
ClosedPublic

Authored by ardumont on Sep 18 2020, 2:05 PM.

Details

Summary

in their pipeline configuration

context: D3976 for some DVCS loaders now send one object at a time to the
storage.

So this will allow batching calls to the *_missing endpoints (for dvcs loaders
e.g. git loader).

This slighly impacts the package loaders but this should tend towards null.

Prior to this we filtered unknown objects and kept a buffer of those unknown
objects to flush to the storage given a threshold hit.

Now, we will buffer all objects and then filter on said buffer of objects. So
we may increase calls to the *_missing endpoints.

Related to D3976
Related to T2373

Test Plan
  • run on staging node with this and everything runs fine (git, npm, pypi... see T2373#49135)
  • octocatalog
bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging worker01
Found host worker01.softwareheritage.org
Cloning into '/tmp/swh-ocd.lTqt9H4A/environments/production/data/private'...
done.
Cloning into '/tmp/swh-ocd.lTqt9H4A/environments/staging/data/private'...
done.
*** Running octocatalog-diff on host worker01.softwareheritage.org
I, [2020-09-18T14:06:13.917723 #6126]  INFO -- : Catalogs compiled for worker01.softwareheritage.org
I, [2020-09-18T14:06:14.899756 #6126]  INFO -- : Diffs computed for worker01.softwareheritage.org
diff origin/production/worker01.softwareheritage.org current/worker01.softwareheritage.org
*******************************************
  File[/etc/softwareheritage/loader_archive.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_cran.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_debian.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_deposit.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_git.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_mercurial.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_nixguix.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_npm.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_pypi.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
  File[/etc/softwareheritage/loader_svn.yml] =>
   parameters =>
     content =>
      @@ -4,5 +4,4 @@
         steps:
         - cls: retry
      -  - cls: filter
         - cls: buffer
           min_batch_size:
      @@ -12,4 +11,5 @@
             revision: 1000
             release: 1000
      +  - cls: filter
         - cls: remote
           args:
*******************************************
*** End octocatalog-diff on worker01.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Branch
staging
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 15327
Build 23608: arc lint + arc unit

Event Timeline

olasd added a subscriber: olasd.

Yeah, this definitely needs to happen before the new swh.loader.core is deployed.

This revision is now accepted and ready to land.Sep 18 2020, 2:50 PM
ardumont edited the summary of this revision. (Show Details)

Rework commit message according to diff description