Page MenuHomeSoftware Heritage

Puppet: Lister implements incremental mode
ClosedPublic

Authored by franckbret on Oct 26 2022, 10:14 AM.

Details

Summary

Use with_release_since api argument to retrieve modules that have been
updated since the last date the lister has been executed.

Related T4519

Diff Detail

Repository
rDLS Listers
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build is green

Patch application report for D8777 (id=31637)

Rebasing onto 8355fee25f...

Current branch diff-target is up to date.
Changes applied before test
commit 8a026b816cc977d4bf1718b3057a9253a4eb3e25
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Oct 26 10:09:03 2022 +0200

    Puppet: Lister implements incremental mode
    
    Use with_release_since api argument to retrieve modules that have been
    updated since the last date the lister has been executed.
    
    Related T4519

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/834/ for more details.

vlorentz added inline comments.
swh/lister/puppet/lister.py
83–85

If I understand this correctly:

Constrain results to modules that have had at least one release since the given ISO 8601 date

then incremental loads would miss all packages updated later in the (local) day of the previous visit.

Did you check with_release_since is inclusive?

swh/lister/puppet/lister.py
83–85

also, what is the timezone used by with_release_since?
The timestamps I see in the file below are American, so it might not be UTC.

swh/lister/puppet/lister.py
83–85

Yes it is inclusive and It returns the same results given a date or datetime,

Query with a datetime

############################################################               
Two days ago:  2022-11-01T11:12:14.744560                                                                 
############################################################
2022-11-01 03:13:02 -0700                           
2022-11-01 04:57:26 -0700                           
2022-11-01 05:57:58 -0700                           
2022-11-01 10:06:31 -0700                                                                                
2022-11-01 10:28:20 -0700                                                                                
2022-11-01 14:53:51 -0700                           
2022-11-02 02:38:13 -0700                           
2022-11-02 03:32:59 -0700                           
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700                           
2022-11-02 09:25:05 -0700                                                                                
2022-11-02 14:56:51 -0700                                                                                
2022-11-02 17:57:44 -0700                                                                                
2022-11-02 19:20:00 -0700                                                                                
2022-11-02 20:13:51 -0700                                                                                
2022-11-03 02:24:39 -0700                           
2022-11-03 02:31:19 -0700                           
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700                                                                                
############################################################
One day ago:  2022-11-02T11:12:14.744560                                                                       
############################################################       
2022-11-02 02:38:13 -0700                           
2022-11-02 03:32:59 -0700                           
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700                           
2022-11-02 09:25:05 -0700                           
2022-11-02 14:56:51 -0700                           
2022-11-02 17:57:44 -0700                                                                                
2022-11-02 19:20:00 -0700                                                                                
2022-11-02 20:13:51 -0700                                                                                
2022-11-03 02:24:39 -0700                           
2022-11-03 02:31:19 -0700                           
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700 

Query with a date 

############################################################
Two days ago:  2022-11-01                            
############################################################
2022-11-01 03:13:02 -0700                           
2022-11-01 04:57:26 -0700                           
2022-11-01 05:57:58 -0700                           
2022-11-01 10:06:31 -0700                           
2022-11-01 10:28:20 -0700                           
2022-11-01 14:53:51 -0700                           
2022-11-02 02:38:13 -0700                           
2022-11-02 03:32:59 -0700                           
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700                           
2022-11-02 09:25:05 -0700                           
2022-11-02 14:56:51 -0700                           
2022-11-02 17:57:44 -0700                           
2022-11-02 19:20:00 -0700                           
2022-11-02 20:13:51 -0700                           
2022-11-03 02:24:39 -0700                           
2022-11-03 02:31:19 -0700                           
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700                           
############################################################
One day ago:  2022-11-02                                  
############################################################
2022-11-02 02:38:13 -0700                           
2022-11-02 03:32:59 -0700                           
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700                           
2022-11-02 09:25:05 -0700                           
2022-11-02 14:56:51 -0700                           
2022-11-02 17:57:44 -0700                           
2022-11-02 19:20:00 -0700                           
2022-11-02 20:13:51 -0700                           
2022-11-03 02:24:39 -0700                           
2022-11-03 02:31:19 -0700                           
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700
83–85

Looks like the query is UTC, it returns the same results when specifying an offset or utc

With -7 offset

############################################################                                             
Two days ago:  2022-11-01T03:37:55.576098-07:00      
############################################################                                             
2022-11-01 03:13:02 -0700
2022-11-01 04:57:26 -0700                           
2022-11-01 05:57:58 -0700                           
2022-11-01 10:06:31 -0700                                                                                
2022-11-01 10:28:20 -0700 
2022-11-01 14:53:51 -0700                                                                                
2022-11-02 02:38:13 -0700                           
2022-11-02 03:32:59 -0700                           
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700                                                                                
2022-11-02 09:25:05 -0700                                                                                
2022-11-02 14:56:51 -0700                           
2022-11-02 17:57:44 -0700                                                                                
2022-11-02 19:20:00 -0700                                                                                
2022-11-02 20:13:51 -0700                           
2022-11-03 02:24:39 -0700                                                                                
2022-11-03 02:31:19 -0700                                                                                
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700 
############################################################                                             
One day ago:  2022-11-02T03:37:55.576098-07:00                                                                 
############################################################                                             
2022-11-02 02:38:13 -0700                                                                                
2022-11-02 03:32:59 -0700                                                                                
2022-11-02 04:53:04 -0700                           
2022-11-02 09:23:28 -0700
2022-11-02 09:25:05 -0700
2022-11-02 14:56:51 -0700                           
2022-11-02 17:57:44 -0700                           
2022-11-02 19:20:00 -0700                                                                                
2022-11-02 20:13:51 -0700 
2022-11-03 02:24:39 -0700                                                                                
2022-11-03 02:31:19 -0700                           
2022-11-03 02:47:49 -0700                           
2022-11-03 02:51:34 -0700 

With utc 

############################################################
Two days ago:  2022-11-01T10:50:35.671332+00:00
############################################################
2022-11-01 03:13:02 -0700
2022-11-01 04:57:26 -0700
2022-11-01 05:57:58 -0700
2022-11-01 10:06:31 -0700
2022-11-01 10:28:20 -0700
2022-11-01 14:53:51 -0700
2022-11-02 02:38:13 -0700
2022-11-02 03:32:59 -0700
2022-11-02 04:53:04 -0700
2022-11-02 09:23:28 -0700
2022-11-02 09:25:05 -0700
2022-11-02 14:56:51 -0700
2022-11-02 17:57:44 -0700
2022-11-02 19:20:00 -0700
2022-11-02 20:13:51 -0700
2022-11-03 02:24:39 -0700
2022-11-03 02:31:19 -0700
2022-11-03 02:47:49 -0700
2022-11-03 02:51:34 -0700
############################################################
One day ago:  2022-11-02T10:50:35.671332+00:00
############################################################
2022-11-02 02:38:13 -0700
2022-11-02 03:32:59 -0700
2022-11-02 04:53:04 -0700
2022-11-02 09:23:28 -0700
2022-11-02 09:25:05 -0700
2022-11-02 14:56:51 -0700
2022-11-02 17:57:44 -0700
2022-11-02 19:20:00 -0700
2022-11-02 20:13:51 -0700
2022-11-03 02:24:39 -0700
2022-11-03 02:31:19 -0700
2022-11-03 02:47:49 -0700
2022-11-03 02:51:34 -0700
swh/lister/puppet/lister.py
83–85

So, assume we run the lister at 2022-11-02 01:00:00 UTC, aka 2022-11-01 18:00:00 -0700. The next run of the lister will query with with_release_since=2022-11-02, which returns all results after 2022-11-02 00:00:00 -0700, according to your last two examples.

So the lister will miss origins updated between 2022-11-01 18:00:00 -0700 and 2022-11-01 24:00:00 -0700

Ensure we query the api with the same timezone Us/Pacific date as the http api use for querying and expressing results

franckbret added inline comments.
swh/lister/puppet/lister.py
83–85

Right, I improved the code to ensure we query the http api with an aware date

Build is green

Patch application report for D8777 (id=31742)

Rebasing onto 60707a45dd...

First, rewinding head to replay your work on top of it...
Applying: Puppet: Lister implements incremental mode
Changes applied before test
commit df1790caa297bc409f147a2a58ac577874e99f63
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Oct 26 10:09:03 2022 +0200

    Puppet: Lister implements incremental mode
    
    Use with_release_since api argument to retrieve modules that have been
    updated since the last date the lister has been executed.
    
    Related T4519

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/840/ for more details.

State storage is not guaranteed to return the same timezone that was written (postgresql returns the local timezone of the server iirc). You should convert it just before querying the API.

Also, hardcoding -7 is not a solution. First, because the API will probably switch from UTC-7 to UTC-8 next Sunday (as the US enter Winter Time). More generally, the API does not guarantee it won't change timezones in the future (especially as the API designers didn't seem to consider timezones at all).

So just to be safe, you should remove 15h, which is the lowest timezone recorded in the tzdb.

franckbret marked an inline comment as done.

Use an offset of -15h when querying the api which is the lower timezone recorded in the tzdb

Build is green

Patch application report for D8777 (id=31746)

Rebasing onto 60707a45dd...

First, rewinding head to replay your work on top of it...
Applying: Puppet: Lister implements incremental mode
Changes applied before test
commit 5dd180652e4cda7c68b21b28bc1653dc6be0fa91
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Oct 26 10:09:03 2022 +0200

    Puppet: Lister implements incremental mode
    
    Use with_release_since api argument to retrieve modules that have been
    updated since the last date the lister has been executed.
    
    Related T4519

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/841/ for more details.

This revision is now accepted and ready to land.Nov 8 2022, 10:55 AM
This revision was landed with ongoing or failed builds.Nov 8 2022, 2:34 PM
This revision was automatically updated to reflect the committed changes.

Build is green

Patch application report for D8777 (id=31805)

Rebasing onto e8699422d7...

Current branch diff-target is up to date.
Changes applied before test
commit e1f3f87c73f4ea25e06bbe3591bd54a9675b1785
Author: Franck Bret <franck.bret@octobus.net>
Date:   Wed Oct 26 10:09:03 2022 +0200

    Puppet: Lister implements incremental mode
    
    Use with_release_since api argument to retrieve modules that have been
    updated since the last date the lister has been executed.
    
    Related T4519

See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/854/ for more details.