Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 7 2022

franckbret added a comment to D8539: Add random directory sampling policy.
In D8539#222941, @olasd wrote:

Fine, i've one comment i'd like others to have a look at though [1] regarding where
that new discovery (interface) code should go. It feels currently a bit off to me that this code
is in loader-core. Loaders are not the sole archive consumers (scanner, webapp, cli, indexer, cooker, ...).

[1] https://forge.softwareheritage.org/D8539?id=30916#inline-60774

@vlorentz @douardda @olasd @anlambert ^

If it's to be used generically, this discovery code should pretty definitely not be in swh.loader.core.

  • The generic discovery algorithm, and base abstract classes/protocols, should probably be in swh.model, as they're tied to that structure;
  • The swh.storage-based discovery mechanism could live in swh.storage.algorithms, and be used by swh.loader.core;
  • The REST API-based discovery mechanism could live in swh.web.client, or stay in swh.scanner.
Dec 7 2022, 4:28 PM

Dec 5 2022

franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).

@anlambert Shouldn't this be replaced by swh auth generate-token?

Dec 5 2022, 10:20 AM

Dec 1 2022

franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).

Some screen capture of the differents scenario when login:

Dec 1 2022, 11:22 AM
franckbret requested review of D8909: Login: Add an option to choose an authentication method (by username/password or token).
Dec 1 2022, 11:08 AM

Nov 21 2022

franckbret closed D8379: Hackage: Loads Hackage Listed origins.
Nov 21 2022, 3:07 PM
franckbret committed rDLDBASEb9bd1287e8ad: Hackage: Loads Hackage Listed origins (authored by franckbret).
Hackage: Loads Hackage Listed origins
Nov 21 2022, 3:07 PM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Simplify some code and rebase

Nov 21 2022, 3:06 PM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

@anlambert hi, can we merge this one?

Nov 21 2022, 8:17 AM

Nov 18 2022

franckbret closed D8663: Hackage: Implement incremental mode.
Nov 18 2022, 2:16 PM
franckbret committed rDLS065b3f81a1e8: Hackage: Implement incremental mode (authored by franckbret).
Hackage: Implement incremental mode
Nov 18 2022, 2:16 PM
franckbret updated the diff for D8663: Hackage: Implement incremental mode.

Rebase

Nov 18 2022, 1:49 PM
franckbret updated the diff for D8663: Hackage: Implement incremental mode.

Check the number of http requests done in incremental tests

Nov 18 2022, 12:19 PM

Nov 16 2022

franckbret added a comment to T4687: Hex.pm lister (Erlang package manager).

@KShivendu Hi, we had a similar situation with rubygems loader, see its _load_directory method

Nov 16 2022, 9:23 AM · Lister

Nov 15 2022

franckbret updated the diff for D8663: Hackage: Implement incremental mode.

Improve test for incremental listing, ensure the http searchQuery/lastUpload value is a is a date

Nov 15 2022, 9:53 AM

Nov 14 2022

franckbret added a comment to D8663: Hackage: Implement incremental mode.

buuuut you are using a strict inequality, so you need to subtract one day, in order not to miss uploads submitted after the previous run of the lister but on the same day.

Also, you should apply .astimezone(tz=timezone.utc) before converting to date, because the database is not guaranteed to return timestamps in UTC even when they were written in UTC.

(Sorry for the back-and-forth; hopefully I'm done now.)

Nov 14 2022, 10:53 AM
franckbret updated the diff for D8663: Hackage: Implement incremental mode.

Use greater than or equal instead of strict comparison when building http api query params for incremental listing

Nov 14 2022, 10:48 AM
franckbret abandoned D8824: Cpan: Implement incremental mode.

Abandon revision because in this case we can not really get advantages of an incremental mode

Nov 14 2022, 10:06 AM
franckbret added a comment to D8824: Cpan: Implement incremental mode.

@franckbret, as explained in my inline comment we cannot use the date filtering on the release index of CPAN elasticsearch.

The only incremental mode we can implement here is to filter the ListedOrigininstances sent to the scheduler according to the
last_updatevalue, if it is greater than the date from the lister state, we can yield it.

Nevertheless, I am not sure if it is worth it as a full listing takes around 10 minutes, which is pretty fast.

Nov 14 2022, 10:04 AM
franckbret closed D8748: Nuget: Implement incremental listing.
Nov 14 2022, 9:33 AM
franckbret committed rDLSea146ce297d5: Nuget: Implement incremental listing (authored by franckbret).
Nuget: Implement incremental listing
Nov 14 2022, 9:33 AM
franckbret updated the diff for D8748: Nuget: Implement incremental listing.

Rebase

Nov 14 2022, 9:33 AM

Nov 9 2022

franckbret requested review of D8824: Cpan: Implement incremental mode.
Nov 9 2022, 3:48 PM

Nov 8 2022

franckbret closed D8777: Puppet: Lister implements incremental mode.
Nov 8 2022, 2:35 PM
franckbret committed rDLSe1f3f87c73f4: Puppet: Lister implements incremental mode (authored by franckbret).
Puppet: Lister implements incremental mode
Nov 8 2022, 2:34 PM
franckbret updated the diff for D8777: Puppet: Lister implements incremental mode.

Rebase

Nov 8 2022, 2:33 PM

Nov 7 2022

franckbret added inline comments to D8748: Nuget: Implement incremental listing.
Nov 7 2022, 8:40 AM

Nov 4 2022

franckbret updated the diff for D8748: Nuget: Implement incremental listing.

minor improvments

Nov 4 2022, 5:21 PM
franckbret updated the diff for D8777: Puppet: Lister implements incremental mode.

Use an offset of -15h when querying the api which is the lower timezone recorded in the tzdb

Nov 4 2022, 11:56 AM
franckbret added inline comments to D8777: Puppet: Lister implements incremental mode.
Nov 4 2022, 10:17 AM
franckbret updated the diff for D8777: Puppet: Lister implements incremental mode.

Ensure we query the api with the same timezone Us/Pacific date as the http api use for querying and expressing results

Nov 4 2022, 10:15 AM

Nov 3 2022

franckbret added inline comments to D8777: Puppet: Lister implements incremental mode.
Nov 3 2022, 11:56 AM
franckbret committed rDLDBASE8e34a6d77996: Rubygems: Improve lister to make use of artifacts and rubygems_metadata (authored by franckbret).
Rubygems: Improve lister to make use of artifacts and rubygems_metadata
Nov 3 2022, 8:54 AM
franckbret closed D8569: Add rubygems loader.
Nov 3 2022, 8:54 AM
franckbret committed rDLDBASE0022bb50abd1: Add rubygems loader (authored by Alphare).
Add rubygems loader
Nov 3 2022, 8:54 AM

Nov 2 2022

franckbret updated the diff for D8569: Add rubygems loader.

Do not json.loads already deserialized json data

Nov 2 2022, 4:24 PM
franckbret added a comment to D8569: Add rubygems loader.

Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments

Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url
Add an EXTID manifest
Add metadata from extrinsic metadata

@anlambert Please note I used 'rubygems_metadata' instead of 'rubygem_metadata' as in the lister. Maybe I'm wrong but I think the lister should rename to rubygems_metadata?

@franckbret, I did not use plural because we are processing a single gem in the loader (with multiple versions but those are metadata for a single gem).
So I do not think we should modify the lister output.

Nov 2 2022, 9:06 AM

Oct 27 2022

franckbret added a comment to D8569: Add rubygems loader.

Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments

Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url
Add an EXTID manifest
Add metadata from extrinsic metadata

Oct 27 2022, 7:05 PM
franckbret updated the diff for D8569: Add rubygems loader.

Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments

Oct 27 2022, 7:02 PM
franckbret added a comment to T4665: [pudbev lister] use alternative packages list endpoint.

Ok, thanks for the feedback, will have a look on this next week, I will be out of office until wednesday.

Oct 27 2022, 4:10 PM · PubDev lister

Oct 26 2022

franckbret requested review of D8777: Puppet: Lister implements incremental mode.
Oct 26 2022, 10:20 AM

Oct 25 2022

franckbret closed D8766: Puppet: Artifacts as lists.
Oct 25 2022, 3:09 PM
franckbret committed rDLDBASEe6847f36162f: Puppet: Artifacts as lists (authored by franckbret).
Puppet: Artifacts as lists
Oct 25 2022, 3:09 PM
franckbret closed D8762: Puppet: Switch artifacts from dict to list.
Oct 25 2022, 2:50 PM
franckbret committed rDLS8355fee25f57: Puppet: Switch artifacts from dict to list (authored by franckbret).
Puppet: Switch artifacts from dict to list
Oct 25 2022, 2:50 PM
franckbret requested review of D8766: Puppet: Artifacts as lists.
Oct 25 2022, 2:50 PM
franckbret updated the diff for D8762: Puppet: Switch artifacts from dict to list.

Rebase

Oct 25 2022, 2:50 PM
franckbret added a comment to D8762: Puppet: Switch artifacts from dict to list.

Looks good to me, I guess you need to update the loader too.

Oct 25 2022, 11:05 AM
franckbret updated the summary of D8762: Puppet: Switch artifacts from dict to list.
Oct 25 2022, 10:42 AM
franckbret requested review of D8762: Puppet: Switch artifacts from dict to list.
Oct 25 2022, 10:16 AM
franckbret added inline comments to D8379: Hackage: Loads Hackage Listed origins.
Oct 25 2022, 8:25 AM

Oct 24 2022

franckbret updated the diff for D8748: Nuget: Implement incremental listing.

Improve documentation section related to incremental listing

Oct 24 2022, 11:48 AM
franckbret added inline comments to D8379: Hackage: Loads Hackage Listed origins.
Oct 24 2022, 9:44 AM
franckbret closed D8566: Conda: Anaconda packages archive loader.
Oct 24 2022, 9:43 AM
franckbret committed rDLDBASEe7ba6316315d: Conda: Anaconda packages archive loader (authored by franckbret).
Conda: Anaconda packages archive loader
Oct 24 2022, 9:43 AM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

fix issue with hexadecimal representation of the checksum md5 hash

Oct 24 2022, 9:41 AM

Oct 21 2022

franckbret updated the summary of D8566: Conda: Anaconda packages archive loader.
Oct 21 2022, 10:25 AM
franckbret updated the diff for D8566: Conda: Anaconda packages archive loader.

Manage case where author or last_update is empty

Oct 21 2022, 10:22 AM

Oct 19 2022

franckbret requested review of D8748: Nuget: Implement incremental listing.
Oct 19 2022, 4:21 PM
franckbret updated the diff for D8566: Conda: Anaconda packages archive loader.

Remove description from message

Oct 19 2022, 11:07 AM
franckbret added a comment to D8566: Conda: Anaconda packages archive loader.

@anlambert Can we merge this one, do you have more feedback?

Oct 19 2022, 10:51 AM
franckbret added a comment to D8171: crates: Loader implements incremental mode.

@vlorentz Can we merge this one?

Oct 19 2022, 10:46 AM
franckbret added a comment to D8566: Conda: Anaconda packages archive loader.

@anlambert Can we merge this one, do you have more feedback?

Oct 19 2022, 10:44 AM
franckbret added inline comments to D8379: Hackage: Loads Hackage Listed origins.
Oct 19 2022, 10:36 AM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

Get md5 checksums from HEAD request on archive url

Oct 19 2022, 10:29 AM

Oct 13 2022

franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Remove original-artifacts-json from raw extrinsic metadata, it should already be created by the base package loader

Oct 13 2022, 4:41 PM
franckbret abandoned D8665: Pubdev: Add raw_extrinsic_metadata.

Pub.dev have no other metadata we do not already parse.

Oct 13 2022, 3:49 PM
franckbret added inline comments to D8665: Pubdev: Add raw_extrinsic_metadata.
Oct 13 2022, 12:23 PM
franckbret added a comment to D8379: Hackage: Loads Hackage Listed origins.

@vlorentz @anlambert We can get archive checksums by making an HEAD call to archive download url :

Oct 13 2022, 9:21 AM
franckbret added inline comments to D8379: Hackage: Loads Hackage Listed origins.
Oct 13 2022, 9:04 AM
franckbret updated the diff for D8379: Hackage: Loads Hackage Listed origins.

loader specification documentation, update Hackage section

Oct 13 2022, 9:02 AM
franckbret added inline comments to D8616: cpan: Align loader implementation with latest lister improvements.
Oct 13 2022, 8:43 AM

Oct 12 2022

franckbret added a revision to T4597: Create a Hackage Lister: D8663: Hackage: Implement incremental mode.
Oct 12 2022, 4:39 PM · Hackage lister
franckbret updated the summary of D8663: Hackage: Implement incremental mode.
Oct 12 2022, 4:39 PM
franckbret updated the summary of D8665: Pubdev: Add raw_extrinsic_metadata.
Oct 12 2022, 4:36 PM
franckbret updated the diff for D8665: Pubdev: Add raw_extrinsic_metadata.

Remove 'pubdev-pubspec-json' format from raw_extrinsic_metadata

Oct 12 2022, 4:31 PM
franckbret updated the diff for D8663: Hackage: Implement incremental mode.

Incremental operations are now related to the last_listing_date

Oct 12 2022, 4:11 PM
franckbret added inline comments to D8665: Pubdev: Add raw_extrinsic_metadata.
Oct 12 2022, 2:44 PM
franckbret added a comment to D8663: Hackage: Implement incremental mode.
for entry in page:
    last_update = iso8601.parse_date(entry["lastUpload"])
    if not self.earliest_update or last_update > self.earliest_update:
        self.earliest_update = last_update

This makes self.earliest_update the latest lastUpload, not the earliest.

Oct 12 2022, 12:56 PM
franckbret requested review of D8665: Pubdev: Add raw_extrinsic_metadata.
Oct 12 2022, 12:46 PM
franckbret requested review of D8663: Hackage: Implement incremental mode.
Oct 12 2022, 10:19 AM

Oct 11 2022

franckbret closed D8640: Pubdev: Do not rely on intrinsic metadata.
Oct 11 2022, 12:59 PM
franckbret committed rDLDBASE4cb85e153e2e: Pubdev: Do not rely on intrinsic metadata (authored by franckbret).
Pubdev: Do not rely on intrinsic metadata
Oct 11 2022, 12:59 PM
franckbret updated the diff for D8640: Pubdev: Do not rely on intrinsic metadata.

rebase

Oct 11 2022, 12:58 PM
franckbret updated the diff for D8640: Pubdev: Do not rely on intrinsic metadata.

rebase

Oct 11 2022, 12:20 PM
franckbret updated the diff for D8640: Pubdev: Do not rely on intrinsic metadata.

rebase

Oct 11 2022, 10:57 AM
franckbret added inline comments to D8640: Pubdev: Do not rely on intrinsic metadata.
Oct 11 2022, 10:56 AM
franckbret added a comment to D8640: Pubdev: Do not rely on intrinsic metadata.

Remove useless check condition when getttinh 'atuhor' data

You checked there is no empty string in the input data, right?

Oct 11 2022, 9:58 AM
franckbret added inline comments to D8640: Pubdev: Do not rely on intrinsic metadata.
Oct 11 2022, 9:58 AM
franckbret updated the diff for D8640: Pubdev: Do not rely on intrinsic metadata.

Remove useless check condition when getttinh 'atuhor' data

Oct 11 2022, 7:23 AM

Oct 10 2022

franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Manage checksums

Oct 10 2022, 9:16 AM

Oct 7 2022

franckbret added inline comments to D8640: Pubdev: Do not rely on intrinsic metadata.
Oct 7 2022, 4:31 PM
franckbret updated the diff for D8640: Pubdev: Do not rely on intrinsic metadata.

Remove files from commit

Oct 7 2022, 12:29 PM
franckbret requested review of D8640: Pubdev: Do not rely on intrinsic metadata.
Oct 7 2022, 12:04 PM
franckbret updated the diff for D8171: crates: Loader implements incremental mode.

CI rebuild

Oct 7 2022, 8:49 AM
franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Adapt raw extrinsic metadata tests for CI

Oct 7 2022, 8:29 AM
franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Add pytest verbose option to tox (previous failed)

Oct 7 2022, 7:48 AM
franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Temporaly add verbose options to pytest to understand why CI fail

Oct 7 2022, 7:32 AM

Oct 6 2022

franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Ensure last_update is utc

Oct 6 2022, 6:50 PM
franckbret updated the diff for D8171: crates: Loader implements incremental mode.

Add "crates-package-json" raw extrinsic metadata

Oct 6 2022, 4:12 PM
franckbret added a comment to D8171: crates: Loader implements incremental mode.

@vlorentz @anlambert Last commit introduce raw extrinsic metatata with format="original-artifacts-json". It is populated with data from extra loader arguments "artifacts".

Oct 6 2022, 12:55 PM