In D8539#222941, @olasd wrote:In D8539#222800, @ardumont wrote:Fine, i've one comment i'd like others to have a look at though [1] regarding where
that new discovery (interface) code should go. It feels currently a bit off to me that this code
is in loader-core. Loaders are not the sole archive consumers (scanner, webapp, cli, indexer, cooker, ...).[1] https://forge.softwareheritage.org/D8539?id=30916#inline-60774
If it's to be used generically, this discovery code should pretty definitely not be in swh.loader.core.
- The generic discovery algorithm, and base abstract classes/protocols, should probably be in swh.model, as they're tied to that structure;
- The swh.storage-based discovery mechanism could live in swh.storage.algorithms, and be used by swh.loader.core;
- The REST API-based discovery mechanism could live in swh.web.client, or stay in swh.scanner.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Dec 7 2022
Dec 7 2022
Dec 5 2022
Dec 5 2022
franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).
In D8909#231631, @vlorentz wrote:@anlambert Shouldn't this be replaced by swh auth generate-token?
Dec 1 2022
Dec 1 2022
franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).
Some screen capture of the differents scenario when login:
Nov 21 2022
Nov 21 2022
franckbret committed rDLDBASEb9bd1287e8ad: Hackage: Loads Hackage Listed origins (authored by franckbret).
Hackage: Loads Hackage Listed origins
Simplify some code and rebase
@anlambert hi, can we merge this one?
Nov 18 2022
Nov 18 2022
franckbret committed rDLS065b3f81a1e8: Hackage: Implement incremental mode (authored by franckbret).
Hackage: Implement incremental mode
Rebase
Check the number of http requests done in incremental tests
Nov 16 2022
Nov 16 2022
@KShivendu Hi, we had a similar situation with rubygems loader, see its _load_directory method
Nov 15 2022
Nov 15 2022
Improve test for incremental listing, ensure the http searchQuery/lastUpload value is a is a date
Nov 14 2022
Nov 14 2022
In D8663#229574, @vlorentz wrote:buuuut you are using a strict inequality, so you need to subtract one day, in order not to miss uploads submitted after the previous run of the lister but on the same day.
Also, you should apply .astimezone(tz=timezone.utc) before converting to date, because the database is not guaranteed to return timestamps in UTC even when they were written in UTC.
(Sorry for the back-and-forth; hopefully I'm done now.)
Use greater than or equal instead of strict comparison when building http api query params for incremental listing
Abandon revision because in this case we can not really get advantages of an incremental mode
In D8824#229544, @anlambert wrote:@franckbret, as explained in my inline comment we cannot use the date filtering on the release index of CPAN elasticsearch.
The only incremental mode we can implement here is to filter the ListedOrigininstances sent to the scheduler according to the
last_updatevalue, if it is greater than the date from the lister state, we can yield it.Nevertheless, I am not sure if it is worth it as a full listing takes around 10 minutes, which is pretty fast.
franckbret committed rDLSea146ce297d5: Nuget: Implement incremental listing (authored by franckbret).
Nuget: Implement incremental listing
Rebase
Nov 9 2022
Nov 9 2022
Nov 8 2022
Nov 8 2022
franckbret committed rDLSe1f3f87c73f4: Puppet: Lister implements incremental mode (authored by franckbret).
Puppet: Lister implements incremental mode
Rebase
Nov 7 2022
Nov 7 2022
Nov 4 2022
Nov 4 2022
minor improvments
Use an offset of -15h when querying the api which is the lower timezone recorded in the tzdb
Ensure we query the api with the same timezone Us/Pacific date as the http api use for querying and expressing results
Nov 3 2022
Nov 3 2022
franckbret committed rDLDBASE8e34a6d77996: Rubygems: Improve lister to make use of artifacts and rubygems_metadata (authored by franckbret).
Rubygems: Improve lister to make use of artifacts and rubygems_metadata
Add rubygems loader
Nov 2 2022
Nov 2 2022
Do not json.loads already deserialized json data
In D8569#228509, @anlambert wrote:In D8569#228493, @franckbret wrote:In D8569#228487, @franckbret wrote:Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments
Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url
Add an EXTID manifest
Add metadata from extrinsic metadata@anlambert Please note I used 'rubygems_metadata' instead of 'rubygem_metadata' as in the lister. Maybe I'm wrong but I think the lister should rename to rubygems_metadata?
@franckbret, I did not use plural because we are processing a single gem in the loader (with multiple versions but those are metadata for a single gem).
So I do not think we should modify the lister output.
Oct 27 2022
Oct 27 2022
In D8569#228487, @franckbret wrote:Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments
Use artifacts and rubygems_metadata to get list of versions, artifacts checksums and extrinsic metadata url
Add an EXTID manifest
Add metadata from extrinsic metadata
Rubygems: Improve loader to make use of artifacts and rubygems_metadata provided by the lister extra_loader_arguments
Ok, thanks for the feedback, will have a look on this next week, I will be out of office until wednesday.
Oct 26 2022
Oct 26 2022
Oct 25 2022
Oct 25 2022
Puppet: Artifacts as lists
franckbret committed rDLS8355fee25f57: Puppet: Switch artifacts from dict to list (authored by franckbret).
Puppet: Switch artifacts from dict to list
Rebase
In D8762#227816, @anlambert wrote:Looks good to me, I guess you need to update the loader too.
Oct 24 2022
Oct 24 2022
Improve documentation section related to incremental listing
franckbret committed rDLDBASEe7ba6316315d: Conda: Anaconda packages archive loader (authored by franckbret).
Conda: Anaconda packages archive loader
fix issue with hexadecimal representation of the checksum md5 hash
Oct 21 2022
Oct 21 2022
Manage case where author or last_update is empty
Oct 19 2022
Oct 19 2022
Remove description from message
In D8566#227169, @franckbret wrote:@anlambert Can we merge this one, do you have more feedback?
@vlorentz Can we merge this one?
@anlambert Can we merge this one, do you have more feedback?
Get md5 checksums from HEAD request on archive url
Oct 13 2022
Oct 13 2022
Remove original-artifacts-json from raw extrinsic metadata, it should already be created by the base package loader
Pub.dev have no other metadata we do not already parse.
@vlorentz @anlambert We can get archive checksums by making an HEAD call to archive download url :
loader specification documentation, update Hackage section
franckbret added inline comments to D8616: cpan: Align loader implementation with latest lister improvements.
Oct 12 2022
Oct 12 2022
franckbret added a revision to T4597: Create a Hackage Lister: D8663: Hackage: Implement incremental mode.
Remove 'pubdev-pubspec-json' format from raw_extrinsic_metadata
Incremental operations are now related to the last_listing_date
In D8663#225586, @vlorentz wrote:for entry in page: last_update = iso8601.parse_date(entry["lastUpload"]) if not self.earliest_update or last_update > self.earliest_update: self.earliest_update = last_updateThis makes self.earliest_update the latest lastUpload, not the earliest.
Oct 11 2022
Oct 11 2022
franckbret committed rDLDBASE4cb85e153e2e: Pubdev: Do not rely on intrinsic metadata (authored by franckbret).
Pubdev: Do not rely on intrinsic metadata
rebase
rebase
rebase
In D8640#225229, @vlorentz wrote:In D8640#225200, @franckbret wrote:Remove useless check condition when getttinh 'atuhor' data
You checked there is no empty string in the input data, right?
Remove useless check condition when getttinh 'atuhor' data
Oct 10 2022
Oct 10 2022
Manage checksums
Oct 7 2022
Oct 7 2022
Remove files from commit
CI rebuild
Adapt raw extrinsic metadata tests for CI
Add pytest verbose option to tox (previous failed)
Temporaly add verbose options to pytest to understand why CI fail
Oct 6 2022
Oct 6 2022
Ensure last_update is utc
Add "crates-package-json" raw extrinsic metadata
@vlorentz @anlambert Last commit introduce raw extrinsic metatata with format="original-artifacts-json". It is populated with data from extra loader arguments "artifacts".