- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Nov 15 2022
not really a nice catch as it wasn't a very useful optimization before D8843, which I only noticed when the useless query caused issues ;)
In D8386#229890, @anlambert wrote:In D8386#229882, @olasd wrote:In D8386#229677, @KShivendu wrote:I noticed that https://archive.softwareheritage.org/browse/origin/directory/?origin_url=deb://Ubuntu/packages/nginx has duplicate branch names, which is very confusing. In fact, even the default branch is repeated twice and I see two check marks. If we use branch names like 0.3.9-15.fc26, won't the same happen with Fedora listers? It doesn't seem to differentiate between the editions. (or does it?)
This seems like a misfeature in the webapp:
https://archive.softwareheritage.org/api/1/snapshot/158a3f36b0bd3da461fb7458de44cfa2c94e4270/
The snapshot has multiple branches, with the same version suffix, pointing at the same objects (because the exact same version of the package is present in multiple Ubuntu suites).
I'm not 100% sure how we should be fixing that, but that bug shouldn't prevent you from giving the fedora snapshots the "semantically correct" structure.
I also noticed that yesterday evening and I was also wondering what is the best way to fix that. I see two possible options:
- We change the names of the keys in snapshot branches dictionary in order to use the intrinsic version of a debian package instead of its extrinsic one (meaning releases/bionic-security/main/1.14.0-0ubuntu1.10 should rather be releases/1.14.0-0ubuntu1.10)
- We update the webapp to filter duplicated releases before display as the release name is used instead of the snapshot branches key associated to the release
I would rather go for the second one as keeping the debian/ubuntu suites and components in keys of snapshot branches dictionary seems of interest.
We could do the same for the fedora case as based on my tests it is quite common that extrinsic versions in the form [0-9].[0-9].[0-9]-[0-9].fc[0-9]+
target the same intrinsic version [0-9].[0-9].[0-9]-[0-9].
Build is green
make the right parameter significant
Looks good to me, thanks !
In D8386#229882, @olasd wrote:In D8386#229677, @KShivendu wrote:I noticed that https://archive.softwareheritage.org/browse/origin/directory/?origin_url=deb://Ubuntu/packages/nginx has duplicate branch names, which is very confusing. In fact, even the default branch is repeated twice and I see two check marks. If we use branch names like 0.3.9-15.fc26, won't the same happen with Fedora listers? It doesn't seem to differentiate between the editions. (or does it?)
This seems like a misfeature in the webapp:
https://archive.softwareheritage.org/api/1/snapshot/158a3f36b0bd3da461fb7458de44cfa2c94e4270/
The snapshot has multiple branches, with the same version suffix, pointing at the same objects (because the exact same version of the package is present in multiple Ubuntu suites).
I'm not 100% sure how we should be fixing that, but that bug shouldn't prevent you from giving the fedora snapshots the "semantically correct" structure.
In D8386#229677, @KShivendu wrote:I noticed that https://archive.softwareheritage.org/browse/origin/directory/?origin_url=deb://Ubuntu/packages/nginx has duplicate branch names, which is very confusing. In fact, even the default branch is repeated twice and I see two check marks. If we use branch names like 0.3.9-15.fc26, won't the same happen with Fedora listers? It doesn't seem to differentiate between the editions. (or does it?)
Build is green
Improve test for incremental listing, ensure the http searchQuery/lastUpload value is a is a date
Build is green
Minor fixes in the loader docstrings
Build is green
- Add tests for handling of HTTP errors and sha1 checksum (increase test coverage)
Build is green
- Extract .tar.gz as a seperate branch (and other suggestions made by @anlambert)
- Remove .tar.gz extraction logic from extract_rpm_package function. Previously, I was just replacing .tar.gz with its extracted folder but now we are creating a separate branch as well.
- Updating relevant tests for the same
Nov 14 2022
Fix the issue by adding a level of indirection in the yaml (replacing the job
by an identical job-template, and instantiating it through a project).
It seems jinja2 templates aren't actually supported in direct job definitions,
only in job templates. Thanks to olasd for finding this out and suggesting a fix.
Build is green
Fix mistyped signature
Build has FAILED
Add coverage (which is a bit convoluted but we are in loader-core so no real loader to
check that actual behavior beyond what i propose).
Build is green
Only, more_data_to_fetch/create_snapshot is renamed create_partial_visit though as
that makes more sense now.
- Rebase
- reword commit and diff description
- adapt parameter according to review suggestion from @vlorentz
the replication/05-earliest-revision.sh script in the replication package mentions the swh-graph version it uses, and the fully qualified class name, so it can be found in the swh-graph code.
merged in abbcf03b7bb2f1425db154dbe6e43e10c647354c
One last thing: could you make tests check the request body is as expected? See https://requests-mock.readthedocs.io/en/latest/history.html
Thanks!
swh-web uses swh-search as a glorified postgresql index: for every result returned by swh-search, it pulls the corresponding row from origin_intrinsic_metadata in the indexer database; which means it ignores extrinsic metadata.
Build is green
In D8663#229574, @vlorentz wrote:buuuut you are using a strict inequality, so you need to subtract one day, in order not to miss uploads submitted after the previous run of the lister but on the same day.
Also, you should apply .astimezone(tz=timezone.utc) before converting to date, because the database is not guaranteed to return timestamps in UTC even when they were written in UTC.
(Sorry for the back-and-forth; hopefully I'm done now.)
Use greater than or equal instead of strict comparison when building http api query params for incremental listing
Abandon revision because in this case we can not really get advantages of an incremental mode
In D8824#229544, @anlambert wrote:@franckbret, as explained in my inline comment we cannot use the date filtering on the release index of CPAN elasticsearch.
The only incremental mode we can implement here is to filter the ListedOrigininstances sent to the scheduler according to the
last_updatevalue, if it is greater than the date from the lister state, we can yield it.Nevertheless, I am not sure if it is worth it as a full listing takes around 10 minutes, which is pretty fast.
Build is green
Rebase