Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 5 2022

olasd committed rDLSb815737054ac: Add built-in page and origin count limit to listers (authored by olasd).
Add built-in page and origin count limit to listers
Dec 5 2022, 3:25 PM
olasd closed D8914: Ignore psqlrc when loading the rubygems database dump.
Dec 5 2022, 2:56 PM
olasd committed rDLSa66e24bfa2af: Ignore psqlrc when loading the rubygems database dump (authored by olasd).
Ignore psqlrc when loading the rubygems database dump
Dec 5 2022, 2:56 PM
olasd updated the diff for D8916: Add a flag to not enable origins listed by a lister.

Rebase

Dec 5 2022, 2:55 PM
olasd updated the diff for D8915: Add built-in page and origin count limit to listers.

Add explicit test for default behavior

Dec 5 2022, 2:55 PM
olasd added inline comments to D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 2:51 PM
olasd added a comment to D8914: Ignore psqlrc when loading the rubygems database dump.

Thanks, I will try to update the swh/lister/rubygems/data/small_rubygems_dump.sh script to generate a postgres dump that can be loaded when ERROR_ON_STOP is set.

Dec 5 2022, 2:35 PM
olasd requested review of D8916: Add a flag to not enable origins listed by a lister.
Dec 5 2022, 2:27 PM
olasd requested review of D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 2:27 PM
olasd requested review of D8914: Ignore psqlrc when loading the rubygems database dump.
Dec 5 2022, 1:58 PM

Nov 30 2022

olasd added a comment to P1534 Command-Line Input.
\copy (select name, instance_name, visit_type, origins_known, origins_enabled, origins_never_visited, origins_known - origins_never_visited as origins_visited, origins_with_pending_changes from scheduler_metrics inner join listers on id=lister_id order by 1,2) to stdout with (format csv, header);
Nov 30 2022, 3:06 PM
olasd edited P1534 Command-Line Input.
Nov 30 2022, 3:05 PM
olasd created P1534 Command-Line Input.
Nov 30 2022, 3:03 PM
olasd created P1533 Command-Line Input.
Nov 30 2022, 3:01 PM
olasd accepted D8904: metadata_dictionary: Fix 'Invalid IPv6 URL' crash.

...

Nov 30 2022, 10:39 AM
olasd accepted D8905: Fix crash when indexing two REMD objects from the same deposit.

Thanks!

Nov 30 2022, 10:38 AM

Nov 29 2022

olasd committed rDCIDXd4d4c59cc3bb: Fix ordering and idempotence in the 136 -> 137 upgrade script (authored by olasd).
Fix ordering and idempotence in the 136 -> 137 upgrade script
Nov 29 2022, 5:11 PM
olasd committed rDPROVc626cc21b3ea: Add an indexed `revision_date` column on the `content_in_revision` relation (authored by olasd).
Add an indexed `revision_date` column on the `content_in_revision` relation
Nov 29 2022, 4:28 PM
olasd added inline comments to D8888: Drop content_language and content_ctags tables and related SQL functions.
Nov 29 2022, 3:28 PM
olasd added inline comments to D8890: metrics: Add gauge counting save origin requests created from webhooks.
Nov 29 2022, 3:17 PM
olasd added inline comments to D8889: save_code_now: Add from_webhook field to SaveOriginRequest model.
Nov 29 2022, 3:11 PM
olasd committed rDPROVc46d896cd651: Drop denormalized db flavor (authored by olasd).
Drop denormalized db flavor
Nov 29 2022, 11:02 AM
olasd committed rDPROV9b0dca6f5a6f: Make direct archive DSN a string instead of a dict (authored by olasd).
Make direct archive DSN a string instead of a dict
Nov 29 2022, 11:02 AM
olasd committed rDPROV5cfe3d3cb67d: Make postgresql storage DSN a string instead of a dict (authored by olasd).
Make postgresql storage DSN a string instead of a dict
Nov 29 2022, 11:02 AM

Nov 28 2022

olasd triaged T4707: provenance: add (revision) dates to the relation tables as High priority.
Nov 28 2022, 4:56 PM · Provenance database
olasd created T4707: provenance: add (revision) dates to the relation tables.
Nov 28 2022, 4:55 PM · Provenance database
olasd triaged T4706: provenance: consider running the origin->revision provenance algorithm on only "releases" as Normal priority.
Nov 28 2022, 4:46 PM · Provenance database
olasd added a comment to T4701: GitHub mirror of swh-graph is not updated.

(I've at least pushed the main branch and relevant tags)

Nov 28 2022, 2:00 PM · System administration, Compressed graph service
olasd added a comment to T4701: GitHub mirror of swh-graph is not updated.

As I've suspected, this is due to the pristine-tar branch having objects that exceed 100MB:

Nov 28 2022, 1:58 PM · System administration, Compressed graph service
olasd accepted D8873: storage: Insert from temporary tables in consistent order.

As mentioned by @ardumont, please bump the schema version and add a migration script.

Nov 28 2022, 10:45 AM

Nov 23 2022

olasd committed rDPROV27cc3d08a3e8: kafka: bundle relation messages into a single write_additions (authored by olasd).
kafka: bundle relation messages into a single write_additions
Nov 23 2022, 2:17 PM
olasd committed rDPROVdf9678a55a03: test_cli: fix synthetic kafka messages for replay test (authored by olasd).
test_cli: fix synthetic kafka messages for replay test
Nov 23 2022, 2:17 PM
olasd committed rDPROV1da2e324ba50: Storage config checking in the rabbitmq storage backend isn't needed (authored by olasd).
Storage config checking in the rabbitmq storage backend isn't needed
Nov 23 2022, 10:28 AM
olasd committed rDPROVf1608599b50d: Lower logging level in the rabbitmq server backend for messages (authored by olasd).
Lower logging level in the rabbitmq server backend for messages
Nov 23 2022, 10:28 AM

Nov 22 2022

olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

To force kafka compaction to run I've done the following:

Nov 22 2022, 5:19 PM · Archive integrity, Object storage, Data Model

Nov 21 2022

olasd accepted D8856: deployment/upgrade-swh-services: Update with elastic deployment part.

LGTM too, thanks!

Nov 21 2022, 6:18 PM

Nov 15 2022

olasd accepted D8838: Use a volatile resource lock for host port 5080.

Thank you!

Nov 15 2022, 2:56 PM
olasd added a comment to D8386: feat(fedora): Introduce fedora lister.
In D8386#229882, @olasd wrote:

@anlambert

I noticed that https://archive.softwareheritage.org/browse/origin/directory/?origin_url=deb://Ubuntu/packages/nginx has duplicate branch names, which is very confusing. In fact, even the default branch is repeated twice and I see two check marks. If we use branch names like 0.3.9-15.fc26, won't the same happen with Fedora listers? It doesn't seem to differentiate between the editions. (or does it?)

This seems like a misfeature in the webapp:

https://archive.softwareheritage.org/api/1/snapshot/158a3f36b0bd3da461fb7458de44cfa2c94e4270/

The snapshot has multiple branches, with the same version suffix, pointing at the same objects (because the exact same version of the package is present in multiple Ubuntu suites).

I'm not 100% sure how we should be fixing that, but that bug shouldn't prevent you from giving the fedora snapshots the "semantically correct" structure.

I also noticed that yesterday evening and I was also wondering what is the best way to fix that. I see two possible options:

  1. We change the names of the keys in snapshot branches dictionary in order to use the intrinsic version of a debian package instead of its extrinsic one (meaning releases/bionic-security/main/1.14.0-0ubuntu1.10 should rather be releases/1.14.0-0ubuntu1.10)
  2. We update the webapp to filter duplicated releases before display as the release name is used instead of the snapshot branches key associated to the release

I would rather go for the second one as keeping the debian/ubuntu suites and components in keys of snapshot branches dictionary seems of interest.
We could do the same for the fedora case as based on my tests it is quite common that extrinsic versions in the form [0-9].[0-9].[0-9]-[0-9].fc[0-9]+
target the same intrinsic version [0-9].[0-9].[0-9]-[0-9].

Nov 15 2022, 11:02 AM
olasd added a comment to D8386: feat(fedora): Introduce fedora lister.

@anlambert

I noticed that https://archive.softwareheritage.org/browse/origin/directory/?origin_url=deb://Ubuntu/packages/nginx has duplicate branch names, which is very confusing. In fact, even the default branch is repeated twice and I see two check marks. If we use branch names like 0.3.9-15.fc26, won't the same happen with Fedora listers? It doesn't seem to differentiate between the editions. (or does it?)

Nov 15 2022, 10:35 AM

Nov 14 2022

olasd triaged T4680: GitLab lister: allow ignoring origins contained in given namespace prefixes as Normal priority.
Nov 14 2022, 4:42 PM · Easy hack, Lister
olasd created P1528 Command-Line Input.
Nov 14 2022, 4:36 PM
olasd added a project to T4681: Add throttling/backoff to origin visit scheduler respawn logic: Easy hack.
Nov 14 2022, 1:49 PM · Easy hack, Scheduling utilities
olasd triaged T4681: Add throttling/backoff to origin visit scheduler respawn logic as Normal priority.
Nov 14 2022, 1:49 PM · Easy hack, Scheduling utilities
olasd placed T4681: Add throttling/backoff to origin visit scheduler respawn logic up for grabs.
Nov 14 2022, 1:49 PM · Easy hack, Scheduling utilities
olasd accepted D8835: changelog: document recent git loader speed improvements.

Thanks!

Nov 14 2022, 1:34 PM
olasd added a revision to T4657: Allow object removal from journal: D8833: Add base functionality to support object deletion.
Nov 14 2022, 11:12 AM · Journal
olasd renamed T4680: GitLab lister: allow ignoring origins contained in given namespace prefixes from GitLab lister: allow ignoring origins contained in a given namespace to GitLab lister: allow ignoring origins contained in given namespace prefixes.
Nov 14 2022, 11:04 AM · Easy hack, Lister
olasd created T4680: GitLab lister: allow ignoring origins contained in given namespace prefixes.
Nov 14 2022, 11:03 AM · Easy hack, Lister

Nov 10 2022

olasd accepted D8830: Stop scheduling mirror tests while building Docker images.
Nov 10 2022, 3:25 PM
olasd accepted D8825: Configure gitlab projects declaratively.

I'm not too sure about having a single "big" object as json output, as dumping that full json structure will only work out if the process didn't crash in the middle (and when that does, you won't really know what had happened until then). Maybe output one json object per project mutated instead?

Nov 10 2022, 3:03 PM
olasd added a comment to D8825: Configure gitlab projects declaratively.

Looks great, thanks.

Nov 10 2022, 12:00 PM
olasd added inline comments to D8825: Configure gitlab projects declaratively.
Nov 10 2022, 11:03 AM
olasd added a comment to D8825: Configure gitlab projects declaratively.

Looks nice!

Nov 10 2022, 10:50 AM

Nov 7 2022

olasd removed a member for Developers: aeviso.
Nov 7 2022, 2:37 PM
olasd removed a member for Developers: amadouth6.
Nov 7 2022, 2:37 PM
olasd added a member for Developers: lunar.
Nov 7 2022, 2:37 PM
olasd accepted D8671: Add a job running swh-mirror tests.
Nov 7 2022, 1:41 PM

Nov 6 2022

olasd added a comment to T4671: Experiment with an inline storage for snapshot branches in postgresql.

Experiments in terms of space:

Nov 6 2022, 6:49 PM · Storage manager
olasd triaged T4671: Experiment with an inline storage for snapshot branches in postgresql as Normal priority.
Nov 6 2022, 6:47 PM · Storage manager
olasd added a comment to T4657: Allow object removal from journal.

https://gitlab.softwareheritage.org/infra/sysadm-environment/-/issues/4658#note_12824 is the process I've followed for a (lower stakes) removal of buggy raw_extrinsic_metadata messages from swh.journal (in staging).

Nov 6 2022, 3:37 PM · Journal

Nov 4 2022

olasd added a comment to T4219: Investigate why GitHub fork detection did not bring a speed-up.

swh.loader.git 2.1.0 has now been deployed on all workers.

Nov 4 2022, 9:25 PM · Origin-GitHub, Git loader
olasd committed rDLDGe7988153e2ae: dumb loader: also filter the symbolic refs (authored by olasd).
dumb loader: also filter the symbolic refs
Nov 4 2022, 8:38 PM
olasd committed rDLDG92d9ada9b739: Implement discovering branch targets from the archive (authored by olasd).
Implement discovering branch targets from the archive
Nov 4 2022, 8:38 PM
olasd closed D8817: Implement discovering branch targets from the archive.
Nov 4 2022, 8:38 PM
olasd committed rDLDGc2ed09e0e86c: Make utils.filter_refs accept {bytes: bytes} and {bytes: HexBytes} (authored by olasd).
Make utils.filter_refs accept {bytes: bytes} and {bytes: HexBytes}
Nov 4 2022, 8:38 PM
olasd added inline comments to D8817: Implement discovering branch targets from the archive.
Nov 4 2022, 8:07 PM
olasd added inline comments to D8817: Implement discovering branch targets from the archive.
Nov 4 2022, 8:04 PM
olasd added a comment to D8817: Implement discovering branch targets from the archive.

I've tested this on all the origins that triggered https://sentry.softwareheritage.org/share/issue/d04e5d7050cb49c6a080b0552fdee5ef/ (with D7838 applied to test on the real storage) and they load fine now.

Nov 4 2022, 7:44 PM
olasd updated the diff for D8817: Implement discovering branch targets from the archive.

More consistent commit message

Nov 4 2022, 7:39 PM
olasd requested review of D8817: Implement discovering branch targets from the archive.
Nov 4 2022, 7:36 PM
olasd updated subscribers of D8759: model: Add payload to ExtID class.

Maybe worth @douardda or @marmoute having a look at this, as I think there were more usecases than just disarchive for this?

Nov 4 2022, 3:56 PM
olasd accepted D8759: model: Add payload to ExtID class.
Nov 4 2022, 3:55 PM
olasd accepted D8810: cli: Increase open file descriptor limit to support 256 open LevelDBs.

Thanks; I have one comment inline that I think should be addressed!

Nov 4 2022, 3:52 PM
olasd added inline comments to D8760: Add payloads to ExtIDs.
Nov 4 2022, 3:30 PM
olasd added a comment to D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.

LGTM, I guess we should deploy this as soon as possible, right ?

Nov 4 2022, 1:32 PM
olasd closed D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 4 2022, 1:30 PM
olasd committed rDLDG3cf7582aa74d: Eagerly populate the set of local heads in RepoRepresentation.__init__ (authored by olasd).
Eagerly populate the set of local heads in RepoRepresentation.__init__
Nov 4 2022, 1:30 PM
olasd closed D8806: Dump all known and remote heads when debugging.
Nov 4 2022, 1:17 PM
olasd committed rDLDG35ecf1843696: Dump all known and remote heads when debugging (authored by olasd).
Dump all known and remote heads when debugging
Nov 4 2022, 1:17 PM
olasd updated the summary of D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 4 2022, 1:05 PM
olasd retitled D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__ from Implement the dulwich graph walker interface directly in RepoRepresentation to Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 4 2022, 1:05 PM
olasd added a comment to D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.

LOL, thanks Jenkins

Nov 4 2022, 12:39 PM
olasd updated the diff for D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.

Apply @anlambert's suggested changes

Nov 4 2022, 12:35 PM
olasd added inline comments to D8806: Dump all known and remote heads when debugging.
Nov 4 2022, 12:25 PM
olasd accepted D8809: Document the gitlab installation and procedures.
Nov 4 2022, 11:53 AM

Nov 3 2022

olasd closed D8803: journal writer: only flush kafka once per batch.
Nov 3 2022, 7:16 PM
olasd committed rDCIDXb7f04dd9d40d: journal writer: only flush kafka once per batch (authored by olasd).
journal writer: only flush kafka once per batch
Nov 3 2022, 7:16 PM
olasd created P1521 Command-Line Input.
Nov 3 2022, 5:47 PM
olasd requested review of D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 3 2022, 5:30 PM
olasd added a revision to T4219: Investigate why GitHub fork detection did not bring a speed-up: D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 3 2022, 5:28 PM · Origin-GitHub, Git loader
olasd requested review of D8806: Dump all known and remote heads when debugging.
Nov 3 2022, 4:32 PM
olasd committed R263:bbdee56508d9: Search for gitlab namespaces even if they're not a mapped group (authored by olasd).
Search for gitlab namespaces even if they're not a mapped group
Nov 3 2022, 11:50 AM
olasd requested review of D8803: journal writer: only flush kafka once per batch.
Nov 3 2022, 9:55 AM

Nov 2 2022

olasd added a reverting change for D8670: Make relation_add sql function prefill entity tables if needed: rDPROV49eb0290d4e8: Partially revert "Make relation_add sql function prefill entity tables if….
Nov 2 2022, 6:32 PM
olasd added a reverting change for rDPROVe1da37d4375f: Make relation_add sql function prefill entity tables if needed: rDPROV49eb0290d4e8: Partially revert "Make relation_add sql function prefill entity tables if….
Nov 2 2022, 6:32 PM
olasd committed rDPROV49eb0290d4e8: Partially revert "Make relation_add sql function prefill entity tables if… (authored by olasd).
Partially revert "Make relation_add sql function prefill entity tables if…
Nov 2 2022, 6:32 PM
olasd committed rDPROV2f7075fc43e2: rabbitmq client: don't dump all responses as debug logs (authored by olasd).
rabbitmq client: don't dump all responses as debug logs
Nov 2 2022, 6:20 PM
olasd committed rDPROVab126368d6a6: Add migration script for the updated relation_add SQL function (authored by olasd).
Add migration script for the updated relation_add SQL function
Nov 2 2022, 6:06 PM
olasd added inline comments to D8760: Add payloads to ExtIDs.
Nov 2 2022, 2:39 PM
olasd accepted D8799: codemeta: Fix crash on SWORD documents that specify an id.

Thanks!

Nov 2 2022, 1:22 PM