Page MenuHomeSoftware Heritage
Feed All Stories

Dec 6 2022

ardumont requested review of D8922: send-to-celery: Adapt to schedule from lister name & instance_name.
Dec 6 2022, 12:30 PM
ardumont created P1539 basic script to schedule stuff on staging.
Dec 6 2022, 12:28 PM
vlorentz added a comment to D8920: from_disk.Content: Add missing path info for symlink.

Does it mean we were silently dropping data until this? Which loaders use this?

Dec 6 2022, 12:08 PM
olasd closed D8921: Hook up recently introduced options to all listers.
Dec 6 2022, 10:47 AM
olasd committed rDLSe785e67315c6: Hook up recently introduced options to all listers (authored by olasd).
Hook up recently introduced options to all listers
Dec 6 2022, 10:47 AM
olasd closed D8918: gitlab: allow ignoring projects with certain path prefixes.
Dec 6 2022, 10:47 AM
olasd committed rDLS5ea79ee3e0de: gitlab: allow ignoring projects with certain path prefixes (authored by olasd).
gitlab: allow ignoring projects with certain path prefixes
Dec 6 2022, 10:47 AM

Dec 5 2022

olasd added a comment to D8906: [WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks.

So, overall, what I'm uncomfortable with is decoding incoming cursors without validating that we have output and generated them ourselves. They're currently untrusted user input, that we're decoding with a very simple algorithm, and it'd be easy enough for anyone to fiddle with them (even with the prefix/suffix addition).

Dec 5 2022, 5:55 PM
ardumont accepted D8918: gitlab: allow ignoring projects with certain path prefixes.
Dec 5 2022, 5:31 PM
ardumont added inline comments to D8883: Add a script to generate a topological sort.
Dec 5 2022, 5:26 PM
anlambert closed T4715: Update publications page as Resolved.
Dec 5 2022, 5:16 PM · Website
anlambert accepted D8921: Hook up recently introduced options to all listers.
Dec 5 2022, 5:05 PM
olasd added a comment to D8918: gitlab: allow ignoring projects with certain path prefixes.

Could you add this check?

for prefix in ignored_project_prefixes:
    if not prefix.endswith("/"):
        raise ValueError("Prefixes must end with /")

Otherwise we are going to forget to add slashes at the end, and it will match too many namespaces.

Dec 5 2022, 4:46 PM
olasd requested review of D8921: Hook up recently introduced options to all listers.
Dec 5 2022, 4:44 PM
anlambert updated the task description for T4715: Update publications page .
Dec 5 2022, 4:40 PM · Website
rdicosmo created T4715: Update publications page .
Dec 5 2022, 4:37 PM · Website
vlorentz added a comment to D8918: gitlab: allow ignoring projects with certain path prefixes.

Could you add this check?

Dec 5 2022, 4:24 PM
anlambert requested review of D8920: from_disk.Content: Add missing path info for symlink.
Dec 5 2022, 4:21 PM
vlorentz requested review of D8919: Add CLI script to generate Luigi config and call it.
Dec 5 2022, 3:53 PM
olasd requested review of D8918: gitlab: allow ignoring projects with certain path prefixes.
Dec 5 2022, 3:45 PM
ardumont accepted D8913: loader: Compute contents diff when detecting trees divergence.

ok then!

Dec 5 2022, 3:43 PM
olasd added a revision to T4680: GitLab lister: allow ignoring origins contained in given namespace prefixes: D8918: gitlab: allow ignoring projects with certain path prefixes.
Dec 5 2022, 3:40 PM · Easy hack, Lister
olasd closed D8916: Add a flag to not enable origins listed by a lister.
Dec 5 2022, 3:25 PM
olasd committed rDLS64267f8f50a8: Add a flag to not enable origins listed by a lister (authored by olasd).
Add a flag to not enable origins listed by a lister
Dec 5 2022, 3:25 PM
olasd closed D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 3:25 PM
olasd committed rDLSb815737054ac: Add built-in page and origin count limit to listers (authored by olasd).
Add built-in page and origin count limit to listers
Dec 5 2022, 3:25 PM
anlambert added a comment to D8914: Ignore psqlrc when loading the rubygems database dump.
In D8914#231759, @olasd wrote:

Thanks, I will try to update the swh/lister/rubygems/data/small_rubygems_dump.sh script to generate a postgres dump that can be loaded when ERROR_ON_STOP is set.

There's not much point in doing that, the actual dump is generated with ownership instructions as well. I've wasted a good chunk of time trying to work around that but filtering their silly double-wrapped plain text SQL export is a PITA.

Dec 5 2022, 3:23 PM
swh-public-ci added a comment to D8916: Add a flag to not enable origins listed by a lister.

Build is green

Dec 5 2022, 3:01 PM
swh-public-ci added a comment to D8915: Add built-in page and origin count limit to listers.

Build is green

Dec 5 2022, 3:00 PM
olasd closed D8914: Ignore psqlrc when loading the rubygems database dump.
Dec 5 2022, 2:56 PM
olasd committed rDLSa66e24bfa2af: Ignore psqlrc when loading the rubygems database dump (authored by olasd).
Ignore psqlrc when loading the rubygems database dump
Dec 5 2022, 2:56 PM
olasd updated the diff for D8916: Add a flag to not enable origins listed by a lister.

Rebase

Dec 5 2022, 2:55 PM
olasd updated the diff for D8915: Add built-in page and origin count limit to listers.

Add explicit test for default behavior

Dec 5 2022, 2:55 PM
vlorentz requested review of D8917: Split swh/graph/luigi.py into modules.
Dec 5 2022, 2:53 PM
olasd added inline comments to D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 2:51 PM
anlambert added inline comments to D8913: loader: Compute contents diff when detecting trees divergence.
Dec 5 2022, 2:42 PM
ardumont added inline comments to D8913: loader: Compute contents diff when detecting trees divergence.
Dec 5 2022, 2:39 PM
ardumont requested changes to D8913: loader: Compute contents diff when detecting trees divergence.

Good idea but you need to update that conditional.

Dec 5 2022, 2:37 PM
olasd added a comment to D8914: Ignore psqlrc when loading the rubygems database dump.

Thanks, I will try to update the swh/lister/rubygems/data/small_rubygems_dump.sh script to generate a postgres dump that can be loaded when ERROR_ON_STOP is set.

Dec 5 2022, 2:35 PM
ardumont accepted D8916: Add a flag to not enable origins listed by a lister.
Dec 5 2022, 2:32 PM
ardumont accepted D8915: Add built-in page and origin count limit to listers.

just something i don't get in the test, as per my question inline.

Dec 5 2022, 2:31 PM
ardumont added inline comments to D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 2:30 PM
olasd requested review of D8916: Add a flag to not enable origins listed by a lister.
Dec 5 2022, 2:27 PM
olasd requested review of D8915: Add built-in page and origin count limit to listers.
Dec 5 2022, 2:27 PM
anlambert accepted D8914: Ignore psqlrc when loading the rubygems database dump.

Thanks, I will try to update the swh/lister/rubygems/data/small_rubygems_dump.sh script to generate a postgres dump that can be loaded when ERROR_ON_STOP is set.

Dec 5 2022, 2:08 PM
olasd requested review of D8914: Ignore psqlrc when loading the rubygems database dump.
Dec 5 2022, 1:58 PM
anlambert closed T4711: publications page: add 2022 scientific papers as Resolved.
Dec 5 2022, 1:55 PM · Website
vlorentz requested review of D8877: Fix incorrect error messages when failing to connect.
Dec 5 2022, 1:50 PM
anlambert requested review of D8913: loader: Compute contents diff when detecting trees divergence.
Dec 5 2022, 1:36 PM
anlambert closed D8886: replay: Add more debug logs.
Dec 5 2022, 1:26 PM
anlambert committed rDLDSVNe35f800a423a: replay: Add more debug logs (authored by anlambert).
replay: Add more debug logs
Dec 5 2022, 1:26 PM
ardumont accepted D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 5 2022, 11:01 AM
vlorentz triaged T4714: Write Luigi tasks to generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz triaged T4713: Generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz updated the task description for T4712: Write Luigi tasks to regenerate the license dataset.
Dec 5 2022, 10:50 AM · Datasets
vlorentz triaged T4712: Write Luigi tasks to regenerate the license dataset as Low priority.
Dec 5 2022, 10:50 AM · Datasets
anlambert created P1535 subversion loader based on incremental checkout operations.
Dec 5 2022, 10:47 AM
franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).

@anlambert Shouldn't this be replaced by swh auth generate-token?

Dec 5 2022, 10:20 AM

Dec 2 2022

swh-public-ci added a comment to D8077: Add a static query cost calculator to reject malicious quries.

Build is green

Dec 2 2022, 3:47 PM
jayeshv updated the diff for D8077: Add a static query cost calculator to reject malicious quries.

rebase

Dec 2 2022, 3:44 PM
swh-public-ci added a comment to D8425: Return more information with revision and release dates.

Build is green

Dec 2 2022, 3:44 PM
jayeshv updated the diff for D8425: Return more information with revision and release dates.

bug fix

Dec 2 2022, 3:39 PM

Dec 1 2022

vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories, a subtask of T4283: Load https://github.com/chromium/chromium with a higher packfile size limit, as Resolved.
Dec 1 2022, 4:18 PM · System administration, Git loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Git loader
vlorentz closed T2064: Add metadata from deposits to metadata search, a subtask of T3097: Expose metadata in the WebApp and make it searchable, as Resolved.
Dec 1 2022, 4:17 PM · Intrinsic metadata, Extrinsic metadata, Roadmap 2021, meta-task
vlorentz closed T2064: Add metadata from deposits to metadata search as Resolved.
Dec 1 2022, 4:17 PM · Metadata workflow
vlorentz closed T4401: Index metadata from the deposit as Resolved.
Dec 1 2022, 4:17 PM · SWORD deposit, Indexer, Metadata workflow
vlorentz closed T4401: Index metadata from the deposit, a subtask of T2064: Add metadata from deposits to metadata search, as Resolved.
Dec 1 2022, 4:17 PM · Metadata workflow
vlorentz closed T4694: Use directory metadata in origin search as Resolved.
Dec 1 2022, 4:16 PM · Indexer, Metadata workflow
vlorentz closed T4694: Use directory metadata in origin search, a subtask of T4401: Index metadata from the deposit, as Resolved.
Dec 1 2022, 4:16 PM · SWORD deposit, Indexer, Metadata workflow
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch, a subtask of T2064: Add metadata from deposits to metadata search, as Resolved.
Dec 1 2022, 4:16 PM · Metadata workflow
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch, a subtask of T3097: Expose metadata in the WebApp and make it searchable, as Resolved.
Dec 1 2022, 4:16 PM · Intrinsic metadata, Extrinsic metadata, Roadmap 2021, meta-task
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch as Resolved.
Dec 1 2022, 4:16 PM · Archive search, Metadata workflow
vlorentz closed T4696: indexer storage: Deadlock in origin_extrinsic_metadata_add as Resolved.
Dec 1 2022, 4:16 PM · Indexer
vlorentz closed T4457: Index metadata from Gitea/Gogs as Resolved.
Dec 1 2022, 4:15 PM · Origin-Gitea/Gogs, Extrinsic metadata, Indexer
vlorentz added a task to D8908: Add ListOriginContributors: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added a task to D8910: Regenerate the test dataset to include a release with no author: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added a task to D8912: ListOriginContributors: Ignore null author/committer in revisions/releases: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added revisions to T4695: Provide a collaboration graph / dataset: D8908: Add ListOriginContributors, D8910: Regenerate the test dataset to include a release with no author, D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 1 2022, 4:15 PM · Datasets
vlorentz closed T4710: DuplicateId errors when indexing from deposits as Resolved.
Dec 1 2022, 4:13 PM · Indexer
vlorentz added a comment to D8907: feat: Add Hex.pm lister.

Let's see how this goes: https://github.com/hexpm/hexpm/pull/1168

Dec 1 2022, 4:03 PM
vlorentz requested changes to D8907: feat: Add Hex.pm lister.
Dec 1 2022, 2:04 PM
vlorentz added a comment to D8907: feat: Add Hex.pm lister.

You need to pass complete_qs to requests_mock to actually test the query string: https://requests-mock.readthedocs.io/en/latest/matching.html#query-strings (and then you'll need to add the sort param, because it's missing from tests).

Dec 1 2022, 2:04 PM
vlorentz closed T4709: npm mapping: ValueError: Invalid IPv6 URL as Resolved.
Dec 1 2022, 1:37 PM · Indexer
swh-public-ci added a comment to D8910: Regenerate the test dataset to include a release with no author.

Build is green

Dec 1 2022, 1:32 PM
Harbormaster failed remote builds in B33069: Diff 32116 for D8910: Regenerate the test dataset to include a release with no author!
Dec 1 2022, 1:16 PM
swh-public-ci added a comment to D8910: Regenerate the test dataset to include a release with no author.

Build was aborted

Dec 1 2022, 1:16 PM
vlorentz requested review of D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 1 2022, 1:11 PM
vlorentz updated the diff for D8910: Regenerate the test dataset to include a release with no author.

remove logs

Dec 1 2022, 1:05 PM
vlorentz requested review of D8910: Regenerate the test dataset to include a release with no author.
Dec 1 2022, 11:38 AM
jayeshv requested review of D8911: Remove ambiguous item cursors.
Dec 1 2022, 11:36 AM
swh-public-ci added a comment to D8908: Add ListOriginContributors.

Build is green

Dec 1 2022, 11:34 AM
vlorentz updated the diff for D8908: Add ListOriginContributors.

fix copyright header

Dec 1 2022, 11:26 AM
franckbret added a comment to D8909: Login: Add an option to choose an authentication method (by username/password or token).

Some screen capture of the differents scenario when login:

Dec 1 2022, 11:22 AM
vlorentz updated subscribers of D8909: Login: Add an option to choose an authentication method (by username/password or token).

@anlambert Shouldn't this be replaced by swh auth generate-token?

Dec 1 2022, 11:16 AM
vlorentz requested review of D8908: Add ListOriginContributors.
Dec 1 2022, 11:14 AM
franckbret requested review of D8909: Login: Add an option to choose an authentication method (by username/password or token).
Dec 1 2022, 11:08 AM
zack committed rMSLD1e5246da493e: rename IFRI slides to make them more easily discoverable (authored by zack).
rename IFRI slides to make them more easily discoverable
Dec 1 2022, 10:04 AM