Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 7 2022

vlorentz updated the diff for D8903: luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3.

rebase

Dec 7 2022, 10:08 AM
vlorentz closed D8926: luigi.RunExportAll: Default to exporting all formats.
Dec 7 2022, 10:03 AM
vlorentz closed D8925: luigi.CreateAthena: Fix validation of DB name.
Dec 7 2022, 10:03 AM
vlorentz committed rDDATASETeceaf73f0fba: luigi.CreateAthena: Fix validation of DB name (authored by vlorentz).
luigi.CreateAthena: Fix validation of DB name
Dec 7 2022, 10:03 AM
vlorentz committed rDDATASETc717f60fe08e: luigi.RunExportAll: Default to exporting all formats (authored by vlorentz).
luigi.RunExportAll: Default to exporting all formats
Dec 7 2022, 10:03 AM
vlorentz closed D8924: exporters/orc: Fix crash on visit status with no type.
Dec 7 2022, 10:03 AM
vlorentz committed rDDATASET22f7ed11f688: exporters/orc: Fix crash on visit status with no type (authored by vlorentz).
exporters/orc: Fix crash on visit status with no type
Dec 7 2022, 10:02 AM
vlorentz added inline comments to D8908: Add ListOriginContributors.
Dec 7 2022, 9:45 AM
vlorentz closed T1345: Update metadata docs about using CodeMeta vocabulary as Resolved.
Dec 7 2022, 6:20 AM · Documentation
vlorentz closed T1345: Update metadata docs about using CodeMeta vocabulary, a subtask of T1649: Update documentation with compliance scenario changes, as Resolved.
Dec 7 2022, 6:20 AM · SWORD deposit
vlorentz added a comment to T1345: Update metadata docs about using CodeMeta vocabulary.

yes

Dec 7 2022, 6:20 AM · Documentation

Dec 6 2022

vlorentz added a comment to D8907: feat: Add Hex.pm lister.

order sounds best. Do you want to do it?

Dec 6 2022, 6:18 PM
vlorentz added a comment to T4394: Add support for running metadata fetchers without a VCS/package loaders.

We decided to add recurring fetches, so it will take care both of backfilling now, and visiting from time to time in the future. We're going to assume 3 months for now, as it seems reasonable to not exhaust rate limits.

Dec 6 2022, 3:54 PM · Extrinsic metadata
vlorentz added a revision to T2220: swh-graph in production: D8919: Add CLI script to generate Luigi config and call it.
Dec 6 2022, 2:37 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
vlorentz added a task to D8919: Add CLI script to generate Luigi config and call it: T2220: swh-graph in production.
Dec 6 2022, 2:37 PM
vlorentz added a task to D8919: Add CLI script to generate Luigi config and call it: T4676: Add Luigi workflow in swh-dataset.
Dec 6 2022, 2:37 PM
vlorentz added a task to D8924: exporters/orc: Fix crash on visit status with no type: T4676: Add Luigi workflow in swh-dataset.
Dec 6 2022, 2:37 PM
vlorentz added a task to D8925: luigi.CreateAthena: Fix validation of DB name: T4676: Add Luigi workflow in swh-dataset.
Dec 6 2022, 2:37 PM
vlorentz added a task to D8926: luigi.RunExportAll: Default to exporting all formats: T4676: Add Luigi workflow in swh-dataset.
Dec 6 2022, 2:37 PM
vlorentz added revisions to T4676: Add Luigi workflow in swh-dataset: D8919: Add CLI script to generate Luigi config and call it, D8924: exporters/orc: Fix crash on visit status with no type, D8925: luigi.CreateAthena: Fix validation of DB name, D8926: luigi.RunExportAll: Default to exporting all formats.
Dec 6 2022, 2:37 PM · Datasets, Compressed graph service
vlorentz requested review of D8926: luigi.RunExportAll: Default to exporting all formats.
Dec 6 2022, 2:07 PM
vlorentz requested review of D8925: luigi.CreateAthena: Fix validation of DB name.
Dec 6 2022, 2:05 PM
vlorentz requested review of D8924: exporters/orc: Fix crash on visit status with no type.
Dec 6 2022, 2:04 PM
vlorentz accepted D8923: archive_coverage: Add link to Archive Changelog in coverage widget.

nice

Dec 6 2022, 1:46 PM
vlorentz accepted D8920: from_disk.Content: Add missing path info for symlink.

ah, so it doesn't matter for other loaders. Phew!

Dec 6 2022, 1:36 PM
vlorentz added a comment to D8920: from_disk.Content: Add missing path info for symlink.

Does it mean we were silently dropping data until this? Which loaders use this?

Dec 6 2022, 12:08 PM

Dec 5 2022

vlorentz added a comment to D8918: gitlab: allow ignoring projects with certain path prefixes.

Could you add this check?

Dec 5 2022, 4:24 PM
vlorentz requested review of D8919: Add CLI script to generate Luigi config and call it.
Dec 5 2022, 3:53 PM
vlorentz requested review of D8917: Split swh/graph/luigi.py into modules.
Dec 5 2022, 2:53 PM
vlorentz requested review of D8877: Fix incorrect error messages when failing to connect.
Dec 5 2022, 1:50 PM
vlorentz triaged T4714: Write Luigi tasks to generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz triaged T4713: Generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz updated the task description for T4712: Write Luigi tasks to regenerate the license dataset.
Dec 5 2022, 10:50 AM · Datasets
vlorentz triaged T4712: Write Luigi tasks to regenerate the license dataset as Low priority.
Dec 5 2022, 10:50 AM · Datasets

Dec 1 2022

vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories, a subtask of T4283: Load https://github.com/chromium/chromium with a higher packfile size limit, as Resolved.
Dec 1 2022, 4:18 PM · System administration, Git loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Git loader
vlorentz closed T2064: Add metadata from deposits to metadata search, a subtask of T3097: Expose metadata in the WebApp and make it searchable, as Resolved.
Dec 1 2022, 4:17 PM · Intrinsic metadata, Extrinsic metadata, Roadmap 2021, meta-task
vlorentz closed T2064: Add metadata from deposits to metadata search as Resolved.
Dec 1 2022, 4:17 PM · Metadata workflow
vlorentz closed T4401: Index metadata from the deposit as Resolved.
Dec 1 2022, 4:17 PM · SWORD deposit, Indexer, Metadata workflow
vlorentz closed T4401: Index metadata from the deposit, a subtask of T2064: Add metadata from deposits to metadata search, as Resolved.
Dec 1 2022, 4:17 PM · Metadata workflow
vlorentz closed T4694: Use directory metadata in origin search as Resolved.
Dec 1 2022, 4:16 PM · Indexer, Metadata workflow
vlorentz closed T4694: Use directory metadata in origin search, a subtask of T4401: Index metadata from the deposit, as Resolved.
Dec 1 2022, 4:16 PM · SWORD deposit, Indexer, Metadata workflow
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch, a subtask of T2064: Add metadata from deposits to metadata search, as Resolved.
Dec 1 2022, 4:16 PM · Metadata workflow
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch, a subtask of T3097: Expose metadata in the WebApp and make it searchable, as Resolved.
Dec 1 2022, 4:16 PM · Intrinsic metadata, Extrinsic metadata, Roadmap 2021, meta-task
vlorentz closed T2073: Index extrinsic metadata from the journal in swh-search/Elasticsearch as Resolved.
Dec 1 2022, 4:16 PM · Archive search, Metadata workflow
vlorentz closed T4696: indexer storage: Deadlock in origin_extrinsic_metadata_add as Resolved.
Dec 1 2022, 4:16 PM · Indexer
vlorentz closed T4457: Index metadata from Gitea/Gogs as Resolved.
Dec 1 2022, 4:15 PM · Origin-Gitea/Gogs, Extrinsic metadata, Indexer
vlorentz added a task to D8908: Add ListOriginContributors: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added a task to D8910: Regenerate the test dataset to include a release with no author: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added a task to D8912: ListOriginContributors: Ignore null author/committer in revisions/releases: T4695: Provide a collaboration graph / dataset.
Dec 1 2022, 4:15 PM
vlorentz added revisions to T4695: Provide a collaboration graph / dataset: D8908: Add ListOriginContributors, D8910: Regenerate the test dataset to include a release with no author, D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 1 2022, 4:15 PM · Datasets
vlorentz closed T4710: DuplicateId errors when indexing from deposits as Resolved.
Dec 1 2022, 4:13 PM · Indexer
vlorentz added a comment to D8907: feat: Add Hex.pm lister.

Let's see how this goes: https://github.com/hexpm/hexpm/pull/1168

Dec 1 2022, 4:03 PM
vlorentz requested changes to D8907: feat: Add Hex.pm lister.
Dec 1 2022, 2:04 PM
vlorentz added a comment to D8907: feat: Add Hex.pm lister.

You need to pass complete_qs to requests_mock to actually test the query string: https://requests-mock.readthedocs.io/en/latest/matching.html#query-strings (and then you'll need to add the sort param, because it's missing from tests).

Dec 1 2022, 2:04 PM
vlorentz closed T4709: npm mapping: ValueError: Invalid IPv6 URL as Resolved.
Dec 1 2022, 1:37 PM · Indexer
vlorentz requested review of D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 1 2022, 1:11 PM
vlorentz updated the diff for D8910: Regenerate the test dataset to include a release with no author.

remove logs

Dec 1 2022, 1:05 PM
vlorentz requested review of D8910: Regenerate the test dataset to include a release with no author.
Dec 1 2022, 11:38 AM
vlorentz updated the diff for D8908: Add ListOriginContributors.

fix copyright header

Dec 1 2022, 11:26 AM
vlorentz updated subscribers of D8909: Login: Add an option to choose an authentication method (by username/password or token).

@anlambert Shouldn't this be replaced by swh auth generate-token?

Dec 1 2022, 11:16 AM
vlorentz requested review of D8908: Add ListOriginContributors.
Dec 1 2022, 11:14 AM

Nov 30 2022

vlorentz added a comment to D8906: [WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks.

Thanks, just fixed it :)

Nov 30 2022, 2:53 PM
vlorentz requested changes to D8906: [WIP] :Make numeric cursors a bit more opaque to avoid cursor hacks.

.lstrip() and .rstrip() are not .removeprefix() and .removesuffix():

Nov 30 2022, 2:27 PM
vlorentz created P1532 .csv.zst to .csv.zst .
Nov 30 2022, 1:39 PM
vlorentz closed D8904: metadata_dictionary: Fix 'Invalid IPv6 URL' crash.
Nov 30 2022, 11:46 AM
vlorentz closed D8905: Fix crash when indexing two REMD objects from the same deposit.
Nov 30 2022, 11:46 AM
vlorentz committed rDCIDXf74b47bcf45d: Fix crash when indexing two REMD objects from the same deposit (authored by vlorentz).
Fix crash when indexing two REMD objects from the same deposit
Nov 30 2022, 11:46 AM
vlorentz committed rDCIDXb2d8afff6fa5: metadata_dictionary: Fix 'Invalid IPv6 URL' crash (authored by vlorentz).
metadata_dictionary: Fix 'Invalid IPv6 URL' crash
Nov 30 2022, 11:46 AM
vlorentz requested review of D8905: Fix crash when indexing two REMD objects from the same deposit.
Nov 30 2022, 10:17 AM
vlorentz added a revision to T4710: DuplicateId errors when indexing from deposits: D8905: Fix crash when indexing two REMD objects from the same deposit.
Nov 30 2022, 10:13 AM · Indexer
vlorentz requested review of D8904: metadata_dictionary: Fix 'Invalid IPv6 URL' crash.
Nov 30 2022, 9:49 AM
vlorentz triaged T4710: DuplicateId errors when indexing from deposits as High priority.
Nov 30 2022, 9:46 AM · Indexer
vlorentz added a revision to T4709: npm mapping: ValueError: Invalid IPv6 URL: D8904: metadata_dictionary: Fix 'Invalid IPv6 URL' crash.
Nov 30 2022, 9:45 AM · Indexer
vlorentz triaged T4709: npm mapping: ValueError: Invalid IPv6 URL as Normal priority.
Nov 30 2022, 9:44 AM · Indexer

Nov 29 2022

vlorentz updated the test plan for D8883: Add a script to generate a topological sort.
Nov 29 2022, 6:25 PM
vlorentz updated the test plan for D8883: Add a script to generate a topological sort.
Nov 29 2022, 6:25 PM
vlorentz updated the summary of D8883: Add a script to generate a topological sort.
Nov 29 2022, 6:24 PM
vlorentz updated the diff for D8883: Add a script to generate a topological sort.

fix rebase

Nov 29 2022, 6:23 PM
vlorentz published D8883: Add a script to generate a topological sort for review.
Nov 29 2022, 6:22 PM
vlorentz updated the diff for D8903: luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3.

fix incorrect decompressed path

Nov 29 2022, 6:21 PM
vlorentz closed D8899: README: Update list of indexers.
Nov 29 2022, 5:55 PM
vlorentz committed rDCIDXf44e14b11f10: README: Update list of indexers (authored by vlorentz).
README: Update list of indexers
Nov 29 2022, 5:55 PM
vlorentz updated the diff for D8899: README: Update list of indexers.

rebase

Nov 29 2022, 5:55 PM
vlorentz requested review of D8903: luigi: Add tasks UploadGraphToS3 and DownloadGraphFromS3.
Nov 29 2022, 5:15 PM
vlorentz closed D8901: luigi: Send progress reports to the scheduler.
Nov 29 2022, 4:56 PM
vlorentz committed rDDATASETa1cf9b871c5f: luigi: Send progress reports to the scheduler (authored by vlorentz).
luigi: Send progress reports to the scheduler
Nov 29 2022, 4:56 PM
vlorentz committed rDGRPHec7f568b1305: luigi: Clean up local files before compressing (authored by vlorentz).
luigi: Clean up local files before compressing
Nov 29 2022, 4:56 PM
vlorentz closed D8900: luigi: Send progress reports to the scheduler.
Nov 29 2022, 4:56 PM
vlorentz committed rDGRPHd5122a9b97b3: luigi: Send progress reports to the scheduler (authored by vlorentz).
luigi: Send progress reports to the scheduler
Nov 29 2022, 4:56 PM
vlorentz requested review of D8901: luigi: Send progress reports to the scheduler.
Nov 29 2022, 4:47 PM
vlorentz requested review of D8900: luigi: Send progress reports to the scheduler.
Nov 29 2022, 4:27 PM
vlorentz requested review of D8899: README: Update list of indexers.
Nov 29 2022, 3:56 PM
vlorentz closed D8898: Add Luigi to requirements of swh-dataset and swh-graph.
Nov 29 2022, 3:42 PM
vlorentz committed rDDOC144d5c2702b4: Add Luigi to requirements of swh-dataset and swh-graph (authored by vlorentz).
Add Luigi to requirements of swh-dataset and swh-graph
Nov 29 2022, 3:42 PM
vlorentz requested review of D8898: Add Luigi to requirements of swh-dataset and swh-graph.
Nov 29 2022, 3:38 PM
vlorentz committed rDGRPHb10f126bb47a: luigi: Fix typo in docstring (authored by vlorentz).
luigi: Fix typo in docstring
Nov 29 2022, 3:32 PM
vlorentz committed rDCIDX3faeac6cb667: docs: Remove remaining references to ctags and content_language (authored by vlorentz).
docs: Remove remaining references to ctags and content_language
Nov 29 2022, 3:31 PM