Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2023

gitlab-migration closed T4551: document the license dataset on docs.s.o as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:24 PM · Documentation, Datasets
gitlab-migration closed T4550: dataset: document the AWS S3 bucket for content objects as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:24 PM · Documentation, Datasets
gitlab-migration changed the status of T4676: Add Luigi workflow in swh-dataset from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:04 PM · Datasets, Compressed graph service
gitlab-migration changed the status of T4676: Add Luigi workflow in swh-dataset, a subtask of T4677: Add support for generating subdatasets in swh.dataset.luigi, from Resolved to Migrated.
Jan 8 2023, 10:04 PM · Datasets
gitlab-migration changed the status of T3260: publish swh.dataset to pypi from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:02 PM · Continuous Integration, Datasets
gitlab-migration changed the status of T3178: document how to export the graph dataset automatically, a subtask of T1847: fully automate export of the graph dataset, from Invalid to Migrated.
Jan 8 2023, 10:02 PM · Compressed graph service, Datasets
gitlab-migration changed the status of T3178: document how to export the graph dataset automatically from Invalid to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:02 PM · Documentation, Datasets
gitlab-migration changed the status of T3021: Investigate why reading the journal of the content table takes so long from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:01 PM · Journal, Datasets
gitlab-migration changed the status of T2431: Document how to export the graph edge dataset from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 10:00 PM · Documentation, Compressed graph service, Datasets
gitlab-migration changed the status of T2431: Document how to export the graph edge dataset, a subtask of T1847: fully automate export of the graph dataset, from Resolved to Migrated.
Jan 8 2023, 10:00 PM · Compressed graph service, Datasets
gitlab-migration changed the status of T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, from Resolved to Migrated.
Jan 8 2023, 9:59 PM · Datasets
gitlab-migration changed the status of T1847: fully automate export of the graph dataset from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 9:59 PM · Compressed graph service, Datasets
gitlab-migration closed T4747: Extract sample of .c files along with their most popular file name as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:06 PM · Datasets
gitlab-migration closed T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url, a subtask of T4695: Provide a collaboration graph / dataset, as Migrated.
Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4714: Write Luigi tasks to generate the citation dataset, a subtask of T4713: Generate the citation dataset, as Migrated.
Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4714: Write Luigi tasks to generate the citation dataset as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4713: Generate the citation dataset as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4713: Generate the citation dataset, a subtask of T4712: Write Luigi tasks to regenerate the license dataset, as Migrated.
Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4712: Write Luigi tasks to regenerate the license dataset as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4695: Provide a collaboration graph / dataset as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4685: license dataset: add logic to convert/import dataset into a SQL database as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4683: license dataset: use a consistent file format for CSV-like files, a subtask of T4685: license dataset: add logic to convert/import dataset into a SQL database, as Migrated.
Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4683: license dataset: use a consistent file format for CSV-like files as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T4677: Add support for generating subdatasets in swh.dataset.luigi as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:05 PM · Datasets
gitlab-migration closed T3885: Filter rows of size >32MB from dataset export as Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 5:03 PM · Datasets
gitlab-migration changed the status of T4682: license dataset: missing java stuff from the replication package from Wontfix to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:38 PM · Datasets
gitlab-migration changed the status of T4586: max_matching_nodes is applied before filtering for node type, a subtask of T4469: update license blob dataset to match-ish latest compress graph, from Resolved to Migrated.
Jan 8 2023, 4:37 PM · Datasets
gitlab-migration changed the status of T4469: update license blob dataset to match-ish latest compress graph from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:37 PM · Datasets
gitlab-migration changed the status of T3626: graph API: add ?limit parameter to /leaves endpoint, a subtask of T4469: update license blob dataset to match-ish latest compress graph, from Resolved to Migrated.
Jan 8 2023, 4:35 PM · Datasets
gitlab-migration changed the status of T3329: document ORC format dataset availability from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:34 PM · Datasets
gitlab-migration changed the status of T2361: WARNING:swh.core.cli:Could not load subcommand dataset: No module named 'swh.dataset.cli' from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:30 PM · Datasets
gitlab-migration changed the status of T2003: Content replayer may try to copy objects before they are available from an objstorage, a subtask of T1914: Keep mirror of contents on S3 up to date, from Resolved to Migrated.
Jan 8 2023, 4:28 PM · Mirror, Datasets
gitlab-migration changed the status of T1956: Integrate usage docs of the graph dataset in swh-docs from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:28 PM · Datasets
gitlab-migration changed the status of T1914: Keep mirror of contents on S3 up to date, a subtask of T1899: complete object storage mirror on AWS, from Duplicate to Migrated.
Jan 8 2023, 4:28 PM · Mirror, Datasets
gitlab-migration changed the status of T1914: Keep mirror of contents on S3 up to date from Duplicate to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:28 PM · Mirror, Datasets
gitlab-migration changed the status of T1899: complete object storage mirror on AWS from Duplicate to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Mirror, Datasets
gitlab-migration changed the status of T1848: refresh graph dataset export from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1796: Datasets exported from Spark are missing some rows from Wontfix to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1783: edge dataset: re-export rev→rev edges in the right order from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1741: graph dataset: update to use persistent identifiers everywhere, a subtask of T1848: refresh graph dataset export, from Resolved to Migrated.
Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1743: create a nice landing web page for exported dataset from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1742: graph dataset: uniform file names from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets
gitlab-migration changed the status of T1741: graph dataset: update to use persistent identifiers everywhere from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:27 PM · Datasets

Jan 6 2023

vlorentz added a project to T4747: Extract sample of .c files along with their most popular file name: Datasets.
Jan 6 2023, 12:14 PM · Datasets

Dec 22 2022

vlorentz closed T4682: license dataset: missing java stuff from the replication package as Wontfix.

Future versions will be generated using only code in swh-graph (bash glue code replaced by Python code, some of which shells out to bash for simplicity), so the replication package will simply be replaced by a swh-graph tag.

Dec 22 2022, 2:52 PM · Datasets
vlorentz added revisions to T4695: Provide a collaboration graph / dataset: D8970: origin_contributors: Use origin IDs instead of SWHIDs, D8971: origin_contributors: Write table mapping origin ID to origin URL (base64-encoded), D8972: origin_contributors: Rename 'person' to 'contributor' in outputs.
Dec 22 2022, 1:57 PM · Datasets
vlorentz added a comment to T4695: Provide a collaboration graph / dataset.

TODO: deanonymized dataset should be just a <contributor_id,contributor_base64,contributor_escaped> table, rather than repeating the origin<->contributor mapping

Dec 22 2022, 1:57 PM · Datasets

Dec 21 2022

vlorentz added a comment to T4683: license dataset: use a consistent file format for CSV-like files.

blobs-fileinfo.csv.zst: (no changes needed)

Dec 21 2022, 1:50 PM · Datasets

Dec 19 2022

vlorentz added a revision to T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url: D8971: origin_contributors: Write table mapping origin ID to origin URL (base64-encoded).
Dec 19 2022, 5:55 PM · Datasets
vlorentz added a revision to T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url: D8970: origin_contributors: Use origin IDs instead of SWHIDs.
Dec 19 2022, 5:45 PM · Datasets

Dec 15 2022

zack added a parent task for T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url: T4695: Provide a collaboration graph / dataset.
Dec 15 2022, 1:37 PM · Datasets
zack added a subtask for T4695: Provide a collaboration graph / dataset: T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url.
Dec 15 2022, 1:37 PM · Datasets
zack triaged T4729: collaboration graph: drop pseudo-SWHIDs and add mapping ori<->url as Normal priority.
Dec 15 2022, 1:33 PM · Datasets
vlorentz closed T4676: Add Luigi workflow in swh-dataset, a subtask of T4677: Add support for generating subdatasets in swh.dataset.luigi, as Resolved.
Dec 15 2022, 1:00 PM · Datasets
vlorentz closed T4676: Add Luigi workflow in swh-dataset as Resolved.
Dec 15 2022, 1:00 PM · Datasets, Compressed graph service
zack changed the status of T4695: Provide a collaboration graph / dataset from Open to Work in Progress.
Dec 15 2022, 10:27 AM · Datasets

Dec 6 2022

vlorentz added revisions to T4676: Add Luigi workflow in swh-dataset: D8919: Add CLI script to generate Luigi config and call it, D8924: exporters/orc: Fix crash on visit status with no type, D8925: luigi.CreateAthena: Fix validation of DB name, D8926: luigi.RunExportAll: Default to exporting all formats.
Dec 6 2022, 2:37 PM · Datasets, Compressed graph service

Dec 5 2022

vlorentz triaged T4714: Write Luigi tasks to generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz triaged T4713: Generate the citation dataset as Normal priority.
Dec 5 2022, 10:51 AM · Datasets
vlorentz updated the task description for T4712: Write Luigi tasks to regenerate the license dataset.
Dec 5 2022, 10:50 AM · Datasets
vlorentz triaged T4712: Write Luigi tasks to regenerate the license dataset as Low priority.
Dec 5 2022, 10:50 AM · Datasets

Dec 1 2022

vlorentz added revisions to T4695: Provide a collaboration graph / dataset: D8908: Add ListOriginContributors, D8910: Regenerate the test dataset to include a release with no author, D8912: ListOriginContributors: Ignore null author/committer in revisions/releases.
Dec 1 2022, 4:15 PM · Datasets

Nov 24 2022

vlorentz added a revision to T4695: Provide a collaboration graph / dataset: D8883: Add a script to generate a topological sort.
Nov 24 2022, 4:20 PM · Datasets

Nov 21 2022

vlorentz triaged T4695: Provide a collaboration graph / dataset as Normal priority.
Nov 21 2022, 12:13 PM · Datasets

Nov 14 2022

zack added a parent task for T4683: license dataset: use a consistent file format for CSV-like files: T4685: license dataset: add logic to convert/import dataset into a SQL database.
Nov 14 2022, 4:50 PM · Datasets
zack added a subtask for T4685: license dataset: add logic to convert/import dataset into a SQL database: T4683: license dataset: use a consistent file format for CSV-like files.
Nov 14 2022, 4:50 PM · Datasets
zack triaged T4685: license dataset: add logic to convert/import dataset into a SQL database as Low priority.
Nov 14 2022, 4:49 PM · Datasets
zack changed the edit policy for P1529 import the license dataset into sqlite.
Nov 14 2022, 4:47 PM · Datasets
zack created P1529 import the license dataset into sqlite.
Nov 14 2022, 4:47 PM · Datasets
zack added a project to T4683: license dataset: use a consistent file format for CSV-like files: Datasets.
Nov 14 2022, 3:09 PM · Datasets
vlorentz added a comment to T4682: license dataset: missing java stuff from the replication package.

the replication/05-earliest-revision.sh script in the replication package mentions the swh-graph version it uses, and the fully qualified class name, so it can be found in the swh-graph code.

Nov 14 2022, 3:08 PM · Datasets
zack triaged T4682: license dataset: missing java stuff from the replication package as Low priority.
Nov 14 2022, 2:45 PM · Datasets

Nov 10 2022

vlorentz added revisions to T4676: Add Luigi workflow in swh-dataset: D8827: athena: Fix create_table to work with restricted permissions, D8828: cli: Move the main code of export_graph to its own function, D8829: Add luigi tasks.
Nov 10 2022, 10:42 AM · Datasets, Compressed graph service
vlorentz added a parent task for T4676: Add Luigi workflow in swh-dataset: T4677: Add support for generating subdatasets in swh.dataset.luigi.
Nov 10 2022, 10:42 AM · Datasets, Compressed graph service
vlorentz added a subtask for T4677: Add support for generating subdatasets in swh.dataset.luigi: T4676: Add Luigi workflow in swh-dataset.
Nov 10 2022, 10:42 AM · Datasets
vlorentz triaged T4677: Add support for generating subdatasets in swh.dataset.luigi as Normal priority.
Nov 10 2022, 10:42 AM · Datasets
vlorentz triaged T4676: Add Luigi workflow in swh-dataset as High priority.
Nov 10 2022, 10:41 AM · Datasets, Compressed graph service

Nov 7 2022

vlorentz closed T4469: update license blob dataset to match-ish latest compress graph as Resolved.

It's now available on https://annex.softwareheritage.org/public/dataset/license-blobs/2022-04-25/

Nov 7 2022, 10:34 AM · Datasets

Oct 19 2022

gitlab-migration changed the status of T4507: Out of memory on granet, a subtask of T4469: update license blob dataset to match-ish latest compress graph, from Resolved to Migrated.
Oct 19 2022, 6:08 PM · Datasets

Oct 11 2022

vlorentz closed T4507: Out of memory on granet, a subtask of T4469: update license blob dataset to match-ish latest compress graph, as Resolved.
Oct 11 2022, 11:45 AM · Datasets

Oct 3 2022

vlorentz closed T4586: max_matching_nodes is applied before filtering for node type, a subtask of T4469: update license blob dataset to match-ish latest compress graph, as Resolved.
Oct 3 2022, 9:56 AM · Datasets

Sep 29 2022

vlorentz added a subtask for T4469: update license blob dataset to match-ish latest compress graph: T4507: Out of memory on granet.
Sep 29 2022, 3:08 PM · Datasets
vlorentz removed a subtask for T4469: update license blob dataset to match-ish latest compress graph: T4522: graph gRPC API: Add support for limiting traversals by number of results.
Sep 29 2022, 3:07 PM · Datasets
vlorentz added subtasks for T4469: update license blob dataset to match-ish latest compress graph: T4586: max_matching_nodes is applied before filtering for node type, T4522: graph gRPC API: Add support for limiting traversals by number of results, T3626: graph API: add ?limit parameter to /leaves endpoint.
Sep 29 2022, 3:06 PM · Datasets

Sep 23 2022

zack renamed T4551: document the license dataset on docs.s.o from document the license dataset to document the license dataset on docs.s.o.
Sep 23 2022, 4:38 PM · Documentation, Datasets
zack triaged T4551: document the license dataset on docs.s.o as Normal priority.
Sep 23 2022, 4:38 PM · Documentation, Datasets
zack triaged T4550: dataset: document the AWS S3 bucket for content objects as Normal priority.
Sep 23 2022, 4:27 PM · Documentation, Datasets

Aug 29 2022

zack triaged T4469: update license blob dataset to match-ish latest compress graph as Normal priority.
Aug 29 2022, 11:46 AM · Datasets

May 1 2022

seirl closed T1848: refresh graph dataset export as Resolved.

Now that there is both a columnar+compressed graph from 2021 and a columnar graph from 2022 that is pending compression, this task about "refreshing the export from January 2019" is resolved.

May 1 2022, 12:08 PM · Datasets

Apr 29 2022

seirl changed the status of T1848: refresh graph dataset export from Open to Work in Progress.
Apr 29 2022, 6:23 PM · Datasets
seirl moved T1847: fully automate export of the graph dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Compressed graph service, Datasets
seirl moved T2431: Document how to export the graph edge dataset from Backlog to Deployed on the Compressed graph service board.
Apr 29 2022, 6:22 PM · Documentation, Compressed graph service, Datasets
seirl closed T3021: Investigate why reading the journal of the content table takes so long as Resolved.

Fixed in D7718

Apr 29 2022, 6:20 PM · Journal, Datasets
seirl closed T2431: Document how to export the graph edge dataset, a subtask of T1847: fully automate export of the graph dataset, as Resolved.
Apr 29 2022, 6:15 PM · Compressed graph service, Datasets
seirl closed T2431: Document how to export the graph edge dataset as Resolved.

Done here: D7693 and here: D7711

Apr 29 2022, 6:15 PM · Documentation, Compressed graph service, Datasets
seirl closed T1743: create a nice landing web page for exported dataset as Resolved.
Apr 29 2022, 6:14 PM · Datasets
seirl added a comment to T1743: create a nice landing web page for exported dataset.

Done, this page https://annex.softwareheritage.org/public/dataset/graph/ now contains a link to the detailed list of datasets: https://forge.softwareheritage.org/D7487

Apr 29 2022, 6:14 PM · Datasets
seirl closed T1847: fully automate export of the graph dataset, a subtask of T1848: refresh graph dataset export, as Resolved.
Apr 29 2022, 5:57 PM · Datasets
seirl closed T1847: fully automate export of the graph dataset as Resolved.

Done!

Apr 29 2022, 5:57 PM · Compressed graph service, Datasets