Page MenuHomeSoftware Heritage
Feed All Stories

Oct 26 2022

vlorentz added subtasks for T4659: Fix all crashes of the git loader caused by malformed git objects: T4658: ObjectFormatException: Unknown field b'>', T3880: Support Git commits with no angle brackets in author name.
Oct 26 2022, 10:31 AM · meta-task, Git loader
vlorentz triaged T4659: Fix all crashes of the git loader caused by malformed git objects as Normal priority.
Oct 26 2022, 10:31 AM · meta-task, Git loader
swh-sentry-integration assigned T4658: ObjectFormatException: Unknown field b'>' to vlorentz.
Oct 26 2022, 10:30 AM · Git loader
franckbret requested review of D8777: Puppet: Lister implements incremental mode.
Oct 26 2022, 10:20 AM
vlorentz created P1511 kafka topic config.
Oct 26 2022, 10:07 AM

Oct 25 2022

samplet added a comment to D8759: model: Add payload to ExtID class.

Hi, thanks for the diffs

Oct 25 2022, 10:48 PM
olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

This is now done, the objects are fixed in the production DB and kafka.

Oct 25 2022, 8:10 PM · Archive integrity, Object storage, Data Model
olasd added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

@vlorentz I'm running the following adaptation to your script:

Oct 25 2022, 7:03 PM · Archive integrity, Object storage, Data Model
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 25 2022, 6:57 PM · Data Model, Nixguix loader
ardumont updated subscribers of T3781: Replace the Nixguix loader with a lister.

Last analysis without [1]. That last diff should fix the key entries marked with the key 'only-version-should-be-tarball'.

Oct 25 2022, 6:53 PM · Data Model, Nixguix loader
ardumont committed rDSNIP4ff0739b5cae: nixguix/analyze-result: Improve extension grouping (authored by ardumont).
nixguix/analyze-result: Improve extension grouping
Oct 25 2022, 6:50 PM
ardumont committed rDSNIP5203c59a2bb3: nixguix/analyze-result: Improve command output (authored by ardumont).
nixguix/analyze-result: Improve command output
Oct 25 2022, 6:50 PM
ardumont added a comment to D8773: nixguix: Deal with edge case url with version instead of extension.

I think detecting versions is a lost cause. Here is what I had to do for PyPI:

https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/migrate_extrinsic_metadata.py$234-281
Oct 25 2022, 6:47 PM
swh-public-ci added a comment to D8774: nixguix: Use content-disposition from http head request if provided.

Build is green

Oct 25 2022, 6:22 PM
vlorentz added a comment to D8773: nixguix: Deal with edge case url with version instead of extension.

plus, Nix and Guix already have the package name and version in their metadata, can't we ask them to provide this data to us?

Oct 25 2022, 6:22 PM
swh-public-ci added a comment to D8773: nixguix: Deal with edge case url with version instead of extension.

Build is green

Oct 25 2022, 6:21 PM
vlorentz added a comment to D8773: nixguix: Deal with edge case url with version instead of extension.

plus, Nix and Guix already have the package name and version in their metadata, can't we ask them to provide this data to us?

Oct 25 2022, 6:20 PM
olasd added a comment to D8151: [RFC] Add 'evolve' method to BaseModel objects.

Yeah, I think that makes sense. Time for these tests to be written? :)

Oct 25 2022, 6:19 PM
vlorentz added a comment to D8773: nixguix: Deal with edge case url with version instead of extension.

I think detecting versions is a lost cause. Here is what I had to do for PyPI: https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/migrate_extrinsic_metadata.py$234-281

Oct 25 2022, 6:18 PM
ardumont updated the diff for D8774: nixguix: Use content-disposition from http head request if provided.

Rebase

Oct 25 2022, 6:16 PM
ardumont updated the diff for D8773: nixguix: Deal with edge case url with version instead of extension.

Improve the regexp version detection to be more restrictive.

Oct 25 2022, 6:16 PM
olasd accepted D8775: test_converters: Rename 'raw_manifest' to 'raw_string'.

must... resist... painting the bikeshed (git_manifest? git_object? raw_object?)

Oct 25 2022, 6:15 PM
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Oh yeah, I was thinking of just removing the entire project, but your solution also works.

Oct 25 2022, 6:15 PM · Archive integrity, Object storage, Data Model
olasd accepted D8776: converters: Replace '/' with '_' in directory entries.

Great, thanks!

Oct 25 2022, 6:14 PM
vlorentz requested review of D8776: converters: Replace '/' with '_' in directory entries.
Oct 25 2022, 6:12 PM
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Holes are bad. And I just opened a diff to make the git loader apply the same transformation, as @olasd made the same comment: D8776

Oct 25 2022, 6:10 PM · Archive integrity, Object storage, Data Model
vlorentz requested review of D8775: test_converters: Rename 'raw_manifest' to 'raw_string'.
Oct 25 2022, 6:09 PM
seirl added a comment to T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

Do you actually want to keep these objects? This would be inconsistent with the fixed loader behavior that would just reject those objects, and not load the repository at all.

Oct 25 2022, 6:06 PM · Archive integrity, Object storage, Data Model
ardumont added inline comments to D8773: nixguix: Deal with edge case url with version instead of extension.
Oct 25 2022, 6:03 PM
ardumont added inline comments to D8773: nixguix: Deal with edge case url with version instead of extension.
Oct 25 2022, 6:03 PM
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

This is the POST request I received from SourceForge when pushing commit to a sample hg repository.

Oct 25 2022, 6:02 PM · Web app
vlorentz changed the visibility for T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').
Oct 25 2022, 5:57 PM · Archive integrity, Object storage, Data Model
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

This is the POST request I received from SourceForge when adding commit to a sample svn repository.

Oct 25 2022, 5:56 PM · Web app
ardumont requested review of D8774: nixguix: Use content-disposition from http head request if provided.
Oct 25 2022, 5:56 PM
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8774: nixguix: Use content-disposition from http head request if provided.
Oct 25 2022, 5:50 PM · Data Model, Nixguix loader
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

This is the POST request I received from SourceForge when pushing commit to a sample git repository.

Oct 25 2022, 5:49 PM · Web app
vlorentz updated subscribers of T4644: replayer crashes on invalid directory entry name (b'gitter/gitter.xml').

I tried to add a workaround in the backfiller, but it is incredibly hard to do properly, especially as entries as disordered, so raw_manifest needs to be fixed in two different ways.

Oct 25 2022, 5:41 PM · Archive integrity, Object storage, Data Model
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

This is the POST request I received from Gitea when pushing commit to a sample repository.

Oct 25 2022, 5:36 PM · Web app
ardumont requested review of D8773: nixguix: Deal with edge case url with version instead of extension.
Oct 25 2022, 5:34 PM
ardumont added a revision to T3781: Replace the Nixguix loader with a lister: D8773: nixguix: Deal with edge case url with version instead of extension.
Oct 25 2022, 5:29 PM · Data Model, Nixguix loader
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

This is the POST request I received from GitLab when pushing commit to a sample repository.

Oct 25 2022, 5:27 PM · Web app
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

I used ngrok to forward webhook requests to my local machine.

Oct 25 2022, 5:21 PM · Web app
olasd closed D8770: Ensure origins are not visited faster than twice a day.
Oct 25 2022, 4:58 PM
olasd closed D8769: Refresh task type data from the database every time recurrent tasks are run.
Oct 25 2022, 4:58 PM
olasd committed rDSCHff75e742ee45: Ensure origins are not visited faster than twice a day (authored by olasd).
Ensure origins are not visited faster than twice a day
Oct 25 2022, 4:58 PM
olasd committed rDSCH1f9109fa4d66: Refresh task type data from the database every time recurrent tasks are run (authored by olasd).
Refresh task type data from the database every time recurrent tasks are run
Oct 25 2022, 4:58 PM
olasd closed D8768: Use json instead of msgpack for serializers.
Oct 25 2022, 4:58 PM
olasd committed rDSCHbde27a9e4262: Use json instead of msgpack for serializers (authored by olasd).
Use json instead of msgpack for serializers
Oct 25 2022, 4:58 PM
swh-public-ci added a comment to D8770: Ensure origins are not visited faster than twice a day.

Build is green

Oct 25 2022, 4:54 PM
swh-public-ci added a comment to D8769: Refresh task type data from the database every time recurrent tasks are run.

Build is green

Oct 25 2022, 4:54 PM
anlambert closed D8767: replay: Ensure proper removal of external paths when deleting directory.
Oct 25 2022, 4:51 PM
anlambert committed rDLDSVN8c709079ce28: replay: Ensure proper removal of external paths when deleting directory (authored by anlambert).
replay: Ensure proper removal of external paths when deleting directory
Oct 25 2022, 4:51 PM
olasd updated the diff for D8770: Ensure origins are not visited faster than twice a day.

Rebase on updated D8769

Oct 25 2022, 4:50 PM
olasd updated the diff for D8769: Refresh task type data from the database every time recurrent tasks are run.

Variabilize task_name

Oct 25 2022, 4:50 PM
lunar claimed T3087: Implement support for takedown notices (infra, admin tools, workflow).
Oct 25 2022, 4:48 PM · Roadmap 2022, meta-task, Roadmap 2021, Web app
swh-public-ci added a comment to D8770: Ensure origins are not visited faster than twice a day.

Build is green

Oct 25 2022, 4:48 PM
lunar added a subtask for T3087: Implement support for takedown notices (infra, admin tools, workflow): T4657: Allow object removal from journal.
Oct 25 2022, 4:48 PM · Roadmap 2022, meta-task, Roadmap 2021, Web app
lunar added a parent task for T4657: Allow object removal from journal: T3087: Implement support for takedown notices (infra, admin tools, workflow).
Oct 25 2022, 4:48 PM · Journal
lunar created T4657: Allow object removal from journal.
Oct 25 2022, 4:47 PM · Journal
olasd closed D8771: client: redact sensitive consumer settings before logging them.
Oct 25 2022, 4:45 PM
olasd committed rDJNL1d879f1dd624: client: redact sensitive consumer settings before logging them (authored by olasd).
client: redact sensitive consumer settings before logging them
Oct 25 2022, 4:45 PM
olasd updated the diff for D8770: Ensure origins are not visited faster than twice a day.

Add tests for the new absolute_cooldown

Oct 25 2022, 4:43 PM
vlorentz requested review of D8772: metadata_dictionary: Systematically check input URLs before adding to graph.
Oct 25 2022, 4:27 PM
anlambert added a comment to T4548: Add a public API endpoint and documentation to trigger Save Code Now from webhook.

I started looking how to implement that task.

Oct 25 2022, 4:22 PM · Web app
vlorentz accepted D8768: Use json instead of msgpack for serializers.

* sigh *

Oct 25 2022, 4:12 PM
vlorentz accepted D8769: Refresh task type data from the database every time recurrent tasks are run.

youmight want to add a variable for the value of f"load-{visit_type}", it's used four times now

Oct 25 2022, 4:08 PM
vlorentz accepted D8771: client: redact sensitive consumer settings before logging them.
Oct 25 2022, 4:06 PM
vlorentz accepted D8770: Ensure origins are not visited faster than twice a day.
Oct 25 2022, 4:05 PM
vlorentz added a revision to T4656: AttributeError: 'NoneType' object has no attribute 'endswith': D8772: metadata_dictionary: Systematically check input URLs before adding to graph.
Oct 25 2022, 4:03 PM · Indexer
vlorentz triaged T4656: AttributeError: 'NoneType' object has no attribute 'endswith' as Normal priority.
Oct 25 2022, 4:03 PM · Indexer
swh-sentry-integration assigned T4656: AttributeError: 'NoneType' object has no attribute 'endswith' to vlorentz.
Oct 25 2022, 4:02 PM · Indexer
olasd requested review of D8771: client: redact sensitive consumer settings before logging them.
Oct 25 2022, 4:00 PM
olasd requested review of D8770: Ensure origins are not visited faster than twice a day.
Oct 25 2022, 3:56 PM
olasd requested review of D8769: Refresh task type data from the database every time recurrent tasks are run.
Oct 25 2022, 3:56 PM
olasd requested review of D8768: Use json instead of msgpack for serializers.
Oct 25 2022, 3:56 PM
ardumont accepted D8767: replay: Ensure proper removal of external paths when deleting directory.

\o/

Oct 25 2022, 3:42 PM
anlambert created P1510 (An Untitled Masterwork).
Oct 25 2022, 3:32 PM
anlambert requested review of D8767: replay: Ensure proper removal of external paths when deleting directory.
Oct 25 2022, 3:25 PM
vlorentz updated the test plan for D8756: azure: Add tests based on Azurite in addition to mocks.
Oct 25 2022, 3:13 PM
vlorentz closed D8756: azure: Add tests based on Azurite in addition to mocks.
Oct 25 2022, 3:12 PM
vlorentz committed rDOBJSdf4be2d87c30: azure: Add tests based on Azurite in addition to mocks (authored by vlorentz).
azure: Add tests based on Azurite in addition to mocks
Oct 25 2022, 3:12 PM
franckbret closed D8766: Puppet: Artifacts as lists.
Oct 25 2022, 3:09 PM
franckbret committed rDLDBASEe6847f36162f: Puppet: Artifacts as lists (authored by franckbret).
Puppet: Artifacts as lists
Oct 25 2022, 3:09 PM
swh-public-ci added a comment to D8756: azure: Add tests based on Azurite in addition to mocks.

Build is green

Oct 25 2022, 3:09 PM
vlorentz closed D8764: metadata: Make default tool configuration follow swh.indexer versions.
Oct 25 2022, 3:08 PM
vlorentz committed rDCIDXa51cbf396593: metadata: Make default tool configuration follow swh.indexer versions (authored by vlorentz).
metadata: Make default tool configuration follow swh.indexer versions
Oct 25 2022, 3:08 PM
vlorentz updated the diff for D8756: azure: Add tests based on Azurite in addition to mocks.

remove leftover marker

Oct 25 2022, 3:05 PM
ardumont committed rDSNIPca822ba76bf5: nixguix: Reference the snippet of code to check dataset result (authored by ardumont).
nixguix: Reference the snippet of code to check dataset result
Oct 25 2022, 3:03 PM
swh-public-ci added a comment to D8756: azure: Add tests based on Azurite in addition to mocks.

Build is green

Oct 25 2022, 3:01 PM
anlambert accepted D8766: Puppet: Artifacts as lists.

Looks good to me, thanks !

Oct 25 2022, 3:00 PM
swh-public-ci added a comment to D8762: Puppet: Switch artifacts from dict to list.

Build is green

Oct 25 2022, 2:55 PM
vlorentz closed D8765: Install Azurite in base docker image.
Oct 25 2022, 2:53 PM
vlorentz committed rCDFJ72c947ae5313: Install Azurite (authored by vlorentz).
Install Azurite
Oct 25 2022, 2:53 PM
swh-public-ci added a comment to D8756: azure: Add tests based on Azurite in addition to mocks.

Build is green

Oct 25 2022, 2:51 PM
franckbret closed D8762: Puppet: Switch artifacts from dict to list.
Oct 25 2022, 2:50 PM
franckbret committed rDLS8355fee25f57: Puppet: Switch artifacts from dict to list (authored by franckbret).
Puppet: Switch artifacts from dict to list
Oct 25 2022, 2:50 PM
franckbret requested review of D8766: Puppet: Artifacts as lists.
Oct 25 2022, 2:50 PM
ardumont updated the task description for T3781: Replace the Nixguix loader with a lister.
Oct 25 2022, 2:50 PM · Data Model, Nixguix loader
franckbret updated the diff for D8762: Puppet: Switch artifacts from dict to list.

Rebase

Oct 25 2022, 2:50 PM
Harbormaster failed remote builds in B32561: Diff 31607 for D8756: azure: Add tests based on Azurite in addition to mocks!
Oct 25 2022, 2:46 PM