Page MenuHomeSoftware Heritage
Feed All Stories

Sep 14 2021

ardumont updated the diff for D6240: Use extids to filter out already seen revisions across hg origins.

Explicit that the current behavior does not compute another snapshot when nothing
changes (thus everything gets filtered out).

Sep 14 2021, 2:16 PM
anlambert added a comment to D6252: package/utils: Improve downloaded filename extraction.
In D6252#161761, @olasd wrote:

Looks like the format you're expecting for the content-disposition header isn't quite standards-compliant.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition says the content-disposition filename entry is supposed to be a quoted string.

Sep 14 2021, 2:14 PM
olasd requested changes to D6252: package/utils: Improve downloaded filename extraction.

Looks like the format you're expecting for the content-disposition header isn't quite standards-compliant.

Sep 14 2021, 2:07 PM
anlambert requested review of D6254: package/tests/test_utils: Remove code duplication.
Sep 14 2021, 2:03 PM
ardumont accepted D6252: package/utils: Improve downloaded filename extraction.

Awesome. Thanks a lot.

Sep 14 2021, 2:03 PM
ardumont added a revision to T3468: staging: current opam loading issues: D6252: package/utils: Improve downloaded filename extraction.
Sep 14 2021, 1:59 PM · System administration, Opam
ardumont updated the summary of D6252: package/utils: Improve downloaded filename extraction.
Sep 14 2021, 1:59 PM
anlambert requested review of D6252: package/utils: Improve downloaded filename extraction.
Sep 14 2021, 1:58 PM
stsp requested review of D6253: add CVS loader to the swh-loader.rst index.
Sep 14 2021, 1:56 PM
ardumont added inline comments to D6240: Use extids to filter out already seen revisions across hg origins.
Sep 14 2021, 1:46 PM
ardumont updated the summary of D6240: Use extids to filter out already seen revisions across hg origins.
Sep 14 2021, 1:44 PM
ardumont updated the summary of D6240: Use extids to filter out already seen revisions across hg origins.
Sep 14 2021, 1:42 PM
swh-public-ci added a comment to D6240: Use extids to filter out already seen revisions across hg origins.

Build is green

Sep 14 2021, 12:28 PM
ardumont updated the diff for D6240: Use extids to filter out already seen revisions across hg origins.

Adapt test so it's currently a fork that is loaded

Sep 14 2021, 12:24 PM
douardda committed rDENVb0f07795ddff: docker: Document how to consume kafka topics from the host (authored by douardda).
docker: Document how to consume kafka topics from the host
Sep 14 2021, 11:40 AM
douardda closed D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:40 AM
douardda committed rDENVf612427f663d: docker: allow kafka to be consumed from the host (authored by douardda).
docker: allow kafka to be consumed from the host
Sep 14 2021, 11:40 AM
swh-public-ci added a comment to D6240: Use extids to filter out already seen revisions across hg origins.

Build is green

Sep 14 2021, 11:39 AM
ardumont updated the summary of D6240: Use extids to filter out already seen revisions across hg origins.
Sep 14 2021, 11:38 AM
douardda closed D6247: Commit kafka messages which offset has reach the high limit.

closed by 94be817f869409c64415b181824071d2998e33d5

Sep 14 2021, 11:38 AM
ardumont updated the diff for D6240: Use extids to filter out already seen revisions across hg origins.

Revert 'drop the flush instruction'

Sep 14 2021, 11:37 AM
douardda closed D6246: Add a JournalClientOffsetRanges.unsubscribe() method.

closed by a3c1f39013bae1a6982140d51d8bb443dc1b5c9c

Sep 14 2021, 11:37 AM
douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Keep port 5092 exposed on host

Sep 14 2021, 11:35 AM
Harbormaster failed to build B23569: rDSTO1c8337fd4834: migrate_extrinsic_metadata: Fix crash on deposit hal-02355563 for rDSTO1c8337fd4834: migrate_extrinsic_metadata: Fix crash on deposit hal-02355563!
Sep 14 2021, 11:23 AM
douardda committed rDDATASET94be817f8694: Commit kafka messages which offset has reach the high limit (authored by douardda).
Commit kafka messages which offset has reach the high limit
Sep 14 2021, 11:23 AM
douardda committed rDDATASETa3c1f39013ba: Add a JournalClientOffsetRanges.unsubscribe() method (authored by douardda).
Add a JournalClientOffsetRanges.unsubscribe() method
Sep 14 2021, 11:22 AM
douardda added inline comments to D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:21 AM
vlorentz committed rDSTO589d20ed64b7: migrate_extrinsic_metadata: Fix missing f-stringification (authored by vlorentz).
migrate_extrinsic_metadata: Fix missing f-stringification
Sep 14 2021, 11:15 AM
vlorentz closed D6242: migrate_extrinsic_metadata: Fix crash on deposit hal-02355563.
Sep 14 2021, 11:15 AM
vlorentz committed rDSTO1c8337fd4834: migrate_extrinsic_metadata: Fix crash on deposit hal-02355563 (authored by vlorentz).
migrate_extrinsic_metadata: Fix crash on deposit hal-02355563
Sep 14 2021, 11:15 AM
vlorentz closed D6241: migrate_extrinsic_metadata: Fix remaining pypi issues.
Sep 14 2021, 11:14 AM
vlorentz committed rDSTO3315738be9f6: migrate_extrinsic_metadata: Fix remaining pypi issues (authored by vlorentz).
migrate_extrinsic_metadata: Fix remaining pypi issues
Sep 14 2021, 11:14 AM
vlorentz added inline comments to D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:13 AM
ardumont accepted D6241: migrate_extrinsic_metadata: Fix remaining pypi issues.
Sep 14 2021, 11:08 AM
vlorentz accepted D6246: Add a JournalClientOffsetRanges.unsubscribe() method.

thx

Sep 14 2021, 11:05 AM
ardumont accepted D6248: docker: allow kafka to be consumed from the host.
Sep 14 2021, 11:03 AM
ardumont closed T3538: Send scheduler metrics to prometheus, a subtask of T2345: Improve handling of recurrent loading tasks in scheduler, as Resolved.
Sep 14 2021, 11:00 AM · Sprint 2021 01, Archive coverage, Scheduling utilities
ardumont closed T3538: Send scheduler metrics to prometheus as Resolved.
Sep 14 2021, 11:00 AM · System administration, Monitoring, Scheduling utilities
ardumont moved T3538: Send scheduler metrics to prometheus from in-progress to deployed/landed/monitoring on the System administration board.
Sep 14 2021, 11:00 AM · System administration, Monitoring, Scheduling utilities
ardumont created P1160 CocoaPods/Specs runs.
Sep 14 2021, 10:53 AM
ardumont updated the task description for T3468: staging: current opam loading issues.
Sep 14 2021, 10:51 AM · System administration, Opam
anlambert closed D6239: package/utils: Add FTP protocol support to download function.
Sep 14 2021, 10:45 AM
anlambert committed rDLDBASEd5e54a5eea1e: package/utils: Add FTP protocol support to download function (authored by anlambert).
package/utils: Add FTP protocol support to download function
Sep 14 2021, 10:45 AM
swh-public-ci added a comment to D6239: package/utils: Add FTP protocol support to download function.

Build is green

Sep 14 2021, 10:43 AM
ardumont added a comment to T3468: staging: current opam loading issues.

P1158 and P1159 with some updated errors from the last run.

I'll udpate those tomorrow as it's still ongoing.

Sep 14 2021, 10:41 AM · System administration, Opam
anlambert updated the diff for D6239: package/utils: Add FTP protocol support to download function.

Rebase

Sep 14 2021, 10:41 AM
ardumont edited P1159 load-failing-on-unpacking.
Sep 14 2021, 10:36 AM
ardumont edited P1158 20210913-load-failing-on-404.txt.
Sep 14 2021, 10:34 AM
ardumont edited P1158 20210913-load-failing-on-404.txt.
Sep 14 2021, 10:31 AM
ardumont accepted D6239: package/utils: Add FTP protocol support to download function.
Sep 14 2021, 10:14 AM
rdicosmo moved T3536: Blog post Easter Eggs NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 14 2021, 9:29 AM · Unknown Object (Project)
rdicosmo moved T3535: Blog post Octobus NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 14 2021, 9:29 AM · Unknown Object (Project)
ardumont added inline comments to D6240: Use extids to filter out already seen revisions across hg origins.
Sep 14 2021, 9:12 AM
swh-public-ci added a comment to D6133: maven-lister: initialise lister..

Build is green

Sep 14 2021, 8:36 AM
borisbaldassari updated the diff for D6133: maven-lister: initialise lister..
  • maven-lister: Fix tests (review D6133)
Sep 14 2021, 8:33 AM
marla.dasilva moved T3535: Blog post Octobus NLNet from Restricted Project Column to Restricted Project Column on the Unknown Object (Project) board.
Sep 14 2021, 8:17 AM · Unknown Object (Project)

Sep 13 2021

ardumont added a comment to T3468: staging: current opam loading issues.

P1158 and P1159 with some updated errors from the last run.

Sep 13 2021, 6:44 PM · System administration, Opam
ardumont abandoned D6250: wip: Allow download to follow redirection to fetch more tarballs.
Sep 13 2021, 6:37 PM
ardumont added a comment to D6250: wip: Allow download to follow redirection to fetch more tarballs.

great, let's close this then.

Sep 13 2021, 6:36 PM
olasd added a comment to D6250: wip: Allow download to follow redirection to fetch more tarballs.
In D6250#161647, @olasd wrote:

requests.get already follows redirects by default. I believe that this boolean only applies to POST/PUT/DELETE requests.

Sep 13 2021, 6:22 PM
olasd added a comment to D6250: wip: Allow download to follow redirection to fetch more tarballs.

requests.get already follows redirects by default. I believe that this boolean only applies to POST/PUT/DELETE requests.

Sep 13 2021, 6:21 PM
ardumont requested review of D6250: wip: Allow download to follow redirection to fetch more tarballs.
Sep 13 2021, 6:20 PM
ardumont added a revision to T3468: staging: current opam loading issues: D6250: wip: Allow download to follow redirection to fetch more tarballs.
Sep 13 2021, 6:18 PM · System administration, Opam
ardumont added a comment to T3468: staging: current opam loading issues.

P1158 and P1159 with some updated errors from the last run.

Sep 13 2021, 6:17 PM · System administration, Opam
ardumont edited P1158 20210913-load-failing-on-404.txt.
Sep 13 2021, 6:15 PM
ardumont edited P1159 load-failing-on-unpacking.
Sep 13 2021, 6:14 PM
ardumont created P1159 load-failing-on-unpacking.
Sep 13 2021, 6:10 PM
ardumont edited P1158 20210913-load-failing-on-404.txt.
Sep 13 2021, 6:07 PM
ardumont created P1158 20210913-load-failing-on-404.txt.
Sep 13 2021, 5:59 PM
ardumont planned changes to D6249: Allow filtering extids per extid_version/extid_type when reading.

cassandra implementation for extid_get_from_target needs to be changed to actually allow filtering on both extid_type and extid_version.

Sep 13 2021, 5:46 PM
anlambert added a project to T3570: Upgrade python3-pkginfo debian package to latest upstream version: System administration.
Sep 13 2021, 5:27 PM · System administration, PyPI loader
ardumont requested review of D6249: Allow filtering extids per extid_version/extid_type when reading.
Sep 13 2021, 5:22 PM
douardda committed rDDATASET0425bdea0789: Fix a missing f-string prefix (authored by douardda).
Fix a missing f-string prefix
Sep 13 2021, 5:17 PM
ardumont added a revision to T3567: storage: Allow extid reading with filter on extid version: D6249: Allow filtering extids per extid_version/extid_type when reading.
Sep 13 2021, 5:15 PM · System administration, Mercurial loader
douardda updated the diff for D6248: docker: allow kafka to be consumed from the host.

Add a bit of documentation in the README file on how to consume kafka from the host

Sep 13 2021, 5:13 PM
douardda requested review of D6248: docker: allow kafka to be consumed from the host.
Sep 13 2021, 4:51 PM
anlambert closed D6245: package/pypi: Handle missing Version field in PKG-INFO file.
Sep 13 2021, 4:31 PM
anlambert committed rDLDBASE0efaf7a0ef96: package/pypi: Handle missing Version field in PKG-INFO file (authored by anlambert).
package/pypi: Handle missing Version field in PKG-INFO file
Sep 13 2021, 4:31 PM
anlambert triaged T3570: Upgrade python3-pkginfo debian package to latest upstream version as Normal priority.
Sep 13 2021, 4:29 PM · System administration, PyPI loader
douardda abandoned D6234: Add a --reset option to export_graph cli tool.

It's not worth the trouble, and there is a better solution (server-side)

Sep 13 2021, 4:23 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

You could also add a command in swh-dataset's entrypoint.sh that calls whatever Kafka's script does

Sep 13 2021, 4:20 PM
vlorentz added a comment to D6234: Add a --reset option to export_graph cli tool.

You could also add a command in swh-dataset's entrypoint.sh that calls whatever Kafka's script does

Sep 13 2021, 4:18 PM
swh-public-ci added a comment to D6245: package/pypi: Handle missing Version field in PKG-INFO file.

Build is green

Sep 13 2021, 4:16 PM
moranegg accepted D6201: Add an overview of the metadata workflow.
Sep 13 2021, 4:15 PM
anlambert updated the diff for D6245: package/pypi: Handle missing Version field in PKG-INFO file.

Improve test assertion

Sep 13 2021, 4:13 PM
douardda added a comment to D6234: Add a --reset option to export_graph cli tool.

So either I kill this diff or it stays "intricate" with the setup of the consumer (so the whole journalprocessor.py)

Note: this feature is mainly useful for testing purpose IMHO, so I suppose it's not that critical to keep it, I just find it handy when "playing" with swh dataset export

Meh. How much easier does it make testing, compared to using Kafka's CLI (from the linked comment)?

Sep 13 2021, 4:11 PM
swh-public-ci added a comment to D6234: Add a --reset option to export_graph cli tool.

Build is green

Sep 13 2021, 4:07 PM
douardda updated the diff for D6234: Add a --reset option to export_graph cli tool.

rebase

Sep 13 2021, 4:05 PM
vlorentz accepted D6247: Commit kafka messages which offset has reach the high limit.
Sep 13 2021, 4:05 PM
douardda requested review of D6247: Commit kafka messages which offset has reach the high limit.
Sep 13 2021, 4:04 PM
douardda abandoned D6235: Commit kafka messages wich offset has reach the high limit.

in favor of D6247 because phab/arcanist won't let me update this later any more (sorry)

Sep 13 2021, 4:04 PM
douardda requested review of D6246: Add a JournalClientOffsetRanges.unsubscribe() method.
Sep 13 2021, 4:02 PM
vlorentz accepted D6245: package/pypi: Handle missing Version field in PKG-INFO file.
Sep 13 2021, 3:57 PM
douardda committed rDDATASET358d84938d01: Reduce the size of the progress bar (authored by douardda).
Reduce the size of the progress bar
Sep 13 2021, 3:33 PM
douardda closed D6233: Make sure the progress bar for the export reaches 100%.
Sep 13 2021, 3:33 PM
douardda committed rDDATASET47713ee38c94: Make sure the progress bar for the export reaches 100% (authored by douardda).
Make sure the progress bar for the export reaches 100%
Sep 13 2021, 3:33 PM
douardda committed rDDATASET2760e322af7c: Simplify the lo/high partition offset computation (authored by douardda).
Simplify the lo/high partition offset computation
Sep 13 2021, 3:33 PM
douardda committed rDDATASETd07b2a632256: Explicitly close the temporary kafka consumer in `get_offsets` (authored by douardda).
Explicitly close the temporary kafka consumer in `get_offsets`
Sep 13 2021, 3:33 PM
douardda closed D6232: Simplify the lo/high partition offset computation.
Sep 13 2021, 3:33 PM
douardda committed rDDATASETe47a3db1287b: Use proper signature for JournalClientOffsetRanges.process() (authored by douardda).
Use proper signature for JournalClientOffsetRanges.process()
Sep 13 2021, 3:33 PM