Page MenuHomeSoftware Heritage
Feed Advanced Search

Mar 28 2019

zack removed a reviewer for D1295: prevent high memory usage: zack.
Mar 28 2019, 9:53 PM
zack requested changes to D1295: prevent high memory usage.

Oups, sorry, didn't mean to accept this, only to remove myself from reviewers.
I'll let @anlambert finish the actual review.

Mar 28 2019, 9:53 PM
D1295: prevent high memory usage is now accepted and ready to land.
Mar 28 2019, 9:52 PM

Mar 26 2019

zack updated subscribers of D1295: prevent high memory usage.
In D1295#27649, @zack wrote:

or, actually, we can just also add a fulltext index to URLs and be done with it https://www.postgresql.org/docs/11/textsearch-intro.html#TEXTSEARCH-MATCHING

Mar 26 2019, 9:03 PM
zack added a comment to D1295: prevent high memory usage.
In D1295#27648, @zack wrote:

@anlambert given we have a trigram index on origin URLs, have you ever tried to use the various similarity operators document at https://www.postgresql.org/docs/11/pgtrgm.html instead of generating all possible permutations for regexs?
I'm assuming (probably too naively) that you can just do a big select on the URLs, sorting by similarity and possibly filtering on a threshold to return meaningful results. But it's not like I've actually tested it…

Mar 26 2019, 8:44 PM
zack added a comment to D1295: prevent high memory usage.

@anlambert given we have a trigram index on origin URLs, have you ever tried to use the various similarity operators document at https://www.postgresql.org/docs/11/pgtrgm.html instead of generating all possible permutations for regexs?
I'm assuming (probably too naively) that you can just do a big select on the URLs, sorting by similarity and possibly filtering on a threshold to return meaningful results. But it's not like I've actually tested it…

Mar 26 2019, 8:38 PM
zack requested changes to D1295: prevent high memory usage.
Mar 26 2019, 7:50 PM

Mar 25 2019

zack triaged T1602: Analyze kakfa storage requirements as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1599: Analyze objstorage's Azure updateness as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1603: kafka storage backfiller as Normal priority.
Mar 25 2019, 3:07 PM · Journal, Sprint 2019 03
zack triaged T1601: Journal client of swh-storage mirrors as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1600: Write a storage backend that writes to kafka as Normal priority.
Mar 25 2019, 3:07 PM · Sprint 2019 03
zack triaged T1604: Improve kafka deployment as Normal priority.
Mar 25 2019, 3:07 PM · System administration, Sprint 2019 03
zack added a reviewer for D1286: swh-monthly-report: helper script to draft monthly activity team reports: Reviewers.
Mar 25 2019, 10:29 AM
zack added a reviewer for D1283: swh-weekly-report: new helper to write weekly reports: douardda.
Mar 25 2019, 10:29 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-monthly-report: helper script to draft monthly activity team reports
  • swh-monthly-report: filter on committer date
Mar 25 2019, 10:28 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: filter on committer date
Mar 25 2019, 10:28 AM

Mar 24 2019

zack added a comment to T808: phabricator lister.

Sure, just go ahead: there is no need to "reserve" tasks as a prerequisite to work on them. Just submit a diff against the lister repo as a diff when you've something ready to review :-)

Mar 24 2019, 1:09 PM · Easy hack, Phabricator forge

Mar 22 2019

zack created D1286: swh-monthly-report: helper script to draft monthly activity team reports.
Mar 22 2019, 4:05 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swhphab.py: do not crash when printing summary of repo-less diffs
  • swhphab.py: include status when printing task summaries
Mar 22 2019, 3:57 PM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: further refactoring/clean-up against swhphab.py
Mar 22 2019, 10:33 AM
zack updated the diff for D1283: swh-weekly-report: new helper to write weekly reports.
  • swh-weekly-report: split generic code to swhphab.py
Mar 22 2019, 10:23 AM

Mar 20 2019

Herald added a reviewer for D1283: swh-weekly-report: new helper to write weekly reports: Reviewers.
Mar 20 2019, 11:15 PM
zack created P374 (An Untitled Masterwork).
Mar 20 2019, 11:07 PM
zack reassigned T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from zack to vlorentz.
Mar 20 2019, 12:10 PM · Archive content, Indexer

Mar 18 2019

zack lowered the priority of T1590: lister: Update readme instructions to be able to run listers after a git clone from Unbreak Now! to Normal.
Mar 18 2019, 8:54 PM · GitHub lister

Mar 17 2019

zack added a comment to T1589: support RFC 7089 Memento headers.

here's an old mail of mine to -devel with additional context:

Mar 17 2019, 5:37 PM · Web app
zack triaged T1589: support RFC 7089 Memento headers as Wishlist priority.
Mar 17 2019, 5:35 PM · Web app

Mar 15 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

@vlorentz: lather, rinse, repeat.

softwareheritage-indexer=# DELETE FROM revision_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
ERROR:  deadlock detected
DETAIL:  Process 23900 waits for ShareLock on transaction 212164862; blocked by process 20175.
Process 20175 waits for ShareLock on transaction 212164381; blocked by process 23900.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (772424,55) in relation "revision_intrinsic_metadata"
Time: 33048,828 ms (00:33,049)

(just happened, after indexers have been restarted including D1218)

Mar 15 2019, 9:25 PM · Archive content, Indexer
nahimilega awarded T735: SourceForge lister a Like token.
Mar 15 2019, 7:34 PM · Origin-SourceForge
zack added a comment to D1248: Add support for keywords in PKG-INFO..

This function outputs JSON-LD arrays, which are unordered.

I don't think it's useful to deduplicate, as these keywords are written by a human, so duplicates would be intentional.

Mar 15 2019, 12:32 PM

Mar 12 2019

zack closed T565: embrace repository snapshot object in the data model (meta task) as Resolved.

unless i'm missing something, this has been completed a while ago (if not, please reopen, ideally adding the relevant open sub-task)

Mar 12 2019, 10:10 AM · General
zack closed T565: embrace repository snapshot object in the data model (meta task), a subtask of T887: Vault: "snapshot" cooker, as Resolved.
Mar 12 2019, 10:10 AM · Vault
zack closed T565: embrace repository snapshot object in the data model (meta task), a subtask of T531: Vault cookers, as Resolved.
Mar 12 2019, 10:10 AM · Vault

Mar 11 2019

zack renamed T1576: document the typical cost(s) of hosting an archive mirror from document the typical cost(s) of hosting a mirror to document the typical cost(s) of hosting an archive mirror.
Mar 11 2019, 6:12 PM · Documentation, Mirror
zack triaged T1576: document the typical cost(s) of hosting an archive mirror as Normal priority.
Mar 11 2019, 6:10 PM · Documentation, Mirror
zack renamed Mirror from Mirror tooling to Mirror.
Mar 11 2019, 6:07 PM
zack created Mirror.
Mar 11 2019, 6:06 PM
zack added a comment to T1349: Storage.content_find should return all matches, not just one..

Contact information are available on our GSoC wiki page (which is in turn linked from the GSoC portal).

Mar 11 2019, 5:57 PM · Easy hack, Storage manager

Mar 9 2019

zack added a comment to T1349: Storage.content_find should return all matches, not just one..
In T1349#29267, @Sowmya wrote:

can I get this task assigned by the administrator?

Mar 9 2019, 7:47 AM · Easy hack, Storage manager

Mar 8 2019

zack accepted D1221: readme: integrate the docker-based development setup from the main's doc.
Mar 8 2019, 12:13 PM
zack added inline comments to D1221: readme: integrate the docker-based development setup from the main's doc.
Mar 8 2019, 11:20 AM
zack accepted D1220: Refactor the getting started guides.
Mar 8 2019, 11:13 AM

Mar 6 2019

zack requested changes to D1221: readme: integrate the docker-based development setup from the main's doc.

Thanks for this doc refactoring too!

Mar 6 2019, 11:06 AM
zack requested changes to D1220: Refactor the getting started guides.

Great, thanks for this doc refactoring!

Mar 6 2019, 10:46 AM

Mar 4 2019

zack reopened T1564: drop out-of-date translations of the jobs page as "Open".

the job offer is now gone completely from the english page :-(

Mar 4 2019, 7:07 PM · Website
zack triaged T1564: drop out-of-date translations of the jobs page as High priority.
Mar 4 2019, 6:05 PM · Website
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

D1218

Once this is landed and deployed, ordering your DELETEs by revision_metadata.id will acquire locks in the same order as the idx_storage, solving the deadlock issue.

Mar 4 2019, 3:11 PM · Archive content, Indexer
zack accepted D1218: Prevent deadlocks by always updating items in the same order..
Mar 4 2019, 3:10 PM
zack committed rCDFDf7f358414f6d: objstorage conf: increase max payload size (authored by zack).
objstorage conf: increase max payload size
Mar 4 2019, 2:13 PM

Mar 2 2019

zack triaged T1562: ingest Caml Light/Heavy as Low priority.
Mar 2 2019, 6:01 PM · Archive coverage
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The update completed, but a first attempt at the second DELETE failed with a deadlock (?!):

softwareheritage-indexer=# DELETE FROM revision_metadata
WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;                                                                                            ERROR:  deadlock detected
DETAIL:  Process 10966 waits for ShareLock on transaction 197265813; blocked by process 11754.
Process 11754 waits for ShareLock on transaction 197264487; blocked by process 10966.
HINT:  See server log for query details.
CONTEXT:  while deleting tuple (1380733,15) in relation "revision_metadata"
Time: 170864,091 ms (02:50,864)
Mar 2 2019, 1:18 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

The following fix for the above (suggested by @vlorentz ) is now running:

update revision_metadata
set translated_metadata = origin_intrinsic_metadata.metadata
from origin_intrinsic_metadata
where revision_metadata.id=origin_intrinsic_metadata.from_revision and revision_metadata.translated_metadata='{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}' and origin_intrinsic_metadata.metadata != '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}';
Mar 2 2019, 9:53 AM · Archive content, Indexer
zack created P364 (An Untitled Masterwork).
Mar 2 2019, 9:50 AM

Mar 1 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

As discussed on IRC, even after cleaning up origin_intrinsic_metadata, the DELETE on revision_metadata fails with:

softwareheritage-indexer=# DELETE FROM revision_metadata
softwareheritage-indexer-# WHERE translated_metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::json
b ;
Mar 1 2019, 4:49 PM · Archive content, Indexer
zack claimed T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.
Mar 1 2019, 3:12 PM · Archive content, Indexer
zack changed the status of T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata from Open to Work in Progress.

I've started the first of following queries on somerset (in a screen of my user):

DELETE FROM origin_intrinsic_metadata
WHERE metadata = '{"@context": "https://doi.org/10.5063/schema/codemeta-2.0"}'::jsonb ;
Mar 1 2019, 2:51 PM · Archive content, Indexer
zack changed the visibility for P363 Masterwork From Distant Lands.
Mar 1 2019, 2:32 PM
zack updated the title for P363 Masterwork From Distant Lands from untitled to Masterwork From Distant Lands.
Mar 1 2019, 2:32 PM
zack accepted D1211: Fix warnings raised by sphinx on swh-core..
Mar 1 2019, 10:54 AM
zack accepted D1212: Fix syntax errors in the documentation..
Mar 1 2019, 10:53 AM
zack accepted D1206: Prevent origin metadata indexer from writing empty records.

LGTM, please just add a comment on the test case (as discussed in the review) before landing

Mar 1 2019, 10:51 AM
zack accepted D1205: Add API to delete metadata entries..

LGTM

Mar 1 2019, 10:46 AM
zack lowered the priority of T1544: archive graphs stopped being updated a while ago from Unbreak Now! to High.
Mar 1 2019, 8:45 AM · Website, Web app

Feb 28 2019

zack added a comment to T1512: Add Web Labels to state the license of the JavaScript we distribute.

great, thanks!

Feb 28 2019, 6:40 PM · Web app
zack added a comment to T1512: Add Web Labels to state the license of the JavaScript we distribute.

@anlambert: did you follow-up on list and/or to @singpolyma about the solution you've adopted?

Feb 28 2019, 6:24 PM · Web app

Feb 26 2019

zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

I don't understand your comment. What are the remaining arguments for using NULL instead of just deleting rows?

Feb 26 2019, 6:42 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

Yes :-)
so, do we agree that the right fix for this task is just to get rid of empty-ish rows? or are there other arguments that we haven't considered yet?

Feb 26 2019, 5:41 PM · Archive content, Indexer
zack added a comment to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata.

What is the provenance map?

Feb 26 2019, 3:12 PM · Archive content, Indexer
zack added a project to T1549: Clean up entries in {origin_intrinsic,revision}_metadata with no metadata: Archive content.

My tentative proposal is to delete all table entries for which no metadata has been found.
The invariant will be: if an origin/revision has metadata, there will be an entry in the table(s); if not, the origin/revision will not appear.

Feb 26 2019, 12:53 PM · Archive content, Indexer
zack added a comment to T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary..

(but of course if you want to have a CLI tool to generate the info, sure; I just wanted to highlight here that the end goal is the doc)

Feb 26 2019, 11:02 AM · Metadata workflow, Indexer
zack added a comment to T1545: Add a CLI tool to list all fields that may be outputted by metadata_dictionary..

More than a CLI tool, I'd like to have documentation about how to use the CodeMeta metadata that we extract, sort of "typing information" for the content of the various intrinsic metadata tables.
It might be something as simple as:

Feb 26 2019, 11:01 AM · Metadata workflow, Indexer
zack updated subscribers of D1193: Add an helper function to list all origins in the storage..

@anlambert recently added a list origins method to the Web API. I'm pinging him here to make sure there is no overlap and/or that there is code to be reused/refactored related to this proposed change.

Feb 26 2019, 10:58 AM

Feb 25 2019

zack renamed T1544: archive graphs stopped being updated a while ago from archive counters stopped being updated a while ago to archive graphs stopped being updated a while ago.
Feb 25 2019, 7:45 PM · Website, Web app
zack triaged T1544: archive graphs stopped being updated a while ago as Unbreak Now! priority.
Feb 25 2019, 5:35 PM · Website, Web app
zack added a member for Staff: marla.dasilva.
Feb 25 2019, 1:39 PM
zack removed a member for Staff: fiendish.
Feb 25 2019, 1:39 PM
zack removed a member for Staff: mollydb.
Feb 25 2019, 1:38 PM
zack added a comment to T1523: Search tools on metadata.

That's the impression i got from testing. Either way, the current UI & semantics are bad, the proposed ones would be much better.

Feb 25 2019, 9:57 AM · meta-task, Restricted Project, Metadata workflow

Feb 23 2019

zack added a comment to T1523: Search tools on metadata.

I've added an item to the above list (metadata-only search); I think the ideal UI would be a single form with two checkboxes under it, one enabling URL-based search (enabled by default), one enabling metadata-based search (disabled by default).

Feb 23 2019, 2:15 PM · meta-task, Restricted Project, Metadata workflow
zack updated the task description for T1523: Search tools on metadata.
Feb 23 2019, 2:13 PM · meta-task, Restricted Project, Metadata workflow
zack committed R65:2bed324bed30: simplify user-message for revision ordering choice (authored by zack).
simplify user-message for revision ordering choice
Feb 23 2019, 1:57 AM
zack committed R65:52afd71c010a: spelling fixes in comments (authored by zack).
spelling fixes in comments
Feb 23 2019, 1:57 AM
zack committed R65:23e71a657a06: docs: add title and brief module description (authored by zack).
docs: add title and brief module description
Feb 23 2019, 1:57 AM
zack committed R65:b5e0537aced4: dev doc: fix dangling ref to webapp/webapp.yml (authored by zack).
dev doc: fix dangling ref to webapp/webapp.yml
Feb 23 2019, 1:57 AM
zack committed R65:f738b47528b3: minor improvements to the titles of main web views (authored by zack).
minor improvements to the titles of main web views
Feb 23 2019, 1:57 AM
zack committed R65:2c151c5ad56b: fix a bunch of typos (authored by zack).
fix a bunch of typos
Feb 23 2019, 1:57 AM
zack committed R65:60af24b871cc: git ignore npm cruft (authored by zack).
git ignore npm cruft
Feb 23 2019, 1:57 AM
zack committed R65:2ed370ca0e35: fix typo in docstrings/comments (tnx codespell) (authored by zack).
fix typo in docstrings/comments (tnx codespell)
Feb 23 2019, 1:56 AM
zack committed R65:09d740f9bdef: docs: add absolute anchor to documentation index (authored by zack).
docs: add absolute anchor to documentation index
Feb 23 2019, 1:56 AM
zack committed R65:a76713a6eef1: docs: enable httpdomain sphinx extension (authored by zack).
docs: enable httpdomain sphinx extension
Feb 23 2019, 1:56 AM
zack committed R65:00de27cc80d3: api.service: fix docstring to avoid bogus cross-ref to "default" (authored by zack).
api.service: fix docstring to avoid bogus cross-ref to "default"
Feb 23 2019, 1:56 AM
zack committed R65:8aaea2f5d134: sanitize docstrings for sphinx (authored by zack).
sanitize docstrings for sphinx
Feb 23 2019, 1:56 AM
zack committed R65:8f1188f45cbf: docs/: add sphinx apidoc generation skeleton (authored by zack).
docs/: add sphinx apidoc generation skeleton
Feb 23 2019, 1:56 AM
zack committed R65:4dd525745214: API doc: add warning about API instability (authored by zack).
API doc: add warning about API instability
Feb 23 2019, 1:56 AM
zack added a reverting change for R65:8b283648aa4c: css: make footer colors reasonable: R65:8023e562c1d7: API doc: make footer link font colors be uniform.
Feb 23 2019, 1:56 AM
zack committed R65:8023e562c1d7: API doc: make footer link font colors be uniform (authored by zack).
API doc: make footer link font colors be uniform
Feb 23 2019, 1:56 AM
zack committed R65:e77f375133a1: API doc: add usual copyright/license/contact footer (authored by zack).
API doc: add usual copyright/license/contact footer
Feb 23 2019, 1:56 AM
zack committed R65:c2e9f5c5ed23: API doc CSS: remove inline-block hack, fixing non-clickable TOC items (authored by zack).
API doc CSS: remove inline-block hack, fixing non-clickable TOC items
Feb 23 2019, 1:56 AM
zack committed R65:a83afcac339e: API doc: vertically distantiate jquery search box and preceding text (authored by zack).
API doc: vertically distantiate jquery search box and preceding text
Feb 23 2019, 1:56 AM