Query: Advanced Search

bump dep on swh.web.client to have recent SWHID-class changes

	Include stories about projects I am a member of.

LGTM too. I've also tried it locally, and it worked just fine, both interactively and when running the test suite.

migrate from SWHID to new CoreSWHID class

developer setup: uniform rendering of pip install (minor)

typing: make recent mypy (0.812) pass again

developer setup: simplify package installation by delegating to bin/install

bin/pip-swh-packages: remove [extras] markers before topo-sorting

bin/install: install packages one by one and enforce prerequisites

docs: add link to screencast and ICSE 2021 paper

docs: link to official SANER 2020 paper in the proceedings

@KShivendu it sounds like you haven't installed the various swh-* modules with pip -e. Did you follow the developer setup tutorial? Steps in there should take care of that.

In D5107#128987, @vlorentz wrote:

@zack added to D5081: https://forge.softwareheritage.org/D5081?vs=18237&id=18239#toc

if that's fine with you, I'll do the same here

I've only added a nitpicky comment about docstrings.

Thanks for this summary/status, very useful. Regarding goals, I think we want to have a read goal also about time to first bite, which is a performance metric which is particularly bad in the current filesystem-based object storage. Not sure what would be a reasonable goal though. Poke @olasd: any idea about a good target for this?

I'm not approving it (and only resigning as reviewer instead) just to allow others to have a look, given it's something quite foundational in the data model.

followed sequence of:

Size of SHA256, SWHID, Content
SHA256
SWHID
Content

The object storage is a collection of RBD images containing a sequence of objects (SHA256 + SWHID + content).

(I've finally caught up with the backlog in this task, sorry I'm late to the party.)

I'm requesting changes as the idea of hexlifying URLs behind the "swh:1:ori:" prefix is still being discussed (on the swh-devel list) and, for now, I'm against that specific part of the proposal.

In T2912#58062, @rdicosmo wrote:

Thanks @vsellier, that seems quite ok indeed. The only question left is to know if the estimator implemented is monotonic (i.e. we will never have negative bumps in the graph :-))

In T3008#58039, @olasd wrote:

I've written a basic documentation page : https://wiki.softwareheritage.org/wiki/HedgeDoc

@zack, do you see anything else you would like to have documented?

sql/blob-size-stats: scripts to compute basic stats about archive blob sizes

for context: this change is only going to be committed in a separate branch that we're using for experiments with swh-scanner, hence the testing requirements are different/lower than for master

ah, yes, of course, sorry I didn't notice this either in the previous diff :-)

Why do we still have tests that check error messages meant for human consumption, again? :-)
If the difference in how they fail is meaningful (which I don't know), they should raise different exceptions and we should just check their types.

check-in slides for Boa workshop

looks great, thanks !

CONTRIBUTORS: add Loïc Dachary

bin/update fails unless pre-commit is installed

Thanks for the bug report. Do you have a log of what happens upon bin/update if pre-commit is missing? (it's sure is needed for committing, but i'm surprised it's needed for bin/update).
Also, is it needed for a raw "bin/update" or only for "bin/update -i" (that also install Python dependencies?).

oh, also, this diff should close T2647 in its commit message

note that this diff should close T2595, so the commit message should be updated

My answer is a very emphatic YES.

In T2912#55860, @rdicosmo wrote:

In T2912#55849, @olasd wrote:

I think we should be able to decouple these counters completely from the loaders, and have them directly updated/handled by a client of the swh-journal. This would be a "centralized" component, but which we can parallelize quite heavily thanks to basic kafka design. We can also leverage the way kafka clients do parallelism to sidestep the locking issues arising in a potentially distributed filter.

Maybe my writing was not all that clear: I also had in mind a single centralised component (the ArchiveCounter) per Bloom filter, receiving the lists newcontents of ids from the loaders.
Getting the feed of ids from swh-journal instead of from the loaders is really neat: we avoid touching the loader code, and we gain a better capability of monitoring the load on the ArchiveCounter, so I'm all for it :-)

Advanced Search
Use Results
Edit Query
Hide Query

Mar 5 2021

Mar 4 2021

Mar 3 2021

Mar 1 2021

Feb 27 2021

Feb 26 2021

Feb 24 2021

Feb 19 2021

Feb 18 2021

Feb 16 2021

Feb 15 2021

Feb 13 2021

Feb 9 2021

Feb 8 2021

Feb 5 2021

Feb 4 2021

Feb 3 2021

Feb 2 2021

Feb 1 2021

Jan 30 2021

Jan 28 2021

Jan 26 2021

Jan 21 2021

Jan 20 2021

Jan 19 2021

Jan 18 2021

Jan 15 2021

Jan 14 2021

Jan 13 2021

Jan 11 2021

Jan 9 2021

Jan 8 2021

Jan 7 2021

Jan 6 2021

Jan 5 2021

Advanced SearchUse ResultsEdit QueryHide Query