Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 11 2020

douardda closed D4690: Encode datetime as Timestamp in msgpack.
Dec 11 2020, 10:09 AM
douardda updated the diff for D4690: Encode datetime as Timestamp in msgpack.

split the commit in 3 parts

Dec 11 2020, 9:40 AM

Dec 9 2020

douardda retitled D4698: Add support for ExtID in the storage from [WIP][RFC] Add hg revisions to the test data set to [WIP][RFC] Add a revision_id_from_vcs() method to the Storage.
Dec 9 2020, 5:17 PM
douardda updated the summary of D4698: Add support for ExtID in the storage.
Dec 9 2020, 5:14 PM
douardda added a revision to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders: D4698: Add support for ExtID in the storage.
Dec 9 2020, 5:13 PM · Storage manager
douardda created D4698: Add support for ExtID in the storage.
Dec 9 2020, 5:13 PM
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

fix and improve the long int encoding examples

Dec 9 2020, 11:07 AM
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Question: who should be responsible for filling this table? The loader or the storage (as side effect of revision_add)?

Dec 9 2020, 9:21 AM · Storage manager

Dec 8 2020

douardda created D4690: Encode datetime as Timestamp in msgpack.
Dec 8 2020, 5:13 PM
douardda committed rDJNLd8cd3f9bbeb8: Fix dependency on msgpack (>= 1.0.0) (authored by douardda).
Fix dependency on msgpack (>= 1.0.0)
Dec 8 2020, 5:09 PM
douardda added a comment to T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders.

Ok so the plan is a first step as simple as possible, implementing what @olasd proposed in the task, put this table in the storage, and provide a simple batch get API endpoint.

Dec 8 2020, 3:23 PM · Storage manager
douardda closed D4686: Remove dependency on swh.core.
Dec 8 2020, 2:59 PM
douardda committed rDJNL5c00685cda4a: Remove dependency on swh.core (authored by douardda).
Remove dependency on swh.core
Dec 8 2020, 2:59 PM
douardda committed rDCORE6492e6173e57: Get rid of the arrow datetime format in swh.core.api.serializers (authored by douardda).
Get rid of the arrow datetime format in swh.core.api.serializers
Dec 8 2020, 2:58 PM
douardda closed D4684: Get rid of the arrow datetime format in swh.core.api.serializers.
Dec 8 2020, 2:58 PM
douardda committed rDCORE5bbbba6197d8: Drop support for legacy msgpack encodings in msgpack_loads (authored by douardda).
Drop support for legacy msgpack encodings in msgpack_loads
Dec 8 2020, 2:58 PM
douardda closed D4683: Drop support for legacy msgpack encodings in msgpack_loads.
Dec 8 2020, 2:58 PM
douardda added inline comments to D4684: Get rid of the arrow datetime format in swh.core.api.serializers.
Dec 8 2020, 2:54 PM
douardda updated the diff for D4684: Get rid of the arrow datetime format in swh.core.api.serializers.

fix typing according to olasd's comment

Dec 8 2020, 2:51 PM
douardda added a comment to D4684: Get rid of the arrow datetime format in swh.core.api.serializers.

Cool, thanks ! Can you push a new tag for swh-core once landed ? This will make swh-web CI happy.

Dec 8 2020, 2:49 PM
douardda created D4686: Remove dependency on swh.core.
Dec 8 2020, 2:47 PM
douardda updated the diff for D4684: Get rid of the arrow datetime format in swh.core.api.serializers.

and requirements-http.txt

Dec 8 2020, 12:33 PM
douardda updated the diff for D4684: Get rid of the arrow datetime format in swh.core.api.serializers.

ci msg

Dec 8 2020, 12:31 PM
douardda created D4684: Get rid of the arrow datetime format in swh.core.api.serializers.
Dec 8 2020, 12:30 PM
douardda added a comment to P894 Command-Line Input.

after this patch I have:

Dec 8 2020, 12:20 PM
douardda updated the language for P894 Command-Line Input from autodetect to diff.
Dec 8 2020, 12:19 PM
douardda edited P894 Command-Line Input.
Dec 8 2020, 12:19 PM
douardda created P894 Command-Line Input.
Dec 8 2020, 12:18 PM
douardda created D4683: Drop support for legacy msgpack encodings in msgpack_loads.
Dec 8 2020, 11:51 AM
douardda added a comment to D4679: README: Explain the current caveats on elasticsearch 7.10.0.

Also the "Currently" in not appropriate, there is not any chance the 7.10.0 will be updated, so the I'd get rid of the "currently" in this caveat entry description.

Dec 8 2020, 11:04 AM
douardda added a comment to D4679: README: Explain the current caveats on elasticsearch 7.10.0.

Not sure the debian specific part is needed here. It's only a transient problem that will hit very few (most probably swh team mate only) people.

Dec 8 2020, 11:02 AM

Dec 7 2020

douardda accepted D4669: cli: Allow object-type declaration through cli or configuration.
Dec 7 2020, 4:07 PM
douardda accepted D4670: cli: Allow topic prefix declaration through cli or configuration.

not sure why this is needed but otherwise lgtm

Dec 7 2020, 4:06 PM
douardda accepted D4671: cli: Subscribe journal client to origin_visit_status.

(not sure why tests are red) lgtm

Dec 7 2020, 4:04 PM
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

fix indentation pbs

Dec 7 2020, 3:34 PM
douardda added a comment to T2827: Deploy an instance of hedgedoc.
In T2827#54141, @olasd wrote:

Just to be sure (I'm confused even after reading the history document), HedgeDoc is the *community* fork of the HackMD code base (which was named CodiMD), and has no relationship with the original upstream company running hackmd.io. Correct?

Dec 7 2020, 12:43 PM · System administration
douardda requested changes to D4673: Data persistence for a devel setting.
Dec 7 2020, 10:58 AM
douardda raised the priority of T2849: Design and implement a mapping from "original VCS ids" to SWHIDs to help incremental loaders from Normal to High.
Dec 7 2020, 10:30 AM · Storage manager

Dec 4 2020

douardda updated subscribers of D4649: HgLoaderFromDisk: Only load new commits.
Dec 4 2020, 5:24 PM
douardda accepted D4643: HgLoaderFromDisk: uneventful load when unchanged.
Dec 4 2020, 5:04 PM
douardda added inline comments to D4633: Add a documentation/specification of the journal messages formats.
Dec 4 2020, 5:02 PM
douardda retitled D4633: Add a documentation/specification of the journal messages formats from [WIP] Add a documentation/specification of the journal messages formats to Add a documentation/specification of the journal messages formats.
Dec 4 2020, 3:59 PM
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

improve the doc

Dec 4 2020, 3:59 PM
douardda closed T2834: Use msgpack extension types instead of custom swh encoders/decoders as Resolved.

done for the journal. We might want to do the same in the RPC low level stack (swh.core.api), some day (or replace this later by gRPC or so :-) )

Dec 4 2020, 3:12 PM · Journal
douardda committed rDJNL8dd97125df2d: Use msgpack Timestamp extension type to encode datetime objects in the journal (authored by douardda).
Use msgpack Timestamp extension type to encode datetime objects in the journal
Dec 4 2020, 2:48 PM
douardda closed D4655: Use msgpack extension type to encode datetime objects in the journal.
Dec 4 2020, 2:47 PM
douardda requested changes to D4649: HgLoaderFromDisk: Only load new commits.
Dec 4 2020, 2:38 PM
douardda requested changes to D4643: HgLoaderFromDisk: uneventful load when unchanged.

overall lgtm but see comments

Dec 4 2020, 2:26 PM
douardda updated the diff for D4655: Use msgpack extension type to encode datetime objects in the journal.

Add a few tests for msgpack codecs (dates and long integers)

Dec 4 2020, 10:45 AM
douardda updated the diff for D4655: Use msgpack extension type to encode datetime objects in the journal.

Improve (?) the commit message

Dec 4 2020, 10:11 AM
douardda updated the diff for D4655: Use msgpack extension type to encode datetime objects in the journal.

Simplify a bit the bw compat code (no need for "genericity" there)

Dec 4 2020, 10:06 AM
douardda updated the diff for D4655: Use msgpack extension type to encode datetime objects in the journal.

Use (poorly/not documented) timestapmp/datetime feature flags of mashpack to handle datetime objs

Dec 4 2020, 9:51 AM

Dec 3 2020

douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

forgot to save before commit...

Dec 3 2020, 3:14 PM
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

Update the documentation for (WIP) extended type based datetime encoding

Dec 3 2020, 3:11 PM
douardda added a revision to T2834: Use msgpack extension types instead of custom swh encoders/decoders: D4655: Use msgpack extension type to encode datetime objects in the journal.
Dec 3 2020, 1:14 PM · Journal
douardda created D4655: Use msgpack extension type to encode datetime objects in the journal.
Dec 3 2020, 1:14 PM
douardda committed rDSCH3c8707599ae1: Replace usage of arrow datetime objects in favor of pure datetime ones (authored by douardda).
Replace usage of arrow datetime objects in favor of pure datetime ones
Dec 3 2020, 10:59 AM
douardda closed D4642: Replace usage of arrow datetime objects in favor of pure datetime ones.
Dec 3 2020, 10:59 AM
douardda closed T2835: Get rid of the arrow library as Resolved by committing rDSCH3c8707599ae1: Replace usage of arrow datetime objects in favor of pure datetime ones.
Dec 3 2020, 10:59 AM · Scheduling utilities
douardda added a comment to T2837: Investigate using zfs snapshots for incremental database backups, instead of rsync.

can storage be easily added to the backup server?

Dec 3 2020, 10:45 AM · System administration
douardda updated the diff for D4642: Replace usage of arrow datetime objects in favor of pure datetime ones.

Replace usage of arrow for its humanize() feature by the humanize module

Dec 3 2020, 10:31 AM

Dec 2 2020

douardda added inline comments to D4642: Replace usage of arrow datetime objects in favor of pure datetime ones.
Dec 2 2020, 2:37 PM
douardda created P889 (An Untitled Masterwork).
Dec 2 2020, 12:33 PM
douardda added a revision to T2835: Get rid of the arrow library: D4642: Replace usage of arrow datetime objects in favor of pure datetime ones.
Dec 2 2020, 9:52 AM · Scheduling utilities
douardda created D4642: Replace usage of arrow datetime objects in favor of pure datetime ones.
Dec 2 2020, 9:52 AM

Dec 1 2020

douardda accepted D4541: Add content lru cache to HgLoaderFromDisk.
Dec 1 2020, 5:33 PM
douardda accepted D4540: Add tree diffing in HgLoaderFromDisk.

lgtm *but*:

  • you should remove the ".84" part in the timestamps (I first read 1h -> 42h)
  • I'd like the commit message to give a bit more details on what this diff really is doing (if I get the idea, keep the directory structure between 2 revisions and "apply the modifications" rather than rebuild the whole structure, right?)
Dec 1 2020, 5:27 PM
douardda triaged T2835: Get rid of the arrow library as High priority.
Dec 1 2020, 2:12 PM · Scheduling utilities
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

typos and (somes) fixes reported by vlorentz

Dec 1 2020, 12:33 PM
douardda added inline comments to D4633: Add a documentation/specification of the journal messages formats.
Dec 1 2020, 12:12 PM
douardda added a comment to D4633: Add a documentation/specification of the journal messages formats.

Oh, and you need to add something to explain anonymized topics

Dec 1 2020, 11:59 AM
douardda triaged T2834: Use msgpack extension types instead of custom swh encoders/decoders as High priority.
Dec 1 2020, 11:58 AM · Journal
douardda created T2834: Use msgpack extension types instead of custom swh encoders/decoders.
Dec 1 2020, 11:58 AM · Journal
douardda updated the diff for D4633: Add a documentation/specification of the journal messages formats.

typo

Dec 1 2020, 9:25 AM

Nov 30 2020

douardda added a comment to D4633: Add a documentation/specification of the journal messages formats.

Note: I have not made any proofreading yet, so it should be full of typos and errors...

Nov 30 2020, 5:22 PM
douardda added inline comments to D4633: Add a documentation/specification of the journal messages formats.
Nov 30 2020, 5:17 PM
douardda added a revision to T2818: Document the journal topics: D4633: Add a documentation/specification of the journal messages formats.
Nov 30 2020, 5:15 PM · Documentation
douardda created D4633: Add a documentation/specification of the journal messages formats.
Nov 30 2020, 5:15 PM
douardda accepted D3435: Add mercurial.from_disk.HgLoaderFromDisk.
Nov 30 2020, 11:44 AM
douardda accepted D4627: Support multiple log level specifications in the swh cli.

LGTM. I was wondering if a regexp would not be simpler for the validation step, but it would make error message more generic and less user friendly.

Nov 30 2020, 10:39 AM
douardda updated the task description for T2827: Deploy an instance of hedgedoc.
Nov 30 2020, 10:07 AM · System administration
douardda triaged T2827: Deploy an instance of hedgedoc as Wishlist priority.
Nov 30 2020, 9:54 AM · System administration

Nov 27 2020

douardda requested changes to D3435: Add mercurial.from_disk.HgLoaderFromDisk.

ok we are getting close I think. Just a few more comments to be handled and we are done.

Nov 27 2020, 4:38 PM
douardda raised the priority of T2818: Document the journal topics from Normal to High.
Nov 27 2020, 10:18 AM · Documentation
douardda added a comment to T2818: Document the journal topics.

I think the best would be to put this directly in swh-docs ( rDDOC ) because it's kind of "transverse" between swh-journal, swh-storage and swh-indexer.

Nov 27 2020, 9:59 AM · Documentation
douardda triaged T2818: Document the journal topics as Normal priority.
Nov 27 2020, 9:46 AM · Documentation

Nov 25 2020

douardda accepted D4595: api: Fix endpoint responses that must not be cached.

Thanks!
it's not completely clear why the @never_cache did not work any more, but meh.

Nov 25 2020, 4:14 PM
douardda created P880 (An Untitled Masterwork).
Nov 25 2020, 10:43 AM
douardda updated subscribers of T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish.

This may be better suited to a fix in the web API, via proper cache config headers, as @olasd mentioned on IRC (probably via https://docs.djangoproject.com/en/3.1/topics/cache/#downstream-caches )

Nov 25 2020, 10:42 AM · Web app, System administration
douardda added a project to T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish: Web app.
Nov 25 2020, 10:41 AM · Web app, System administration
douardda triaged T2810: API endpoint /vault/directory/<dirhash>/ should not be cached by varnish as High priority.
Nov 25 2020, 10:15 AM · Web app, System administration

Nov 24 2020

douardda edited P877 (An Untitled Masterwork).
Nov 24 2020, 2:19 PM
douardda edited P877 (An Untitled Masterwork).
Nov 24 2020, 2:19 PM
douardda created P877 (An Untitled Masterwork).
Nov 24 2020, 2:18 PM
douardda accepted D4568: Add support for large negative integers in msgpack encoding.

bit sad indeed, but LGTM

Nov 24 2020, 10:04 AM

Nov 23 2020

douardda accepted D4463: Remove the <external_identifier> tag from the protocol..

lgtm (besides the retries_left=3 stuff)

Nov 23 2020, 12:47 PM
douardda accepted D4550: requirements-test.txt: Explicit swh.core[db] as test requirement.
Nov 23 2020, 12:44 PM
douardda accepted D4554: requirements-test.txt: Drop no longer needed test dependency.

ok ok ok

Nov 23 2020, 12:42 PM
douardda accepted D4548: requirements-test.txt: Drop no longer needed pytest-postgresql requirement.
Nov 23 2020, 12:36 PM
douardda added a comment to D4554: requirements-test.txt: Drop no longer needed test dependency.

swh.core[db] is already a dep in requirements-swh.txt, why would is be needed here?

But more importantly, I disagree with the validity of the diff because pytest-posgtgresql is used directly in this package (in conftest.py), not only by the mean of the swh.core.tests.db package.

Nov 23 2020, 10:53 AM