Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 1 2017

ardumont created P197 errors in indexer when uffizi's disk's full -> OSError: [Errno 28] No space left on device.
Dec 1 2017, 1:52 PM · System administrators, Indexer, Storage manager

Nov 13 2017

ardumont closed T823: Gitorious import: Overflow error in revision time as Resolved by committing rDLDG120f23dd0bf2: swh.loader.git.disk: Force further checks on objects.
Nov 13 2017, 6:40 PM · Origin-Gitorious, Storage manager, Git loader

Nov 10 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 10 2017, 6:29 PM · Origin-Gitorious, Storage manager, Git loader
olasd added a revision to T567: adapt SQL storage for repository snapshot objects: D268: Add snapshot models.
Nov 10 2017, 6:06 PM · Storage manager

Nov 6 2017

olasd added a parent task for T698: Migrate the content store to a new (internal) primary key scheme: T835: Migrate away from using sha1s as foreign keys in the database.
Nov 6 2017, 2:38 PM · Object storage, Storage manager

Nov 5 2017

olasd added a parent task for T830: Remove tables occurrence and occurrence_history: T565: embrace repository snapshot object in the data model (meta task).
Nov 5 2017, 9:29 PM · Storage manager, Archive content
olasd created T830: Remove tables occurrence and occurrence_history.
Nov 5 2017, 9:28 PM · Storage manager, Archive content
zack added a comment to T829: Remove duplication between fetch_history and origin_visit.

(agreed)

Nov 5 2017, 8:27 PM · Storage manager, Archive content
olasd triaged T829: Remove duplication between fetch_history and origin_visit as Normal priority.
Nov 5 2017, 7:53 PM · Storage manager, Archive content

Nov 4 2017

olasd created T829: Remove duplication between fetch_history and origin_visit.
Nov 4 2017, 3:28 PM · Storage manager, Archive content
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 4 2017, 12:58 PM · Origin-Gitorious, Storage manager, Git loader

Oct 31 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Follow up on this:

Oct 31 2017, 3:38 PM · Origin-Gitorious, Storage manager, Git loader

Oct 27 2017

ardumont triaged T823: Gitorious import: Overflow error in revision time as Normal priority.
Oct 27 2017, 2:26 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

The revision in question is:

Oct 27 2017, 2:18 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Debugging some more, the date generating this error is the following, which raises indeed the initial overflow error:

Oct 27 2017, 2:10 PM · Origin-Gitorious, Storage manager, Git loader
ardumont created T823: Gitorious import: Overflow error in revision time.
Oct 27 2017, 2:09 PM · Origin-Gitorious, Storage manager, Git loader

Oct 26 2017

ardumont created T818: indexer DB should not use bytea for mimetype and encoding columns.
Oct 26 2017, 12:37 PM · Storage manager, Indexer

Oct 9 2017

olasd closed T482: First swh-storage-archiver run to catch up uffizi as Resolved.

It's been running for a while now :)

Oct 9 2017, 11:51 AM · Storage manager
olasd closed T482: First swh-storage-archiver run to catch up uffizi, a subtask of T240: content archiver, as Resolved.
Oct 9 2017, 11:51 AM · Storage manager

Sep 18 2017

olasd added a comment to T760: swh api clients often fail with a BadStatusLine exception.

As a proof of concept nginx has been manually deployed to uffizi on port 15003. It does alleviate the BadStatusLine issues the archiver encountered before under high load. This "just" needs to be properly deployed.

Sep 18 2017, 8:50 PM · Vault, Object storage, Storage manager

Sep 15 2017

zack added a comment to T727: Provide a version of swh_directory_walk/_one without the join on the contents table.
Sep 15 2017, 10:37 AM · Storage manager
zack closed T111: Performance analysis of read queries as Wontfix.

looks like this is no longer of interest

Sep 15 2017, 10:02 AM · Storage manager
zack closed T598: Store content -> revision cache in azure table storage as Wontfix.

we're taking a different route for this now, based on @grouss WIP

Sep 15 2017, 9:58 AM · Storage manager
zack placed T454: object storage backend that can read from/write to S3 up for grabs.
Sep 15 2017, 9:55 AM · Storage manager

Sep 13 2017

olasd created T760: swh api clients often fail with a BadStatusLine exception.
Sep 13 2017, 12:33 PM · Vault, Object storage, Storage manager

Sep 11 2017

ardumont renamed T757: Memory leak in swh.storage.api.server from Memroy leak in swh.storage.api.server to Memory leak in swh.storage.api.server.
Sep 11 2017, 4:37 PM · Storage manager
olasd created T757: Memory leak in swh.storage.api.server.
Sep 11 2017, 12:15 PM · Storage manager

Jun 6 2017

ardumont closed T721: Improve license indexer's unknown license policy as Resolved.
Jun 6 2017, 6:26 PM · Indexer, Storage manager
ardumont updated the task description for T721: Improve license indexer's unknown license policy.
Jun 6 2017, 2:26 PM · Indexer, Storage manager

Jun 2 2017

olasd created T727: Provide a version of swh_directory_walk/_one without the join on the contents table.
Jun 2 2017, 4:07 PM · Storage manager

May 29 2017

ardumont created T721: Improve license indexer's unknown license policy.
May 29 2017, 11:01 AM · Indexer, Storage manager

Apr 26 2017

ardumont closed T703: Make the loaders compute blake2s256 hash for new contents as Resolved.
Apr 26 2017, 11:43 AM · Data Model, Storage manager
ardumont added a parent task for T703: Make the loaders compute blake2s256 hash for new contents: T692: worker to efficiently (re)compute content blob checksums.
Apr 26 2017, 11:43 AM · Data Model, Storage manager

Apr 5 2017

zack moved T687: swh-storage: consider preventing bogus permission values from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Apr 5 2017, 2:05 PM · Storage manager, Restricted Project

Mar 31 2017

ardumont closed D200: Deal with new checksum blake2s256 in storage by committing rDSTOc94ba898a288: d/control: Add python3-swh.journal dependency with version.
Mar 31 2017, 12:28 PM · Storage manager
ardumont added a comment to T49: DB schema: add missing unicity constraint on origin (type, url).

Ok, i've taken a closer look at the duplications. So far, only duplication in origins with type 'ftp' (from gnu injection) and 'git':

Mar 31 2017, 10:46 AM · Restricted Project, Storage manager

Mar 30 2017

Harbormaster failed remote builds in B842: Diff 668 for D200: Deal with new checksum blake2s256 in storage!
Mar 30 2017, 2:21 PM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.

Do not use yet the new column blake2s256 for filtering (content_add, skipped_content_add)

Mar 30 2017, 2:21 PM · Storage manager

Mar 29 2017

ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.

Fixes according to latest review:

  • fix unique key computation (using tuple)
  • fix sql about missing default value (on new column)
Mar 29 2017, 4:37 PM · Storage manager
ardumont added a comment to D200: Deal with new checksum blake2s256 in storage.

Looks good with one easily solved caveat inline.

Mar 29 2017, 4:33 PM · Storage manager
olasd added a comment to D200: Deal with new checksum blake2s256 in storage.

Looks good with one easily solved caveat inline.

Mar 29 2017, 3:48 PM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.

Use sql/bin/db-upgrade to generate the 103-104 sql migration script

Mar 29 2017, 12:08 PM · Storage manager

Mar 28 2017

zack added a comment to T532: Vault API.

Just a couple of comments:

  • as discussed on IRC, having the object storage fully streaming is a goal per se, no matter what the Vault needs. If the vault needs it, its priority it's just higher; but the goal remains nonetheless (please file this as a separate task, so that we can collect knowledge and TODO items about that in a dedicated space.)
  • I might be wrong, but it seems to me that an underlying assumption of Option 2 above is that we will not cache cooked objects. That's wrong. The Vault is, conceptually, a cache and should remain so. The reason is that we expect Vault usage to be really "spike-y". Most of the content we archive will never be requested because it will remain available to its original hosting place most of the time. But when something disappears from there, especially if it's some "famous" content, we will have people looking for it into Software Heritage; possibly many people at the same time. To cater for those use cases we will need to be sure we can make the cooking only once, and serve it multiple times subsequently at essentially zero cost. Then, of course, the cache policy and how aggressive in deletion we will be is totally up for discussion and will need some data points (that we don't have yet) for tuning.
Mar 28 2017, 1:28 PM · Vault
ardumont added a comment to D200: Deal with new checksum blake2s256 in storage.

Ok, dropping the unique indexes and creating simple index then.

Mar 28 2017, 10:05 AM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.

Improve test on skipped_content_add

Mar 28 2017, 9:54 AM · Storage manager

Mar 27 2017

ardumont added inline comments to D200: Deal with new checksum blake2s256 in storage.
Mar 27 2017, 4:31 PM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.
  • swh.storage: Use agreggate key to filter on missing skipped contents
Mar 27 2017, 4:08 PM · Storage manager
Harbormaster failed remote builds in B835: Diff 659 for D200: Deal with new checksum blake2s256 in storage!
Mar 27 2017, 3:59 PM · Storage manager
Harbormaster failed remote builds in B837: Diff 661 for D200: Deal with new checksum blake2s256 in storage!
Mar 27 2017, 3:15 PM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.
  • swh.storage: Extract key variable for insertion
  • swh.storage: Use upsert scheme on (skipped_)content_add function
  • Revert "swh.storage: Use upsert scheme on (skipped_)content_add function"
  • db version 104: Update schema properly
Mar 27 2017, 3:14 PM · Storage manager
ardumont added a comment to D200: Deal with new checksum blake2s256 in storage.
In D200#4134, @olasd wrote:

There is some confusion in our current schema about what the unicity expectations are. This diff adds some on top, so we should clear them up before moving any further.

Mar 27 2017, 3:04 PM · Storage manager
olasd added a comment to D200: Deal with new checksum blake2s256 in storage.

There is some confusion in our current schema about what the unicity expectations are. This diff adds some on top, so we should clear them up before moving any further.

Mar 27 2017, 1:41 PM · Storage manager
olasd added a comment to T698: Migrate the content store to a new (internal) primary key scheme.

I agree that metadata exports need to keep meaningful intrinsic identifiers as well.

Mar 27 2017, 12:48 PM · Object storage, Storage manager
ardumont updated the summary of D200: Deal with new checksum blake2s256 in storage.
Mar 27 2017, 12:28 PM · Storage manager
ardumont updated the summary of D200: Deal with new checksum blake2s256 in storage.
Mar 27 2017, 12:13 PM · Storage manager
Harbormaster failed remote builds in B835: Diff 659 for D200: Deal with new checksum blake2s256 in storage!
Mar 27 2017, 11:51 AM · Storage manager
ardumont updated the diff for D200: Deal with new checksum blake2s256 in storage.

Fix sql formatting which was off

Mar 27 2017, 11:50 AM · Storage manager
ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 27 2017, 10:41 AM · Data Model, Storage manager

Mar 25 2017

ardumont retitled D200: Deal with new checksum blake2s256 in storage from Deal with new checksum blake2s256 in schema to Deal with new checksum blake2s256 in storage.
Mar 25 2017, 1:07 AM · Storage manager

Mar 24 2017

ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:37 PM · Data Model, Storage manager
ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:18 PM · Data Model, Storage manager
ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:16 PM · Data Model, Storage manager
ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:16 PM · Data Model, Storage manager
ardumont updated the task description for T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:15 PM · Data Model, Storage manager
ardumont renamed T703: Make the loaders compute blake2s256 hash for new contents from Make the loaders compute blake2s256 checksums to Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:13 PM · Data Model, Storage manager
ardumont created T703: Make the loaders compute blake2s256 hash for new contents.
Mar 24 2017, 2:12 PM · Data Model, Storage manager

Mar 23 2017

olasd added a comment to T532: Vault API.

From IRC with permission.

16:46:39         seirl ╡ olasd: i don't remember if you had an opinion on that too
16:49:32         olasd ╡ my opinion is that we should try very hard to avoid doing long-running stuff without checkpoints
16:50:59               ╡ I don't think it's reasonable to expect a connection to stay open for hours
16:53:37               ╡ this disqualifies any client on an unreliable connection, which is maybe half the world?
16:54:14         seirl ╡ okay, i'm not excluding trying to find a way to "resume" the download
16:55:06               ╡ that way we can just store the state of the cookers, which is pretty small
16:55:29               ╡ also, people on unstable connections tend to not want to download 52GB files
16:55:38         olasd ╡ except when they do
16:55:45     nicolas17 ╡ o/
16:56:03               ╡ I don't mind downloading 52GB
16:56:32         olasd ╡ swh doesn't intend to serve {people on stable, fast connections}, it intends to serve people
16:56:41     nicolas17 ╡ but if it's bigger than 100MB and you don't support resuming then I hate you
16:56:44         seirl ╡ okay there's a misunderstanding by what I meant by that
16:56:53               ╡ assuming we DO implement checkpoints
16:57:04               ╡ (and resuming)
16:57:15               ╡ people with unstable connections are usually people with slow download speeds
16:57:47               ╡ so they won't be impacted a lot by the fact that streaming the response while it's being cooked has a lower throughtput
16:57:55         olasd ╡ I still don't think streaming is a reasonable default
16:58:09         seirl ╡ okay
16:59:13         olasd ╡ however, I think making the objstorage support chunking is a reasonable goal
16:59:34               ╡ even if it's restricted to the local api for now
16:59:55         seirl ╡ oh, i hadn't thought of chunking the bundles
17:00:02     nicolas17 ╡ if I start downloading and you stream the response, and the connection drops, what happens? will it keep processing and storing the result in the server, or will it abort?
17:00:29         seirl ╡ nicolas17: i was thinking about storing the state of the processing (which is small) somewhere
17:00:34               ╡ in maybe an LRU cache
17:00:48               ╡ if the user reconnects, the state is restored and the processing can continue
17:01:15     nicolas17 ╡ would this be a plain HTTP download from the user's viewpoint?
17:01:21         seirl ╡ yeah
17:01:27     nicolas17 ╡ would the state be restored such that the file being produced is bitwise identical?
17:01:33         seirl ╡ that's the idea
17:01:45               ╡ we can deduce which state to retrieve from the Range: header
17:02:06     nicolas17 ╡ great then
17:03:04         olasd ╡ nicolas17: mind if I paste this conversation to the forge ?
17:03:15     nicolas17 ╡ go ahead
17:03:17             * ╡ olasd is lazy
17:03:23         seirl ╡ that said i perfectly understand that wanting the retrieval to be fast and simple for the users is an important goal, if we're not concerned about the storage and we can easily do chunking that might be a good way to go
17:03:42     nicolas17 ╡ the bitwise-identical thing is important or HTTP-level resuming would cause a corrupted mess :P
Mar 23 2017, 5:09 PM · Vault
seirl added a comment to T532: Vault API.

Currently the cookers store their bundles in an objstorage. The current design of the objstorage requires to have the whole object in ram, and it would require significant changes to be able to "stream" big objects to the objstorage. This is a big problems for the cooking requests of big repositories.

Mar 23 2017, 4:42 PM · Vault

Mar 6 2017

zack added a comment to T698: Migrate the content store to a new (internal) primary key scheme.

Ack on the principle. But noting down a caveat for use case (3).

Mar 6 2017, 3:01 PM · Object storage, Storage manager
zack updated the task description for T698: Migrate the content store to a new (internal) primary key scheme.
Mar 6 2017, 2:56 PM · Object storage, Storage manager

Mar 3 2017

olasd created T698: Migrate the content store to a new (internal) primary key scheme.
Mar 3 2017, 3:37 PM · Object storage, Storage manager
ardumont closed D185: storage: open content_update endpoint to permit update on content rows by committing rDSTO96c0a217f1c7: storage: open content_update endpoint.
Mar 3 2017, 10:26 AM · Storage manager
ardumont added a comment to D185: storage: open content_update endpoint to permit update on content rows.

There are limits to the current implementation as per the todo description in the code (swh.storage.content_update).

Mar 3 2017, 10:17 AM · Storage manager

Mar 2 2017

ardumont updated the diff for D185: storage: open content_update endpoint to permit update on content rows.

Rebased to master

Mar 2 2017, 6:47 PM · Storage manager
ardumont added inline comments to D185: storage: open content_update endpoint to permit update on content rows.
Mar 2 2017, 3:08 PM · Storage manager
ardumont updated the diff for D185: storage: open content_update endpoint to permit update on content rows.

fixup some bad sql comments commit

Mar 2 2017, 2:54 PM · Storage manager
ardumont updated the diff for D185: storage: open content_update endpoint to permit update on content rows.
  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Simplify sql update implementation
  • storage.content_update: Move altering schema tests in their own class
Mar 2 2017, 2:52 PM · Storage manager
ardumont added a comment to D185: storage: open content_update endpoint to permit update on content rows.

I didn't dive deep into the docstring, but at least I gave a stab at the SQL query.

Mar 2 2017, 1:56 PM · Storage manager
ardumont added inline comments to D185: storage: open content_update endpoint to permit update on content rows.
Mar 2 2017, 11:44 AM · Storage manager
olasd added a comment to D185: storage: open content_update endpoint to permit update on content rows.

I didn't dive deep into the docstring, but at least I gave a stab at the SQL query.

Mar 2 2017, 11:25 AM · Storage manager
ardumont added a project to D185: storage: open content_update endpoint to permit update on content rows: Storage manager.
Mar 2 2017, 11:09 AM · Storage manager

Feb 24 2017

ardumont added a comment to T494: swh-journal: archiver-client: Keep archiver table in sync with new contents.

I have a working POC for this which uses swh-journal as basis.

Feb 24 2017, 6:11 PM · Journal, Restricted Project, Storage manager
ardumont claimed T494: swh-journal: archiver-client: Keep archiver table in sync with new contents.
Feb 24 2017, 11:31 AM · Journal, Restricted Project, Storage manager

Feb 22 2017

ardumont added a comment to T49: DB schema: add missing unicity constraint on origin (type, url).

My current thinking on the general topic of "what are origins for distributions/package manager environments" is that in those contexts an origin should be a pair <distributor, package>. So, for instance, <pypi, django>, or <debian, ocaml>.

Feb 22 2017, 9:36 AM · Restricted Project, Storage manager
zack added a comment to T49: DB schema: add missing unicity constraint on origin (type, url).
In T49#12329, @ardumont wrote:

There exist some duplicated origins (well, at least regarding the loader-tar's origins):

softwareheritage=> select * from origin where type='ftp' limit 10;
   id    | type |                      url                       | lister | project
---------+------+------------------------------------------------+--------+---------
 4423668 | ftp  | rsync://ftp.gnu.org/gnu/3dldf                  |        |
 4423671 | ftp  | rsync://ftp.gnu.org/gnu/3dldf                  |        |
Feb 22 2017, 9:15 AM · Restricted Project, Storage manager
ardumont added a comment to T49: DB schema: add missing unicity constraint on origin (type, url).

There exist some duplicated origins (well, at least regarding the loader-tar's origins):

Feb 22 2017, 8:56 AM · Restricted Project, Storage manager

Feb 18 2017

olasd added a comment to T687: swh-storage: consider preventing bogus permission values.
softwareheritage=> select distinct(perms) from directory_entry_file;
 perms  
--------
  33200
  33248
  33276
  33261
 100644
  16877
 120000
  33152
  32768
  33188
  33216
  33225
  33060
      0
  40960
 295332
  33268
 100755
  33184
  33252
  33272
  33279
  33196
  33277
  33189
  16888
  16895
  33260
  33256
  33204
  16893
  41471
  16832
  33206
(34 rows)

....

Feb 18 2017, 12:16 PM · Storage manager, Restricted Project

Feb 17 2017

olasd created T687: swh-storage: consider preventing bogus permission values.
Feb 17 2017, 7:23 PM · Storage manager, Restricted Project

Feb 15 2017

olasd added a comment to T75: Check integrity of directories, revisions, and releases.
In T75#12186, @olasd wrote:

After all those fixes, we were down to 8911 releases with improper checksums, all of them synthetic (from swh-loader-tar).

Their checksums were computed with a wrong algorithm (appending a newline to the stored message, and treating an integral timestamp as a floating point value), and have now been fixed.

All releases should now have a proper identifier.

Feb 15 2017, 6:07 PM · Archive content, Restricted Project
olasd added a comment to T75: Check integrity of directories, revisions, and releases.

I've looked at the 31k releases with improper checksums.

Feb 15 2017, 6:03 PM · Archive content, Restricted Project

Feb 14 2017

olasd closed T680: fix off-by-1us timestamp in revisions coming from SVN loader as Resolved.

This issue has been solved and the fix deployed everywhere.

Feb 14 2017, 11:21 PM · Storage manager, SVN Loader, Restricted Project
olasd added a comment to T680: fix off-by-1us timestamp in revisions coming from SVN loader.

I just actually stopped the SVN loaders :)

Feb 14 2017, 2:52 PM · Storage manager, SVN Loader, Restricted Project
zack created T680: fix off-by-1us timestamp in revisions coming from SVN loader.
Feb 14 2017, 9:50 AM · Storage manager, SVN Loader, Restricted Project

Feb 13 2017

zack added a project to T672: Keep an up to date count of the number of objects in each archive: Restricted Project.
Feb 13 2017, 3:33 PM · Restricted Project, Storage manager

Feb 12 2017

zack moved T49: DB schema: add missing unicity constraint on origin (type, url) from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Feb 12 2017, 6:40 PM · Restricted Project, Storage manager
zack moved T494: swh-journal: archiver-client: Keep archiver table in sync with new contents from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Feb 12 2017, 6:39 PM · Journal, Restricted Project, Storage manager
zack moved T75: Check integrity of directories, revisions, and releases from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Feb 12 2017, 6:38 PM · Archive content, Restricted Project
zack added a project to T49: DB schema: add missing unicity constraint on origin (type, url): Restricted Project.
Feb 12 2017, 6:35 PM · Restricted Project, Storage manager
zack placed T494: swh-journal: archiver-client: Keep archiver table in sync with new contents up for grabs.
Feb 12 2017, 6:17 PM · Journal, Restricted Project, Storage manager
zack lowered the priority of T598: Store content -> revision cache in azure table storage from High to Low.
Feb 12 2017, 6:17 PM · Storage manager