Page MenuHomeSoftware Heritage

Object storageFolder
ActivePublic

Members

  • This project does not have any members.
  • View All

Recent Activity

Yesterday

dachary added a comment to T3149: Benchmark software for the object storage.

After writing 1TB in 40 DB (40 * 25GB), the WAL is ~200GB i.e. ~20%:

Sat, May 8, 10:35 AM · Object storage
dachary added a comment to T3149: Benchmark software for the object storage.
$ ansible-playbook -i inventory tests-run.yml && ssh -t $runner direnv exec bench python bench/bench.py --file-count-ro 500 --rw-workers 40 --ro-workers 40 --file-size 50000 --no-warmup
Sat, May 8, 8:46 AM · Object storage

Mon, May 3

dachary closed T3065: Using git to store objects, a subtask of T3054: Scale out object storage design, as Wontfix.
Mon, May 3, 5:49 PM · Roadmap 2021, meta-task, Object storage
dachary closed T3065: Using git to store objects as Wontfix.
Mon, May 3, 5:49 PM · Object storage
dachary added a comment to T3065: Using git to store objects.

While this is very creative, there is no benefit in storing small objects in git for the Software Heritage workload.

Mon, May 3, 5:48 PM · Object storage
dachary closed T3050: Using libcephsqlite to store objects, a subtask of T3054: Scale out object storage design, as Wontfix.
Mon, May 3, 5:47 PM · Roadmap 2021, meta-task, Object storage
dachary closed T3050: Using libcephsqlite to store objects as Wontfix.
Mon, May 3, 5:47 PM · Object storage
dachary added a comment to T3050: Using libcephsqlite to store objects.

There is no need to use Ceph for the Write Storage: PostgreSQL performs well and there is no scaling problem. The size of the Write Storage is limited, by design.

Mon, May 3, 5:47 PM · Object storage
dachary closed T3055: Ceph and immutable & append only storage, a subtask of T3056: Ceph as an object storage, as Wontfix.
Mon, May 3, 5:45 PM · Object storage
dachary closed T3055: Ceph and immutable & append only storage as Wontfix.
Mon, May 3, 5:45 PM · Object storage
dachary added a comment to T3055: Ceph and immutable & append only storage.

It was discussed, during the Ceph Developer Summit 2021 and the conclusion was that RADOS is not the place to implement immutable optimizations. RGW is a better fit.

Mon, May 3, 5:45 PM · Object storage
dachary added a comment to T3149: Benchmark software for the object storage.
  • Group the two postgresql nvme drives in a single logical volume to get more storage. We need 30 write workers using 100GB Shards require 3TB of postgresql storage
  • Setup a second postgresql server set as a standby replication of the master: it may negatively impact the performances of the master cluster and should be included in the benchmark
Mon, May 3, 2:59 PM · Object storage
dachary added a comment to T3149: Benchmark software for the object storage.
$ bench.py --file-count-ro 200 --rw-workers 20 --ro-workers 80 --file-size 50000 --no-warmup
...
WARNING:root:Objects write 5.8K/s
WARNING:root:Bytes write 117.9MB/s
WARNING:root:Objects read 1.3K/s
WARNING:root:Bytes read 100.4MB/s
Mon, May 3, 7:26 AM · Object storage

Sun, May 2

dachary added a comment to T3149: Benchmark software for the object storage.
$ bench.py --file-count-ro 200 --rw-workers 20 --ro-workers 80 --file-size 50000 --rand-ratio 10
...
WARNING:root:Objects write 5.8K/s
WARNING:root:Bytes write 118.4MB/s
WARNING:root:Objects read 12.3K/s
WARNING:root:Bytes read 850.3MB/s
Sun, May 2, 9:41 AM · Object storage

Sat, May 1

dachary added a comment to T3149: Benchmark software for the object storage.

Fix a race condition that failed postgresql database drops.

Sat, May 1, 5:19 PM · Object storage

Tue, Apr 27

dachary added a comment to T3149: Benchmark software for the object storage.

The rewrite to use processes was trivial and preliminary tests yield the expected results. Most of the time was spent on two problems:

Tue, Apr 27, 2:02 PM · Object storage

Tue, Apr 20

dachary added a comment to T3149: Benchmark software for the object storage.

Struggled most of the today because there is a bottleneck when using threads and postgress, from a single client. However, when running 4 process, it performs as expected. The benchmark should be rewritten to use the process pool instead of the thread pool which should not be too complicated. I tried to add a warmup phase so that all concurrent threads/process do not start at the same time, but it does not really make any visible difference.

Tue, Apr 20, 9:05 PM · Object storage

Mon, Apr 19

dachary added a comment to T3149: Benchmark software for the object storage.

Completed the tests for the rewrite, it is working. Time to run it in grid5000

Mon, Apr 19, 2:49 PM · Object storage
olasd claimed T1954: Up-to-date objstorage mirror on S3.

Some partitions have reached the tail of the journal and everything is still running smoothly, yay.

Mon, Apr 19, 12:08 PM · System administration, Object storage
olasd added a subtask for T1954: Up-to-date objstorage mirror on S3: T2003: Content replayer may try to copy objects before they are available from an objstorage.
Mon, Apr 19, 12:07 PM · System administration, Object storage

Sun, Apr 18

dachary added a comment to T3149: Benchmark software for the object storage.

rbd bench on the images created

Sun, Apr 18, 3:15 PM · Object storage

Sat, Apr 17

dachary added a comment to T3149: Benchmark software for the object storage.

There is a 3% space overhead on the RBD data pool. 6TB data, 3TB parity = 9TB. Actual 9.3TB, i.e. ~+3%.

Sat, Apr 17, 10:19 PM · Object storage
dachary added a comment to T3108: Grid5000 for benchmarking.

https://www.grid5000.fr/w/Grenoble:Network shows the network topology

Sat, Apr 17, 5:26 PM · Object storage
dachary added a comment to T3149: Benchmark software for the object storage.

Complete rewrite to:

Sat, Apr 17, 5:24 PM · Object storage

Fri, Apr 16

vsellier placed T2743: Investigate why logging is inconsistent up for grabs.
Fri, Apr 16, 11:48 AM · Vault, Object storage, Core & foundations

Wed, Apr 14

dachary renamed T3249: Deleting and erasing an object from Object deletion to Deleting and erasing an object.
Wed, Apr 14, 5:37 PM · Object storage
dachary added a subtask for T3054: Scale out object storage design: T3249: Deleting and erasing an object.
Wed, Apr 14, 5:37 PM · Roadmap 2021, meta-task, Object storage
dachary added a parent task for T3249: Deleting and erasing an object: T3054: Scale out object storage design.
Wed, Apr 14, 5:37 PM · Object storage
dachary changed the status of T3249: Deleting and erasing an object from Open to Work in Progress.
Wed, Apr 14, 5:37 PM · Object storage
dachary added a comment to T3149: Benchmark software for the object storage.
  • Add reader to continuously read from images to simulate a read workload
  • Randomize the payload instead of using easily compressible data (postgres does a good job compressing them and this does not reflect the reality)
Wed, Apr 14, 5:33 PM · Object storage

Mon, Apr 12

olasd added a comment to T1954: Up-to-date objstorage mirror on S3.

The process has been restarted and is well ongoing (we have 800 million objects left to copy, at around 500 ops, so the ETA until reaching the tail of the log is around 3 weeks now).

Mon, Apr 12, 5:11 PM · System administration, Object storage
olasd claimed T3085: Complete and updated copy of the archive on S3 (objects+graph).
Mon, Apr 12, 5:04 PM · meta-task, Roadmap 2021, System administration, Object storage
dachary added a comment to T3149: Benchmark software for the object storage.
  • bench.py --file-count-ro 20 --rw-workers 20 --packer-workers 20 --file-size 1024 --fake-ro yields WARNING:root:Objects write 17.7K/s
  • bench.py --file-count-ro 40 --rw-workers 40 --packer-workers 20 --file-size 1024 --fake-ro yields WARNING:root:Objects write 13.8K/s
Mon, Apr 12, 9:21 AM · Object storage

Apr 7 2021

dachary added a comment to T3149: Benchmark software for the object storage.

The benchmark was moved to a temporary repository for convenience (easier than uploading here every time). https://git.easter-eggs.org/biceps/biceps

Apr 7 2021, 6:25 PM · Object storage

Apr 6 2021

dachary closed T3210: Ceph Quincy CDS & immutable objects as Resolved.
Apr 6 2021, 11:33 PM · Object storage
dachary closed T3210: Ceph Quincy CDS & immutable objects, a subtask of T3054: Scale out object storage design, as Resolved.
Apr 6 2021, 11:33 PM · Roadmap 2021, meta-task, Object storage
dachary updated the task description for T3054: Scale out object storage design.
Apr 6 2021, 11:33 PM · Roadmap 2021, meta-task, Object storage
dachary added a comment to T3210: Ceph Quincy CDS & immutable objects.

Takeaways from the session:

Apr 6 2021, 6:35 PM · Object storage
dachary updated the task description for T3210: Ceph Quincy CDS & immutable objects.
Apr 6 2021, 1:46 PM · Object storage
dachary updated the task description for T3210: Ceph Quincy CDS & immutable objects.
Apr 6 2021, 1:46 PM · Object storage
dachary added a subtask for T3054: Scale out object storage design: T3210: Ceph Quincy CDS & immutable objects.
Apr 6 2021, 1:39 PM · Roadmap 2021, meta-task, Object storage
dachary added a parent task for T3210: Ceph Quincy CDS & immutable objects: T3054: Scale out object storage design.
Apr 6 2021, 1:39 PM · Object storage
dachary changed the status of T3210: Ceph Quincy CDS & immutable objects from Open to Work in Progress.
Apr 6 2021, 1:33 PM · Object storage
vlorentz added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

@KShivendu The linked script is a start. As it is, it requires direct access to the DB; so you need to create abstractions for it in swh-storage and swh-web

Apr 6 2021, 12:50 PM · Easy hack, Storage manager, Object storage

Apr 5 2021

KShivendu added a comment to T1487: Add a public API endpoint to retrieve a set of files with a given name.

Hi guys. Any pointers on where to start?

Apr 5 2021, 1:57 PM · Easy hack, Storage manager, Object storage

Mar 30 2021

dachary renamed T3186: Ceph Sepia lab for performance testing from Ceph Sepia lab for testing to Ceph Sepia lab for performance testing.
Mar 30 2021, 10:14 AM · Object storage
dachary added a subtask for T3054: Scale out object storage design: T3186: Ceph Sepia lab for performance testing.
Mar 30 2021, 10:13 AM · Roadmap 2021, meta-task, Object storage
dachary added a parent task for T3186: Ceph Sepia lab for performance testing: T3054: Scale out object storage design.
Mar 30 2021, 10:13 AM · Object storage
dachary changed the status of T3186: Ceph Sepia lab for performance testing from Open to Work in Progress.
Mar 30 2021, 10:13 AM · Object storage

Mar 26 2021

dachary updated the task description for T3054: Scale out object storage design.
Mar 26 2021, 11:52 PM · Roadmap 2021, meta-task, Object storage