In T3127#66631, @rdicosmo wrote:Nice to see this moving forward!
These entries in the counter log look suspicious, though, they are not origins:
b'atlassian@bitbucket.org' 2 b'taylorhakes@github.com' 2 b'bunnyhero@bitbucket.org' 1 b'dtrebbien@bitbucket.org' 1 b'eldargab@github.com' 1 b'git@github.com' 1 b'schierlm@git.code.sf.net' 1 b'tomakehurst@github.com' 1 b'wenshao@github.com' 1 b'zimbra-mirror@bitbucket.org' 1
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Jun 22 2021
Jun 22 2021
Nice to see this moving forward!
Regarding this, to ease the mapping between a lister and an instance name, we may want to rework the instance names in the scheduler
model (listers table) so that the value is actually the netloc of the origin.
Great work! Awesome.
After some analysis, the data we need to properly implement this are:
- the set of lister names and their instance names in order to organize origins by forge types (gitlab, cgit, sourceforge, ...)
- a precise or estimated count for the origins listed by a given lister instance
An array with the possible node count relative to the replication factor was added on the hedgedoc document : https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw?both
Jun 18 2021
Jun 18 2021
Several tests were executed with cassandra node on the parasilo cluster [1]
The configuration was always the same to calibrate the runs:
- ZFS is used to manage to datasets
- the commitlogs in the 200Go SSD drive
- the data in the 4 600Gb HDD configured in RAID0
- Default memory configuration (8Go / default GC (not g1))
- Cassandra configuration : [2]
Landed but some more change in the build pipeline need to happen.
Currently looking into it...
Landed and deployed on the docs site [1]
ardumont closed T3389: Create FAQ in docs for developers, a subtask of T3387: Create FAQ in docs, as Resolved.
Jun 17 2021
Jun 17 2021
ardumont added a revision to T3388: Create FAQ in docs for users: D5888: users faq: Define faq with categories.
moranegg renamed T3388: Create FAQ in docs for users from Create FAQ for users to Create FAQ in docs for users.
Jun 16 2021
Jun 16 2021
Some notes on how to perform common actions with cassandra: https://hedgedoc.softwareheritage.org/m2MBUViUQl2r9dwcq3-_Nw
Jun 15 2021
Jun 15 2021
The environment can be stopped and rebuild as long as the disk remained reserved on the servers.
vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
vsellier updated the task description for T3357: Perform some tests of the cassandra storage on Grid5000.
Jun 11 2021
Jun 11 2021
moranegg triaged T3377: Add icon/button in moderation view to go to deposit in new tab as Normal priority.
moranegg triaged T3376: Visualize metadata of a deposit in the admin (moderation) view as Normal priority.
Jun 10 2021
Jun 10 2021
Some status about the automation:
- Cassandra nodes are ok (os installation, zfs configuration according to the defined environment except a problem during the first initialization with new disks, startup, cluster configuration)
- swh-storage node is ok (os installation, gunicorn/swh-storage installation and startup)
- cassandra database initialization :
root@parasilo-3:~# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 172.16.97.3 78.85 KiB 256 31.6% 49d46dd8-4640-45eb-9d4c-b6b16fc954ab rack1 UN 172.16.97.5 105.45 KiB 256 26.0% 47e99bb4-4846-4e03-a06c-53ea2862172d rack1 UN 172.16.97.4 98.35 KiB 256 18.1% e2aeff29-c89a-4c7a-9352-77aaf78e91b3 rack1 UN 172.16.97.2 78.85 KiB 256 24.3% edd1b72b-4c35-44bd-b7e5-316f41a156c4 rack1
root@parasilo-3:~# cqlsh 172.16.97.3 Connected to swh-storage at 172.16.97.3:9042 [cqlsh 6.0.0 | Cassandra 4.0 | CQL spec 3.4.5 | Native protocol v5] cqlsh> desc KEYSPACES
Jun 8 2021
Jun 8 2021
ardumont edited projects for T3082: Improve Save Code Now handling, added: System administration; removed System administrators.
Jun 3 2021
Jun 3 2021
I played with grid5000 to experiment how the jobs work and how to initialize the reserved nodes.
ardumont moved T3357: Perform some tests of the cassandra storage on Grid5000 from Backlog to in-progress on the System administration board.
Jun 2 2021
Jun 2 2021
vsellier changed the status of T3357: Perform some tests of the cassandra storage on Grid5000 from Open to Work in Progress.
May 27 2021
May 27 2021
great ;)
The save code now queue statistics are now displayed on the status.io page[1] as an example. The data are refreshed each 5 minutes.
May 26 2021
May 26 2021
vlorentz closed T2602: Investigate how to upgrade the schema of the Cassandra storage, a subtask of T2214: Scale-out graph and database storage in production, as Resolved.
May 25 2021
May 25 2021
Metrics can easily be pushed to the status page.
The simple poc for the save code now request is available here : https://forge.softwareheritage.org/source/snippets/browse/master/sysadmin/status.io/update_metrics.py
May 20 2021
May 20 2021
for the status.swh.org point of view, status.io is providing some api endpoint to push metrics. It should be possible to add some metrics (up to 10 with our plan) to expose the behavior of the platform (daily/weekly and monthly statistics).
As a first step, we could expose the number of pending save code now requests and the number of origin visits to have some live data. An example of a status page with metrics : https://status.docker.com/
I'm working on a code snippet to test the integration feasibility/complexity.
vsellier changed the status of T3129: Reliable monitoring of services: for users and for admins from Open to Work in Progress.
May 17 2021
May 17 2021
May 15 2021
May 15 2021
dachary removed a subtask for T3054: Scale out object storage design: T3327: Hardware architecture for the object storage.
dachary added a subtask for T3054: Scale out object storage design: T3327: Hardware architecture for the object storage.
May 10 2021
May 10 2021
anlambert closed T3272: Authenticated users should be able to browse their save code now requests, a subtask of T3082: Improve Save Code Now handling, as Resolved.
vlorentz changed the status of T843: Vault: Add a "git bare" tarball cooker, a subtask of T3096: Efficient and reliable download via the Vault, from Open to Work in Progress.
vlorentz changed the status of T3096: Efficient and reliable download via the Vault from Open to Work in Progress.
vlorentz moved T3096: Efficient and reliable download via the Vault from Backlog to Work in progress on the Roadmap 2021 board.
May 8 2021
May 8 2021
zack updated the task description for T3316: SWHID v2: determine binary-to-text encoding for checksum part.
zack triaged T3316: SWHID v2: determine binary-to-text encoding for checksum part as Normal priority.
rdicosmo moved T2194: Archive Integration (Web API) from Backlog to Work in progress on the Roadmap 2021 board.
rdicosmo moved T3118: Documentation for users and ambassadors from Backlog to Work in progress on the Roadmap 2021 board.
rdicosmo moved T2912: Next generation archive counters from Pending validation to Done on the Roadmap 2021 board.
rdicosmo moved T3082: Improve Save Code Now handling from Backlog to Work in progress on the Roadmap 2021 board.
May 3 2021
May 3 2021
dachary closed T3065: Using git to store objects, a subtask of T3054: Scale out object storage design, as Wontfix.
dachary closed T3050: Using libcephsqlite to store objects, a subtask of T3054: Scale out object storage design, as Wontfix.
anlambert changed the status of T3272: Authenticated users should be able to browse their save code now requests, a subtask of T3082: Improve Save Code Now handling, from Open to Work in Progress.
Apr 28 2021
Apr 28 2021
> I also recall now that vincent added a graph [1] recently enough.
This to try and compare a bit the counter approaches together.
So that's still using the old plumbing at least for that part.
In T2912#64208, @ardumont wrote:What about the old counter pipeline? Has it been decommissioned already?
I don't think so as I do not recall seeing diffs about clean up.
In any case, it's not part of what's currently deployed (so no risk for
data mangling if that's part the concern).
Apr 27 2021
Apr 27 2021
moranegg updated the task description for T2624: Create strategy for documentation with a map or a full table of content.
moranegg updated the task description for T2624: Create strategy for documentation with a map or a full table of content.
moranegg updated the task description for T2624: Create strategy for documentation with a map or a full table of content.
moranegg changed the status of T3128: Improve deposit integration, management and display from Open to Work in Progress.
Apr 26 2021
Apr 26 2021
What about the old counter pipeline? Has it been decommissioned already?
In T2912#64174, @ardumont wrote:Last bits deployed on archive.s.o (including the author counters).
Last bits deployed on archive.s.o (including the author counters).
ardumont added a comment to T3213: Enable save code now of software source code archives for specific users.
Remains one or two concerns about this prior to actually act on it.
rdicosmo moved T2912: Next generation archive counters from Work in progress to Pending validation on the Roadmap 2021 board.
zack added a comment to T3087: Implement support for takedown notices (infra, admin tools, workflow).
In T3087#63887, @rdicosmo wrote:In T3087#63791, @douardda wrote:So what about exports of the archive available on git-annex?
Apr 24 2021
Apr 24 2021
ardumont added a comment to T3213: Enable save code now of software source code archives for specific users.
If I understand well, url+time+length+filename+version are used in an heuristic to
avoid (down)loading over and over again something that is already ingested
rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.
In T3213#64118, @ardumont wrote:I recall it's part of creating a primary key (of sort) composed of all the properties mentioned
above (when the artifact does not provide some hashes already).
This to bypass fetching all other again things already fetched.
ardumont added a comment to T3213: Enable save code now of software source code archives for specific users.
(submitted too early)
rdicosmo added a comment to T3213: Enable save code now of software source code archives for specific users.
In T3213#64001, @ardumont wrote:Currently users only provide an url in the save code now, the loader expects a bit more
[1] (recall it's the lister which actually provide those).The loader expects to be provided with a list of artifacts (could be only 1 in our
case). Still, such artifacts are described through the following:
- artifact url
- time
- length (could be derived from the url when discussing with the server but not all server provides it...)
- version (could be derived with heuristic from the url as well but that's regexp-hell-ish and prone to error)
- filename (could be derived from the url without too much risk i think...)
I gather the save code now ui could be enriched (and displayed according to chosen visit
type) but that becomes more involved for people in general.Another road would be to make some of those properties optional...
Thoughts?
[1]
"url": "https://ftp.gnu.org/old-gnu/emacs/", "artifacts": [{"url": "https://ftp.gnu.org/old-gnu/emacs/elib-1.0.tar.gz", "time": "1995-12-12T08:00:00+00:00", "length": 58335, "version": "1.0", "filename": "elib-1.0.tar.gz", }, ... ] ...
Apr 23 2021
Apr 23 2021
vlorentz moved T2214: Scale-out graph and database storage in production from Backlog to Work in progress on the Roadmap 2021 board.
vsellier added a revision to T2912: Next generation archive counters: D5588: Activate swh-counters on all the webapps.
vsellier closed T3251: Count authors from revisions and releases, a subtask of T2912: Next generation archive counters, as Resolved.
Apr 22 2021
Apr 22 2021
ardumont added a comment to T3213: Enable save code now of software source code archives for specific users.
I stand by what i said regarding the scheduling logic, it's as simple as I described
earlier... But...