Page MenuHomeSoftware Heritage
Paste P163

average length, variance, standard deviation on ~42m language indexed contents
ActivePublic

Authored by ardumont on May 29 2017, 7:09 PM.
date: Tue May 29 2017
mean:
```
softwareheritage=> select avg(length) from content_language cl inner join content c on cl.id=c.sha1;
avg
--------------------
26862.385193867011
(1 row)
```
variance:
```
softwareheritage=> select variance(length) from content_language cl inner join content c on cl.id=c.sha1;
variance
-----------------------
125835685708.88915180
(1 row)
```
standard deviation:
```
softwareheritage=> select stddev(length) from content_language cl inner join content c on cl.id=c.sha1;
stddev
-----------------
355008.08759433
(1 row)
```
Those contents were extracted and stored in uffizi:/srv/storage/space/lists/content-language-id-size.txt.gz (<sha1>\t<length>).
Using https://forge.softwareheritage.org/rDSNIP49ffa63356d7bcee7ea259381d078a3a87359bed, graph https://forge.softwareheritage.org/F2250694 was drawn.

Event Timeline

ardumont changed the title of this paste from average length and variance on ~42m language indexed contents to average length, variance, standard deviation on ~42m language indexed contents.May 30 2017, 9:43 AM
ardumont edited the content of this paste. (Show Details)