Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Paste
P163
average length, variance, standard deviation on ~42m language indexed contents
Active
Public
Actions
Authored by
ardumont
on May 29 2017, 7:09 PM.
Edit Paste
Archive Paste
View Raw File
Subscribe
Mute Notifications
Award Token
Flag For Later
Tags
None
Subscribers
None
date: Tue May 29 2017
mean:
```
softwareheritage=> select avg(length) from content_language cl inner join content c on cl.id=c.sha1;
avg
--------------------
26862.385193867011
(1 row)
```
variance:
```
softwareheritage=> select variance(length) from content_language cl inner join content c on cl.id=c.sha1;
variance
-----------------------
125835685708.88915180
(1 row)
```
standard deviation:
```
softwareheritage=> select stddev(length) from content_language cl inner join content c on cl.id=c.sha1;
stddev
-----------------
355008.08759433
(1 row)
```
Those contents were extracted and stored in uffizi:/srv/storage/space/lists/content-language-id-size.txt.gz (<sha1>\t<length>).
Using https://forge.softwareheritage.org/rDSNIP49ffa63356d7bcee7ea259381d078a3a87359bed, graph https://forge.softwareheritage.org/F2250694 was drawn.
Event Timeline
ardumont
created this paste.
May 29 2017, 7:09 PM
2017-05-29 19:09:12 (UTC+2)
ardumont
edited the content of this paste.
(Show Details)
May 30 2017, 9:28 AM
2017-05-30 09:28:46 (UTC+2)
ardumont
changed the title of this paste from
average length and variance on ~42m language indexed contents
to
average length, variance, standard deviation on ~42m language indexed contents
.
May 30 2017, 9:43 AM
2017-05-30 09:43:52 (UTC+2)
ardumont
edited the content of this paste.
(Show Details)
ardumont
edited the content of this paste.
(Show Details)
May 30 2017, 11:37 AM
2017-05-30 11:37:05 (UTC+2)
ardumont
mentioned this in
T722: Improve language indexer performance
.
May 30 2017, 11:40 AM
2017-05-30 11:40:55 (UTC+2)
Log In to Comment