Page MenuHomeSoftware Heritage

Puppetize the elasticsearch closing index script
ClosedPublic

Authored by ardumont on Apr 12 2021, 1:35 PM.

Details

Summary

This installs the cron on the logstash instance.

Related to T3221

Test Plan

octocatalog, vagrant happy:

bin/octocatalog-diff --octocatalog-diff-args --no-truncate-details --to staging logstash0

Found host logstash0.internal.softwareheritage.org
...
*** Running octocatalog-diff on host logstash0.internal.softwareheritage.org
I, [2021-04-12T13:34:31.298208 #12191]  INFO -- : Catalogs compiled for logstash0.internal.softwareheritage.org
I, [2021-04-12T13:34:32.433237 #12191]  INFO -- : Diffs computed for logstash0.internal.softwareheritage.org
diff origin/production/logstash0.internal.softwareheritage.org current/logstash0.internal.softwareheritage.org
*******************************************
+ Concat_file[profile::cron::elasticsearch] =>
   parameters =>
      "group": "root"
      "mode": "0644"
      "owner": "root"
      "path": "/etc/puppet-cron.d/elasticsearch"
      "tag": "profile::cron::elasticsearch"
*******************************************
+ Concat_fragment[profile::cron::elasticsearch-close-index] =>
   parameters =>
      "order": "10"
      "tag": "profile::cron::elasticsearch"
      "target": "profile::cron::elasticsearch"
      "content": >>>
# Cron snippet elasticsearch-close-index
30 5 * * * root /usr/local/bin/elasticsearch_close_index.py --host esnode2.internal.softwareheritage.org --host esnode3.internal.softwareheritage.org --host esnode1.internal.softwareheritage.org --timeout 1200
<<<
*******************************************
+ Concat_fragment[profile::cron::elasticsearch::_header] =>
   parameters =>
      "order": "00"
      "tag": "profile::cron::elasticsearch"
      "target": "profile::cron::elasticsearch"
      "content": >>>
# Managed by puppet (module profile::cron), manual changes will be lost
<<<
*******************************************
+ File[/etc/cron.d/puppet-elasticsearch] =>
   parameters =>
      "ensure": "link"
      "target": "/etc/puppet-cron.d/elasticsearch"
*******************************************
+ File[/usr/local/bin/elasticsearch_close_index.py] =>
   parameters =>
      "ensure": "present"
      "group": "root"
      "mode": "0755"
      "owner": "root"
      "content": >>>
#!/usr/bin/env python3

import click
import datetime
import iso8601

import elasticsearch

@click.command()
@click.option('--host', '-h',
              default='esnode1.internal.softwareheritage.org',
              multiple=True,
              help="Elasticsearch node instances")
@click.option('--timeout', '-t', default=1200)
def main(host, timeout):
    hosts = host  # `host` is a list of multiple hosts
    today = datetime.date.today()
    days = lambda n: datetime.timedelta(days=n)

    es = elasticsearch.Elasticsearch(hosts=hosts, timeout=timeout)

    for l in sorted(es.cat.indices(h='i,sth,status').splitlines()):
        i, throttled, status = l.split()
        throttled = throttled == 'true'
        if throttled and status != 'open':
            continue
        if i.startswith('.'):
            continue
        date = i.split('-')[-1]
        if not date.startswith('20'):
            continue
        date = date.replace('.', '-')
        date = iso8601.parse_date(date).date()
        info = es.indices.get(i)[i]
        shards = int(info['settings']['index']['number_of_shards'])
        if not throttled and date < today - days(7):
            print('freezing', i)
            es.indices.freeze(i, wait_for_active_shards=shards)
            status = 'open'
        if status == 'open' and date < today - days(30):
            print('closing', i)
            es.indices.close(i)


if __name__ == '__main__':
    main()
<<<
*******************************************
+ Package[python3-click] =>
   parameters =>
      "ensure": "present"
*******************************************
+ Package[python3-elasticsearch] =>
   parameters =>
      "ensure": "present"
*******************************************
+ Package[python3-iso8601] =>
   parameters =>
      "ensure": "present"
*******************************************
+ Profile::Cron::D[elasticsearch-close-index] =>
   parameters =>
      "command": "/usr/local/bin/elasticsearch_close_index.py --host esnode2.internal.softwareheritage.org --host esnode3.internal.softwareheritage.org --host esnode1.internal.softwareheritage.org --timeout 1200"
      "month": "fqdn_rand"
      "target": "elasticsearch"
      "unique_tag": "elasticsearch-close-index"
      "user": "root"
*******************************************
+ Profile::Cron::File[elasticsearch] =>
   parameters =>
      "target": "elasticsearch"
*******************************************
*** End octocatalog-diff on logstash0.internal.softwareheritage.org

Diff Detail

Repository
rSPSITE puppet-swh-site
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

ardumont created this revision.
ardumont added inline comments.
site-modules/profile/manifests/elasticsearch/index_janitor.pp
32

mmm, i need to rework the hour for this cron.
But the rest remains reviewable.

Move period to once a month should be enough

Use the right repo to update the diff

Adapt according to suggestions

  • Rename to hosts variable, supported by click ;)
  • Extracted a freezing and closing days as cli flags
  • Add cli docstring
  • Let the cron trigger once a day
site-modules/profile/manifests/elasticsearch/index_janitor.pp
32

nope, it's fine once a day, so it's back ;)

olasd added a subscriber: olasd.

I'm sure some of the assumptions behind this script are going to break, eventually. But for now it's better than nothing!

site-modules/profile/files/elasticsearch/elasticsearch_close_index.py
14

freeze-after-days?

15

close-after-days?

38–39

add a comment: ignore dot-prefixed indexes (e.g. kibana settings)

site-modules/profile/manifests/elasticsearch/index_janitor.pp
28

should probably be wrapped in /usr/bin/chronic, unless we want a daily cronspam

This revision is now accepted and ready to land.Apr 12 2021, 4:00 PM
ardumont added inline comments.
site-modules/profile/files/elasticsearch/elasticsearch_close_index.py
15

yes, thx a lot, that would not come... (both ;)

ardumont marked an inline comment as done.

Adapt according to good points