diff --git a/README.rst b/README.rst index 27b22f6..87f8da6 100644 --- a/README.rst +++ b/README.rst @@ -1,162 +1,185 @@ Prometheus Proxmox VE Exporter ============================== |Build Status| |Package Version| This is an exporter that exposes information gathered from Proxmox VE node for use by the Prometheus monitoring system. Installation ------------ Note: Python 2 is not supported anymore as of version 2.0.0. Instead use Python 3.6 or better. Using pip: ========== .. code:: shell python3 -m pip install prometheus-pve-exporter Using docker: ============= .. code:: shell docker pull prompve/prometheus-pve-exporter Example: Display usage message: .. code:: shell docker run -it --rm prompve/prometheus-pve-exporter --help Example: Run the image with a mounted configuration file and published port: .. code:: shell docker run --name prometheus-pve-exporter -d -p 127.0.0.1:9221:9221 -v /path/to/pve.yml:/etc/pve.yml prompve/prometheus-pve-exporter Prometheus PVE Exporter will now be reachable at http://localhost:9221/. Usage ----- :: - usage: pve_exporter [-h] [config] [port] [address] + usage: pve_exporter [-h] [--collector.status] [--collector.version] + [--collector.node] [--collector.cluster] + [--collector.resources] [--collector.config] + [config] [port] [address] positional arguments: - config Path to configuration file (pve.yml) - port Port on which the exporter is listening (9221) - address Address to which the exporter will bind + config Path to configuration file (pve.yml) + port Port on which the exporter is listening (9221) + address Address to which the exporter will bind optional arguments: - -h, --help show this help message and exit + -h, --help show this help message and exit + --collector.status, --no-collector.status + Exposes Node/VM/CT-Status (default: True) + --collector.version, --no-collector.version + Exposes PVE version info (default: True) + --collector.node, --no-collector.node + Exposes PVE node info (default: True) + --collector.cluster, --no-collector.cluster + Exposes PVE cluster info (default: True) + --collector.resources, --no-collector.resources + Exposes PVE resources info (default: True) + --collector.config, --no-collector.config + Exposes PVE onboot status (default: True) + Use `::` for the `address` argument in order to bind to both IPv6 and IPv4 sockets on dual stacked machines. Visit http://localhost:9221/pve?target=1.2.3.4 where 1.2.3.4 is the IP of the Proxmox VE node to get metrics from. Specify the ``module`` request parameter, to choose which module to use from the config file. The ``target`` request parameter defaults to ``localhost``. Hence if ``pve_exporter`` is deployed directly on the proxmox host, ``target`` can be omitted. +Use the `--collector.X` / `--no-collector.X` flags to enable disable selected +collectors. + +Note that that the config collector results in one API call per guest VM/CT. +It is therefore recommended to disable this collector using the +`--no-collector.config` flag on big deployments. + See the wiki_ for more examples and docs. Authentication -------------- Example ``pve.yml`` for password authentication: .. code:: yaml default: user: prometheus@pve password: sEcr3T! Example ``pve.yml`` for `token authentication`_: .. code:: yaml default: user: prometheus@pve token_name: "..." token_value: "..." The configuration is passed directly into `proxmoxer.ProxmoxAPI()`_. Note: When operating PVE with self-signed certificates, then it is necessary to either import the certificate into the local trust store (see this `SE answer`_ for Debian/Ubuntu) or add ``verify_ssl: false`` to the config dict as a sibling to the credentials. Note that PVE `supports Let's Encrypt`_ out ouf the box. In many cases setting up trusted certificates is the better option than operating with self-signed certs. Proxmox VE Configuration ------------------------ For security reasons it is essential to add a user with read-only access (PVEAuditor role) for the purpose of metrics collection. Prometheus Configuration ------------------------ The PVE exporter can be deployed either directly on a Proxmox VE node or onto a separate machine. Example config for PVE exporter running on PVE node: .. code:: yaml scrape_configs: - job_name: 'pve' static_configs: - targets: - 192.168.1.2:9221 # Proxmox VE node with PVE exporter. - 192.168.1.3:9221 # Proxmox VE node with PVE exporter. metrics_path: /pve params: module: [default] Example config for PVE exporter running on Prometheus host: .. code:: yaml scrape_configs: - job_name: 'pve' static_configs: - targets: - 192.168.1.2 # Proxmox VE node. - 192.168.1.3 # Proxmox VE node. metrics_path: /pve params: module: [default] relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 127.0.0.1:9221 # PVE exporter. Grafana Dashboards ------------------ * `Proxmox via Prometheus by Pietro Saccardi`_ .. |Build Status| image:: https://travis-ci.com/prometheus-pve/prometheus-pve-exporter.svg?branch=master :target: https://travis-ci.com/prometheus-pve/prometheus-pve-exporter .. |Package Version| image:: https://img.shields.io/pypi/v/prometheus-pve-exporter.svg :target: https://pypi.python.org/pypi/prometheus-pve-exporter .. _wiki: https://github.com/prometheus-pve/prometheus-pve-exporter/wiki .. _`token authentication`: https://pve.proxmox.com/wiki/User_Management#pveum_tokens .. _`proxmoxer.ProxmoxAPI()`: https://pypi.python.org/pypi/proxmoxer .. _`SE answer`: https://askubuntu.com/a/1007236 .. _`supports Let's Encrypt`: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_certificate_management .. _`Proxmox via Prometheus by Pietro Saccardi`: https://grafana.com/dashboards/10347 diff --git a/src/pve_exporter/cli.py b/src/pve_exporter/cli.py index 05ce2d0..672021b 100755 --- a/src/pve_exporter/cli.py +++ b/src/pve_exporter/cli.py @@ -1,34 +1,108 @@ """ Proxmox VE exporter for the Prometheus monitoring system. """ from argparse import ArgumentParser import yaml from pve_exporter.http import start_http_server from pve_exporter.config import config_from_yaml +from pve_exporter.collector import CollectorsOptions + +try: + from argparse import BooleanOptionalAction +except ImportError: + from argparse import Action + # https://github.com/python/cpython/blob/master/Lib/argparse.py#L856 + # pylint: disable=all + class BooleanOptionalAction(Action): + def __init__(self, + option_strings, + dest, + default=None, + type=None, + choices=None, + required=False, + help=None, + metavar=None): + + _option_strings = [] + for option_string in option_strings: + _option_strings.append(option_string) + + if option_string.startswith('--'): + option_string = '--no-' + option_string[2:] + _option_strings.append(option_string) + + if help is not None and default is not None: + help += f" (default: {default})" + + super().__init__( + option_strings=_option_strings, + dest=dest, + nargs=0, + default=default, + type=type, + choices=choices, + required=required, + help=help, + metavar=metavar) + + def __call__(self, parser, namespace, values, option_string=None): + if option_string in self.option_strings: + setattr(namespace, self.dest, not option_string.startswith('--no-')) + + def format_usage(self): + return ' | '.join(self.option_strings) def main(): """ Main entry point. """ parser = ArgumentParser() + parser.add_argument('--collector.status', dest='collector_status', + action=BooleanOptionalAction, default=True, + help='Exposes Node/VM/CT-Status') + parser.add_argument('--collector.version', dest='collector_version', + action=BooleanOptionalAction, default=True, + help='Exposes PVE version info') + parser.add_argument('--collector.node', dest='collector_node', + action=BooleanOptionalAction, default=True, + help='Exposes PVE node info') + parser.add_argument('--collector.cluster', dest='collector_cluster', + action=BooleanOptionalAction, default=True, + help='Exposes PVE cluster info') + parser.add_argument('--collector.resources', dest='collector_resources', + action=BooleanOptionalAction, default=True, + help='Exposes PVE resources info') + parser.add_argument('--collector.config', dest='collector_config', + action=BooleanOptionalAction, default=True, + help='Exposes PVE onboot status') parser.add_argument('config', nargs='?', default='pve.yml', help='Path to configuration file (pve.yml)') parser.add_argument('port', nargs='?', type=int, default='9221', help='Port on which the exporter is listening (9221)') parser.add_argument('address', nargs='?', default='', help='Address to which the exporter will bind') params = parser.parse_args() + collectors = CollectorsOptions( + status=params.collector_status, + version=params.collector_version, + node=params.collector_node, + cluster=params.collector_cluster, + resources=params.collector_resources, + config=params.collector_config + ) + # Load configuration. with open(params.config) as handle: config = config_from_yaml(yaml.safe_load(handle)) if config.valid: - start_http_server(config, params.port, params.address) + start_http_server(config, params.port, params.address, collectors) else: parser.error(str(config)) diff --git a/src/pve_exporter/collector.py b/src/pve_exporter/collector.py index 476573b..d4bfc35 100644 --- a/src/pve_exporter/collector.py +++ b/src/pve_exporter/collector.py @@ -1,299 +1,316 @@ """ Prometheus collecters for Proxmox VE cluster. """ # pylint: disable=too-few-public-methods import itertools +import collections from proxmoxer import ProxmoxAPI from prometheus_client import CollectorRegistry, generate_latest from prometheus_client.core import GaugeMetricFamily +CollectorsOptions = collections.namedtuple('CollectorsOptions', [ + 'status', + 'version', + 'node', + 'cluster', + 'resources', + 'config', +]) + class StatusCollector: """ Collects Proxmox VE Node/VM/CT-Status # HELP pve_up Node/VM/CT-Status is online/running # TYPE pve_up gauge pve_up{id="node/proxmox-host"} 1.0 pve_up{id="cluster/pvec"} 1.0 pve_up{id="lxc/101"} 1.0 pve_up{id="qemu/102"} 1.0 """ def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring status_metrics = GaugeMetricFamily( 'pve_up', 'Node/VM/CT-Status is online/running', labels=['id']) for entry in self._pve.cluster.status.get(): if entry['type'] == 'node': label_values = [entry['id']] status_metrics.add_metric(label_values, entry['online']) elif entry['type'] == 'cluster': label_values = ['cluster/{:s}'.format(entry['name'])] status_metrics.add_metric(label_values, entry['quorate']) else: raise ValueError('Got unexpected status entry type {:s}'.format(entry['type'])) for resource in self._pve.cluster.resources.get(type='vm'): label_values = [resource['id']] status_metrics.add_metric(label_values, resource['status'] == 'running') yield status_metrics class VersionCollector: """ Collects Proxmox VE build information. E.g.: # HELP pve_version_info Proxmox VE version info # TYPE pve_version_info gauge pve_version_info{release="15",repoid="7599e35a",version="4.4"} 1.0 """ LABEL_WHITELIST = ['release', 'repoid', 'version'] def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring version_items = self._pve.version.get().items() version = {key: value for key, value in version_items if key in self.LABEL_WHITELIST} labels, label_values = zip(*version.items()) metric = GaugeMetricFamily( 'pve_version_info', 'Proxmox VE version info', labels=labels ) metric.add_metric(label_values, 1) yield metric class ClusterNodeCollector: """ Collects Proxmox VE cluster node information. E.g.: # HELP pve_node_info Node info # TYPE pve_node_info gauge pve_node_info{id="node/proxmox-host", level="c", name="proxmox-host", nodeid="0"} 1.0 """ def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring nodes = [entry for entry in self._pve.cluster.status.get() if entry['type'] == 'node'] labels = ['id', 'level', 'name', 'nodeid'] if nodes: info_metrics = GaugeMetricFamily( 'pve_node_info', 'Node info', labels=labels) for node in nodes: label_values = [str(node[key]) for key in labels] info_metrics.add_metric(label_values, 1) yield info_metrics class ClusterInfoCollector: """ Collects Proxmox VE cluster information. E.g.: # HELP pve_cluster_info Cluster info # TYPE pve_cluster_info gauge pve_cluster_info{id="cluster/pvec",nodes="2",quorate="1",version="2"} 1.0 """ def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring clusters = [entry for entry in self._pve.cluster.status.get() if entry['type'] == 'cluster'] if clusters: # Remove superflous keys. for cluster in clusters: del cluster['type'] # Add cluster-prefix to id. for cluster in clusters: cluster['id'] = 'cluster/{:s}'.format(cluster['name']) del cluster['name'] # Yield remaining data. labels = clusters[0].keys() info_metrics = GaugeMetricFamily( 'pve_cluster_info', 'Cluster info', labels=labels) for cluster in clusters: label_values = [str(cluster[key]) for key in labels] info_metrics.add_metric(label_values, 1) yield info_metrics class ClusterResourcesCollector: """ Collects Proxmox VE cluster resources information, i.e. memory, storage, cpu usage for cluster nodes and guests. """ def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring metrics = { 'maxdisk': GaugeMetricFamily( 'pve_disk_size_bytes', 'Size of storage device', labels=['id']), 'disk': GaugeMetricFamily( 'pve_disk_usage_bytes', 'Disk usage in bytes', labels=['id']), 'maxmem': GaugeMetricFamily( 'pve_memory_size_bytes', 'Size of memory', labels=['id']), 'mem': GaugeMetricFamily( 'pve_memory_usage_bytes', 'Memory usage in bytes', labels=['id']), 'netout': GaugeMetricFamily( 'pve_network_transmit_bytes', 'Number of bytes transmitted over the network', labels=['id']), 'netin': GaugeMetricFamily( 'pve_network_receive_bytes', 'Number of bytes received over the network', labels=['id']), 'diskwrite': GaugeMetricFamily( 'pve_disk_write_bytes', 'Number of bytes written to storage', labels=['id']), 'diskread': GaugeMetricFamily( 'pve_disk_read_bytes', 'Number of bytes read from storage', labels=['id']), 'cpu': GaugeMetricFamily( 'pve_cpu_usage_ratio', 'CPU usage (value between 0.0 and pve_cpu_usage_limit)', labels=['id']), 'maxcpu': GaugeMetricFamily( 'pve_cpu_usage_limit', 'Maximum allowed CPU usage', labels=['id']), 'uptime': GaugeMetricFamily( 'pve_uptime_seconds', 'Number of seconds since the last boot', labels=['id']), 'shared': GaugeMetricFamily( 'pve_storage_shared', 'Whether or not the storage is shared among cluster nodes', labels=['id']), } info_metrics = { 'guest': GaugeMetricFamily( 'pve_guest_info', 'VM/CT info', labels=['id', 'node', 'name', 'type']), 'storage': GaugeMetricFamily( 'pve_storage_info', 'Storage info', labels=['id', 'node', 'storage']), } info_lookup = { 'lxc': { 'labels': ['id', 'node', 'name', 'type'], 'gauge': info_metrics['guest'], }, 'qemu': { 'labels': ['id', 'node', 'name', 'type'], 'gauge': info_metrics['guest'], }, 'storage': { 'labels': ['id', 'node', 'storage'], 'gauge': info_metrics['storage'], }, } for resource in self._pve.cluster.resources.get(): restype = resource['type'] if restype in info_lookup: label_values = [resource.get(key, '') for key in info_lookup[restype]['labels']] info_lookup[restype]['gauge'].add_metric(label_values, 1) label_values = [resource['id']] for key, metric_value in resource.items(): if key in metrics: metrics[key].add_metric(label_values, metric_value) return itertools.chain(metrics.values(), info_metrics.values()) class ClusterNodeConfigCollector: """ Collects Proxmox VE VM information directly from config, i.e. boot, name, onboot, etc. For manual test: "pvesh get /nodes////config" # HELP pve_onboot_status Proxmox vm config onboot value # TYPE pve_onboot_status gauge pve_onboot_status{id="qemu/113",node="XXXX",type="qemu"} 1.0 """ def __init__(self, pve): self._pve = pve def collect(self): # pylint: disable=missing-docstring metrics = { 'onboot': GaugeMetricFamily( 'pve_onboot_status', 'Proxmox vm config onboot value', labels=['id', 'node', 'type']), } for node in self._pve.nodes.get(): if node["status"] == "online": # Qemu vmtype = 'qemu' for vmdata in self._pve.nodes(node['node']).qemu.get(): config = self._pve.nodes(node['node']).qemu(vmdata['vmid']).config.get().items() for key, metric_value in config: label_values = ["%s/%s" % (vmtype, vmdata['vmid']), node['node'], vmtype] if key in metrics: metrics[key].add_metric(label_values, metric_value) # LXC vmtype = 'lxc' for vmdata in self._pve.nodes(node['node']).lxc.get(): config = self._pve.nodes(node['node']).lxc(vmdata['vmid']).config.get().items() for key, metric_value in config: label_values = ["%s/%s" % (vmtype, vmdata['vmid']), node['node'], vmtype] if key in metrics: metrics[key].add_metric(label_values, metric_value) return metrics.values() -def collect_pve(config, host): +def collect_pve(config, host, options: CollectorsOptions): """Scrape a host and return prometheus text format for it""" pve = ProxmoxAPI(host, **config) registry = CollectorRegistry() - registry.register(StatusCollector(pve)) - registry.register(ClusterResourcesCollector(pve)) - registry.register(ClusterNodeCollector(pve)) - registry.register(ClusterInfoCollector(pve)) - registry.register(ClusterNodeConfigCollector(pve)) - registry.register(VersionCollector(pve)) + if options.status: + registry.register(StatusCollector(pve)) + if options.resources: + registry.register(ClusterResourcesCollector(pve)) + if options.node: + registry.register(ClusterNodeCollector(pve)) + if options.cluster: + registry.register(ClusterInfoCollector(pve)) + if options.config: + registry.register(ClusterNodeConfigCollector(pve)) + if options.version: + registry.register(VersionCollector(pve)) + return generate_latest(registry) diff --git a/src/pve_exporter/http.py b/src/pve_exporter/http.py index 95bfb9c..be648b5 100644 --- a/src/pve_exporter/http.py +++ b/src/pve_exporter/http.py @@ -1,138 +1,139 @@ """ HTTP API for Proxmox VE prometheus collector. """ import logging import time from prometheus_client import CONTENT_TYPE_LATEST, Summary, Counter, generate_latest from werkzeug.routing import Map, Rule from werkzeug.serving import run_simple from werkzeug.wrappers import Request, Response from werkzeug.exceptions import InternalServerError from .collector import collect_pve class PveExporterApplication: """ Proxmox VE prometheus collector HTTP handler. """ # pylint: disable=no-self-use - def __init__(self, config, duration, errors): + def __init__(self, config, duration, errors, collectors): self._config = config self._duration = duration self._errors = errors - - self._url_map = Map([ - Rule('/', endpoint='index'), - Rule('/metrics', endpoint='metrics'), - Rule('/pve', endpoint='pve'), - ]) - - self._args = { - 'pve': ['module', 'target'] - } - - self._views = { - 'index': self.on_index, - 'metrics': self.on_metrics, - 'pve': self.on_pve, - } + self._collectors = collectors self._log = logging.getLogger(__name__) def on_pve(self, module='default', target='localhost'): """ Request handler for /pve route """ if module in self._config: start = time.time() - output = collect_pve(self._config[module], target) + output = collect_pve(self._config[module], target, self._collectors) response = Response(output) response.headers['content-type'] = CONTENT_TYPE_LATEST self._duration.labels(module).observe(time.time() - start) else: response = Response("Module '{0}' not found in config".format(module)) response.status_code = 400 return response def on_metrics(self): """ Request handler for /metrics route """ response = Response(generate_latest()) response.headers['content-type'] = CONTENT_TYPE_LATEST return response def on_index(self): """ Request handler for index route (/). """ response = Response( """ Proxmox VE Exporter

Proxmox VE Exporter

Visit /pve?target=1.2.3.4 to use.

""" ) response.headers['content-type'] = 'text/html' return response def view(self, endpoint, values, args): """ Werkzeug views mapping method. """ + allowed_args = { + 'pve': ['module', 'target'] + } + + view_registry = { + 'index': self.on_index, + 'metrics': self.on_metrics, + 'pve': self.on_pve, + } + params = dict(values) - if endpoint in self._args: - params.update({key: args[key] for key in self._args[endpoint] if key in args}) + if endpoint in allowed_args: + params.update({key: args[key] for key in allowed_args[endpoint] if key in args}) try: - return self._views[endpoint](**params) + return view_registry[endpoint](**params) except Exception as error: # pylint: disable=broad-except self._log.exception("Exception thrown while rendering view") self._errors.labels(args.get('module', 'default')).inc() raise InternalServerError from error @Request.application def __call__(self, request): - urls = self._url_map.bind_to_environ(request.environ) + url_map = Map([ + Rule('/', endpoint='index'), + Rule('/metrics', endpoint='metrics'), + Rule('/pve', endpoint='pve'), + ]) + + urls = url_map.bind_to_environ(request.environ) view_func = lambda endpoint, values: self.view(endpoint, values, request.args) return urls.dispatch(view_func, catch_http_exceptions=True) -def start_http_server(config, port, address=''): +def start_http_server(config, port, address, collectors): """ Start a HTTP API server for Proxmox VE prometheus collector. """ duration = Summary( 'pve_collection_duration_seconds', 'Duration of collections by the PVE exporter', ['module'], ) errors = Counter( 'pve_request_errors_total', 'Errors in requests to PVE exporter', ['module'], ) # Initialize metrics. for module in config.keys(): # pylint: disable=no-member errors.labels(module) # pylint: disable=no-member duration.labels(module) - app = PveExporterApplication(config, duration, errors) + app = PveExporterApplication(config, duration, errors, collectors) run_simple(address, port, app, threaded=True)