Getting Started with Prometheus — Pt. 1

Weston Bassler
10 min readJul 21, 2018
Source: CNCF.io

Recently I have been deep diving and researching into how to best collect metrics for monitoring and alerting for services and infrastructure. In particular, how to do this in a dynamic distributed system running microservices is no easy task. Up to this point I have used tools such as check_mk (nagios based), ELK stack, and TICK stack for doing much of the metric collection and alerting. These stacks have worked fine up to this point, but has taken a lot of work and can get complex because it requires management of multiple different tools and stacks.

I am beginning to notice a lot of hype behind a project called Prometheus. It is becoming a standard for many of the style of systems that I currently am working on (DC/OS, Kubernetes, Kafka, etc…). It appears to be gaining a major community and many companies are switching to it. Some are even converting to the Prometheus format. From my exposure so far, it looks to be the right move.

In this series of posts, I am going to go through my journey of learning and understanding how to use Prometheus. My goal is to take what I have learned and then apply it to my DC/OS cluster and use it as a service for monitoring not only DC/OS, but all the other frameworks that I am using there such as Marathon, Kubernetes and Kafka. ALL THE THINGS. In this first post, I am going to show how you can get started using Prometheus and describe many of its components. My hope is that after reading this you will have a better understanding of Prometheus, as well as gain enough knowledge to get started on your own.

Started a Github project for this if interested. Will update along the way as I add to this series.

Before we begin: My goal is to keep these posts as short as possible to remain interesting and engaging. With this being the first one and the need to explain Prometheus and its components, this is not likely going to be the case. You have to crawl before you walk, but I understand in the world of tech sometimes you must RUN before you crawl. For this reason, I have broken this post up into 2 separate categories: “Introduction” and “Getting Started”. The “Introduction” will be the lengthy explanation of Prometheus and its different components. The “boring” stuff. If you would like to skip over this and just try it out yourself, I suggest skipping ahead to “Getting Started” section.

Introduction

I want to start off by explaining what Prometheus is and what purpose it serves.

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Since its inception in 2012, many companies and organizations have adopted Prometheus, and the project has a very active developer and user community. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project’s governance structure, Prometheus joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes.

— prometheus.io

Prometheus is an Open Source monitoring and alerting toolkit, but I would like to point out that this is not like ordinary monitoring and alerting toolkits that you might be familiar with. Instead of using protocols such as SMNP or some sort of agent service, Prometheus pulls (scrape) metrics from a client (target) over http and places the data into its time series database that you can query using its own DSL. If you have ever worked with a time series database before, it’s easy to understand why it’s used for a monitoring system.

Prometheus uses “exporters” that are installed/configured on the clients in order to convert and expose their metrics in a Prometheus format. The Prometheus server then scrapes the exporter for metrics. A list of already created exporters can be found here but you also have the ability to create your own exporter as well. Currently supports language libraries for: Python, Java, Go, Node and Clojure.

One other thing to note here is that by default Prometheus comes with a UI out of the box that can be accessed on port 9090 on the Prometheus server. Users also have the ability to build dashboards and integrate their favorite visualization software, such as Grafana, Chronograf or Kibana.

Prometheus uses a separate component for alerting called the AlertManager. The AlertManager receives metrics from the Prometheus server and then is responsible for grouping and making sense of the metrics and then forwarding an alert to your chosen notification system. The AlertManager currently supports email, Slack, VictorOps, HipChat, WebHooks and many more. One of the cooler features of alerting in Prometheus is the ability to do templating for your notifications so you can customize the way it looks when you receive it. I’m planning to do an entire blog post on just this component in the future.

Pretty simple so far. Let’s do a quick recap of some of the terms that I found to be important:

  • Prometheus Server — The main server that scrapes and stores the scraped metrics in a time series DB.
  • Scrape — Prometheus server uses a pulling method to retrieve metrics.
  • Target — The Prometheus servers clients that it retrieves info from.
  • Exporter — Target libraries that convert and export existing metrics into Prometheus format.
  • Alert Manager — Component responsible for handling alerts.

Obviously this is not everything required to run Prometheus, but are important starting points.

Being a visual learner myself, here is an image of an overview of the Prometheus architecture.

source: prometheus.io/overview

I really like this image because it shows the flow of how Prometheus works. It is a good overall description of everything that we have discussed so far and pieces everything together. Prometheus scrapes metrics from exports, stores them in the TSDB on the Prometheus server and then pushes alerts to Alertmanager.

One thing I did not discuss that is shown, however, is the Pushgateway. The reason I didn’t discuss it is because I am not really sure that I have a use case for it yet and I haven’t studied it very closely yet. You can read more about it here if you would like.

One of the other key pieces the image shows that I did not discuss was the “Service discovery” component. This is one of the best features of Prometheus in my personal opinion and a reason why it is a good fit for me. It is built for “Dynamic” environments. Although I will not be showing this component in the post, I would like to briefly talk about it because it will be key down the line when we try to get metrics from frameworks such as Kubernetes, Marathon, etc and ec2 instances. It provides the point of truth for the environment.

This feature will allow Prometheus to auto discover different targets since these targets will come and go so frequently in a distributed system or microservice orchestration style of architecture. We will not be responsible for continuously updating a static list of target addresses each time a service and/or a piece of infrastructure is removed or added. Prometheus will automatically discover and start/stop scraping for us. More to come on this in later posts.

The last thing I want to provide in the introduction is a list of resources that I have found helpful while trying to learn Prometheus over the last few weeks. There is quite a bit of material on the internet and a fast growing community, but I have a few resources that have really helped speed up the learning process for me:

Getting Started

Now that we have a decent understanding of Prometheus, let’s roll through the process and get a Prometheus server up and start scraping some metrics together. For this example I am going to be using Docker locally to my machine to run my Prometheus server and scraping metrics from the Docker daemon (see: Collect Docker Metrics with Prometheus official docs). I am going to be using Docker on Mac for this example, but you can do this anywhere that you can run the Docker engine.

Configure Docker
First we need to reconfigure Docker. Docker has recently enabled Prometheus-compatible metrics on port 9323 of the engine. We need to expose it so that Prometheus can scrap it. Add the following to your daemon.json depending on your version of Docker.

{
"metrics-addr" : "127.0.0.1:9323",
"experimental" : true
}

For Docker on Mac click on Preferences > Daemon > Advanced and then copy and paste. Then click “Apply & Restart” when done.

Configure Prometheus
Next we need to configure Prometheus. Prometheus uses a configuration file (/etc/prometheus/prometheus.yml) that specifies everything needed for Prometheus to begin scraping and storing metrics. The configuration file includes configurations for things such as scrape interval, rule evaluation interval (haven’t discussed), rules to load (haven’t discussed yet), and your scrape configs for your specified targets.

Let’s take a look at the example provided in the Docker docs:

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first.rules"
# - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ['docker.for.mac.localhost:9090']

- job_name: 'docker'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ['docker.for.mac.host.internal:9323']

In our config above, we are going to be scraping our targets every 15s. If you look at the “scrape_configs” in line 19, you can see that we have two “job_name” entries. In simplest terms, a “job_name” begins the configuration for how to scrape your targets. In the example you can see that we will be scraping the prometheus server at port 9090 and then the docker engine at port 9323 every 15 seconds for metrics.

Take the above example and create it as “prometheus.yml”. We will mount this file under our Prometheus server we will be running as a docker container.

Run Prometheus
Now we are ready to run Prometheus from the latest Prometheus docker image and the config file we created above! For this post we aren’t going to save the metrics to disk so be aware that the data collected will not be persisted if the container get deleted.

docker run -d --name prometheus -p 9090:9090 -v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Once it starts you can pull up your browser at http://localhost:9090 to see the UI.

Looking Around
The Prometheus UI provides some pretty nice features out of the box. You can view things such as the current Prometheus environment (Under the “Status” tab), execute queries using PromQL (Under the “Graph” tab) and see what is currently alerting (Under the “Alerts” tab). I would suggest spending some time looking around and playing around with the query engine to get familiar with how it works. Below are some examples of using the engine:

Prometheus is scraping my Docker Engine for the number of stopped, running and paused containers. Pretty cool!

Add Grafana
Although Prometheus has a UI out of the box, you may also want to integrate your own Visualization tool. I am personally a fan of Grafana so I am going to quickly show how you can integrate Grafana and import already created Grafana dashboards. Will be using this in later demos as well.

We are going to go ahead and run Grafana in Docker as well linked to Prometheus.

docker run -d --name grafana --link prometheus -p 3000:3000 grafana/grafana

Open browser to port 3000 and login with default admin:admin creds. You will be asked to change the password upon successful login or you can skip it.

Once you log in, click on “Add Data Source” so we can add our Prometheus Server as the source. Use the following configs and then click “Save and Test” when done.

Now let’s import an existing Dashboard for our Docker Engine. Click the “+” in the top left and then “import”. Type “1229” and then click “Load”.

Select “Prometheus” data source created above and click “Import”. Your Dashboard will appear.

Nothing too amazing here, but being able to import already created Dashboards into Grafana is pretty awesome! These will get more interesting as we move forward with “real” examples.

As mentioned, this was a very high level overview of Prometheus. Hopefully I have demonstrated how easy it is to get started playing around with Prometheus and gain a better understanding of it. In the upcoming articles of the series I will start to deep dive and discuss things such as Prometheus Configuration, PromQL, Storage, Service Discovery, Alerting with AlertManager and so on. The end goal is to start using Prometheus to monitor and alert on ALL THE THINGS!

--

--

Weston Bassler

Husband. Father. Friend. Mentor. Developer. Engineer. | Sr MLOps Eng