As a full-stack developer and DevOps engineer, having deep visibility into all aspects of your application performance is critical. Understanding metrics like request rates, error rates, response times, resource usage etc. allows you to optimize bottlenecks and quickly resolve issues.
In this comprehensive 3200+ word guide, I will provide my insider perspective as a Prometheus expert on best practices for monitoring Python applications.
Why Prometheus?
Over the last 5 years, Prometheus has emerged as the de-facto standard open source monitoring and alerting solution designed specifically for modern cloud-native applications.
Here are some key capabilities that make Prometheus an ideal choice:
Multi-dimensional data model: Prometheus has an extremely powerful data model allowing you to slice and dice metrics across multiple dimensions like instance, endpoints, response code etc. This is invaluable while troubleshooting sporadic issues.
Customizable dashboards: While Prometheus has a built-in expression browser, tools like Grafana provide unlimited flexibility for custom monitoring dashboards to suit your needs.
Efficiency: The Prometheus server has a very small resource footprint, requiring only 512MB of RAM with durable time series storage using local storage.
Alerting: Complex alerting rules can be configured to send alerts only when critical conditions are met based on multiple metric criteria.
Scalability: Prometheus was designed ground up for monitoring large scale environments with 1000s of instances. The federation capabilities make it possible to monitor even larger environments.
Simply put, you get an enterprise-grade solution completely for free that works at any scale while being easier to operate than many commercial products.
Architecture Overview
At a high level, monitoring Python apps with Prometheus involves:
- Instrumenting application code via Prometheus client libraries to expose metrics
- Scrapping and storing metrics in a Prometheus server
- Analyzing metrics through Prometheus UI and Grafana dashboards
- Creating alerting rules for notifications
The Python client libraries expose an easy API to create Counters, Gauges, Summaries and Histograms. The Prometheus server scrapes these over HTTP when it discovers the app endpoints.
Here is a diagram depicting the workflow:

Now that you understand the basics, let‘s go through the components one by one.
Installing Prometheus Server
The Prometheus server can be easily deployed using docker with just one command:
docker run -d --name=prometheus -p 9090:9090 prom/prometheus
This launches Prometheus in a docker container with the following defaults:
- Local data stored in
/prometheus/data(persists restarts) - Metrics exposed on port
9090 - Web UI available on port
9090
The initial blank config monitors just the Prometheus server itself exposing stats like its own memory usage, scrape metrics etc.
In most cases, additional configuration is not needed as service discovery mechanisms allow for automatic target discovery in Kubernetes based environments.
For our demo, we will add scrape targets manually pointing to our sample Python app which will be deployed separately.
Python Prometheus Client
The Prometheus python client library is available as prometheus_client package installable via pip.
It has no external dependencies making it very easy to integrate:
pip install prometheus_client
The client exposes a metrics registry that can be used to create Counters, Gauges, Summaries and Histograms that serves as the exposition endpoint for scraping.
Instrumentation Example
Here is a sample Flask application that illustrates how instrumenting with a Counter and Histogram looks like:
from flask import Flask
from prometheus_client import Counter, Histogram
import random
import time
app = Flask(__name__)
REQUEST_COUNT = Counter(‘app_requests_total‘,‘Total Request Count‘)
RESPONSE_TIME = Histogram(‘app_response_time_seconds‘, ‘Response time distribution‘ )
@app.route(‘/‘)
def hello():
REQUEST_COUNT.inc()
# Simulate random response time
rand = random.random()
time.sleep(rand)
# Observe request time
RESPONSE_TIME.observe(rand)
return "Hello World!"
if __name__ == "__main__":
app.run(host=‘0.0.0.0‘, port=80)
This defines two metrics:
- app_requests_total: Increments per request
- app_response_time_seconds: Observes response time
That‘s all that‘s needed to instrument a Flask application. Just two lines of code gives deep visibility. More advanced instrumentation would involve:
- Breaking down requests counter across endpoints
- Adding error counters
- Capturing outlier response times
But this shows just how easy the Python Prometheus client makes the process.
Now let‘s look at how the server configuration works.
Configuring Prometheus Server
For Prometheus to collect metrics from our example app, we need to add the app as a target.
The Prometheus config file prometheus.yml accepts a list of scrape configurations each having a set of targets.
So for our app listening on port 80, we can add:
scrape_configs:
- job_name: ‘pythonapp‘
static_configs:
- targets: [‘localhost:80‘]
This defines our app as part of the ‘pythonapp‘ job with the metrics exposed on our local port 80. Multiple apps can be added under the same job.
Once this config change is reloaded after storing the file on the container‘s /etc/prometheus folder, our app metrics will automatically start getting scraped every 15 seconds by default.
The metrics show up instantly under Status > Targets.
Prometheus Query Language
One of the most powerful Prometheus capabilities is the ability to slice and dice through metrics using the Prometheus Query Language (PromQL).
Some examples of analysis that can be done on metrics collected from our sample app:
Total Requests
app_requests_total
Requests per second
rate(app_requests_total[1m])
99th Percentile Latency
histogram_quantile(0.99, rate(app_response_time_seconds_bucket[5m]))
The syntax allows mathematical and boolean operators on metrics enabling complex analysis.
While running ad-hoc queries in the UI is helpful during investigations, standing up a Grafana instance connected to Prometheus data source allows creating persistent dashboards tailored to application needs.
Grafana for Metrics Visualization
Grafana has become the standard tool in monitoring stacks for building custom dashboards tied to data sources like Prometheus.
The out of box support for Prometheus in Grafana makes it very easy to get started. Simply:
- Install Grafana
- Add Prometheus data source
- Create dashboards/panels for target metrics
And you instantly get beautiful graphs like this one displaying requests per second:

Some types of panels you can build on Python app metrics:
Timeseries: Plots metric value over time
Graph: Displays metric correlation
Bar gauge: Indicates threshold crossing
Heatmaps: Visualize latency distribution
Tables: For a metric summary
The customization options are endless allowing you to build the exact dashboards for visibility needs.
Here I‘ve put together a sample dashboard with 4 panels:

- Requests per second timeseries
- Average response time gauge
- Response time 95th percentile indicator
- Latency distribution heatmap
The power here is the ability to slice and dice easily by endpoints, instances etc. using template variables.
Recording Rules for Aggregated Views
On high traffic sites generating billions of metrics each day, querying rates and latency directly can get expensive.
This is where recording rules come in very handy.
Recording rules allow pre-computing frequently needed aggregated views from raw metrics to optimize querying.
For example:
- record: app_requests_per_second
expr: rate(app_requests_total[1m])
This computes the per second request rate saving additional processing each query. Any frequently used complex PromQL expressions can be simplified in this way.
Think of recording rules as OLAP cubes for metrics allowing fast access to aggregations.
Alerting Rules for Real-Time Notifications
Metrics and dashboards provide insight only when actively looked at. To get real-time notifications when critical conditions occur, Prometheus alerting rules need to be leveraged.
Here is a sample rule:
alert: HighLatency
expr: histogram_quantile(0.99, rate(app_response_time_seconds_bucket[1m])) > 0.25
for: 1m
This checks if the 99th percentile latency exceeds 250ms for over 1 minute duration before sending alerts.
Multiple criteria can be combined with boolean logic to trigger alerts only during actual fault scenarios.
The alerts can be routed to email, PagerDuty, Slack etc. with further integration with OpsGenie or ServiceNow possible for incident management.
Containerizing Python Apps
In production scenarios, Python apps are typically deployed as containers orchestrated by Kubernetes or Docker Swarm.
The good news is Prometheus works seamlessly in such containerized environments. Using a Prometheus discovery mechanism like file based, DNS or Kubernetes service discovery, targets can be detected automatically.
The standard way is to annotate pods with the following:
prometheus.io/scrape=true
prometheus.io/path=/metrics
prometheus.io/port=8000
This allows Prometheus to seamlessly pick up targets as apps scale up and down.
Optimizing Prometheus for Scale
For high scale production workloads, Prometheus needs to be configured taking the following into account:
Federation: Sharding Prometheus by function like app vs database vs infrastructure allows distributed scraping suitable even for 10000s instance clusters.
Remote storage: Only last few weeks can be stored locally. For long term archival, remote storage to AWS S3 etc. needs to be configured.
Recording rules: As discussed earlier, pre-aggregating metrics saves heavyduty PromQL querying. Dashboards should be connected to aggregated views rather than raw metrics.
Alert optimization: Too many unnecessary alerts flood the system reducing reliability. Carefully evaluate rules to alarm only during real infrastructure/app issues.
There are a whole host of optimizations worth mastering before running large deployments.
Closing Thoughts
I hope this post served as a good introduction to your journey with instrumenting, monitoring, visualizing and alerting on key Python application metrics using the Prometheus stack.
Mastering these concepts can save your team countless hours by fast troubleshooting of performance issues as well as prevent revenue loss by being able to fix problems immediately.
Do checkout my other in-depth posts on advanced Prometheus metrics analysis and building performing Prometheus deployments later as you grow more proficient.


