Prometheus Query Cheat Sheet: 50+ Essential PromQL

The Prometheus Query Cheat Sheet is your ultimate reference for mastering PromQL — the powerful query language used to extract, analyze, and visualize time-series metrics in Prometheus. Whether you’re a DevOps engineer, SRE, or developer, this guide simplifies the most important queries, operators, and aggregation functions you’ll need for effective monitoring.

Prometheus Query Language (PromQL) lets you perform complex operations on metrics, including filtering, mathematical computations, and data aggregation — all in real time. By the end of this guide, you’ll be able to query metrics confidently, visualize performance trends, and create meaningful dashboards.

What Is Prometheus and PromQL?

Prometheus is an open-source monitoring and alerting system developed by SoundCloud. It collects metrics from configured targets at intervals, stores them as time-series data, and enables users to query that data using PromQL (Prometheus Query Language).

PromQL is the backbone of Prometheus — it allows you to slice, dice, and analyze your metrics in real-time. You can retrieve metrics for CPU usage, memory consumption, network activity, HTTP request latency, and much more.

Example:

up

This simple query returns the status of monitored targets (1 for up, 0 for down).

Why Use This Prometheus Query Cheat Sheet?

Prometheus has a vast query language, and memorizing every function or operator isn’t practical. This Prometheus Query Cheat Sheet gives you:

  • A quick reference for all key PromQL commands.
  • Ready-to-use examples for system and application monitoring.
  • Aggregation, mathematical, and rate function usage.
  • Simplified explanations for faster debugging and dashboard building.

PromQL Query Types Explained

Prometheus supports different query types for various use cases:

1. Instant Queries

Instant queries return the current value of a metric.

node_memory_MemFree_bytes

2. Range Queries

Range queries show metric changes over a period.

rate(http_requests_total[5m])

3. Vector Queries

These return multiple time-series results for the same metric name.

Understanding these types helps you choose the right query format for alerts, graphs, or analysis.

Basic Prometheus Query Examples

Here are a few commonly used PromQL queries for everyday monitoring.

PurposeQuery Example
Show all active targetsup
Show CPU usage per corerate(node_cpu_seconds_total[1m])
Show memory usagenode_memory_MemTotal_bytes - node_memory_MemFree_bytes
Disk usage percentage(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
HTTP request raterate(http_requests_total[5m])

These queries are the foundation of system-level monitoring.

Using Labels in Prometheus Queries

Labels allow fine-grained filtering of metrics. Each time series in Prometheus is uniquely identified by its metric name and label set.

Filter by Label

http_requests_total{method="GET"}

Exclude a Label

http_requests_total{method!="POST"}

Multiple Label Filters

http_requests_total{method="GET", handler="/api"}

Labels help you zoom in on specific applications, endpoints, or nodes for precise analysis.

Working with Rate and Increase Functions

The rate() and increase() functions are essential for analyzing time-series changes.

1. rate()

Shows per-second average rate of increase.

rate(http_requests_total[5m])

2. increase()

Displays the total increase over a time range.

increase(http_requests_total[1h])

3. irate()

Instantaneous rate — good for real-time dashboards.

irate(node_cpu_seconds_total[1m])

Use rate() for smoother graphs and irate() for more responsive charts.

Aggregation Operators in PromQL

Aggregations summarize data across multiple dimensions.

OperatorDescription
sum()Sum of all values
avg()Average of all values
max()Maximum value
min()Minimum value
count()Count of series
stddev()Standard deviation
topk()Top K series by value

Examples:

sum(rate(http_requests_total[5m])) by (method)
avg(node_cpu_seconds_total) by (mode)
topk(5, rate(http_requests_total[5m]))

Aggregation is vital for service-level metrics like average latency or total traffic.

Mathematical Operations in Prometheus Queries

PromQL allows math operations between metrics and constants.

Examples:

node_memory_MemFree_bytes / node_memory_MemTotal_bytes * 100
rate(http_requests_total[5m]) * 60

You can even combine metrics:

(rate(http_requests_total[1m]) / rate(http_requests_errors_total[1m])) * 100

This calculates error percentages from total request counts.

Comparison and Logical Operators

Use these operators to compare or combine metrics.

OperatorMeaning
==Equal to
!=Not equal to
>Greater than
<Less than
and, or, unlessLogical operations

Example:

up == 0

Lists all targets that are down.

node_cpu_seconds_total > 0.9

Shows nodes with CPU usage above 90%.

Rate vs. Increase vs. Irate

Let’s summarize these commonly confused PromQL functions:

FunctionPurposeUse Case
rate()Per-second average over rangeSmooth trends
increase()Total increase over rangeCounters
irate()Instantaneous rateLive dashboards

Each has unique advantages depending on how frequently your metrics are scraped.

Working with Histogram Metrics

Histogram metrics measure distribution, such as request durations.

Example:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

This returns the 95th percentile latency for HTTP requests.

Common Histogram Buckets:

  • le = less than or equal to bucket boundary.
  • sum by (le) aggregates all samples by bucket.

Histograms are crucial for performance and SLA/SLO analysis.

Vector Matching in Prometheus Queries

When performing operations between two metrics, Prometheus uses vector matching to align series by labels.

Example:

http_requests_total{job="api"} / http_requests_total{job="frontend"}

Use:

  • on() to match specific labels.
  • ignoring() to exclude labels from matching.

Example:

rate(requests_total[5m]) / ignoring(instance) rate(errors_total[5m])

This prevents mismatched label dimensions.

Prometheus Query Cheat Sheet for Node Exporter Metrics

Here are the most used PromQL commands for host-level metrics:

MetricDescriptionExample
CPU UsageCPU time used per moderate(node_cpu_seconds_total[5m])
Memory UsageFree vs total memory1 - (node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
Disk UsageUsed space percentage(node_filesystem_size_bytes - node_filesystem_free_bytes) / node_filesystem_size_bytes * 100
Load AverageSystem loadnode_load1
Network In/OutBytes transferredrate(node_network_receive_bytes_total[5m])

These form the backbone of system observability in Prometheus.

PromQL Cheat Sheet for Kubernetes Metrics

Prometheus integrates deeply with Kubernetes, allowing cluster-level insights.

Use CaseQuery Example
Pod CPU Usagesum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
Pod Memory Usagesum(container_memory_usage_bytes) by (pod)
Node Disk Usagenode_filesystem_avail_bytes / node_filesystem_size_bytes * 100
Pod Restart Countsum(increase(kube_pod_container_status_restarts_total[1h])) by (pod)
Running Podscount(kube_pod_info)

These queries are essential for monitoring Kubernetes cluster health.

Alerting Rules with Prometheus Queries

Prometheus queries are used to trigger alerts in Alertmanager.

Example Rule:

- alert: HighCPULoad
  expr: sum(rate(node_cpu_seconds_total[1m])) by (instance) > 0.9
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High CPU usage detected on {{ $labels.instance }}"

By mastering PromQL, you can design intelligent alerts to catch anomalies early.

Advanced PromQL Functions

FunctionPurpose
avg_over_time()Average over a time range
sum_over_time()Sum over time
max_over_time()Maximum in range
quantile_over_time()Percentile values
predict_linear()Forecast trends
deriv()Derivative of time series

Example:

predict_linear(http_requests_total[1h], 3600)

Forecasts future requests based on past trends.

PromQL Query Optimization Tips

  1. Use shorter time ranges for faster queries.
  2. Filter labels precisely to reduce data scans.
  3. Use recording rules for repetitive queries.
  4. Avoid using heavy joins in dashboards.
  5. Test queries incrementally.

These optimizations make dashboards load faster and alerts more responsive.

Best Practices for Prometheus Querying

  • Always use rate() for counters (e.g., request counts).
  • Use gauge metrics for instantaneous values (e.g., temperature, memory).
  • Normalize metrics for consistent visualization.
  • Use recording rules to precompute metrics for dashboards.
  • Regularly test queries in Prometheus UI or Grafana Explore.

A disciplined PromQL strategy improves observability and scalability.

Conclusion

This Prometheus Query Cheat Sheet gives you everything you need to query, aggregate, and analyze metrics efficiently. From system monitoring and Kubernetes metrics to alerting and forecasting, PromQL’s flexibility makes it a must-have skill for any DevOps engineer.

Mastering these commands and best practices will help you troubleshoot faster, optimize performance, and create meaningful dashboards — all while unlocking the full power of Prometheus.

FAQs About Prometheus Query Cheat Sheet

1. What is PromQL in Prometheus?

PromQL (Prometheus Query Language) is used to query and analyze time-series data collected by Prometheus. It supports filtering, aggregation, and mathematical operations.

2. How do I query data in Prometheus?

You can query data using the Prometheus UI or API. For example:
rate(http_requests_total[5m])

3. What is the difference between rate() and increase()?

rate() shows the per-second rate of increase, while increase() shows the total increase over the selected range.

4. Can Prometheus queries be used in Grafana?

Yes, Grafana supports Prometheus as a data source. You can use PromQL queries directly in Grafana dashboards.

5. How do I create custom alerts in Prometheus?

Use alerting rules in YAML format with PromQL expressions. These rules are processed by Alertmanager to send notifications.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top