Advanced Query Patterns

This module covers advanced PromQL patterns including rate functions, subqueries, logical operators, and time-based functions. You will learn how to write complex queries for real-world monitoring scenarios.

Rate Functions

Rate functions calculate the per-second average rate of increase of a time series. They are essential for working with counter metrics.

The rate() Function

The rate() function calculates the per-second average rate of increase:

rate(http_requests_total[5m])

This calculates the average per-second rate of HTTP requests over the last 5 minutes.

The irate() Function

The irate() function calculates the per-second instant rate based on the last two data points:

irate(http_requests_total[5m])

Use irate() when you need the most recent rate and rate() for smoother averages.

The increase() Function

The increase() function calculates the total increase over a time range:

increase(http_requests_total[1h])

This shows the total number of requests in the last hour.

Rate with Aggregation

Combine rate functions with aggregations:

# Total request rate per service
sum(rate(http_requests_total[5m])) by (service)

# Average request rate per namespace
avg(rate(http_requests_total[5m])) by (namespace)

Working with Gauges

Gauges represent values that can go up or down. Different functions are used for gauges compared to counters.

Instant Values

For gauges, you typically query the current value:

container_memory_usage_bytes

Average Over Time

Calculate the average value over a time range:

avg_over_time(container_memory_usage_bytes[5m])

Max and Min Over Time

# Maximum memory usage over the last hour
max_over_time(container_memory_usage_bytes[1h])

# Minimum memory usage over the last hour
min_over_time(container_memory_usage_bytes[1h])

Rate of Change for Gauges

Use delta() or deriv() for gauges:

# Change in memory usage over 5 minutes
delta(container_memory_usage_bytes[5m])

# Per-second derivative
deriv(container_memory_usage_bytes[5m])

Subqueries

Subqueries allow you to evaluate an instant query at regular intervals over a time range.

Basic Subquery Syntax

<instant_query>[<range>:<resolution>]

Subquery Example

# Average rate over 5 minutes, evaluated every 30 seconds for the last hour
avg_over_time(rate(http_requests_total[5m])[1h:30s])

Common Subquery Patterns

# Maximum 5-minute rate over the last hour
max_over_time(rate(http_requests_total[5m])[1h:5m])

# 95th percentile of memory usage over the last day
quantile_over_time(0.95, container_memory_usage_bytes[1d:1h])

Logical Operators

Logical operators allow you to combine multiple conditions in queries.

AND Operator

The and operator returns the left-hand side if the right-hand side has a matching time series:

# Memory usage only for containers that also have CPU metrics
container_memory_usage_bytes and container_cpu_usage_seconds_total

OR Operator

The or operator returns all time series from both sides:

# Combine metrics from two different sources
metric_a or metric_b

Unless Operator

The unless operator returns the left-hand side if the right-hand side has no matching time series:

# Memory usage for containers without CPU metrics
container_memory_usage_bytes unless container_cpu_usage_seconds_total

Time-Based Functions

PromQL provides functions for working with timestamps and time ranges.

time() Function

Returns the current time in seconds since the Unix epoch:

time()

timestamp() Function

Returns the timestamp of each sample:

timestamp(container_memory_usage_bytes)

Time Calculations

Calculate time differences or offsets:

# Time 1 hour ago
time() - 3600

# Metrics from 1 hour ago (if available)
container_memory_usage_bytes offset 1h

Mathematical Functions

PromQL includes various mathematical functions for transforming values.

Basic Math Functions

# Absolute value
abs(container_memory_usage_bytes - 1073741824)

# Square root
sqrt(container_memory_usage_bytes)

# Logarithm (natural log)
ln(container_memory_usage_bytes)

# Logarithm base 10
log10(container_memory_usage_bytes)

# Exponentiation
exp(rate(http_requests_total[5m]))

Rounding Functions

# Round to nearest integer
round(container_memory_usage_bytes / 1048576)

# Round up
ceil(container_memory_usage_bytes / 1048576)

# Round down
floor(container_memory_usage_bytes / 1048576)

Label Manipulation

Functions for modifying labels in query results.

label_replace()

Add or modify labels:

label_replace(
  container_memory_usage_bytes,
  "environment",
  "$1",
  "namespace",
  "(.*)-prod"
)

label_join()

Join multiple label values:

label_join(
  container_memory_usage_bytes,
  "full_name",
  "-",
  "namespace",
  "pod",
  "container"
)

Advanced Patterns for OpenShift

CPU Utilization Percentage

Calculate CPU utilization as a percentage:

# CPU utilization per container
100 * rate(container_cpu_usage_seconds_total[5m])

# Average CPU utilization per namespace
avg(100 * rate(container_cpu_usage_seconds_total[5m])) by (namespace)

Memory Utilization Percentage

Calculate memory utilization relative to limits:

# Memory utilization percentage (if limits are available)
100 * container_memory_usage_bytes / container_spec_memory_limit_bytes

Request Rate with Error Rate

Calculate error rates:

# Error rate as percentage of total requests
100 * sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) /
  sum(rate(http_requests_total[5m])) by (service)

Pod Restart Rate

Monitor pod restarts:

# Rate of pod container restarts
rate(kube_pod_container_status_restarts_total[5m])

Resource Quota Usage

Monitor resource quota usage:

# Memory quota usage percentage
100 * sum(container_memory_usage_bytes) by (namespace) /
  kube_resourcequota{resource="requests.memory", type="used"}

Query Optimization

Avoid Overly Long Ranges

Long ranges increase query time and memory usage:

# Good: 5-minute range
rate(http_requests_total[5m])

# Less efficient: 1-day range
rate(http_requests_total[1d])

Use Appropriate Resolution

For subqueries, use appropriate resolution:

# Good: 1-minute resolution
avg_over_time(rate(http_requests_total[5m])[1h:1m])

# Less efficient: 1-second resolution
avg_over_time(rate(http_requests_total[5m])[1h:1s])

Limit Result Size

Use topk() or bottomk() to limit results:

# Only top 10 results
topk(10, container_memory_usage_bytes)

Debugging Queries

Check for Missing Metrics

Use absent() to detect if a metric is missing:

# Returns 1 if metric is missing, empty if present
absent(http_requests_total)

Validate Label Values

Check what label values exist:

# See all namespace values
group by (namespace) (container_memory_usage_bytes)

Test Query Components

Break complex queries into parts:

# Test the inner query first
rate(http_requests_total[5m])

# Then add aggregation
sum(rate(http_requests_total[5m])) by (service)

Summary

In this module, you learned:

  • Rate functions: rate(), irate(), and increase() for counters

  • Functions for gauges: avg_over_time(), max_over_time(), delta(), deriv()

  • Subqueries for time-based analysis

  • Logical operators: and, or, unless

  • Time-based functions: time(), timestamp(), offset

  • Mathematical functions for value transformation

  • Label manipulation functions

  • Advanced patterns for OpenShift monitoring

  • Query optimization techniques

  • Debugging strategies for complex queries

You now have the knowledge to write sophisticated Prometheus queries for real-world monitoring scenarios. Practice these patterns in your OpenShift environment to become proficient with PromQL.