Aggregation Functions

This module covers PromQL aggregation functions, which allow you to summarize data across multiple time series. You will learn how to group metrics, calculate sums, averages, and other statistical functions.

Understanding Aggregation

Aggregation functions reduce multiple time series into a single time series by grouping them and applying a function (sum, average, max, etc.) to each group.

Basic Aggregation Syntax

<aggregation-function>([parameter,] <vector expression>) [without|by (<label list>)]

The `by` Clause

The by clause specifies which labels to keep when grouping:

sum(container_memory_usage_bytes) by (namespace)

This groups all time series by namespace and sums the memory usage for each namespace.

The `without` Clause

The without clause specifies which labels to remove before grouping:

sum(container_memory_usage_bytes) without (pod, container)

This removes the pod and container labels, then sums the remaining groups.

Sum Aggregation

The sum() function adds up all values in each group.

Basic Sum

# Total memory usage across all containers
sum(container_memory_usage_bytes)

Sum by Label

# Total memory usage per namespace
sum(container_memory_usage_bytes) by (namespace)

# Total CPU usage per node
sum(container_cpu_usage_seconds_total) by (node)

Sum with Multiple Labels

# Total memory usage per namespace and pod
sum(container_memory_usage_bytes) by (namespace, pod)

Average Aggregation

The avg() function calculates the average value for each group.

Basic Average

# Average memory usage across all containers
avg(container_memory_usage_bytes)

Average by Label

# Average memory usage per namespace
avg(container_memory_usage_bytes) by (namespace)

# Average CPU usage per node
avg(container_cpu_usage_seconds_total) by (node)

Min and Max Aggregations

The min() and max() functions find the minimum and maximum values in each group.

Minimum Values

# Minimum memory usage per namespace
min(container_memory_usage_bytes) by (namespace)

Maximum Values

# Maximum memory usage per namespace
max(container_memory_usage_bytes) by (namespace)

Count Aggregation

The count() function counts the number of time series in each group.

Count Time Series

# Count of containers per namespace
count(container_memory_usage_bytes) by (namespace)

# Count of pods per node
count(kube_pod_info) by (node)

Statistical Aggregations

PromQL provides several statistical aggregation functions.

Standard Deviation

The stddev() function calculates the standard deviation:

# Standard deviation of memory usage per namespace
stddev(container_memory_usage_bytes) by (namespace)

Variance

The stdvar() function calculates the variance:

# Variance of memory usage per namespace
stdvar(container_memory_usage_bytes) by (namespace)

Quantiles

The quantile() function calculates quantiles over grouped time series:

# 95th percentile of memory usage per namespace
quantile(0.95, container_memory_usage_bytes) by (namespace)

# 50th percentile (median) of CPU usage per node
quantile(0.50, container_cpu_usage_seconds_total) by (node)

Common quantile values:
* 0.50 - Median
* 0.90 - 90th percentile
* 0.95 - 95th percentile
* 0.99 - 99th percentile

Top-K and Bottom-K

The topk() and bottomk() functions return the top or bottom K time series by value.

Top-K

# Top 5 containers by memory usage
topk(5, container_memory_usage_bytes)

# Top 3 namespaces by total memory usage
topk(3, sum(container_memory_usage_bytes) by (namespace))

Bottom-K

# Bottom 5 containers by memory usage
bottomk(5, container_memory_usage_bytes)

# Bottom 3 nodes by CPU usage
bottomk(3, sum(container_cpu_usage_seconds_total) by (node))

Combining Aggregations

You can combine multiple aggregation functions to create complex queries.

Nested Aggregations

# Average of maximum memory usage per namespace
avg(max(container_memory_usage_bytes) by (namespace, pod)) by (namespace)

Aggregation with Filtering

# Sum of memory usage for production namespaces only
sum(container_memory_usage_bytes{namespace=~"prod.*"}) by (namespace)

Rate Functions with Aggregation

Rate functions are often used with aggregations to calculate rates across multiple time series.

Rate and Sum

# Total request rate per service
sum(rate(http_requests_total[5m])) by (service)

Rate and Average

# Average request rate per namespace
avg(rate(http_requests_total[5m])) by (namespace)

Common Aggregation Patterns in OpenShift

Total Resource Usage per Namespace

# Total CPU usage per namespace
sum(container_cpu_usage_seconds_total) by (namespace)

# Total memory usage per namespace
sum(container_memory_usage_bytes) by (namespace)

Average Resource Usage per Node

# Average CPU usage per node
avg(container_cpu_usage_seconds_total) by (node)

# Average memory usage per node
avg(container_memory_usage_bytes) by (node)

Pod Count per Namespace

# Number of pods per namespace
count(kube_pod_info) by (namespace)

Top Resource Consumers

# Top 10 pods by memory usage
topk(10, container_memory_usage_bytes)

# Top 5 namespaces by total CPU usage
topk(5, sum(container_cpu_usage_seconds_total) by (namespace))

Best Practices

Choose Appropriate Aggregation

Use sum() for additive metrics (counters, bytes)
Use avg() for metrics where you want the average
Use max() or min() when you need extremes
Use quantile() for percentile analysis

Group by Meaningful Labels

Group by labels that make sense for your use case:

# Good: Group by namespace for resource planning
sum(container_memory_usage_bytes) by (namespace)

# Less useful: Group by instance (too granular)
sum(container_memory_usage_bytes) by (instance)

Avoid Over-Aggregation

Don’t aggregate away information you might need:

# Too aggregated: Loses pod-level detail
sum(container_memory_usage_bytes)

# Better: Keeps namespace detail
sum(container_memory_usage_bytes) by (namespace)

Summary

In this module, you learned:

How aggregation functions work in PromQL
Using by and without clauses for grouping
Common aggregation functions: sum(), avg(), min(), max(), count()
Statistical functions: stddev(), stdvar(), quantile()
Top-K and bottom-K functions
Combining aggregations with rate functions
Common aggregation patterns for OpenShift
Best practices for effective aggregation

In the next module, you will learn advanced query patterns including rate functions, subqueries, and logical operators.