Exercises

Exercise 1: Rate Functions

Practice using rate functions to calculate per-second rates from counter metrics.

Step 1: Basic Rate Calculation

If you have HTTP request metrics, calculate the request rate:

rate(http_requests_total[5m])

If you don’t have HTTP metrics, use a counter metric available in your cluster. You can find available metrics using:

curl -H "Authorization: Bearer $(oc whoami -t)" \
  "https://prometheus-k8s-openshift-monitoring.apps.<cluster-domain>/api/v1/label/__name__/values"

Step 2: Rate with Aggregation

Calculate the total request rate per service:

sum(rate(http_requests_total[5m])) by (service)

Step 3: Compare rate() and irate()

Compare the results of rate() and irate():

# Average rate
rate(http_requests_total[5m])

# Instant rate
irate(http_requests_total[5m])

Observe the differences. When would you use each?

Exercise 2: Gauge Functions

Practice using functions designed for gauge metrics.

Step 1: Average Over Time

Calculate the average memory usage over the last 5 minutes:

avg_over_time(container_memory_usage_bytes[5m])

Step 2: Maximum Over Time

Find the maximum memory usage over the last hour:

max_over_time(container_memory_usage_bytes[1h])

Compare this with the current value. What does this tell you?

Step 3: Delta Function

Calculate the change in memory usage over 5 minutes:

delta(container_memory_usage_bytes[5m])

Positive values indicate memory increase, negative values indicate decrease.

Exercise 3: Subqueries

Practice writing subqueries for time-based analysis.

Step 1: Basic Subquery

Calculate the average 5-minute rate over the last hour, evaluated every minute:

avg_over_time(rate(http_requests_total[5m])[1h:1m])

If you don’t have HTTP metrics, adapt this to a counter metric in your cluster.

Step 2: Maximum Rate Over Time

Find the maximum 5-minute rate over the last 6 hours:

max_over_time(rate(http_requests_total[5m])[6h:5m])

Step 3: Percentile Over Time

Calculate the 95th percentile of memory usage over the last day:

quantile_over_time(0.95, container_memory_usage_bytes[1d:1h])

Exercise 4: Logical Operators

Practice using logical operators to combine conditions.

Step 1: AND Operator

Find containers that have both memory and CPU metrics:

container_memory_usage_bytes and container_cpu_usage_seconds_total

How many containers match this condition?

Step 2: UNLESS Operator

Find containers with memory metrics but no CPU metrics:

container_memory_usage_bytes unless container_cpu_usage_seconds_total

Step 3: Complex Logical Expression

Combine multiple conditions. For example, find memory usage for containers in production namespaces that also have CPU metrics:

container_memory_usage_bytes{namespace=~"prod.*"} and container_cpu_usage_seconds_total

Exercise 5: Mathematical Functions

Practice using mathematical functions to transform metric values.

Step 1: Convert Bytes to Megabytes

Convert memory usage from bytes to megabytes:

container_memory_usage_bytes / 1048576

Step 2: Round Values

Round memory usage in megabytes to the nearest integer:

round(container_memory_usage_bytes / 1048576)

Step 3: Calculate Percentage

If you have limit metrics, calculate memory usage as a percentage:

100 * container_memory_usage_bytes / container_spec_memory_limit_bytes

Exercise 6: Advanced OpenShift Patterns

Practice writing advanced queries for OpenShift-specific monitoring scenarios.

Step 1: CPU Utilization Percentage

Calculate CPU utilization as a percentage per namespace:

sum(100 * rate(container_cpu_usage_seconds_total[5m])) by (namespace)

Which namespace has the highest CPU utilization?

Step 2: Pod Restart Rate

Calculate the rate of pod container restarts:

sum(rate(kube_pod_container_status_restarts_total[5m])) by (namespace, pod)

Are there any pods with high restart rates?

Step 3: Top Memory Consumers with Rate

Find the top 5 containers by memory usage, showing their current values:

topk(5, container_memory_usage_bytes)

Now find the top 5 by average memory usage over the last hour:

topk(5, avg_over_time(container_memory_usage_bytes[1h]))

Compare the results. Are they the same?

Exercise 7: Query Optimization

Practice optimizing queries for better performance.

Step 1: Compare Range Durations

Compare query execution time for different range durations:

# Short range
rate(http_requests_total[1m])

# Longer range
rate(http_requests_total[15m])

Note the execution time difference (if visible in your Prometheus UI).

Step 2: Limit Results

Instead of returning all results, limit to top 10:

# All results
container_memory_usage_bytes

# Top 10 only
topk(10, container_memory_usage_bytes)

Step 3: Optimize Subquery Resolution

Compare subqueries with different resolutions:

# Higher resolution (more data points)
avg_over_time(rate(http_requests_total[5m])[1h:30s])

# Lower resolution (fewer data points)
avg_over_time(rate(http_requests_total[5m])[1h:5m])

Use the resolution that provides sufficient detail without excessive computation.

Exercise 8: Debugging Queries

Practice debugging techniques for complex queries.

Step 1: Check for Missing Metrics

Check if a specific metric exists:

absent(nonexistent_metric)

This should return 1 if the metric is missing. Try with an existing metric to see it return empty.

Step 2: Break Down Complex Queries

Start with a simple query and build up complexity:

# Step 1: Basic metric
container_memory_usage_bytes

# Step 2: Add filter
container_memory_usage_bytes{namespace="default"}

# Step 3: Add aggregation
sum(container_memory_usage_bytes{namespace="default"})

# Step 4: Add rate (if applicable)
sum(rate(container_memory_usage_bytes{namespace="default"}[5m]))

Step 3: Validate Label Values

Check what label values exist for a specific label:

# See all unique namespace values
count by (namespace) (container_memory_usage_bytes)

Verification

After completing these exercises, verify your understanding:

  • Can you calculate rates from counter metrics?

  • Can you use functions appropriate for gauge metrics?

  • Can you write subqueries for time-based analysis?

  • Can you use logical operators to combine conditions?

  • Can you apply mathematical functions to transform values?

  • Can you write optimized queries for OpenShift monitoring?

  • Can you debug complex queries effectively?

If you can answer yes to all these questions, you have successfully completed the Prometheus Query Workshop! You are now equipped to write effective PromQL queries for monitoring your OpenShift 4.16 infrastructure.