Anomaly Detection

Name: Realm9
Author: Realm9

Realm9 automatically detects unusual spending patterns across your cloud infrastructure, helping you catch cost issues before they become significant problems.

Accessing Anomalies

Navigate to FinOps > Anomalies in the sidebar, or visit /finops/anomalies.

How Anomaly Detection Works

Detection Methods

Realm9 uses a hybrid detection approach:

Method	Description	Best For
Statistical	Z-score and EWMA algorithms	Gradual changes, trends
Provider	AWS Cost Anomaly Detection API	AWS-specific anomalies
Hybrid	Combined statistical + provider	Most accurate detection

Detection Algorithm

Baseline Calculation: Establishes normal spending patterns using 30-day rolling average
Seasonality Adjustment: Accounts for weekly patterns (weekday vs. weekend)
Threshold Calculation: Uses modified Z-score for outlier detection
Confidence Scoring: Assigns confidence level (0-100%) to each detection

Detection Frequency

Anomaly detection runs during each cost sync
Typically executes daily for daily-granularity cost data
Near-real-time for services with hourly data

Anomaly Dashboard

Summary Cards

Quick overview showing:

Total Active Anomalies: Count of unresolved anomalies
Critical/High: Severe anomalies requiring immediate attention
Cost Impact: Total estimated cost impact of all anomalies
New Today: Anomalies detected in the last 24 hours

Filters

Filter anomalies by:

Status: Open, Investigating, Resolved, Ignored
Severity: Critical, High, Medium, Low
Time Range: Last 7/30/90 days
Cloud Connection: Specific AWS/Azure account
Service: Specific cloud service

Anomaly Severity Levels

Severity	Cost Variance	Description
Critical	> 500%	Extreme deviation, immediate action needed
High	200-500%	Significant deviation, investigate quickly
Medium	100-200%	Moderate deviation, review when possible
Low	50-100%	Minor deviation, monitor for patterns

Anomaly Details

Click on an anomaly to see detailed information:

Overview Section

Detected At: When the anomaly was first detected
Severity: Classification based on variance
Source: Detection method (Statistical/Provider/Hybrid)
Confidence: Percentage confidence in the detection

Cost Information

Actual Cost: The actual cost recorded
Expected Cost: What the cost should have been
Variance: Difference between actual and expected
Percentage Change: Relative change from expected

Scope

Where the anomaly occurred:

Service: Which cloud service (EC2, RDS, S3)
Region: Geographic region
Account: AWS Account ID / Azure Subscription
Namespace: Kubernetes namespace (if applicable)

Root Cause Analysis

Automated analysis showing:

Top Contributors: Resources/usage types causing the spike
Related Changes: Recent Terraform runs or deployments
Forecast Impact: Projected future cost if pattern continues

Daily Trend Chart

Interactive chart showing:

Historical cost pattern (30-day context)
Anomaly period highlighted
Expected cost baseline

Managing Anomalies

Status Workflow

OPEN → INVESTIGATING → RESOLVED
           ↓
       IGNORED

Status Actions

Action	When to Use
Investigate	Started reviewing the anomaly
Resolve	Root cause identified and addressed
Ignore	Expected behavior, not an issue
Mark False Positive	Detection was incorrect

Resolution Notes

When resolving an anomaly, document:

Root cause identified
Actions taken
Prevention measures

This helps with:

Audit trail
Team knowledge sharing
Pattern recognition for future anomalies

Anomaly Notifications

Alert Channels

Configure alerts via:

Email: Immediate notification to specified addresses
Slack: Real-time alerts to channel (requires integration)
Webhook: Custom integrations

Alert Configuration

Set thresholds for notifications:

Minimum severity level to notify
Cost impact threshold
Services to include/exclude

Best Practices

Investigation Workflow

Triage: Review severity and cost impact
Investigate: Check root cause analysis
Correlate: Look for related changes (deployments, Terraform)
Action: Resolve issue or mark as expected
Document: Add resolution notes

Reducing False Positives

Allow 30+ days of data for accurate baselines
Mark seasonal patterns as ignored
Exclude known variable workloads
Adjust sensitivity settings if needed

Anomaly Prevention

After resolving anomalies:

Set up cost budgets for affected services
Add alerts for specific thresholds
Review IAM policies if unauthorized resources created
Update tagging to improve cost attribution

API Access

# Get all anomalies
curl -X GET https://realm9.app/api/finops/anomalies \
  -H "Authorization: Bearer $TOKEN"

# Get specific anomaly
curl -X GET https://realm9.app/api/finops/anomalies/{id} \
  -H "Authorization: Bearer $TOKEN"

# Update anomaly status
curl -X PATCH https://realm9.app/api/finops/anomalies/{id} \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "RESOLVED",
    "resolutionNotes": "Root cause identified and fixed"
  }'

Anomaly Data Model

Key Fields

interface CostAnomaly {
  id: string
  detectedAt: Date
  severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
  source: 'STATISTICAL' | 'PROVIDER' | 'HYBRID'
  confidence: number // 0-100

  actualCost: number
  expectedCost: number
  variance: number
  currency: string

  scope: string // 'SERVICE' | 'REGION' | 'ACCOUNT'
  scopeValue: string
  service?: string
  region?: string

  status: 'OPEN' | 'INVESTIGATING' | 'RESOLVED' | 'IGNORED'
  resolvedAt?: Date
  resolvedBy?: string
  resolutionNotes?: string
}

Troubleshooting

No Anomalies Detected

Ensure at least 14 days of cost data exists
Check that cost sync has completed successfully
Verify anomaly detection is enabled in settings

Too Many False Positives

Increase the detection threshold
Mark known variable workloads as ignored
Allow more time for baseline calculation

Missing Anomalies

Check minimum variance threshold
Verify the affected service is included
Ensure cost data exists for the time period