Realm9 Logo
Search documentation...

Anomaly Detection

Realm9 automatically detects unusual spending patterns across your cloud infrastructure, helping you catch cost issues before they become significant problems.

Accessing Anomalies

Navigate to FinOps > Anomalies in the sidebar, or visit /finops/anomalies.

How Anomaly Detection Works

Detection Methods

Realm9 uses a hybrid detection approach:

MethodDescriptionBest For
StatisticalZ-score and EWMA algorithmsGradual changes, trends
ProviderAWS Cost Anomaly Detection APIAWS-specific anomalies
HybridCombined statistical + providerMost accurate detection

Detection Algorithm

  1. Baseline Calculation: Establishes normal spending patterns using 30-day rolling average
  2. Seasonality Adjustment: Accounts for weekly patterns (weekday vs. weekend)
  3. Threshold Calculation: Uses modified Z-score for outlier detection
  4. Confidence Scoring: Assigns confidence level (0-100%) to each detection

Detection Frequency

  • Anomaly detection runs during each cost sync
  • Typically executes daily for daily-granularity cost data
  • Near-real-time for services with hourly data

Anomaly Dashboard

Summary Cards

Quick overview showing:

  • Total Active Anomalies: Count of unresolved anomalies
  • Critical/High: Severe anomalies requiring immediate attention
  • Cost Impact: Total estimated cost impact of all anomalies
  • New Today: Anomalies detected in the last 24 hours

Filters

Filter anomalies by:

  • Status: Open, Investigating, Resolved, Ignored
  • Severity: Critical, High, Medium, Low
  • Time Range: Last 7/30/90 days
  • Cloud Connection: Specific AWS/Azure account
  • Service: Specific cloud service

Anomaly Severity Levels

SeverityCost VarianceDescription
Critical> 500%Extreme deviation, immediate action needed
High200-500%Significant deviation, investigate quickly
Medium100-200%Moderate deviation, review when possible
Low50-100%Minor deviation, monitor for patterns

Anomaly Details

Click on an anomaly to see detailed information:

Overview Section

  • Detected At: When the anomaly was first detected
  • Severity: Classification based on variance
  • Source: Detection method (Statistical/Provider/Hybrid)
  • Confidence: Percentage confidence in the detection

Cost Information

  • Actual Cost: The actual cost recorded
  • Expected Cost: What the cost should have been
  • Variance: Difference between actual and expected
  • Percentage Change: Relative change from expected

Scope

Where the anomaly occurred:

  • Service: Which cloud service (EC2, RDS, S3)
  • Region: Geographic region
  • Account: AWS Account ID / Azure Subscription
  • Namespace: Kubernetes namespace (if applicable)

Root Cause Analysis

Automated analysis showing:

  • Top Contributors: Resources/usage types causing the spike
  • Related Changes: Recent Terraform runs or deployments
  • Forecast Impact: Projected future cost if pattern continues

Daily Trend Chart

Interactive chart showing:

  • Historical cost pattern (30-day context)
  • Anomaly period highlighted
  • Expected cost baseline

Managing Anomalies

Status Workflow

OPEN → INVESTIGATING → RESOLVED
           ↓
       IGNORED

Status Actions

ActionWhen to Use
InvestigateStarted reviewing the anomaly
ResolveRoot cause identified and addressed
IgnoreExpected behavior, not an issue
Mark False PositiveDetection was incorrect

Resolution Notes

When resolving an anomaly, document:

  • Root cause identified
  • Actions taken
  • Prevention measures

This helps with:

  • Audit trail
  • Team knowledge sharing
  • Pattern recognition for future anomalies

Anomaly Notifications

Alert Channels

Configure alerts via:

  • Email: Immediate notification to specified addresses
  • Slack: Real-time alerts to channel (requires integration)
  • Webhook: Custom integrations

Alert Configuration

Set thresholds for notifications:

  • Minimum severity level to notify
  • Cost impact threshold
  • Services to include/exclude

Best Practices

Investigation Workflow

  1. Triage: Review severity and cost impact
  2. Investigate: Check root cause analysis
  3. Correlate: Look for related changes (deployments, Terraform)
  4. Action: Resolve issue or mark as expected
  5. Document: Add resolution notes

Reducing False Positives

  1. Allow 30+ days of data for accurate baselines
  2. Mark seasonal patterns as ignored
  3. Exclude known variable workloads
  4. Adjust sensitivity settings if needed

Anomaly Prevention

After resolving anomalies:

  1. Set up cost budgets for affected services
  2. Add alerts for specific thresholds
  3. Review IAM policies if unauthorized resources created
  4. Update tagging to improve cost attribution

API Access

# Get all anomalies
curl -X GET https://realm9.app/api/finops/anomalies \
  -H "Authorization: Bearer $TOKEN"

# Get specific anomaly
curl -X GET https://realm9.app/api/finops/anomalies/{id} \
  -H "Authorization: Bearer $TOKEN"

# Update anomaly status
curl -X PATCH https://realm9.app/api/finops/anomalies/{id} \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "RESOLVED",
    "resolutionNotes": "Root cause identified and fixed"
  }'

Anomaly Data Model

Key Fields

interface CostAnomaly {
  id: string
  detectedAt: Date
  severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
  source: 'STATISTICAL' | 'PROVIDER' | 'HYBRID'
  confidence: number // 0-100

  actualCost: number
  expectedCost: number
  variance: number
  currency: string

  scope: string // 'SERVICE' | 'REGION' | 'ACCOUNT'
  scopeValue: string
  service?: string
  region?: string

  status: 'OPEN' | 'INVESTIGATING' | 'RESOLVED' | 'IGNORED'
  resolvedAt?: Date
  resolvedBy?: string
  resolutionNotes?: string
}

Troubleshooting

No Anomalies Detected

  1. Ensure at least 14 days of cost data exists
  2. Check that cost sync has completed successfully
  3. Verify anomaly detection is enabled in settings

Too Many False Positives

  1. Increase the detection threshold
  2. Mark known variable workloads as ignored
  3. Allow more time for baseline calculation

Missing Anomalies

  1. Check minimum variance threshold
  2. Verify the affected service is included
  3. Ensure cost data exists for the time period

Related Documentation