Anomaly Detection
Realm9 automatically detects unusual spending patterns across your cloud infrastructure, helping you catch cost issues before they become significant problems.
Accessing Anomalies
Navigate to FinOps > Anomalies in the sidebar, or visit /finops/anomalies.
How Anomaly Detection Works
Detection Methods
Realm9 uses a hybrid detection approach:
| Method | Description | Best For |
|---|---|---|
| Statistical | Z-score and EWMA algorithms | Gradual changes, trends |
| Provider | AWS Cost Anomaly Detection API | AWS-specific anomalies |
| Hybrid | Combined statistical + provider | Most accurate detection |
Detection Algorithm
- Baseline Calculation: Establishes normal spending patterns using 30-day rolling average
- Seasonality Adjustment: Accounts for weekly patterns (weekday vs. weekend)
- Threshold Calculation: Uses modified Z-score for outlier detection
- Confidence Scoring: Assigns confidence level (0-100%) to each detection
Detection Frequency
- Anomaly detection runs during each cost sync
- Typically executes daily for daily-granularity cost data
- Near-real-time for services with hourly data
Anomaly Dashboard
Summary Cards
Quick overview showing:
- Total Active Anomalies: Count of unresolved anomalies
- Critical/High: Severe anomalies requiring immediate attention
- Cost Impact: Total estimated cost impact of all anomalies
- New Today: Anomalies detected in the last 24 hours
Filters
Filter anomalies by:
- Status: Open, Investigating, Resolved, Ignored
- Severity: Critical, High, Medium, Low
- Time Range: Last 7/30/90 days
- Cloud Connection: Specific AWS/Azure account
- Service: Specific cloud service
Anomaly Severity Levels
| Severity | Cost Variance | Description |
|---|---|---|
| Critical | > 500% | Extreme deviation, immediate action needed |
| High | 200-500% | Significant deviation, investigate quickly |
| Medium | 100-200% | Moderate deviation, review when possible |
| Low | 50-100% | Minor deviation, monitor for patterns |
Anomaly Details
Click on an anomaly to see detailed information:
Overview Section
- Detected At: When the anomaly was first detected
- Severity: Classification based on variance
- Source: Detection method (Statistical/Provider/Hybrid)
- Confidence: Percentage confidence in the detection
Cost Information
- Actual Cost: The actual cost recorded
- Expected Cost: What the cost should have been
- Variance: Difference between actual and expected
- Percentage Change: Relative change from expected
Scope
Where the anomaly occurred:
- Service: Which cloud service (EC2, RDS, S3)
- Region: Geographic region
- Account: AWS Account ID / Azure Subscription
- Namespace: Kubernetes namespace (if applicable)
Root Cause Analysis
Automated analysis showing:
- Top Contributors: Resources/usage types causing the spike
- Related Changes: Recent Terraform runs or deployments
- Forecast Impact: Projected future cost if pattern continues
Daily Trend Chart
Interactive chart showing:
- Historical cost pattern (30-day context)
- Anomaly period highlighted
- Expected cost baseline
Managing Anomalies
Status Workflow
OPEN → INVESTIGATING → RESOLVED
↓
IGNORED
Status Actions
| Action | When to Use |
|---|---|
| Investigate | Started reviewing the anomaly |
| Resolve | Root cause identified and addressed |
| Ignore | Expected behavior, not an issue |
| Mark False Positive | Detection was incorrect |
Resolution Notes
When resolving an anomaly, document:
- Root cause identified
- Actions taken
- Prevention measures
This helps with:
- Audit trail
- Team knowledge sharing
- Pattern recognition for future anomalies
Anomaly Notifications
Alert Channels
Configure alerts via:
- Email: Immediate notification to specified addresses
- Slack: Real-time alerts to channel (requires integration)
- Webhook: Custom integrations
Alert Configuration
Set thresholds for notifications:
- Minimum severity level to notify
- Cost impact threshold
- Services to include/exclude
Best Practices
Investigation Workflow
- Triage: Review severity and cost impact
- Investigate: Check root cause analysis
- Correlate: Look for related changes (deployments, Terraform)
- Action: Resolve issue or mark as expected
- Document: Add resolution notes
Reducing False Positives
- Allow 30+ days of data for accurate baselines
- Mark seasonal patterns as ignored
- Exclude known variable workloads
- Adjust sensitivity settings if needed
Anomaly Prevention
After resolving anomalies:
- Set up cost budgets for affected services
- Add alerts for specific thresholds
- Review IAM policies if unauthorized resources created
- Update tagging to improve cost attribution
API Access
# Get all anomalies
curl -X GET https://realm9.app/api/finops/anomalies \
-H "Authorization: Bearer $TOKEN"
# Get specific anomaly
curl -X GET https://realm9.app/api/finops/anomalies/{id} \
-H "Authorization: Bearer $TOKEN"
# Update anomaly status
curl -X PATCH https://realm9.app/api/finops/anomalies/{id} \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"status": "RESOLVED",
"resolutionNotes": "Root cause identified and fixed"
}'
Anomaly Data Model
Key Fields
interface CostAnomaly {
id: string
detectedAt: Date
severity: 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
source: 'STATISTICAL' | 'PROVIDER' | 'HYBRID'
confidence: number // 0-100
actualCost: number
expectedCost: number
variance: number
currency: string
scope: string // 'SERVICE' | 'REGION' | 'ACCOUNT'
scopeValue: string
service?: string
region?: string
status: 'OPEN' | 'INVESTIGATING' | 'RESOLVED' | 'IGNORED'
resolvedAt?: Date
resolvedBy?: string
resolutionNotes?: string
}
Troubleshooting
No Anomalies Detected
- Ensure at least 14 days of cost data exists
- Check that cost sync has completed successfully
- Verify anomaly detection is enabled in settings
Too Many False Positives
- Increase the detection threshold
- Mark known variable workloads as ignored
- Allow more time for baseline calculation
Missing Anomalies
- Check minimum variance threshold
- Verify the affected service is included
- Ensure cost data exists for the time period
