System Overview

Cluster: prod-us-east-1 — 12 services, 47 pods

Throughput healthy
0 req/s
+12.4% vs last hour
Avg Latency healthy
0ms
-8.2% vs last hour
Error Rate elevated
0%
+1.2% vs last hour
Active Alerts 3 active
0
2 warnings, 1 critical

Live Log Stream

Service Topology

API
GW
2ms
Auth
8ms
Order
124ms
User
12ms
Pay
TIMEOUT
Notif
23ms
Stripe
API
DNS ERR
Healthy
Warning
Error

AI Summary

Critical — Payment Service

Circuit breaker OPEN on payment-svc. Root cause: DNS resolution failure to stripe-api.com started 8 minutes ago. 847 failed transactions.

Warning — Order Service

Latency spike: avg 124ms (baseline 15ms). Caused by inventory cache miss cascade. Recommend cache warm-up.

Warning — Memory Pressure

GC pause frequency increasing on user-svc. Memory at 72% (threshold: 80%). Trending upward.

Recommendation

Switch payment-svc to backup Stripe endpoint. Run logpilot remediate --service payment-svc

Ask Pilot

I've detected a critical issue with the payment service. The circuit breaker opened 8 minutes ago due to DNS resolution failures. Would you like me to run a full RCA or apply the recommended fix?
Run full RCA first, then show me the fix options.
RCA Complete
Root cause: DNS resolver config change in payment-svc v2.4.0 (deployed 43min ago)
Impact: 847 failed txns, $12.4K estimated revenue impact
Fix options:
1. Recommended: Switch to backup endpoint
2. Rollback to v2.3.9
3. Manual DNS config patch

Active Patterns

DNS resolution failure
234×
Cache miss cascade
89×
GC pause > 200ms
47×
Connection timeout (suppressed)
12.4K×

Quick Actions