causely
Use Causely directly in Cursor through a preconfigured MCP server. Query service health, root causes, SLOs, metrics, and topology through natural conversation — grounded in system ontology and live causal intelligence.
cursor.directory·↓ 11
MCP
causely
MCP server: causely
{
"url": "https://api.causely.app/mcp"
}Skill
causely-alert-triage
>
# Causely Alert Triage Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for alert-driven triage
| Tool | Use when | What it returns |
|---|---|---|
| `get_entities(query=, entity_types=)` | Resolve the service/entity from the alert | Entity IDs for the affected service |
| `get_alerts(entity_ids=)` | See all alerts firing + mapping state | Alert name, symptom mapping, severity, count, timestamps |
| `get_root_causes(symptom_ids=)` | Find diagnosed cause behind a mapped alert | Root causes with evidence, blast radius, remediation |
| `triage(entity_name=)` | Quick full-picture health check | Root causes, symptoms, impact — all in one call |
| `get_symptoms(entity_ids=)` | Check which alerts promoted to symptoms | Named signals in the causal graph |
| `ask_causely(question=)` | Free-form query when alert name doesn't resolve | NL fallback for complex alert-to-cause questions |
---
## Core rule: alerts → entities → causes
External alerting systems (PagerDuty, Datadog, Alertmanager) fire raw alert names. Causely maps some alerts to named symptoms in its causal model. The workflow bridges from alert → entity → mapped symptom → root cause.
**`ask_causely` cannot resolve raw alert names.** Don't use it for "what is causing KubeContainerWaiting?" — use the structured workflow below.
---
## Decision tree
**Alert received — service name known:**
```
triage(entity_name="<service>") ← 1 call
→ if root causes found: that's likely what triggered the alert
→ description = evidence, remediation = what to do
→ done in most cases
```
If you need to see the specific alert and its mapping status:
```
get_entities(query="<service>", entity_types=["Service"]) ← 1 call
get_alerts(entity_ids=[id], active_only=true) ← 1 call
→ find the alert by name
→ mapping_state = "mapped" → Causely has incorporated it
→ mapping_state = "unmapped" → Causely hasn't promoted it to a symptom
→ if mapped: symptom_name → get_root_causes(symptom_ids=[...]) for cause
```
**Alert received — service name unknown:**
```
ask_causely("What active root causes are there right now?") ← 1 call
→ scan results for the alert pattern or affected service
→ then triage the identified service
```
**Alert name known, want to check if Causely knows about it:**
```
get_entities(query="<service>") ← 1 call
get_alerts(entity_ids=[id], alert_name_filters=["<alert-name>"]) ← 1 call
→ mapping_state tells you if Causely has incorporated this alert
→ if mapped: follow symptom_name → root cause chain
→ if unmapped: alert is noise or not yet incorporated
```
**Alert noise audit ("how noisy are our alerts?"):**
```
get_entities(query="<service>") ← 1 call
get_alerts(entity_ids=[id], mapping_state_filters=["unmapped"]) ← 1 call
→ high-count unmapped alerts = noise candidates for tuning
→ compare with get_alerts(mapping_state_filters=["mapped"]) for signal-to-noise
```
**Multiple alerts firing at once:**
```
get_root_causes(active_only=true) ← 1 call
→ check if multiple alerts map to the same root cause
→ impact_service_graph shows propagation → many alerts, one origin
```
---
## Mapping state guide
| mapping_state | Meaning | Action |
|---|---|---|
| `mapped` | Causely has promoted this alert to a named symptom | Follow `symptom_name` → `get_root_causes(symptom_ids=)` for diagnosis |
| `unmapped` | Causely hasn't incorporated this alert | May be noise, or a new signal type not yet configured |
---
## Output format
### 🔔 Alert triage: [alert name]
**Alert:** [alert_name from get_alerts or user's description]
**Service:** [entity name]
**Status:** [firing / resolved] · **Severity:** [from alert]
**Causely mapping:** ✅ Mapped to symptom "[symptom_name]" / ❌ Unmapped
**Root cause:** [from triage or get_root_causes — name + entity + portal link]
**Evidence:** [from description field]
**Blast radius:** [from impacted_services]
**Customer impact:** [from impacted_customers]
**Owner:** [from causely.ai/team label]
**Recommended actions:** [from remediation field]
**Links:** [portal links]
---
## Important behaviours
- **Start with `triage` when you have a service name.** It's faster and gives the full picture without needing to resolve alert → symptom → root cause manually.
- **Use `get_alerts` when the user specifically wants to see alert-level detail** — mapping status, alert counts, firing times.
- **Don't use `ask_causely` for alert name resolution** — it can't resolve raw Alertmanager or Datadog alert names to Causely entities.
- **Unmapped ≠ irrelevant**: an unmapped alert might be a real signal that Causely hasn't been configured to ingest yet. Don't dismiss it.
- **Multiple alerts, one cause**: when the user reports several alerts, check `get_root_causes` first — they often share a single origin visible in the impact graph.Skill
causely-change-impact
>
# Causely Change Impact Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for change impact
| Tool | Use when | What it returns |
|---|---|---|
| `triage(entity_name=)` | Quick post-deploy check for one service | Root causes with `started_at` timestamps to compare against deploy time |
| `reliability_delta(service=)` | Metric regression check for one service | Before/after avg+max for CPU, memory, latency, error rate + verdict (PASS/WARNING/REGRESSION/WAIT) |
| `fleet_reliability_delta(team= or namespace= or services=)` | Batch regression check across multiple services | Summary table with per-service verdicts |
| `get_events(entity_id=)` | Find the deploy event / correlate changes | Lifecycle events (deploys, restarts, scaling, config changes) with timestamps |
| `get_config(entity_id=)` | Inspect config drift | Raw config files (manifests, specs) to compare |
| `get_metrics(entity_ids=, metrics=, window_minutes=)` | Custom metric comparison over time window | Time-series data for specific metrics |
| `get_root_causes(active_only=true)` | System-wide post-deploy sweep | All active RCs with `started_at` to filter by deploy time |
---
## Decision tree
**Single-service post-deploy check (recommended path):**
```
reliability_delta(service="<service>") ← 1 call
→ verdict: PASS / WARNING / REGRESSION / WAIT
→ per-metric delta: CPU, memory, latency, error rate before vs after
→ if REGRESSION → recommend rollback
→ if WAIT → deploy too recent, re-run later
→ if PASS → deploy is clean
```
If `reliability_delta` returns REGRESSION or WARNING, add context:
```
triage(entity_name="<service>") ← 2nd call
→ root cause started_at vs deploy time = causal correlation
→ description = evidence of what broke
→ remediation = what to do next
```
**Fleet-wide post-deploy validation:**
```
fleet_reliability_delta(team="<team>" or namespace="<ns>") ← 1 call
→ summary table: service | verdict | release time | per-metric delta
→ verdict counts: REGRESSION / WARNING / PASS / WAIT
→ triage only REGRESSION services for detail
```
**Triage-only path (when reliability_delta not needed):**
```
triage(entity_name="<service>") ← 1 call
→ root cause started_at before deploy? → change not the cause
→ root cause started_at after deploy? → change is suspect
→ description = evidence of what broke
→ impacted_services = downstream blast radius
→ impacted_customers = customer impact
→ done
```
Only add extra calls if:
- Need to see the actual deploy event → `get_entities` → `get_events(entity_id=, message_contains="version")`
- Need config comparison → `get_entities` → `get_config(entity_id=)`
- Need custom metric time-series → `get_entities` → `get_metrics(entity_ids=, metrics=[...], window_minutes=60)`
- `has_stored_logs=true` AND description generic → `get_logs(root_cause_id=, limit=10, severity_filter=ERROR)`
**Canary / blue-green:**
```
reliability_delta(service="<service-v1>") ← 1 call
reliability_delta(service="<service-v2>") ← 1 call
→ compare verdicts: regression on v2 only = canary failure
```
---
## Verdict logic
| Signal | Verdict | Action |
|---|---|---|
| `reliability_delta` → PASS, no new root causes | ✅ Safe | Deploy is clean |
| `reliability_delta` → WARNING | ⚠️ Monitor | Watch for escalation; re-check in 30 min |
| `reliability_delta` → REGRESSION | 🔴 Rollback recommended | New root cause correlates with deploy |
| `reliability_delta` → WAIT | ⏳ Too early | Re-run after more post-deploy data accumulates |
| Root cause `started_at` before deploy | ✅ Pre-existing | Change not the cause |
| Root cause `started_at` after deploy | 🔴 Suspect | Check description for confirmation |
| No root causes at all | ✅ Safe | Service is healthy |
---
## Output format
### 🚀 Deployment validation report
**Service:** [service-name] · **Deploy time:** [from reliability_delta or get_events] · **Report:** [now]
**Verdict:** ✅ Safe / ⚠️ Monitor / 🔴 Rollback recommended / ⏳ Too early
**Metric deltas:**
| Metric | Before (avg) | After (avg) | Delta | Status |
|---|---|---|---|---|
| [from reliability_delta response] |
**New root causes since deploy:** [name + started_at, or "None detected"]
**Evidence:** [from description field; supplement with get_logs only if generic AND has_stored_logs=true]
**Blast radius:** [from impacted_services]
**Customer impact:** [from impacted_customers]
**Owner:** [from causely.ai/team label or team_health]
**Recommended actions:** [from remediation field; rollback recommendation if 🔴]
**Links:** [portal links]Skill
causely-correlated-incidents
>
# Causely Correlated Incidents Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for correlation analysis
| Tool | Use when | What it returns |
|---|---|---|
| `get_root_causes(active_only=true)` | All active issues — primary correlation tool | All RCs with `impact_service_graph` edges showing propagation paths |
| `triage(entity_name=)` | Named service cascade investigation | Per-entity root causes with impact graph |
| `get_topology(entity_id=, mode=)` | Full dependency/dependent graph (beyond active incidents) | Node + edge graph: dependencies, dependents, or dataflow |
| `get_alerts(entity_ids=)` | Alert correlation across entities | Firing alerts with mapping state — find unmapped shared alerts |
| `get_environment_health(namespaces=)` | Scoped health check for affected namespace | Overall status + active root causes in scope |
| `ask_causely(question=)` | Cross-entity synthesis when names aren't clear | Free-form NL query for broad pattern detection |
---
## Core rule: one sweep, read the graphs
**`get_root_causes(active_only=true)` returns everything needed for correlation in one call:**
- Each root cause includes `impact_service_graph.edges` — a node appearing as source in multiple graphs is the shared origin
- `impacted_services` shows blast radius per root cause
- `impacted_customers` shows customer-facing impact
- `description` is the synthesised evidence — read it, don't re-fetch it
Do not follow up with `get_symptoms` — symptoms are already included in the root cause response.
---
## Decision tree
**Widespread outage:**
```
get_root_causes(active_only=true) ← 1 call
→ look for shared node IDs across impact_service_graphs
→ shared node = correlation origin
→ description on that root cause = evidence
→ impacted_customers across all RCs = customer impact
→ done, unless description generic AND has_stored_logs=true:
→ get_logs(root_cause_id=, limit=10, severity_filter=ERROR) ← optional 2nd call
```
**"Are these two incidents related?":**
```
get_root_causes(active_only=true) ← 1 call (covers both services)
→ compare impact_service_graph.nodes for shared IDs
→ compare started_at — simultaneous = correlated
→ done
```
**Named service, cascade suspected:**
```
triage(entity_name="<service>") ← 1 call
→ read impact_service_graph: trace edges from root to leaves
→ impacted_services = confirmed downstream blast radius
→ done
```
**Full dependency graph (beyond active incidents):**
```
get_entities(query="<service>", entity_types=["Service"]) ← 1 call
get_topology(entity_id=<id>, mode=dependents, levels=3) ← 1 call
→ all services that call this entity (upstream blast radius victims)
→ or mode=dependencies for what this entity calls (downstream risk)
→ or mode=dataflow for full end-to-end data movement
```
**Alert-level correlation (shared alert patterns across services):**
```
get_entities(query="<service-a>") ← 1 call
get_entities(query="<service-b>") ← 1 call
get_alerts(entity_ids=[id_a, id_b], active_only=true) ← 1 call
→ shared alert_names across entities = correlated signals
→ mapped alerts → get_root_causes(symptom_ids=) for cause
```
---
## Correlation methods
1. **Impact graph overlap**: shared node IDs in `impact_service_graph` across multiple root causes → same origin
2. **Temporal correlation**: root causes with `started_at` within minutes of each other → likely same trigger
3. **Topology correlation**: `get_topology(mode=dependents)` shows all upstream callers — if the degraded entity is a shared dependency, all dependents are at risk
4. **Alert pattern correlation**: same `alert_name` firing across multiple entities simultaneously → shared infrastructure cause
---
## Output format
### 🔴 Multi-service incident summary
**Affected services:** [from impacted_services across root causes]
**Correlation:** ✅ Correlated / ⚠️ Partial / ❓ Unconfirmed — [origin entity if known]
**Root cause:** [name + entity + portal link from get_root_causes]
**Propagation path:** [from impact_service_graph edges, or get_topology if called]
**Evidence:** [from description field; supplement with get_logs if generic AND has_stored_logs=true]
**Blast radius:** [from impact_service_graph — total affected services count + names]
**Customer impact:** [from impacted_customers]
**Owner:** [from causely.ai/team label or team_health]
**Timeline:** [started_at per root cause, in order]
**Recommended action:** [from remediation field — single fix that resolves the origin]
**Links:** [all portal links]Skill
causely-health-reporting
>
# Causely Health Reporting Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for health reporting
| Tool | Use when | What it returns |
|---|---|---|
| `get_environment_health()` | Global or namespace-scoped health overview | Overall status (HEALTHY/DEGRADED/CRITICAL) + active root causes + remediation |
| `get_service_summary(service=)` | Comprehensive single-service report | Symptoms, root causes, SLOs, metrics, deps, slow queries, events, error logs — all in one call |
| `get_root_causes(active_only=true)` | All active issues with evidence | Structured JSON: description, impacted_services, impacted_customers per RC |
| `team_health(team=)` | Team-scoped standup | Degraded/critical services first, healthy grouped at end |
| `get_entity_health(entity_id=)` | Non-service entity health (DBs, pods, queues) | Symptoms, root causes, events, logs, metrics for one entity |
| `get_slo(entity_ids=)` | SLO error budget and burn rate | Per-SLO: budget remaining %, burn rate, at-risk/violated flags |
| `ask_causely(question=)` | System-wide SLO overview (no entity IDs needed) | "Which services have SLOs at risk or violated?" |
| `get_symptoms(active_only=false, lookback_hours=N)` | Historical flapping/recurring signals | Timeline of symptom start/end for trend analysis |
---
## Decision tree
**Morning standup / system sweep (recommended path):**
```
get_environment_health() ← 1 call
→ overall status: HEALTHY / DEGRADED / CRITICAL
→ active root causes with severity, remediation
→ done for quick overview
```
For more detail on each root cause:
```
get_root_causes(active_only=true) ← 1 call
→ group by severity: Critical → High → Medium → Low
→ description = evidence per issue
→ impacted_customers = customer impact per issue
→ entity.labels["causely.ai/team"] = owner (if set)
→ done
```
**Namespace-scoped health:**
```
get_environment_health(namespaces=["otel-demo"]) ← 1 call
→ scoped status + root causes for that namespace only
```
**Full service report (all dimensions):**
```
get_service_summary(service="<service>") ← 1 call
→ status, symptoms, root causes, SLOs, metrics, deps,
slow queries, events, error logs — everything in one call
→ done — do NOT chain 5 separate tools
```
**SLO-focused report:**
```
ask_causely("Which services have SLOs at risk or violated?") ← 1 call (no entity IDs needed)
→ or if you have entity IDs:
get_entities(query="<service>") → get_slo(entity_ids=[...], only_at_risk=true)
```
**Team standup:**
```
team_health(team="<team>") ← 1 call
→ degraded/critical services listed first
→ for each degraded: get_service_summary(service=) if full detail needed
```
**Weekly report / trend analysis:**
```
get_root_causes(active_only=false, lookback_hours=168) ← 1 call
→ count per service to find recurring offenders
→ compare started_at / ended_at for flapping patterns
```
**Non-service entity health (DBs, queues, pods):**
```
get_entities(query="<name>", entity_types=["Database"]) ← 1 call
get_entity_health(entity_id=<id>) ← 1 call
→ symptoms, root causes, events, logs, metrics
```
---
## Output formats
### Morning / standup briefing
**🟢 / 🟡 / 🔴 System health: [from get_environment_health status]**
*[N] active root causes as of [time]*
| Service | Root cause | Severity | Since | Evidence | Customer impact | Owner |
|---|---|---|---|---|---|---|
| [from response] | [name] | [sev] | [started_at] | [from description] | [impacted_customers or "none"] | [team label or "unknown"] |
**SLOs at risk:** [from get_slo or ask_causely — list services with burn rate > 1.0 or violated]
**Watch:** [anything Critical or active >6h]
---
### Full service report
**[Service] — [status from get_service_summary]**
**Active issues:** [root causes with severity + remediation]
**SLOs:** [budget remaining + burn rate]
**Key metrics:** [CPU, memory, error rate, p99 latency from resource metrics section]
**Dependencies:** [health of upstream/downstream services]
**Recent events:** [deploys, restarts, config changes]
---
### On-call handoff
🔴 **Active now:** [severity · service · root cause · started_at]
🟡 **SLOs burning:** [services with burn rate > 1.0]
⚠️ **Owner gaps:** [services missing causely.ai/team label]
📋 **Watch list:** [services with recurring root causes in the past 24h]Skill
causely-k8s-investigation
>
# Causely K8s Investigation Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for K8s investigation
| Tool | Use when | What it returns |
|---|---|---|
| `triage(entity_name=)` | Service-level health check — always start here | Root causes with infra-layer evidence (OOMKill, pod failure, memory pressure) |
| `get_entities(query=, entity_types=)` | Resolve K8s entities to IDs | Entity IDs for pods, containers, nodes, databases |
| `get_entity_health(entity_id=)` | Non-service entity health (pods, nodes, DBs, containers) | Symptoms, root causes, events, logs, metrics for one entity |
| `get_events(entity_id=)` | Lifecycle events (restarts, scaling, scheduling) | Timestamped events: OOMKill, CrashLoopBackOff, eviction, deploy, config change |
| `get_config(entity_id=)` | Inspect K8s manifests and resource specs | Raw config files: deployment spec, resource limits, HPA config |
| `get_metrics(entity_ids=, metrics=)` | Container/pod resource utilisation | CPU, memory, network I/O snapshots or time-series |
| `get_logs(entity_id=)` | Live container/pod logs | Real-time log stream for a running entity |
| `get_root_causes(active_only=true)` | System-wide infra sweep | All active RCs — filter for K8s-related root causes |
| `list_namespaces()` | Discover valid namespaces | Namespace names for scoping investigations |
| `list_clusters()` | Discover valid clusters | Cluster names for multi-cluster queries |
---
## Entity name format
| Type | Format | Example |
|---|---|---|
| K8s service | `namespace/service-name` | `default/animal-service` |
| ECS task / VM | `cluster/task-name-hash` | `chaos/quarkus-workshop-hero-service-2b62b3ef` |
| Node | AWS/GCP hostname | `ip-192-168-12-32.us-east-2.compute.internal` |
---
## Decision tree
**Service name known — start at service level:**
```
triage(entity_name="<namespace/service>") ← 1 call
→ infra root causes: "Memory congestion", "Pod Failure", "OOMKill", "Node pressure"
→ description = evidence (memory %, restart counts, disk errors)
→ impacted_services = blast radius
→ done
```
**Need pod/container-level detail:**
```
get_entities(query="<pod-name>", entity_types=["Container","Pod"]) ← 1 call
get_entity_health(entity_id=<id>) ← 1 call
→ symptoms, root causes, events, logs, metrics for that specific entity
```
**Why did my pod restart?**
```
get_entities(query="<pod-name>") ← 1 call
get_events(entity_id=<id>, severity_filter=WARNING) ← 1 call
→ look for OOMKill, CrashLoopBackOff, Evicted events with timestamps
→ if OOMKill: get_config(entity_id=) to check resource limits
→ if CrashLoopBackOff: get_logs(entity_id=, limit=20, severity_filter=ERROR)
```
**Resource utilisation check:**
```
get_entities(query="<service>", entity_types=["Service"]) ← 1 call
get_metrics(entity_ids=[id], metrics=["cpu_usage", "memory_usage", "memory_limit"]) ← 1 call
→ compare usage vs limits
→ if near limit: check get_config for resource requests/limits
```
**Inspect K8s config / resource limits:**
```
get_entities(query="<service>") ← 1 call
get_config(entity_id=<id>) ← 1 call
→ deployment spec, resource limits, HPA config, environment variables
```
**Service name unknown / namespace sweep:**
```
get_environment_health(namespaces=["<namespace>"]) ← 1 call
→ overall namespace status + active root causes
→ or:
get_root_causes(active_only=true) ← 1 call
→ filter for namespace/entity names matching the namespace
→ description = evidence for each RC
→ only triage the single highest-severity hit for detail
```
**Triage returns "No Incident Data Found":**
- Service is healthy at the service level — the infra issue may be at pod/container level
- Try `get_entities(query="<name>")` → `get_entity_health(entity_id=)` for pod-level health
- Or `get_root_causes(active_only=true)` and filter for the entity name pattern
---
## Output format
### 🔴 / 🟡 / 🟢 [Service/Entity] — [Status]
**Root cause (infra layer):** [name + entity + portal link]
**Evidence:** [from description field — specific metrics, counts, log patterns; supplement with get_logs only if description is generic AND has_stored_logs=true]
**Resource state:** [from get_metrics if called — CPU/memory usage vs limits]
**Configuration:** [from get_config if called — relevant resource limits, HPA settings]
**Recent events:** [from get_events if called — OOMKill, restarts, scaling events with timestamps]
**Blast radius:** [from impacted_services, or "None identified"]
**Customer impact:** [from impacted_customers, or "None identified"]
**Owner / team:** [from causely.ai/team label or team_health, or "Not registered"]
**Recommended actions:** [from remediation field + k8s-specific steps: adjust resource limits, cordon/drain node, review HPA, check liveness probes]
**Links:** [portal links from response]Skill
causely-postmortem
>
# Causely Postmortem & Ticket Skill
Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.
---
## Core tools for postmortems and tickets
| Tool | Use when | What it returns |
|---|---|---|
| `postmortem(root_cause_id=)` | Generate full postmortem from Causely data | Markdown + structured fields: title, summary, timeline, root cause, blast radius, contributing factors, action items |
| `generate_ticket(task=)` | Create an engineering ticket draft | Structured JSON: title, description, context, requirements, acceptance criteria, notes |
| `get_root_causes(active_only=false, lookback_hours=N)` | Find the root cause ID for postmortem | Historical root causes with IDs |
| `triage(entity_name=, start_time=, end_time=)` | Scoped incident summary for a time window | Markdown narrative with root causes, symptoms, impact |
| `get_events(entity_id=)` | Build incident timeline | Lifecycle events (deploys, restarts, config changes) |
| `get_symptoms(active_only=false, lookback_hours=N)` | Reconstruct signal timeline | Historical symptom start/end for timeline building |
---
## Decision tree
**Generate postmortem — root cause ID known:**
```
postmortem(root_cause_id="<id>") ← 1 call
→ complete postmortem: title, summary, timeline, root cause,
blast radius, contributing factors, action items
→ done
```
**Generate postmortem — root cause ID unknown:**
```
get_root_causes(active_only=false, lookback_hours=48, root_cause_name="<name>") ← 1 call
→ find the matching root cause, get its ID
→ or: triage(entity_name="<service>", start_time=, end_time=) to find RCs in window
postmortem(root_cause_id=<id>) ← 2nd call
→ complete postmortem
```
**Generate postmortem — by service + time window (legacy path):**
```
postmortem(service="<service>", incident_start="2025-03-14T00:00:00Z") ← 1 call
→ postmortem scoped to that service and time
```
**Generate postmortem — by root cause name:**
```
postmortem(root_cause_name="<name>", entity_name="<service>") ← 1 call
→ if ambiguous: returns ambiguity_candidates → re-submit with root_cause_id
```
**Enrich postmortem with additional context:**
```
get_entities(query="<service>") → get_events(entity_id=<id>) ← timeline enrichment
get_symptoms(active_only=false, lookback_hours=48, entity_ids=[id]) ← signal timeline
→ add deploy events, symptom transitions to the postmortem narrative
```
**Generate remediation ticket from postmortem:**
```
postmortem(root_cause_id=<id>) ← 1 call
→ extract action items from postmortem
generate_ticket(task="<action item description>") ← 1 call per ticket
→ structured ticket: title, description, acceptance criteria
```
**Generate ticket without postmortem (standalone):**
```
generate_ticket(task="<description of the remediation work>") ← 1 call
→ Jira/GitHub/Linear-ready ticket draft
```
---
## Postmortem input priority
Use the first applicable lookup path:
1. **`root_cause_id`** — preferred; directly identifies the root cause
2. **`root_cause_name` + `entity_name`** — resolves by name; returns candidates if multiple match
3. **`service` + `incident_start`** — legacy path; requires service name and RFC3339 start time
`incident_id` alone is not resolvable — always pair it with one of the paths above.
---
## Output format
### 📋 Incident postmortem
[Postmortem markdown from the `postmortem` tool — includes title, summary, timeline, root cause analysis, blast radius, contributing factors, and action items]
---
### 🎫 Remediation tickets
For each action item from the postmortem:
**Title:** [from generate_ticket]
**Priority:** [inferred from severity]
**Description:** [from generate_ticket — context + requirements]
**Acceptance criteria:** [from generate_ticket]
---
## Important behaviours
- **Prefer `root_cause_id`** over other lookup paths — it's the most reliable and unambiguous.
- **Handle ambiguity gracefully**: if `postmortem(root_cause_name=)` returns `ambiguity_candidates`, present the candidates to the user and ask them to pick one, then re-call with the selected `root_cause_id`.
- **Don't re-investigate**: the postmortem tool synthesises from Causely's data layer. Do not separately call triage + get_root_causes + get_logs to rebuild what postmortem already returns.
- **Tickets are forward-looking**: use `generate_ticket` for remediation work, not for documenting what happened (that's the postmortem).
- **Surface portal links** so engineers can drill into the Causely data behind the postmortem.