CursorPool
← 返回首页

causely

Use Causely directly in Cursor through a preconfigured MCP server. Query service health, root causes, SLOs, metrics, and topology through natural conversation — grounded in system ontology and live causal intelligence.

cursor.directory·11
MCP

causely

MCP server: causely

{
  "url": "https://api.causely.app/mcp"
}
Skill

causely-alert-triage

>

# Causely Alert Triage Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for alert-driven triage

| Tool | Use when | What it returns |
|---|---|---|
| `get_entities(query=, entity_types=)` | Resolve the service/entity from the alert | Entity IDs for the affected service |
| `get_alerts(entity_ids=)` | See all alerts firing + mapping state | Alert name, symptom mapping, severity, count, timestamps |
| `get_root_causes(symptom_ids=)` | Find diagnosed cause behind a mapped alert | Root causes with evidence, blast radius, remediation |
| `triage(entity_name=)` | Quick full-picture health check | Root causes, symptoms, impact — all in one call |
| `get_symptoms(entity_ids=)` | Check which alerts promoted to symptoms | Named signals in the causal graph |
| `ask_causely(question=)` | Free-form query when alert name doesn't resolve | NL fallback for complex alert-to-cause questions |

---

## Core rule: alerts → entities → causes

External alerting systems (PagerDuty, Datadog, Alertmanager) fire raw alert names. Causely maps some alerts to named symptoms in its causal model. The workflow bridges from alert → entity → mapped symptom → root cause.

**`ask_causely` cannot resolve raw alert names.** Don't use it for "what is causing KubeContainerWaiting?" — use the structured workflow below.

---

## Decision tree

**Alert received — service name known:**
```
triage(entity_name="<service>")                            ← 1 call
  → if root causes found: that's likely what triggered the alert
  → description = evidence, remediation = what to do
  → done in most cases
```

If you need to see the specific alert and its mapping status:
```
get_entities(query="<service>", entity_types=["Service"])   ← 1 call
get_alerts(entity_ids=[id], active_only=true)               ← 1 call
  → find the alert by name
  → mapping_state = "mapped" → Causely has incorporated it
  → mapping_state = "unmapped" → Causely hasn't promoted it to a symptom
  → if mapped: symptom_name → get_root_causes(symptom_ids=[...]) for cause
```

**Alert received — service name unknown:**
```
ask_causely("What active root causes are there right now?")  ← 1 call
  → scan results for the alert pattern or affected service
  → then triage the identified service
```

**Alert name known, want to check if Causely knows about it:**
```
get_entities(query="<service>")                             ← 1 call
get_alerts(entity_ids=[id], alert_name_filters=["<alert-name>"])  ← 1 call
  → mapping_state tells you if Causely has incorporated this alert
  → if mapped: follow symptom_name → root cause chain
  → if unmapped: alert is noise or not yet incorporated
```

**Alert noise audit ("how noisy are our alerts?"):**
```
get_entities(query="<service>")                             ← 1 call
get_alerts(entity_ids=[id], mapping_state_filters=["unmapped"])  ← 1 call
  → high-count unmapped alerts = noise candidates for tuning
  → compare with get_alerts(mapping_state_filters=["mapped"]) for signal-to-noise
```

**Multiple alerts firing at once:**
```
get_root_causes(active_only=true)                           ← 1 call
  → check if multiple alerts map to the same root cause
  → impact_service_graph shows propagation → many alerts, one origin
```

---

## Mapping state guide

| mapping_state | Meaning | Action |
|---|---|---|
| `mapped` | Causely has promoted this alert to a named symptom | Follow `symptom_name` → `get_root_causes(symptom_ids=)` for diagnosis |
| `unmapped` | Causely hasn't incorporated this alert | May be noise, or a new signal type not yet configured |

---

## Output format

### 🔔 Alert triage: [alert name]

**Alert:** [alert_name from get_alerts or user's description]
**Service:** [entity name]
**Status:** [firing / resolved] · **Severity:** [from alert]
**Causely mapping:** ✅ Mapped to symptom "[symptom_name]" / ❌ Unmapped

**Root cause:** [from triage or get_root_causes — name + entity + portal link]

**Evidence:** [from description field]

**Blast radius:** [from impacted_services]

**Customer impact:** [from impacted_customers]

**Owner:** [from causely.ai/team label]

**Recommended actions:** [from remediation field]

**Links:** [portal links]

---

## Important behaviours

- **Start with `triage` when you have a service name.** It's faster and gives the full picture without needing to resolve alert → symptom → root cause manually.
- **Use `get_alerts` when the user specifically wants to see alert-level detail** — mapping status, alert counts, firing times.
- **Don't use `ask_causely` for alert name resolution** — it can't resolve raw Alertmanager or Datadog alert names to Causely entities.
- **Unmapped ≠ irrelevant**: an unmapped alert might be a real signal that Causely hasn't been configured to ingest yet. Don't dismiss it.
- **Multiple alerts, one cause**: when the user reports several alerts, check `get_root_causes` first — they often share a single origin visible in the impact graph.
Skill

causely-change-impact

>

# Causely Change Impact Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for change impact

| Tool | Use when | What it returns |
|---|---|---|
| `triage(entity_name=)` | Quick post-deploy check for one service | Root causes with `started_at` timestamps to compare against deploy time |
| `reliability_delta(service=)` | Metric regression check for one service | Before/after avg+max for CPU, memory, latency, error rate + verdict (PASS/WARNING/REGRESSION/WAIT) |
| `fleet_reliability_delta(team= or namespace= or services=)` | Batch regression check across multiple services | Summary table with per-service verdicts |
| `get_events(entity_id=)` | Find the deploy event / correlate changes | Lifecycle events (deploys, restarts, scaling, config changes) with timestamps |
| `get_config(entity_id=)` | Inspect config drift | Raw config files (manifests, specs) to compare |
| `get_metrics(entity_ids=, metrics=, window_minutes=)` | Custom metric comparison over time window | Time-series data for specific metrics |
| `get_root_causes(active_only=true)` | System-wide post-deploy sweep | All active RCs with `started_at` to filter by deploy time |

---

## Decision tree

**Single-service post-deploy check (recommended path):**
```
reliability_delta(service="<service>")                    ← 1 call
  → verdict: PASS / WARNING / REGRESSION / WAIT
  → per-metric delta: CPU, memory, latency, error rate before vs after
  → if REGRESSION → recommend rollback
  → if WAIT → deploy too recent, re-run later
  → if PASS → deploy is clean
```

If `reliability_delta` returns REGRESSION or WARNING, add context:
```
triage(entity_name="<service>")                           ← 2nd call
  → root cause started_at vs deploy time = causal correlation
  → description = evidence of what broke
  → remediation = what to do next
```

**Fleet-wide post-deploy validation:**
```
fleet_reliability_delta(team="<team>" or namespace="<ns>")  ← 1 call
  → summary table: service | verdict | release time | per-metric delta
  → verdict counts: REGRESSION / WARNING / PASS / WAIT
  → triage only REGRESSION services for detail
```

**Triage-only path (when reliability_delta not needed):**
```
triage(entity_name="<service>")                            ← 1 call
  → root cause started_at before deploy? → change not the cause
  → root cause started_at after deploy? → change is suspect
  → description = evidence of what broke
  → impacted_services = downstream blast radius
  → impacted_customers = customer impact
  → done
```

Only add extra calls if:
- Need to see the actual deploy event → `get_entities` → `get_events(entity_id=, message_contains="version")`
- Need config comparison → `get_entities` → `get_config(entity_id=)`
- Need custom metric time-series → `get_entities` → `get_metrics(entity_ids=, metrics=[...], window_minutes=60)`
- `has_stored_logs=true` AND description generic → `get_logs(root_cause_id=, limit=10, severity_filter=ERROR)`

**Canary / blue-green:**
```
reliability_delta(service="<service-v1>")                  ← 1 call
reliability_delta(service="<service-v2>")                  ← 1 call
  → compare verdicts: regression on v2 only = canary failure
```

---

## Verdict logic

| Signal | Verdict | Action |
|---|---|---|
| `reliability_delta` → PASS, no new root causes | ✅ Safe | Deploy is clean |
| `reliability_delta` → WARNING | ⚠️ Monitor | Watch for escalation; re-check in 30 min |
| `reliability_delta` → REGRESSION | 🔴 Rollback recommended | New root cause correlates with deploy |
| `reliability_delta` → WAIT | ⏳ Too early | Re-run after more post-deploy data accumulates |
| Root cause `started_at` before deploy | ✅ Pre-existing | Change not the cause |
| Root cause `started_at` after deploy | 🔴 Suspect | Check description for confirmation |
| No root causes at all | ✅ Safe | Service is healthy |

---

## Output format

### 🚀 Deployment validation report

**Service:** [service-name] · **Deploy time:** [from reliability_delta or get_events] · **Report:** [now]

**Verdict:** ✅ Safe / ⚠️ Monitor / 🔴 Rollback recommended / ⏳ Too early

**Metric deltas:**
| Metric | Before (avg) | After (avg) | Delta | Status |
|---|---|---|---|---|
| [from reliability_delta response] |

**New root causes since deploy:** [name + started_at, or "None detected"]

**Evidence:** [from description field; supplement with get_logs only if generic AND has_stored_logs=true]

**Blast radius:** [from impacted_services]

**Customer impact:** [from impacted_customers]

**Owner:** [from causely.ai/team label or team_health]

**Recommended actions:** [from remediation field; rollback recommendation if 🔴]

**Links:** [portal links]
Skill

causely-correlated-incidents

>

# Causely Correlated Incidents Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for correlation analysis

| Tool | Use when | What it returns |
|---|---|---|
| `get_root_causes(active_only=true)` | All active issues — primary correlation tool | All RCs with `impact_service_graph` edges showing propagation paths |
| `triage(entity_name=)` | Named service cascade investigation | Per-entity root causes with impact graph |
| `get_topology(entity_id=, mode=)` | Full dependency/dependent graph (beyond active incidents) | Node + edge graph: dependencies, dependents, or dataflow |
| `get_alerts(entity_ids=)` | Alert correlation across entities | Firing alerts with mapping state — find unmapped shared alerts |
| `get_environment_health(namespaces=)` | Scoped health check for affected namespace | Overall status + active root causes in scope |
| `ask_causely(question=)` | Cross-entity synthesis when names aren't clear | Free-form NL query for broad pattern detection |

---

## Core rule: one sweep, read the graphs

**`get_root_causes(active_only=true)` returns everything needed for correlation in one call:**
- Each root cause includes `impact_service_graph.edges` — a node appearing as source in multiple graphs is the shared origin
- `impacted_services` shows blast radius per root cause
- `impacted_customers` shows customer-facing impact
- `description` is the synthesised evidence — read it, don't re-fetch it

Do not follow up with `get_symptoms` — symptoms are already included in the root cause response.

---

## Decision tree

**Widespread outage:**
```
get_root_causes(active_only=true)                          ← 1 call
  → look for shared node IDs across impact_service_graphs
  → shared node = correlation origin
  → description on that root cause = evidence
  → impacted_customers across all RCs = customer impact
  → done, unless description generic AND has_stored_logs=true:
       → get_logs(root_cause_id=, limit=10, severity_filter=ERROR)   ← optional 2nd call
```

**"Are these two incidents related?":**
```
get_root_causes(active_only=true)                          ← 1 call (covers both services)
  → compare impact_service_graph.nodes for shared IDs
  → compare started_at — simultaneous = correlated
  → done
```

**Named service, cascade suspected:**
```
triage(entity_name="<service>")                            ← 1 call
  → read impact_service_graph: trace edges from root to leaves
  → impacted_services = confirmed downstream blast radius
  → done
```

**Full dependency graph (beyond active incidents):**
```
get_entities(query="<service>", entity_types=["Service"])   ← 1 call
get_topology(entity_id=<id>, mode=dependents, levels=3)     ← 1 call
  → all services that call this entity (upstream blast radius victims)
  → or mode=dependencies for what this entity calls (downstream risk)
  → or mode=dataflow for full end-to-end data movement
```

**Alert-level correlation (shared alert patterns across services):**
```
get_entities(query="<service-a>")                           ← 1 call
get_entities(query="<service-b>")                           ← 1 call
get_alerts(entity_ids=[id_a, id_b], active_only=true)       ← 1 call
  → shared alert_names across entities = correlated signals
  → mapped alerts → get_root_causes(symptom_ids=) for cause
```

---

## Correlation methods

1. **Impact graph overlap**: shared node IDs in `impact_service_graph` across multiple root causes → same origin
2. **Temporal correlation**: root causes with `started_at` within minutes of each other → likely same trigger
3. **Topology correlation**: `get_topology(mode=dependents)` shows all upstream callers — if the degraded entity is a shared dependency, all dependents are at risk
4. **Alert pattern correlation**: same `alert_name` firing across multiple entities simultaneously → shared infrastructure cause

---

## Output format

### 🔴 Multi-service incident summary

**Affected services:** [from impacted_services across root causes]

**Correlation:** ✅ Correlated / ⚠️ Partial / ❓ Unconfirmed — [origin entity if known]

**Root cause:** [name + entity + portal link from get_root_causes]

**Propagation path:** [from impact_service_graph edges, or get_topology if called]

**Evidence:** [from description field; supplement with get_logs if generic AND has_stored_logs=true]

**Blast radius:** [from impact_service_graph — total affected services count + names]

**Customer impact:** [from impacted_customers]

**Owner:** [from causely.ai/team label or team_health]

**Timeline:** [started_at per root cause, in order]

**Recommended action:** [from remediation field — single fix that resolves the origin]

**Links:** [all portal links]
Skill

causely-health-reporting

>

# Causely Health Reporting Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for health reporting

| Tool | Use when | What it returns |
|---|---|---|
| `get_environment_health()` | Global or namespace-scoped health overview | Overall status (HEALTHY/DEGRADED/CRITICAL) + active root causes + remediation |
| `get_service_summary(service=)` | Comprehensive single-service report | Symptoms, root causes, SLOs, metrics, deps, slow queries, events, error logs — all in one call |
| `get_root_causes(active_only=true)` | All active issues with evidence | Structured JSON: description, impacted_services, impacted_customers per RC |
| `team_health(team=)` | Team-scoped standup | Degraded/critical services first, healthy grouped at end |
| `get_entity_health(entity_id=)` | Non-service entity health (DBs, pods, queues) | Symptoms, root causes, events, logs, metrics for one entity |
| `get_slo(entity_ids=)` | SLO error budget and burn rate | Per-SLO: budget remaining %, burn rate, at-risk/violated flags |
| `ask_causely(question=)` | System-wide SLO overview (no entity IDs needed) | "Which services have SLOs at risk or violated?" |
| `get_symptoms(active_only=false, lookback_hours=N)` | Historical flapping/recurring signals | Timeline of symptom start/end for trend analysis |

---

## Decision tree

**Morning standup / system sweep (recommended path):**
```
get_environment_health()                                  ← 1 call
  → overall status: HEALTHY / DEGRADED / CRITICAL
  → active root causes with severity, remediation
  → done for quick overview
```

For more detail on each root cause:
```
get_root_causes(active_only=true)                         ← 1 call
  → group by severity: Critical → High → Medium → Low
  → description = evidence per issue
  → impacted_customers = customer impact per issue
  → entity.labels["causely.ai/team"] = owner (if set)
  → done
```

**Namespace-scoped health:**
```
get_environment_health(namespaces=["otel-demo"])           ← 1 call
  → scoped status + root causes for that namespace only
```

**Full service report (all dimensions):**
```
get_service_summary(service="<service>")                   ← 1 call
  → status, symptoms, root causes, SLOs, metrics, deps,
    slow queries, events, error logs — everything in one call
  → done — do NOT chain 5 separate tools
```

**SLO-focused report:**
```
ask_causely("Which services have SLOs at risk or violated?")  ← 1 call (no entity IDs needed)
  → or if you have entity IDs:
get_entities(query="<service>") → get_slo(entity_ids=[...], only_at_risk=true)
```

**Team standup:**
```
team_health(team="<team>")                                 ← 1 call
  → degraded/critical services listed first
  → for each degraded: get_service_summary(service=) if full detail needed
```

**Weekly report / trend analysis:**
```
get_root_causes(active_only=false, lookback_hours=168)     ← 1 call
  → count per service to find recurring offenders
  → compare started_at / ended_at for flapping patterns
```

**Non-service entity health (DBs, queues, pods):**
```
get_entities(query="<name>", entity_types=["Database"])     ← 1 call
get_entity_health(entity_id=<id>)                           ← 1 call
  → symptoms, root causes, events, logs, metrics
```

---

## Output formats

### Morning / standup briefing

**🟢 / 🟡 / 🔴 System health: [from get_environment_health status]**
*[N] active root causes as of [time]*

| Service | Root cause | Severity | Since | Evidence | Customer impact | Owner |
|---|---|---|---|---|---|---|
| [from response] | [name] | [sev] | [started_at] | [from description] | [impacted_customers or "none"] | [team label or "unknown"] |

**SLOs at risk:** [from get_slo or ask_causely — list services with burn rate > 1.0 or violated]

**Watch:** [anything Critical or active >6h]

---

### Full service report

**[Service] — [status from get_service_summary]**

**Active issues:** [root causes with severity + remediation]
**SLOs:** [budget remaining + burn rate]
**Key metrics:** [CPU, memory, error rate, p99 latency from resource metrics section]
**Dependencies:** [health of upstream/downstream services]
**Recent events:** [deploys, restarts, config changes]

---

### On-call handoff

🔴 **Active now:** [severity · service · root cause · started_at]
🟡 **SLOs burning:** [services with burn rate > 1.0]
⚠️ **Owner gaps:** [services missing causely.ai/team label]
📋 **Watch list:** [services with recurring root causes in the past 24h]
Skill

causely-k8s-investigation

>

# Causely K8s Investigation Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for K8s investigation

| Tool | Use when | What it returns |
|---|---|---|
| `triage(entity_name=)` | Service-level health check — always start here | Root causes with infra-layer evidence (OOMKill, pod failure, memory pressure) |
| `get_entities(query=, entity_types=)` | Resolve K8s entities to IDs | Entity IDs for pods, containers, nodes, databases |
| `get_entity_health(entity_id=)` | Non-service entity health (pods, nodes, DBs, containers) | Symptoms, root causes, events, logs, metrics for one entity |
| `get_events(entity_id=)` | Lifecycle events (restarts, scaling, scheduling) | Timestamped events: OOMKill, CrashLoopBackOff, eviction, deploy, config change |
| `get_config(entity_id=)` | Inspect K8s manifests and resource specs | Raw config files: deployment spec, resource limits, HPA config |
| `get_metrics(entity_ids=, metrics=)` | Container/pod resource utilisation | CPU, memory, network I/O snapshots or time-series |
| `get_logs(entity_id=)` | Live container/pod logs | Real-time log stream for a running entity |
| `get_root_causes(active_only=true)` | System-wide infra sweep | All active RCs — filter for K8s-related root causes |
| `list_namespaces()` | Discover valid namespaces | Namespace names for scoping investigations |
| `list_clusters()` | Discover valid clusters | Cluster names for multi-cluster queries |

---

## Entity name format

| Type | Format | Example |
|---|---|---|
| K8s service | `namespace/service-name` | `default/animal-service` |
| ECS task / VM | `cluster/task-name-hash` | `chaos/quarkus-workshop-hero-service-2b62b3ef` |
| Node | AWS/GCP hostname | `ip-192-168-12-32.us-east-2.compute.internal` |

---

## Decision tree

**Service name known — start at service level:**
```
triage(entity_name="<namespace/service>")                  ← 1 call
  → infra root causes: "Memory congestion", "Pod Failure", "OOMKill", "Node pressure"
  → description = evidence (memory %, restart counts, disk errors)
  → impacted_services = blast radius
  → done
```

**Need pod/container-level detail:**
```
get_entities(query="<pod-name>", entity_types=["Container","Pod"])  ← 1 call
get_entity_health(entity_id=<id>)                          ← 1 call
  → symptoms, root causes, events, logs, metrics for that specific entity
```

**Why did my pod restart?**
```
get_entities(query="<pod-name>")                           ← 1 call
get_events(entity_id=<id>, severity_filter=WARNING)         ← 1 call
  → look for OOMKill, CrashLoopBackOff, Evicted events with timestamps
  → if OOMKill: get_config(entity_id=) to check resource limits
  → if CrashLoopBackOff: get_logs(entity_id=, limit=20, severity_filter=ERROR)
```

**Resource utilisation check:**
```
get_entities(query="<service>", entity_types=["Service"])   ← 1 call
get_metrics(entity_ids=[id], metrics=["cpu_usage", "memory_usage", "memory_limit"])  ← 1 call
  → compare usage vs limits
  → if near limit: check get_config for resource requests/limits
```

**Inspect K8s config / resource limits:**
```
get_entities(query="<service>")                            ← 1 call
get_config(entity_id=<id>)                                  ← 1 call
  → deployment spec, resource limits, HPA config, environment variables
```

**Service name unknown / namespace sweep:**
```
get_environment_health(namespaces=["<namespace>"])           ← 1 call
  → overall namespace status + active root causes
  → or:
get_root_causes(active_only=true)                           ← 1 call
  → filter for namespace/entity names matching the namespace
  → description = evidence for each RC
  → only triage the single highest-severity hit for detail
```

**Triage returns "No Incident Data Found":**
- Service is healthy at the service level — the infra issue may be at pod/container level
- Try `get_entities(query="<name>")` → `get_entity_health(entity_id=)` for pod-level health
- Or `get_root_causes(active_only=true)` and filter for the entity name pattern

---

## Output format

### 🔴 / 🟡 / 🟢 [Service/Entity] — [Status]

**Root cause (infra layer):** [name + entity + portal link]

**Evidence:** [from description field — specific metrics, counts, log patterns; supplement with get_logs only if description is generic AND has_stored_logs=true]

**Resource state:** [from get_metrics if called — CPU/memory usage vs limits]

**Configuration:** [from get_config if called — relevant resource limits, HPA settings]

**Recent events:** [from get_events if called — OOMKill, restarts, scaling events with timestamps]

**Blast radius:** [from impacted_services, or "None identified"]

**Customer impact:** [from impacted_customers, or "None identified"]

**Owner / team:** [from causely.ai/team label or team_health, or "Not registered"]

**Recommended actions:** [from remediation field + k8s-specific steps: adjust resource limits, cordon/drain node, review HPA, check liveness probes]

**Links:** [portal links from response]
Skill

causely-postmortem

>

# Causely Postmortem & Ticket Skill

Read `references/complete-investigation.md` for the full 25-tool inventory and evidence strategy.

---

## Core tools for postmortems and tickets

| Tool | Use when | What it returns |
|---|---|---|
| `postmortem(root_cause_id=)` | Generate full postmortem from Causely data | Markdown + structured fields: title, summary, timeline, root cause, blast radius, contributing factors, action items |
| `generate_ticket(task=)` | Create an engineering ticket draft | Structured JSON: title, description, context, requirements, acceptance criteria, notes |
| `get_root_causes(active_only=false, lookback_hours=N)` | Find the root cause ID for postmortem | Historical root causes with IDs |
| `triage(entity_name=, start_time=, end_time=)` | Scoped incident summary for a time window | Markdown narrative with root causes, symptoms, impact |
| `get_events(entity_id=)` | Build incident timeline | Lifecycle events (deploys, restarts, config changes) |
| `get_symptoms(active_only=false, lookback_hours=N)` | Reconstruct signal timeline | Historical symptom start/end for timeline building |

---

## Decision tree

**Generate postmortem — root cause ID known:**
```
postmortem(root_cause_id="<id>")                           ← 1 call
  → complete postmortem: title, summary, timeline, root cause,
    blast radius, contributing factors, action items
  → done
```

**Generate postmortem — root cause ID unknown:**
```
get_root_causes(active_only=false, lookback_hours=48, root_cause_name="<name>")  ← 1 call
  → find the matching root cause, get its ID
  → or: triage(entity_name="<service>", start_time=, end_time=) to find RCs in window

postmortem(root_cause_id=<id>)                             ← 2nd call
  → complete postmortem
```

**Generate postmortem — by service + time window (legacy path):**
```
postmortem(service="<service>", incident_start="2025-03-14T00:00:00Z")  ← 1 call
  → postmortem scoped to that service and time
```

**Generate postmortem — by root cause name:**
```
postmortem(root_cause_name="<name>", entity_name="<service>")  ← 1 call
  → if ambiguous: returns ambiguity_candidates → re-submit with root_cause_id
```

**Enrich postmortem with additional context:**
```
get_entities(query="<service>") → get_events(entity_id=<id>)  ← timeline enrichment
get_symptoms(active_only=false, lookback_hours=48, entity_ids=[id])  ← signal timeline
  → add deploy events, symptom transitions to the postmortem narrative
```

**Generate remediation ticket from postmortem:**
```
postmortem(root_cause_id=<id>)                             ← 1 call
  → extract action items from postmortem
generate_ticket(task="<action item description>")           ← 1 call per ticket
  → structured ticket: title, description, acceptance criteria
```

**Generate ticket without postmortem (standalone):**
```
generate_ticket(task="<description of the remediation work>")  ← 1 call
  → Jira/GitHub/Linear-ready ticket draft
```

---

## Postmortem input priority

Use the first applicable lookup path:
1. **`root_cause_id`** — preferred; directly identifies the root cause
2. **`root_cause_name` + `entity_name`** — resolves by name; returns candidates if multiple match
3. **`service` + `incident_start`** — legacy path; requires service name and RFC3339 start time

`incident_id` alone is not resolvable — always pair it with one of the paths above.

---

## Output format

### 📋 Incident postmortem

[Postmortem markdown from the `postmortem` tool — includes title, summary, timeline, root cause analysis, blast radius, contributing factors, and action items]

---

### 🎫 Remediation tickets

For each action item from the postmortem:

**Title:** [from generate_ticket]
**Priority:** [inferred from severity]
**Description:** [from generate_ticket — context + requirements]
**Acceptance criteria:** [from generate_ticket]

---

## Important behaviours

- **Prefer `root_cause_id`** over other lookup paths — it's the most reliable and unambiguous.
- **Handle ambiguity gracefully**: if `postmortem(root_cause_name=)` returns `ambiguity_candidates`, present the candidates to the user and ask them to pick one, then re-call with the selected `root_cause_id`.
- **Don't re-investigate**: the postmortem tool synthesises from Causely's data layer. Do not separately call triage + get_root_causes + get_logs to rebuild what postmortem already returns.
- **Tickets are forward-looking**: use `generate_ticket` for remediation work, not for documenting what happened (that's the postmortem).
- **Surface portal links** so engineers can drill into the Causely data behind the postmortem.

来源:https://github.com/causely-oss/causely-client