prompt-optimizer

133

Prompt Optimizer：六维评分迭代优化 Prompt 与 Rules，自动识别模式并匹配评分标准。

prompt-engineeringprompt-optimizationscoringrulescursor-rulesiterativeevaluation

1 个 Skill

# Prompt Optimizer

A semi-automatic iterative optimization skill inspired by the autoresearch
paradigm. Supports two modes — **Prompt Mode** for task-specific prompts and
**Rules Mode** for persistent system-level rules — with auto-detection.

## Workflow

### Step 1: Receive Input

Accept a prompt or rule via inline text or file path. If the user provides a
file path, read the file contents. If neither, ask the user to provide the
text they want to optimize.

### Step 1.5: Auto-Detect Mode

Classify the input as **Prompt** or **Rule** based on these signals:

| Signal | Prompt | Rule |
|--------|--------|------|
| Describes a single task with expected output | Yes | No |
| Uses persistent behavioral language ("always", "never", "when X do Y") | No | Yes |
| Contains role/persona definition for ongoing use | No | Yes |
| Expects a one-time deliverable | Yes | No |
| Located in `.cursor/rules/` or user_rules config | No | Yes |
| References other rules or system-level concerns | No | Yes |

If ambiguous, ask the user to confirm.
Display the detected mode: `[Mode: Prompt]` or `[Mode: Rules]`.

### Step 2: Evaluate — 6 Dimensions (1-10)

Select the scoring table matching the detected mode.

**Prompt Mode** — for task-specific, one-off prompts:

| Dim | Name | Guiding Question |
|-----|------|------------------|
| C | **Clarity** | Would a context-free LLM interpret this unambiguously? |
| S | **Specificity** | Are constraints, output format, and expected behavior explicit? |
| T | **Structure** | Is the information logically organized with clear hierarchy? |
| O | **Completeness** | Does it cover context, examples, edge cases, and error handling? |
| E | **Efficiency** | Is every sentence carrying necessary information? Zero fluff? |
| R | **Robustness** | Would 10 runs produce consistent, high-quality outputs? |

**Rules Mode** — for persistent system-level rules:

| Dim | Name | Guiding Question |
|-----|------|------------------|
| C | **Clarity** | Would any LLM unambiguously understand the behavioral intent? |
| S | **Scope Fit** | Is the rule's breadth appropriate — not too broad, not too narrow? |
| T | **Structure** | Is the rule well-organized and easy to scan during every conversation? |
| O | **Coverage** | Does it handle the relevant scenarios without over-specifying? |
| E | **Efficiency** | Is the token cost justified given this runs on EVERY conversation? |
| R | **Composability** | Does this rule coexist peacefully with other rules? No conflicts? |

Composite score = unweighted average of all 6 dimensions (user may override
weights).

For detailed scoring rubrics and anchor examples, see
[scoring-rubric.md](scoring-rubric.md).

### Step 3: Output the Scorecard

Use this exact format:

**Prompt Mode**:
```
== Prompt Scorecard v{N} ==
Clarity:      {score}/10  {delta}
Specificity:  {score}/10  {delta}
Structure:    {score}/10  {delta}
Completeness: {score}/10  {delta}
Efficiency:   {score}/10  {delta}
Robustness:   {score}/10  {delta}
------------------------------
Composite:    {avg}/10    {delta}

Weakest:  {dimension_name}
Verdict:  {one-line diagnosis}
```

**Rules Mode**:
```
== Rules Scorecard v{N} ==
Clarity:       {score}/10  {delta}
Scope Fit:     {score}/10  {delta}
Structure:     {score}/10  {delta}
Coverage:      {score}/10  {delta}
Efficiency:    {score}/10  {delta}
Composability: {score}/10  {delta}
------------------------------
Composite:     {avg}/10    {delta}

Weakest:  {dimension_name}
Verdict:  {one-line diagnosis}
```

- For v1, leave `{delta}` blank.
- For v2+, show delta as `(+1)`, `(-1)`, or `(=)` relative to previous version.

### Step 4: Suggest Improvements

Focus on the weakest 1-2 dimensions only. Greedy strategy — small targeted
fixes avoid regression on other dimensions.

Each suggestion must be:
1. **Concrete** — show the exact text to add, remove, or rewrite.
2. **Justified** — explain which dimension it targets and why.
3. **Minimal** — smallest change for maximum score uplift.

### Step 5: User Confirmation

Present the suggested changes and wait for user confirmation:
- **Confirmed** → apply changes, go to Step 6.
- **Modified** → incorporate user adjustments, then go to Step 6.
- **Rejected** → generate alternative suggestions, return to Step 4.

### Step 6: Apply and Re-evaluate

1. Produce the new prompt version.
2. Re-run the 6-dimension evaluation (Step 2).
3. Output the updated scorecard with deltas.
4. Append to the version history table.

### Step 7: Version History

Maintain a running table throughout the session. Column headers adapt to mode:

**Prompt Mode**: `| Version | C | S | T | O | E | R | Composite | Change Summary |`
**Rules Mode**: `| Version | C | SF | T | Cov | E | Comp | Composite | Change Summary |`

```
| Version | C | S | T | O | E | R | Composite | Change Summary |
|---------|---|---|---|---|---|---|-----------|----------------|
| v1      | 5 | 4 | 6 | 3 | 7 | 4 | 4.8       | baseline       |
| v2      | 7 | 4 | 6 | 5 | 7 | 5 | 5.7       | added examples  |
```

### Step 8: Termination

The loop ends when:
- Composite score >= 8.5, OR
- User explicitly says they are satisfied.

On termination, output:
1. The final optimized prompt (complete text).
2. The full version history table.
3. A summary of key improvements made.

## Optimization Principles

1. **One thing at a time** — never rewrite the entire prompt in one iteration.
   Target the weakest dimension with surgical changes.
2. **Never break what works** — if a dimension scored 8+, do not touch the
   text responsible for that score unless absolutely necessary.
3. **Simplicity over cleverness** — if two rewrites achieve the same score
   gain, pick the shorter one.
4. **Evidence over intuition** — justify every score with a specific quote
   or absence from the prompt text.
5. **Respect user intent** — the optimization must preserve the user's
   original purpose. If unclear, ask before changing.

## Quick Examples

For complete optimization walkthroughs (from low-score to high-score), see
[examples.md](examples.md).

Future AGI↓ 1

AI 评估、可观测性与优化平台，可通过自然语言运行评估、管理数据集、检索 trace、优化 prompt 并模拟 AI agent。

Astro↓ 60

Astro Cursor 插件：含官方 Astro Docs MCP、最佳实践规则与 add integration 等 Skill。

cursor-rules-pack↓ 20

7 条经生产验证的 Cursor Rules，涵盖依赖管理、错误处理、状态管理、webhook 安全等，是完整 50 条规则包的免费样例。

ABP Framework↓ 18

将 ABP 框架的规则与 MCP server 打包为 Cursor 插件，辅助 ABP 开发。

Credos↓ 8

在 Cursor 中分享你的 agent 规则与 skill。

bergamota-kit↓ 4

一组精选 Skill，用于在 Cursor agent 工作流中进行审慎规划、严谨调试与有纪律的执行。

ai-memory↓ 3

将编辑器聊天记录（Cursor、Claude Code、Windsurf、Copilot、Codex）转化为类型化的 Markdown 记忆、AGENTS.md、Cursor Rules 与 Anthropic Skill，本地优先且可纳入 git 跟踪。

inbox-zero↓ 1

在 Cursor 中使用 Inbox Zero API CLI：查看 OpenAPI、列出规则、读取统计并创建或更新规则；安装 @inbox-zero/api，设置 INBOX_ZERO_API_KEY 后可用规则与统计功能。

相关插件