CursorPool
← 返回首页

mixedbread-skills

Agent skills for search, RAG, and document parsing with Mixedbread.

cursor.directory·0
Skill

mixedbread-parsing

Parse documents, extract structured content, and run OCR using the Parsing API. Supports PDFs, Word documents, PowerPoint presentations, and images.

# Mixedbread Parsing

Parse documents, extract structured content, and run OCR using the Parsing API. Supports PDFs, Word documents, PowerPoint presentations, and images.

Docs: https://www.mixedbread.com/docs/parsing/overview.md
Agent-readable docs: https://www.mixedbread.com/docs/llms.txt
Latest docs search: https://www.mixedbread.com/question?q=parsing&section=docs

## Setup

```bash
pip install mixedbread          # Python
npm install @mixedbread/sdk     # TypeScript
```

```bash
export MXBAI_API_KEY=your_api_key
```

## Quick Start

**Python:**
```python
from mixedbread import Mixedbread

mxbai = Mixedbread()

# Upload and parse a document (waits for completion)
job = mxbai.parsing.jobs.upload_and_poll(
    file=open("report.pdf", "rb"),
    return_format="markdown",
)

for chunk in job.result.chunks:
    print(chunk.content)
```

**TypeScript:**
```typescript
import Mixedbread from '@mixedbread/sdk';
import fs from 'fs';

const mxbai = new Mixedbread();

const job = await mxbai.parsing.jobs.uploadAndPoll(
    fs.createReadStream('report.pdf'),
    { return_format: 'markdown' },
);

for (const chunk of job.result.chunks) {
    console.log(chunk.content);
}
```

## Decision Tree

- **Which convenience method?**
  - File on disk → `upload_and_poll()` (uploads + creates job + polls)
  - File already uploaded via Files API → `create_and_poll()` (creates job + polls)
  - Need async control → `upload()` or `create()` then `poll()` separately
- **Which parsing mode?**
  - Born-digital PDF (selectable text) → `fast` mode. Fastest, lowest cost. Extracts text, structure, and layout.
  - Scanned document, image, or complex layout → `high_quality` mode. Uses OCR. Extracts text with confidence scores, handles rotated/skewed pages, multi-column layouts.
- **Need specific elements only?** → Set `element_types` to reduce processing time

## Supported File Types

PDF (`.pdf`), Word (`.doc`, `.docx`, `.dotx`, `.docm`, `.dotm`, `.odt`, `.rtf`), Slides (`.ppt`, `.pptx`, `.ppsx`, `.pptm`, `.potm`, `.ppsm`, `.odp`), Images (`.jpeg`, `.png`, `.webp`, `.avif`).

Element types: `text`, `title`, `section-header`, `header`, `footer`, `page-number`, `list-item`, `figure`, `picture`, `table`, `form`, `footnote`, `caption`, `formula`.

## Workflows

### Extract Tables from Documents

Filter for table elements to pull structured data from reports.

**Python:**
```python
job = mxbai.parsing.jobs.upload_and_poll(
    file=open("financial-report.pdf", "rb"),
    element_types=["table"],
    return_format="html",
    mode="high_quality",
)
for chunk in job.result.chunks:
    for element in chunk.elements:
        if element.type == "table":
            print(f"Page {element.page}, confidence {element.confidence:.2f}")
            print(element.content)
```

**TypeScript:**
```typescript
const job = await mxbai.parsing.jobs.uploadAndPoll(
    fs.createReadStream('financial-report.pdf'),
    { element_types: ['table'], return_format: 'html', mode: 'high_quality' },
);
for (const chunk of job.result.chunks) {
    for (const element of chunk.elements) {
        if (element.type === 'table') {
            console.log(`Page ${element.page}, confidence ${element.confidence.toFixed(2)}`);
            console.log(element.content);
        }
    }
}
```

### Batch Parse Multiple Files

Upload multiple files asynchronously, then poll all jobs:

**Python:**
```python
import os

jobs = []
for filename in os.listdir("./documents"):
    if filename.endswith(".pdf"):
        job = mxbai.parsing.jobs.upload(
            file=open(f"./documents/{filename}", "rb"),
            return_format="markdown",
        )
        jobs.append(job)

# Poll all jobs
for job in jobs:
    completed = mxbai.parsing.jobs.poll(job_id=job.id)
    print(f"{completed.filename}: {len(completed.result.chunks)} chunks")
```

**TypeScript:**
```typescript
import { readdirSync, createReadStream } from 'fs';
import path from 'path';

const files = readdirSync('./documents').filter(f => f.endsWith('.pdf'));
const jobs = await Promise.all(
    files.map(f => mxbai.parsing.jobs.upload(
        createReadStream(path.join('./documents', f)),
        { return_format: 'markdown' },
    )),
);

// Poll all jobs
for (const job of jobs) {
    const completed = await mxbai.parsing.jobs.poll(job.id);
    console.log(`${completed.filename}: ${completed.result.chunks.length} chunks`);
}
```

## Rules

### CRITICAL
- **Don't double-parse.** Store uploads auto-parse documents. Files uploaded with `parsing_strategy: "high_quality"` automatically get OCR text (images), summaries (images), and transcriptions (audio & video) extracted. These are available as fields on search result chunks. There is no benefit to also running the Parsing API on the same file. Use the Parsing API only for standalone document extraction outside of stores.
- **Use `upload_and_poll()` / `create_and_poll()` instead of manual polling loops.** These methods handle backoff automatically. Manual `while` loops with `retrieve()` are fragile and waste API calls.

### HIGH
- **Specify `element_types` when you only need certain elements.** Requesting all types increases processing time and response size. If you only need tables, set `element_types` to `table` only.
- **Use `fast` mode for born-digital PDFs.** The `high_quality` mode adds OCR overhead that provides no benefit when text is already selectable.
- **Check `confidence` scores on OCR output.** Low-confidence elements (< 0.5) may contain errors. Filter or flag them.

### MEDIUM
- **Check `job.error` before retrying failed jobs.** Common causes: unsupported file type, corrupt file, file too large. Blindly retrying wastes quota.
- **Use `content_to_embed` for embedding pipelines.** Each chunk provides both `content` (full text) and `content_to_embed` (optimized for embedding). Use the latter when feeding into vector stores outside Mixedbread.
- **Verify file format before parsing.** Only PDF, Word, PowerPoint, and images are supported. Convert other formats first.

## Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| Job stuck in `pending` | Queue is busy | Use `poll()` with a longer `poll_timeout_ms`. Check job status with `retrieve()`. |
| Job status `failed` | Unsupported file type, corrupt file, or file too large | Check `job.error` for details. Verify file format is supported. |
| Empty chunks in result | File has no extractable content (blank pages) | Verify the file has content. Try `high_quality` mode for scanned documents. |
| Low confidence scores | Scanned or low-resolution source | Use `high_quality` mode for better OCR accuracy. |
| Missing tables or figures | Element types not requested | Set `element_types` to include `table` and `figure` explicitly. |
| `upload_and_poll()` timeout | Very large document or slow processing | Increase `poll_timeout_ms`, or use `upload()` + `poll()` separately for more control. |
Skill

mixedbread-search

Create and search managed knowledge bases using the Stores API. Stores are multimodal search indexes that handle text, images, tables, audio, and video across 100+ languages.

# Mixedbread Search

Create and search managed knowledge bases using the Stores API. Stores are multimodal search indexes that handle text, images, tables, audio, and video across 100+ languages.

Docs: https://www.mixedbread.com/docs/stores/overview.md
Agent-readable docs: https://www.mixedbread.com/docs/llms.txt
Latest docs search: https://www.mixedbread.com/question?q=stores&section=docs

## Setup

```bash
pip install mixedbread          # Python
npm install @mixedbread/sdk     # TypeScript
```

```bash
export MXBAI_API_KEY=your_api_key
```

## Quick Start

**Python:**
```python
import os
from mixedbread import Mixedbread

mxbai = Mixedbread(api_key=os.environ["MXBAI_API_KEY"])

store = mxbai.stores.create(name="my-docs", description="Product documentation")

mxbai.stores.files.upload(
    store_identifier=store.id,
    file=open("guide.pdf", "rb"),
    metadata={"category": "guides", "version": "2.0"},
)

results = mxbai.stores.search(
    query="How does authentication work?",
    store_identifiers=["my-docs"],
    top_k=5,
)
for chunk in results.data:
    print(f"{chunk.score:.3f} | {chunk.filename}: {chunk.text[:100]}")
```

**TypeScript:**
```typescript
import { Mixedbread } from '@mixedbread/sdk';
import fs from 'fs';

const mxbai = new Mixedbread({
    apiKey: process.env.MXBAI_API_KEY!,
});

const store = await mxbai.stores.create({
    name: 'my-docs',
    description: 'Product documentation',
});

await mxbai.stores.files.upload({
    storeIdentifier: store.id,
    file: fs.createReadStream('guide.pdf'),
    body: { metadata: { category: 'guides', version: '2.0' } },
});

const results = await mxbai.stores.search({
    query: 'How does authentication work?',
    store_identifiers: ['my-docs'],
    top_k: 5,
});
```

## Decision Tree

- **What kind of retrieval do you need?**
  - Simple keyword/semantic lookup → Standard `search()` with `top_k`
  - Natural-language answer with citations → `question_answering()` with `cite` enabled
  - Complex multi-hop question → `search()` with `agentic` enabled
  - Combine internal docs with live web → Add `"mixedbread/web"` to `store_identifiers`
- **Do you need metadata filtering?**
  - Don't know what metadata exists → Call `metadata_facets()` first
  - Know the fields → Build `filters` with `all`/`any`/`none` combinators
- **Do you need higher relevance?**
  - Yes → Set `"rerank": true` in `search_options`, or use `{"rerank": {"model": "mixedbread-ai/mxbai-rerank-large-v2"}}` to choose a model
- **Do you need OCR, summaries, or transcriptions from files?**
  - Yes → Upload files with `config: {"parsing_strategy": "high_quality"}`. Stores auto-extract OCR text, summaries, and transcriptions — no separate parsing needed.
  - No / text-only documents → Default `parsing_strategy` (`"fast"`) is sufficient.
- **Is the store temporary (e.g., PR review)?**
  - Yes → Set `expires_after` with a day limit at creation

## Workflows

### Build a Searchable Knowledge Base

Create a store, upload documents, and search. Most of the time you do not need to poll for finished files. Only gate on processing when the workflow depends on complete batch coverage, such as benchmarks or recall evaluation.

**Python:**
```python
store = mxbai.stores.create(
    name="product-docs",
    description="Product documentation",
    config={"contextualization": {"with_metadata": ["title", "category"]}},
)

mxbai.stores.files.upload(
    store_identifier=store.id,
    file=open("guide.pdf", "rb"),
    metadata={"title": "Setup Guide", "category": "guides"},
)
mxbai.stores.files.upload(
    store_identifier=store.id,
    file=open("faq.md", "rb"),
    metadata={"title": "FAQ", "category": "support"},
)

results = mxbai.stores.search(
    query="How do I reset my password?",
    store_identifiers=["product-docs"],
    top_k=5,
    search_options={"rerank": True, "return_metadata": True},
)
for chunk in results.data:
    print(f"{chunk.score:.3f} | {chunk.filename}: {chunk.text[:100]}")

# Optional: poll store.file_counts if you need deterministic full-batch coverage (benchmarks, migrations).
```

**TypeScript:**
```typescript
const store = await mxbai.stores.create({
    name: 'product-docs',
    description: 'Product documentation',
    config: { contextualization: { with_metadata: ['title', 'category'] } },
});

await mxbai.stores.files.upload({
    storeIdentifier: store.id,
    file: fs.createReadStream('guide.pdf'),
    body: { metadata: { title: 'Setup Guide', category: 'guides' } },
});
await mxbai.stores.files.upload({
    storeIdentifier: store.id,
    file: fs.createReadStream('faq.md'),
    body: { metadata: { title: 'FAQ', category: 'support' } },
});

const results = await mxbai.stores.search({
    query: 'How do I reset my password?',
    store_identifiers: ['product-docs'],
    top_k: 5,
    search_options: { rerank: true, return_metadata: true },
});

// Optional: poll store.file_counts if you need deterministic full-batch coverage (benchmarks, migrations).
```

### Filter-Driven Search

Discover available metadata, then build targeted filters.

**Python:**
```python
facets = mxbai.stores.metadata_facets(store_identifiers=["product-docs"])
for key, values in facets.facets.items():
    print(f"{key}: {values}")

results = mxbai.stores.search(
    query="deployment guide",
    store_identifiers=["product-docs"],
    top_k=10,
    filters={
        "all": [
            {"key": "category", "operator": "eq", "value": "guides"},
            {"key": "status", "operator": "not_eq", "value": "archived"},
        ]
    },
    search_options={"rerank": True, "return_metadata": True},
)
```

**TypeScript:**
```typescript
const facets = await mxbai.stores.metadataFacets({
    store_identifiers: ['product-docs'],
});
for (const [key, values] of Object.entries(facets.facets ?? {})) {
    console.log(`${key}: ${JSON.stringify(values)}`);
}

const results = await mxbai.stores.search({
    query: 'deployment guide',
    store_identifiers: ['product-docs'],
    top_k: 10,
    filters: {
        all: [
            { key: 'category', operator: 'eq', value: 'guides' },
            { key: 'status', operator: 'not_eq', value: 'archived' },
        ],
    },
    search_options: { rerank: true, return_metadata: true },
});
```

Filter operators: `eq`, `not_eq`, `gt`, `gte`, `lt`, `lte`, `in`, `not_in`, `like`, `starts_with`, `not_like`, `regex`. Combine with `all` (AND), `any` (OR), `none` (NOT).

### Web-Augmented Search

Include `"mixedbread/web"` in `store_identifiers` to combine store search with live web results. This is a reserved store identifier — no setup required. You can also search the web alone.

**Python:**
```python
results = mxbai.stores.search(
    query="latest best practices",
    store_identifiers=["my-docs", "mixedbread/web"],
)
```

**TypeScript:**
```typescript
const results = await mxbai.stores.search({
    query: 'latest best practices',
    store_identifiers: ['my-docs', 'mixedbread/web'],
});
```

### Question Answering

Get a generated answer with cited sources. The answer may contain `<cite i="n"/>` tags referencing the sources list.

**Python:**
```python
result = mxbai.stores.question_answering(
    query="What are the rate limits?",
    store_identifiers=["my-docs"],
    top_k=10,
    qa_options={"cite": True},
    search_options={"rerank": True},
)
print(result.answer)
for source in result.sources:
    print(f"  {source.filename} (score: {source.score:.3f})")
```

**TypeScript:**
```typescript
const result = await mxbai.stores.questionAnswering({
    query: 'What are the rate limits?',
    store_identifiers: ['my-docs'],
    top_k: 10,
    qa_options: { cite: true },
    search_options: { rerank: true },
});
console.log(result.answer);
for (const source of result.sources) {
    console.log(`  ${source.filename} (score: ${source.score.toFixed(3)})`);
}
```

### Question Answering with Agentic Fallback

When QA returns no sources, retry with agentic search for deeper retrieval. Always re-call `question_answering()` — do not fall back to raw `search()`, which loses the generated answer.

**Python:**
```python
result = mxbai.stores.question_answering(
    query="Compare the pricing tiers and their feature differences",
    store_identifiers=["my-docs"],
    top_k=10,
    qa_options={"cite": True},
    search_options={"rerank": True},
)

if not result.sources:
    result = mxbai.stores.question_answering(
        query="Compare the pricing tiers and their feature differences",
        store_identifiers=["my-docs"],
        top_k=10,
        qa_options={"cite": True},
        search_options={
            "rerank": True,
            "agentic": {"max_rounds": 3},
        },
    )

print(result.answer)
for source in result.sources:
    print(f"  {source.filename} (score: {source.score:.3f})")
```

### Agentic Search

For complex questions requiring multi-step retrieval. The system decomposes your query into sub-queries and runs multiple rounds. Works in both `search()` and `question_answering()`.

**Python:**
```python
results = mxbai.stores.search(
    query="Compare the pricing tiers and their feature differences",
    store_identifiers=["product-docs"],
    search_options={
        "agentic": {
            "max_rounds": 3,
            "queries_per_round": 2,
        }
    },
)
```

**TypeScript:**
```typescript
const results = await mxbai.stores.search({
    query: 'Compare the pricing tiers and their feature differences',
    store_identifiers: ['product-docs'],
    search_options: {
        agentic: {
            max_rounds: 3,
            queries_per_round: 2,
        },
    },
});
```

Set `agentic` to `true` for default settings, or pass an object to control `max_rounds` and `queries_per_round`.

## Response Shapes

**Search results** (`search()` returns):
```python
response.data  # list of chunks
chunk.text       # str — the matched text
chunk.score      # float — relevance score (0–1)
chunk.filename   # str — source file name
chunk.file_id    # str — source file ID
chunk.store_id   # str — store the chunk belongs to
chunk.metadata   # dict — attached metadata (when return_metadata is enabled)
chunk.type       # str — chunk type (e.g. "text", "image_url")
chunk.image_url  # dict | None — image payload for image chunks
chunk.ocr_text   # str | None — OCR text for image-heavy chunks
chunk.summary    # str | None — auto-generated summary for image chunks (high_quality mode)
chunk.transcription # str | None — transcription for audio/video chunks (high_quality mode)
```

**QA results** (`question_answering()` returns):
```python
result.answer    # str — generated answer, may contain <cite i="n"/> tags
result.sources   # list of source objects
source.filename  # str
source.score     # float
source.file_id   # str
source.text      # str — the source chunk text
source.image_url # dict | None — image payload with url/format for image chunks
```

## Store Management

```python
stores = mxbai.stores.list(limit=20)
for store in stores.data:
    print(store.name)

store = mxbai.stores.retrieve(store_identifier="my-docs")
print(store.file_counts)  # {"completed": 5, "in_progress": 2, "failed": 0}

mxbai.stores.delete(store_identifier="my-docs")

files = mxbai.stores.files.list(store_identifier="my-docs", limit=20)
for file in files.data:
    print(file.filename, file.status)
```

## Rules

### CRITICAL
- **Store names must be lowercase letters, numbers, hyphens, and periods only.** Invalid names cause creation to fail. No spaces, underscores, or uppercase.
- **For field-level contextualization, use the documented `{"with_metadata": [...]}` form.** The other documented modes are `true` (all metadata) and `false` (none). Dot notation is supported for nested fields.

### HIGH
- **Do not block on full ingestion unless completeness matters.** Stores process files asynchronously, and completed files become searchable as they finish. Most of the time, especially for interactive flows, upload and search immediately without polling. Poll file status or `file_counts` only when the workflow depends on complete batch coverage, such as benchmarks, migrations, or sync verification.
- **Use `metadata_facets()` before building filters.** Don't guess metadata keys — discover them. Typos in filter keys silently return no results.
- **Enable `rerank` for production search.** Reranking significantly improves relevance. Only skip it for latency-sensitive prototyping.
- **Use `parsing_strategy: "high_quality"` to enable automatic content extraction.** When set in per-file config at upload time, high quality mode extracts OCR text and summaries for images, and transcriptions for audio and video. These fields are directly usable as LLM context. The default `"fast"` strategy indexes content without these additional extractions.
- **Use standard search for simple lookups.** Agentic search adds latency from multiple retrieval rounds. Only use it for complex, multi-hop questions.

### MEDIUM
- **Set `expires_after` for temporary stores.** PR review stores, demo stores, and test stores should auto-expire to avoid accumulating unused indexes.
- **One store per knowledge domain, not per query.** Stores are persistent indexes meant to be reused. Create once, search many times.
- **Use chunk scores to filter low-relevance noise.** If you need a minimum relevance cutoff, post-filter on `chunk.score` (for example `>= 0.3`) after retrieval.
- **Start with default `agentic` settings.** Only increase `max_rounds` if results are insufficient.

## Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| No results returned | Newly uploaded files are still processing, or the store name/query is wrong | Retry after processing completes for at least one file. For completeness-sensitive runs, verify the expected files are `completed` before evaluating results. |
| No results returned | Score cutoff too high | Lower or remove your post-filter threshold. |
| No results returned | Wrong `store_identifiers` | Verify the store name or ID matches exactly. |
| Metadata filters return nothing | Wrong key name or value | Use `metadata_facets()` to discover actual keys and values. |
| Slow agentic search | Too many rounds or queries | Reduce `max_rounds` or `queries_per_round`. Use standard search if the query is simple. |
| API key error | Invalid or missing key | Verify `MXBAI_API_KEY` is set. Get a key at https://platform.mixedbread.com/platform?next=api-keys |
Skill

mxbai-cli

The `mxbai` CLI manages stores, uploads files, performs semantic search, and syncs directories with Mixedbread from the terminal.

# mxbai CLI

The `mxbai` CLI manages stores, uploads files, performs semantic search, and syncs directories with Mixedbread from the terminal.

Docs: https://www.mixedbread.com/cli.md
Agent-readable docs: https://www.mixedbread.com/docs/llms.txt
Latest docs search: https://www.mixedbread.com/question?q=cli&section=cli

## Installation

```bash
npm install -g @mixedbread/cli    # global
npm install --save-dev @mixedbread/cli  # project-local (use npx mxbai)
```

Requires Node.js >= 20.0. Verify with `mxbai --version`.

## Authentication

Resolved in priority order:

1. **Flag:** `--api-key mxb_xxxxx` or `--saved-key <name>`
2. **Environment variable:** `export MXBAI_API_KEY=mxb_xxxxx`
3. **Config file:** `mxbai config set api_key mxb_xxxxx`

Get your API key at https://platform.mixedbread.com/platform?next=api-keys

## Quick Start

```bash
# Create a store and upload docs
mxbai store create "my-docs" --description "Product documentation"
mxbai store upload "my-docs" "docs/**/*.md"

# Search
mxbai store search "my-docs" "How does authentication work?"

# Sync changed files (hash-based detection by default)
mxbai store sync "my-docs" "docs/**"
```

## Decision Tree

- **Upload vs Sync?**
  - One-time or manual upload → `mxbai store upload`
  - Ongoing updates (especially CI/CD) → `mxbai store sync`
- **Which change detection for sync?**
  - In a git repo with known base commit → `--from-git HEAD~1` (fastest)
  - Outside git or need exact comparison → hash-based detection (default, compares content hashes)
- **CLI vs SDK?**
  - Shell scripts, CI/CD, one-off tasks → CLI
  - Application code, custom logic, programmatic access → Python/TypeScript SDK

## Workflows

### CI/CD Documentation Sync

Sync documentation to a store on every push using the default hash-based change detection.

**GitHub Actions:**
```yaml
name: Sync Documentation
on:
  push:
    branches: [main]
    paths:
      - 'docs/**'

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install mxbai CLI
        run: npm install -g @mixedbread/cli

      - name: Sync docs to store
        env:
          MXBAI_API_KEY: ${{ secrets.MXBAI_API_KEY }}
        run: |
          mxbai store sync my-docs "docs/**/*.md" \
            --strategy high_quality \
            --yes
```

For faster change detection in git repos, add `--from-git HEAD~1` (requires `fetch-depth: 2`) or `--from-git origin/main` (requires `fetch-depth: 0`).

**Key points:**
- Always pass `--yes` — CI environments are non-interactive and commands hang without it
- Use `--from-git` for faster change detection in git repos
- Store the API key as a secret via `MXBAI_API_KEY`
- Use `--dry-run` in a separate step to preview changes before applying

**Preview before syncing:**
```bash
mxbai store sync "my-docs" "docs/**" --dry-run
```

### Multi-Environment Setup

Manage separate API keys for staging and production.

```bash
# Add keys for different environments
mxbai config keys add mxb_xxxxx production
mxbai config keys add mxb_xxxxx staging

# Set production as default
mxbai config keys set-default production

# Use staging for a specific command
mxbai store search staging-docs "query" --saved-key staging
```

### Upload with Manifest

Use a manifest file for complex uploads with per-file metadata and strategy overrides.

```yaml
# upload-manifest.yaml
version: "1"
defaults:
  strategy: fast
  metadata:
    team: engineering
files:
  - path: docs/getting-started.md
    metadata:
      title: Getting Started Guide
      priority: high
  - path: docs/api-reference.md
    strategy: high_quality
    metadata:
      title: API Reference
  - path: reports/*.pdf
    metadata:
      category: reports
```

```bash
mxbai store upload "my-docs" --manifest upload-manifest.yaml
```

### Store Aliases

Create short aliases for frequently used stores:

```bash
mxbai config set aliases.docs "my-documentation-store"
mxbai config set aliases.prod "str_abc123"

# Use aliases in any command
mxbai store search docs "how to deploy"
mxbai store upload prod "files/**/*.md"
```

## Rules

### CRITICAL
- **Always pass `--yes` in CI/CD.** Without it, sync and delete commands hang waiting for interactive confirmation that never comes. CI environments don't have a TTY.
- **`--contextualization` on upload/sync is deprecated since v2.2.0.** Configure contextualization at store creation with `mxbai store create --contextualization`. The flag on upload/sync shows a warning and is ignored.

### HIGH
- **`--parallel` max is 200.** The CLI validates and rejects values above 200. Default is 100.
- **`--force` sync re-uploads everything.** It bypasses change detection entirely. Use sparingly — typically only for periodic full re-syncs (e.g., weekly cron).
- **Store names: lowercase letters, numbers, hyphens, and periods only.** Invalid names cause creation to fail.

### MEDIUM
- **Use `--dry-run` before first sync.** Preview what would be uploaded, changed, or deleted before committing.
- **Use store aliases for frequently-used stores.** Avoids typos and long store names in commands.
- **Use `--unique` on upload to skip duplicates.** Prevents re-uploading files that already exist (based on content hash).

## Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| "Command not found" | Node.js < 20 or not globally installed | Verify Node.js >= 20. Try `npx mxbai` for project-local installs. |
| "No API key" | No key configured | Run `mxbai config keys add <key>` or set `MXBAI_API_KEY` env var. |
| Sync hangs in CI | Missing `--yes` flag | Pass `--yes` for non-interactive mode. |
| Upload timeout for large files | Default multipart settings insufficient | Tune `--multipart-threshold`, `--multipart-part-size`, `--multipart-concurrency`. |
| Store not found | Wrong name or alias | Check aliases with `mxbai config get aliases`. Verify name uses valid characters. |
| Contextualization warning | Deprecated flag on upload/sync | Set contextualization at store creation instead. |
| Sync detects no changes | Hash-based detection with modified metadata only | Use `--force` to re-upload, or `--from-git` to detect changes via git. |
| `--from-git` misses files | `fetch-depth` too shallow in CI | Set `fetch-depth: 0` for full history, or `fetch-depth: 2` minimum for `HEAD~1`. |

来源:https://github.com/mixedbread-ai/skills