This file is a merged representation of the entire codebase, combining all repository files into a single document.
Generated by Repopack on: 2026-01-18T09:46:51.430Z

================================================================
File Summary
================================================================

Purpose:
--------
This file contains a packed representation of the entire repository's contents.
It is designed to be easily consumable by AI systems for analysis, code review,
or other automated processes.

File Format:
------------
The content is organized as follows:
1. This summary section
2. Repository information
3. Repository structure
4. Multiple file entries, each consisting of:
  a. A separator line (================)
  b. The file path (File: path/to/file)
  c. Another separator line
  d. The full contents of the file
  e. A blank line

Usage Guidelines:
-----------------
- This file should be treated as read-only. Any changes should be made to the
  original repository files, not this packed version.
- When processing this file, use the file path to distinguish
  between different files in the repository.
- Be aware that this file may contain sensitive information. Handle it with
  the same level of security as you would the original repository.

Notes:
------
- Some files may have been excluded based on .gitignore rules and Repopack's
  configuration.
- Binary files are not included in this packed representation. Please refer to
  the Repository Structure section for a complete list of file paths, including
  binary files.

Additional Info:
----------------

For more information about Repopack, visit: https://github.com/yamadashy/repopack

================================================================
Repository Structure
================================================================
pages/
  en/
    api/
      benchmark.mdx
      cli.mdx
      corpus.mdx
      custom.mdx
      evaluation.mdx
      generation.mdx
      prompt.mdx
      reranker.mdx
      retriever.mdx
      router.mdx
    demo/
      deepresearch.mdx
      lightresearch.mdx
      llm.mdx
      rag.mdx
    develop_guide/
      case_study.mdx
      code_integration.mdx
      dataset.mdx
      debug.mdx
      parallel.mdx
    getting_started/
      installation.mdx
      introduction.mdx
      quick_start.mdx
      update.mdx
    pipeline/
      light_deepresearch.mdx
      rag.mdx
      search_o1.mdx
      visrag.mdx
    rag_client/
      branch.mdx
      data_and_params.mdx
      loop.mdx
      multi_agents.mdx
      pipeline.mdx
      serial.mdx
    rag_servers/
      benchmark.mdx
      corpus.mdx
      custom.mdx
      evaluation.mdx
      generation.mdx
      overview.mdx
      prompt.mdx
      reranker.mdx
      retriever.mdx
      router.mdx
    ui/
      prepare.mdx
      start.mdx

================================================================
Repository Files
================================================================

================
File: pages/en/api/benchmark.mdx
================
---
title: "Benchmark"
icon: "chart-line"
---

## `get_data`

**Signature**
```python
@app.tool(output="benchmark->q_ls,gt_ls")
def get_data(benchmark: Dict[str, Any]) -> Dict[str, List[Any]]
```

**Function**
- Multi-format Loading: Supports loading evaluation datasets in `.jsonl`, `.json`, or `.parquet` formats from local storage.
- Dynamic Field Mapping: Uses `key_map` to map different column names in raw data (such as `question`, `answer`) to standardized output keys (usually `q_ls` and `gt_ls`).
- Data Preprocessing: Built-in support for random shuffling (`shuffle`) and quantity truncation (`limit`).
- Used in Demos to receive user input, treating it as a piece of data (`q_ls`).

**Output Format (JSON)**
```json
{
  "q_ls": ["Question 1", "Question 2"],
  "gt_ls": [["Answer A1", "Answer A2"], ["Answer B"]]
}
```

---

## Configuration

```yaml servers/benchmark/parameter.yaml icon="/images/yaml.svg"
benchmark:
  name: nq
  path: data/sample_nq_10.jsonl
  key_map:
    q_ls: question
    gt_ls: golden_answers
  shuffle: false
  seed: 42
  limit: -1
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `name` | str | Evaluation set name, used only for logging and identification (Example: `nq`) |
| `path` | str | Data file path, supports `.jsonl`, `.json`, `.parquet` |
| `key_map` | dict | Field mapping table, mapping raw fields to tool output keys |
|  | `q_ls` | str| Raw field name mapped to Question List (e.g., question column in file) |
|  | `gt_ls` | str| Raw field name mapped to Golden Answer List (e.g., golden_answers column in file) |
| `shuffle` | bool | Whether to shuffle sample order (default `false`) |
| `seed` | int | Random seed (effective when `shuffle=true`) |
| `limit` | int | Upper limit of sampled data items. Default is -1 (load all), positive integer means truncation of first N items |

================
File: pages/en/api/cli.mdx
================
---
title: "CLI"
icon: "command"
---

## help — Command Overview

```bash
ultrarag --help
```

```text
usage: ultrarag [-h] {build,run,show} ...

UltraRAG CLI

positional arguments:
  {build,run,show}
    build           Build the configuration
    run             Run the pipeline with the given configuration
    show            Show ui interfaces

options:
  -h, --help        show this help message and exit
```

---

## build — Build Configuration

**Purpose**: Automatically generate a unified parameter configuration file based on `pipeline.yaml`.

```bash
ultrarag build <CONFIG> [--log_level DEBUG|INFO|WARN|ERROR]
```

| Parameter | Description |
|---|---|
| `<CONFIG>` | Pipeline configuration file path, e.g., `examples/pipeline.yaml` |
| `--log_level` | Log level, default `INFO` |

---

## run — Execute Pipeline

**Purpose**: Start Pipeline execution.

```bash
ultrarag run <CONFIG> [--param <PARAM_FILE>] [--is_demo] [--log_level DEBUG|INFO|WARN|ERROR]
```

| Parameter | Description |
|---|---|
| `<CONFIG>` | Path to `pipeline.yaml` to be executed |
| `--param <PARAM_FILE>` | Custom parameter file path, overrides default files under `parameter/` |
| `--is_demo` | Enable demo mode |
| `--log_level` | Log level, default `INFO` |

---

## show ui — Start Web Control Interface

**Purpose**: Start UI interface.

```bash
ultrarag show ui [--host 127.0.0.1] [--port 5050] [--log_level DEBUG|INFO|WARN|ERROR]
```

| Parameter | Description |
|---|---|
| `--host` | Web service binding IP (default `127.0.0.1`) |
| `--port` | Web service port (default `5050`) |
| `--log_level` | Log level, default `INFO` |

Access `http://<host>:<port>` after startup to use the UI.

================
File: pages/en/api/corpus.mdx
================
---
title: "Corpus"
icon: "file"
---


## `build_text_corpus`

**Signature**
```python
@app.tool(output="parse_file_path,text_corpus_save_path->None")
async def build_text_corpus(parse_file_path: str, text_corpus_save_path: str) -> None
```

**Function**
- Supports .txt / .md;
- Supports .docx (reads paragraphs and tables);
- Supports .pdf / .xps / .oxps / .epub / .mobi / .fb2 (pure text extraction via pymupdf).
- Recursively processes in directory mode.

**Output Format (JSONL)**
```json
{"id": "<stem>", "title": "<stem>", "contents": "<full text>"}
```
---

## `build_image_corpus`

**Signature**
```python
@app.tool(output="parse_file_path,image_corpus_save_path->None")
async def build_image_corpus(parse_file_path: str, image_corpus_save_path: str) -> None
```

**Function**
- **Only supports PDF**: Renders each page as JPG (RGB) at 144DPI and validates file validity.
- Recursively processes in directory mode.

**Output Index (JSONL)**
```json
{"id": 0, "image_id": "paper/page_0.jpg", "image_path": "image/paper/page_0.jpg"}
```
---

## `mineru_parse`

**Signature**
```python
@app.tool(output="parse_file_path,mineru_dir,mineru_extra_params->None")
async def mineru_parse(
    parse_file_path: str, 
    mineru_dir: str, 
    mineru_extra_params: Optional[Dict[str, Any]] = None
) -> None
```

**Function**
- Calls CLI `mineru` to structurally parse PDF/directory and outputs to `mineru_dir`.

---

## `build_mineru_corpus`

**Signature**
```python
@app.tool(output="mineru_dir,parse_file_path,text_corpus_save_path,image_corpus_save_path->None")
async def build_mineru_corpus(
    mineru_dir: str, 
    parse_file_path: str, 
    text_corpus_save_path: str, 
    image_corpus_save_path: str
) -> None
```

**Function**
- Aggregates MinerU parsing artifacts into **Text Corpus JSONL** and **Image Index JSONL**.


**Output Format (JSONL)**
- Text:
```json
{"id": "<stem>", "title": "<stem>", "contents": "<markdown full text>"}
```
- Image:
```json
{"id": 0, "image_id": "paper/page_0.jpg", "image_path": "images/paper/page_0.jpg"}
```
---

## `chunk_documents`

**Signature**
```python
@app.tool(output="raw_chunk_path,chunk_backend_configs,chunk_backend,tokenizer_or_token_counter,chunk_size,chunk_path,use_title->None")
async def chunk_documents(
    raw_chunk_path: str,
    chunk_backend_configs: Dict[str, Any],
    chunk_backend: str = "token",
    tokenizer_or_token_counter: str = "character",
    chunk_size: int = 256,
    chunk_path: Optional[str] = None,
    use_title: bool = True,
) -> None
```

**Function**
- Chunks input text corpus (JSONL, containing `id/title/contents`) into paragraphs using selected backend:
- Chunk Backend: Supports `token` / `sentence` / `recursive`.
- Tokenizer: `tokenizer_or_token_counter` can be `word`, `character`, or `tiktoken` encoding name (e.g., `gpt2`).
- Chunk Size: Controls block size via `chunk_size` (overlap defaults to size/4).
- Optionally appends document title to the beginning of each block (`use_title`).

**Output Format (JSONL)**
```json
{"id": 0, "doc_id": "paper", "title": "paper", "contents": "Chunked text"}
```

---

## Configuration

```yaml servers/corpus/parameter.yaml icon="/images/yaml.svg"
# servers/corpus/parameter.yaml
parse_file_path: data/UltraRAG.pdf
text_corpus_save_path: corpora/text.jsonl
image_corpus_save_path: corpora/image.jsonl

# mineru
mineru_dir: corpora/
mineru_extra_params:
  source: modelscope

# chunking parameters
raw_chunk_path: corpora/text.jsonl
chunk_path: corpora/chunks.jsonl
use_title: false
chunk_backend: sentence # choices=["token", "sentence", "recursive"]
tokenizer_or_token_counter: character
chunk_size: 512
chunk_backend_configs:
  token:
    chunk_overlap: 50
  sentence:
    chunk_overlap: 50
    min_sentences_per_chunk: 1
    delim: "['.', '!', '?', '；', '。', '！', '？', '\\n']"
  recursive:
    min_characters_per_chunk: 12
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `parse_file_path` | str | Input file or directory path |
| `text_corpus_save_path` | str | Text corpus output path (JSONL) |
| `image_corpus_save_path` | str | Image corpus index output path (JSONL) |
| `mineru_dir` | str | MinerU output root directory |
| `mineru_extra_params` | dict | MinerU extra parameters, such as `source`, `layout`, etc. |
| `raw_chunk_path` | str | Chunking input file path (JSONL format) |
| `chunk_path` | str | Chunking output path |
| `use_title` | bool | Whether to append document title to the beginning of each chunk |
| `chunk_backend` | str | Select chunking method: `token`, `sentence`, `recursive` |
| `tokenizer_or_token_counter` | str | Tokenizer or counting method. Options: `word`, `character` or `tiktoken` model name (e.g., `gpt2`) |
| `chunk_size` | int | Target size for each chunk |
| `chunk_backend_configs` | dict | Configuration items for each chunking method (see below) |

`chunk_backend_configs` Detailed Parameters:

| Backend Type | Parameter | Description |
|---|---|---|
| **token** | `chunk_overlap` | Overlapping tokens between chunks |
| **sentence** | `chunk_overlap` | Overlapping count between chunks |
|  | `min_sentences_per_chunk` | Minimum number of sentences per chunk |
|  | `delim` | Sentence delimiter list (Python list in string format) |
| **recursive** | `min_characters_per_chunk` | Minimum character unit for recursive splitting |

================
File: pages/en/api/custom.mdx
================
---
title: "Custom"
icon: "puzzle-piece"
---

## Search-R1 Tools

### `search_r1_query_extract`
```python
@app.tool(output="ans_ls->extract_query_list")
def search_r1_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts query content from model response.
- **Logic**: Uses regex `r"<search>([^<]*)"` to extract the content inside the last `<search>` tag. If not found, returns "There is no query."; if the query does not end with `?`, it is automatically completed.

### `r1_searcher_query_extract`
```python
@app.tool(output="ans_ls->extract_query_list")
def r1_searcher_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts query from R1-Searcher response.
- **Logic**: Uses regex `r"<|begin_of_query|>([^<]*)"` to extract the last tag content.

---

## IRCoT & IterRetGen Tools

### `iterretgen_nextquery`
```python
@app.tool(output="q_ls,ret_psg->nextq_ls")
def iterretgen_nextquery(q_ls: List[str], ans_ls: List[str | Any]) -> Dict[str, List[str]]
```
- **Function**: Iterative retrieval generation.
- **Logic**: `next_query = f"{q} {ans}"`. Concatenates original question and generated answer as the Query for the next retrieval.

### `ircot_get_first_sent`
```python
@app.tool(output="ans_ls->q_ls")
def ircot_get_first_sent(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts the first sentence of the answer (up to period or question/exclamation mark).

### `ircot_extract_ans`
```python
@app.tool(output="ans_ls->pred_ls")
def ircot_extract_ans(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts the final answer.
- **Logic**: Matches content after `so the answer is [...]`.

---

## Search-o1 Tools

### `search_o1_init_list`
```python
@app.tool(output="q_ls->total_subq_list,total_reason_list,total_final_info_list")
def search_o1_init_list(q_ls: List[str]) -> Dict[str, List[Any]]
```
- **Function**: Initializes accumulation lists required by Search-o1 (sub-questions, reasoning, final info), initially filled with `<PAD>`.

### `search_o1_combine_list`
```python
@app.tool(output="total_subq_list, extract_query_list, total_reason_list, extract_reason_list->total_subq_list, total_reason_list")
def search_o1_combine_list(...)
```
- **Function**: Appends the extracted Query and Reasoning of the current step to the total lists.

### `search_o1_query_extract`
```python
@app.tool(output="ans_ls->extract_query_list")
def search_o1_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts content between `<|begin_search_query|>...<|end_search_query|>`.

### `search_o1_reasoning_extract`
```python
@app.tool(output="ans_ls->extract_reason_list")
def search_o1_reasoning_extract(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts all text before `<|begin_search_query|>` as the reasoning process.

### `search_o1_extract_final_information`
```python
@app.tool(output="ans_ls->extract_final_infor_list")
def search_o1_extract_final_information(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts content after `**Final Information**` marker.

---

## Utility Tools

### `output_extract_from_boxed`
```python
@app.tool(output="ans_ls->pred_ls")
def output_extract_from_boxed(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts answer from LaTeX `\boxed{...}`. Supports nested bracket handling and format cleaning.

### `merge_passages`
```python
@app.tool(output="temp_psg,ret_psg->ret_psg")
def merge_passages(temp_psg: List[str | Any], ret_psg: List[str | Any]) -> Dict[str, List[str | Any]]
```
- **Function**: Appends `temp_psg` list to `ret_psg` list.

### `evisrag_output_extract_from_special`
```python
@app.tool(output="ans_ls->pred_ls")
def evisrag_output_extract_from_special(ans_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Extracts answer from `<answer>...</answer>` tags.

### `assign_citation_ids` / `assign_citation_ids_stateful`
- `assign_citation_ids`: Assigns citation IDs in the form of `[1]`, `[2]` to retrieved passages.
- `assign_citation_ids_stateful`: Uses `CitationRegistry` class to maintain global citation IDs (cross-step deduplication).
- `init_citation_registry`: Resets global citation registry.

---

## SurveyCPM Tools

### `surveycpm_state_init`
```python
@app.tool(output="instruction_ls->state_ls,cursor_ls,survey_ls,step_ls,extend_time_ls,extend_result_ls,retrieved_info_ls,parsed_ls")
def surveycpm_state_init(instruction_ls: List[str]) -> Dict[str, List]
```
- **Function**: Initializes SurveyCPM state machine.
- **Initial State**: `state="search"`, `cursor="outline"`, `step=0`.

### `surveycpm_parse_search_response`
```python
@app.tool(output="response_ls,surveycpm_hard_mode->keywords_ls,parsed_ls")
def surveycpm_parse_search_response(response_ls: List[str], surveycpm_hard_mode: bool = True) -> Dict[str, List]
```
- **Function**: Parses search instructions (JSON or XML format) generated by the model, extracts keyword list.

### `surveycpm_process_passages`
```python
@app.tool(output="ret_psg_ls->retrieved_info_ls")
def surveycpm_process_passages(ret_psg_ls: List[List[List[str]]]) -> Dict[str, List[str]]
```
- **Function**: Processes retrieved passages, deduplicates, limits quantity (Top-K), and concatenates into string.

### `surveycpm_after_init_plan` / `after_write` / `after_extend`
- **Function**: Parses Agent response for different stages (initialize outline, write content, extend plan).
- **Logic**:
  - Calls `surveycpm_parse_response` to validate format and content.
  - Updates `survey_ls` (outline structure) and `cursor_ls` (current cursor position) if successful.
  - Keeps original state for retry if failed.

### `surveycpm_update_state`
```python
@app.tool(output="state_ls,cursor_ls,extend_time_ls,extend_result_ls,step_ls,parsed_ls,surveycpm_max_step,surveycpm_max_extend_step->state_ls,extend_time_ls,step_ls")
def surveycpm_update_state(...)
```
- **Function**: Core state machine logic.
- **State Transition**:
  - `search` -> `analyst-init_plan` (cursor="outline")
  - `search` -> `write` (cursor=section-X)
  - `write` -> `search` (continue writing) or `analyst-extend_plan` (finished current section)
  - `analyst-extend_plan` -> `search` (extend success) or `done` (no extension)
  - Exceeds max steps -> `done`

### `surveycpm_format_output`
```python
@app.tool(output="survey_ls,instruction_ls->ans_ls")
def surveycpm_format_output(survey_ls: List[str], instruction_ls: List[str]) -> Dict[str, List[str]]
```
- **Function**: Converts final Survey JSON to Markdown format.
- **Processing**: Automatically handles heading levels (# ## ###), citation formatting (`\cite{...}` to `[1]`), and text cleaning.

---

## Configuration

```yaml servers/custom/parameter.yaml icon="/images/yaml.svg"
surveycpm_hard_mode: false
surveycpm_max_step: 140
surveycpm_max_extend_step: 12
```

| Parameter | Type | Description |
|---|---|---|
| `surveycpm_hard_mode` | bool | Whether to enable SurveyCPM's strict parsing mode (validate JSON field integrity) |
| `surveycpm_max_step` | int | Maximum total execution steps, forced end if exceeded |
| `surveycpm_max_extend_step` | int | Maximum plan extension times |

================
File: pages/en/api/evaluation.mdx
================
---
title: "Evaluation"
icon: "clipboard-check"
---

## `evaluate`

**Signature**
```python
@app.tool(output="pred_ls,gt_ls,metrics,save_path->eval_res")
def evaluate(
    pred_ls: List[str],
    gt_ls: List[List[str]],
    metrics: List[str] | None,
    save_path: str,
) -> Dict[str, Any]
```

**Function**
- Executes automatic metric evaluation for QA/Generation tasks.
- Supported Metrics: `acc`, `em`, `coverem`, `stringem`, `f1`, `rouge-1`, `rouge-2`, `rouge-l`.
- Results are automatically saved as `.json` file and printed as Markdown table.


---

## `evaluate_trec`

**Signature**
```python
@app.tool(output="run_path,qrels_path,ir_metrics,ks,save_path->eval_res")
def evaluate_trec(
    run_path: str,
    qrels_path: str,
    metrics: List[str] | None,
    ks: List[int] | None,
    save_path: str,
)
```

**Function**
- Performs IR retrieval metric evaluation based on `pytrec_eval`.
- Reads standard TREC format:
  - **qrels**: `<qid> <iter> <docid> <rel>`
  - **run**: `<qid> Q0 <docid> <rank> <score> <tag>`
- Supported Metrics: `mrr`, `map`, `recall@k`, `precision@k`, `ndcg@k`.
- Automatically aggregates statistics and outputs as table.

---

## `evaluate_trec_pvalue`

**Signature**
```python
@app.tool(
    output="run_new_path,run_old_path,qrels_path,ir_metrics,ks,n_resamples,save_path->eval_res"
)
def evaluate_trec_pvalue(
    run_new_path: str,
    run_old_path: str,
    qrels_path: str,
    metrics: List[str] | None,
    ks: List[int] | None,
    n_resamples: int | None,
    save_path: str,
)
```

**Function**
- Compares significance of two TREC result files using **Two-sided Permutation Test** to calculate p-value.
- Default resampling count `n_resamples=10000`.
- Outputs mean, difference, p-value, and significance flag.


---

## Configuration

```yaml servers/evaluation/parameter.yaml icon="/images/yaml.svg"
save_path: output/evaluate_results.json

# QA task
metrics: [ 'acc', 'f1', 'em', 'coverem', 'stringem', 'rouge-1', 'rouge-2', 'rouge-l' ]

# Retrieval task
qrels_path: data/qrels.txt
run_path: data/run_a.txt
ks: [ 1, 5, 10, 20, 50, 100 ]
ir_metrics: [ "mrr", "map", "recall", "ndcg", "precision" ]

# significant
run_new_path: data/run_a.txt
run_old_path: data/run_b.txt
n_resamples: 10000
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `save_path` | str | Evaluation result save path (automatically appends timestamp) |
| `metrics` | list[str] | Metric set used for QA / Generation tasks |
| `qrels_path` | str | TREC format ground truth file path |
| `run_path` | str | Result file for retrieval task |
| `ks` | list[int] | Truncation levels for calculating NDCG@K, P@K, Recall@K, etc. |
| `ir_metrics` | list[str] | Retrieval task metric names, supports `mrr`, `map`, `recall`, `ndcg`, `precision` |
| `run_new_path` | str | Run file path generated by new model (significance analysis) |
| `run_old_path` | str | Run file path of old model (significance analysis) |
| `n_resamples` | int | Resampling count for Permutation Test |

================
File: pages/en/api/generation.mdx
================
---
title: "Generation"
icon: "pen-nib"
---

## `generation_init`

**Signature**
```python
def generation_init(
    backend_configs: Dict[str, Any],
    sampling_params: Dict[str, Any],
    extra_params: Optional[Dict[str, Any]] = None,
    backend: str = "vllm",
) -> None
```

**Function**
- Initializes inference backend and sampling parameters.
- Supports `vllm`, `openai`, `hf` backends.
- `extra_params` can be used to pass `chat_template_kwargs` or other backend-specific parameters.

---

## `generate`

**Signature**
```python
async def generate(
    prompt_ls: List[Union[str, Dict[str, Any]]],
    system_prompt: str = "",
) -> Dict[str, List[str]]
```

**Function**
- Plain text conversation generation.
- Automatically handles Prompt in list, supports string or OpenAI format dictionary.

**Output Format (JSON)**
```json
{"ans_ls": ["answer for prompt_0", "answer for prompt_1", "..."]}
```

---

## `multimodal_generate`

**Signature**
```python
async def multimodal_generate(
    multimodal_path: List[List[str]],
    prompt_ls: List[Union[str, Dict[str, Any]]],
    system_prompt: str = "",
    image_tag: Optional[str] = None,
) -> Dict[str, List[str]]
```

**Function**
- Text-image multimodal conversation generation.
- `multimodal_path`: List of image paths corresponding to each Prompt (supports local path or URL).
- `image_tag`: If specified (e.g., `<img>`), inserts image at that tag's position in Prompt; otherwise defaults to appending to end of Prompt.

**Output Format (JSON)**
```json
{"ans_ls": ["answer with images for prompt_0", "..."]}
```

---

## `multiturn_generate`

**Signature**
```python
async def multiturn_generate(
    messages: List[Dict[str, str]],
    system_prompt: str = "",
) -> Dict[str, List[str]]
```

**Function**
- Multi-turn conversation generation.
- Supports only single-call generation, does not handle batch Prompts.

**Output Format (JSON)**
```json
{"ans_ls": ["assistant response"]}
```

---

## `vllm_shutdown`

**Signature**
```python
def vllm_shutdown() -> None
```

**Function**
- Explicitly shuts down vLLM engine and releases VRAM resources.
- Valid only when using `vllm` backend.

---

## Configuration

```yaml servers/generation/parameter.yaml icon="/images/yaml.svg"
# servers/generation/parameter.yaml
backend: vllm # options: vllm, openai
backend_configs:
  vllm:
    model_name_or_path: openbmb/MiniCPM4-8B
    gpu_ids: "2,3"
    gpu_memory_utilization: 0.9
    dtype: auto
    trust_remote_code: true
  openai:
    model_name: MiniCPM4-8B
    base_url: http://localhost:8000/v1
    api_key: "abc"
    concurrency: 8
    retries: 3
    base_delay: 1.0
  hf:
    model_name_or_path: openbmb/MiniCPM4-8B
    gpu_ids: '2,3'
    trust_remote_code: true
    batch_size: 8
sampling_params:
  temperature: 0.7
  top_p: 0.8
  max_tokens: 2048
extra_params:
  chat_template_kwargs:
    enable_thinking: false
system_prompt: ""
image_tag: null
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `backend` | str | Specify generation backend, options `vllm`, `openai`, or `hf` (Transformers) |
| `backend_configs` | dict | Model and runtime environment configuration for each backend |
| `sampling_params` | dict | Sampling parameters to control generation diversity and length |
| `extra_params` | dict | Extra parameters, e.g., `chat_template_kwargs` |
| `system_prompt` | str | Global system prompt, added to context as `system` message |
| `image_tag` | str | Image placeholder tag (if needed) |

`backend_configs` Detailed Description:

| Backend | Parameter | Description |
|---|---|---|
| **vllm** | `model_name_or_path` | Model name or path |
|  | `gpu_ids` | GPU IDs used (e.g., `"0,1"`) |
|  | `gpu_memory_utilization` | GPU memory utilization ratio (0–1) |
|  | `dtype` | Data type (e.g., `auto`, `bfloat16`) |
|  | `trust_remote_code` | Whether to trust remote code |
| **openai** | `model_name` | OpenAI model name or self-hosted compatible model |
|  | `base_url` | API base URL |
|  | `api_key` | API Key |
|  | `concurrency` | Max concurrent requests |
|  | `retries` | API retry count |
|  | `base_delay` | Base wait time for each retry (seconds) |
| **hf** | `model_name_or_path` | Transformers model path |
|  | `gpu_ids` | GPU IDs (same as above) |
|  | `trust_remote_code` | Whether to trust remote code |
|  | `batch_size` | Batch size per inference |

`sampling_params` Detailed Description:

| Parameter | Type | Description |
|---|---|---|
| `temperature` | float | Controls randomness, higher means more diverse generation |
| `top_p` | float | Nucleus sampling threshold |
| `max_tokens` | int | Max generated tokens |

================
File: pages/en/api/prompt.mdx
================
---
title: "Prompt"
icon: "terminal"
---

## QA Prompts

### `qa_boxed`

**Signature**

```python
@app.prompt(output="q_ls,template->prompt_ls")
def qa_boxed(
    q_ls: List[str], 
    template: str | Path
) -> List[PromptMessage]
```

**Function**

Basic Q&A Prompt.

Loads specified Jinja2 template, renders each question in the question list into a Prompt.

Template Variable: `{{ question }}`

### `qa_boxed_multiple_choice`

**Signature**

```python
@app.prompt(output="q_ls,choices_ls,template->prompt_ls")
def qa_boxed_multiple_choice(
    q_ls: List[str],
    choices_ls: List[List[str]],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Multiple Choice Q&A Prompt.

Automatically formats choice list into "A: ..., B: ..." form and injects into template.

Template Variables: `{{ question }}`, `{{ choices }}`

### `qa_rag_boxed`

**Signature**

```python
@app.prompt(output="q_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed(
    q_ls: List[str], 
    ret_psg: List[str | Any], 
    template: str | Path
) -> list[PromptMessage]
```

**Function**

Standard RAG Prompt.

Concatenates retrieved passage lists and injects into template.

Template Variables: `{{ question }}`, `{{ documents }}`

### `qa_rag_boxed_multiple_choice`

**Signature**

```python
@app.prompt(output="q_ls,choices_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed_multiple_choice(
    q_ls: List[str],
    choices_ls: List[List[str]],
    ret_psg: List[List[str]],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Multiple Choice Q&A Prompt with Retrieval Context.

Template Variables: `{{ question }}`, `{{ documents }}`, `{{ choices }}`

---

## RankCoT Prompts

### `RankCoT_kr`

**Signature**

```python
@app.prompt(output="q_ls,ret_psg,kr_template->prompt_ls")
def RankCoT_kr(
    q_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> list[PromptMessage]
```

**Function**

RankCoT Phase 1: Knowledge Retrieval Prompt.

Template Variables: `{{ question }}`, `{{ documents }}`

### `RankCoT_qa`

**Signature**

```python
@app.prompt(output="q_ls,kr_ls,qa_template->prompt_ls")
def RankCoT_qa(
    q_ls: List[str],
    kr_ls: List[str],
    template: str | Path,
) -> list[PromptMessage]
```

**Function**

RankCoT Phase 2: Chain-of-Thought based Q&A Prompt.

Template Variables: `{{ question }}`, `{{ CoT }}` (Here CoT is usually knowledge generated in the previous phase)

---

## IRCoT Prompts

### `ircot_next_prompt`

**Signature**

```python
@app.prompt(output="memory_q_ls,memory_ret_psg,template->prompt_ls")
def ircot_next_prompt(
    memory_q_ls: List[List[str | None]],
    memory_ret_psg: List[List[List[str]] | None],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

IRCoT (Interleaved Retrieval CoT) Iterative Prompt Generation.

Constructs next round's Prompt based on historical retrieval results and chain of thought. Supports single-turn and multi-turn history concatenation.

Template Variables: `{{ documents }}`, `{{ question }}`, `{{ cur_answer }}`

---

## WebNote Prompts

### `webnote_init_page`

**Signature**

```python
@app.prompt(output="q_ls,plan_ls,webnote_init_page_template->prompt_ls")
def webnote_init_page(
    q_ls: List[str],
    plan_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

WebNote Agent: Initialize note page.

Template Variables: `{{ question }}`, `{{ plan }}`

### `webnote_gen_plan`

**Signature**

```python
@app.prompt(output="q_ls,webnote_gen_plan_template->prompt_ls")
def webnote_gen_plan(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

WebNote Agent: Generate search plan.

Template Variable: `{{ question }}`

### `webnote_gen_subq`

**Signature**

```python
@app.prompt(output="q_ls,plan_ls,page_ls,webnote_gen_subq_template->prompt_ls")
def webnote_gen_subq(
    q_ls: List[str],
    plan_ls: List[str],
    page_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

WebNote Agent: Generate sub-questions.

Template Variables: `{{ question }}`, `{{ plan }}`, `{{ page }}`

### `webnote_fill_page`

**Signature**

```python
@app.prompt(output="q_ls,plan_ls,page_ls,subq_ls,psg_ls,webnote_fill_page_template->prompt_ls")
def webnote_fill_page(
    q_ls: List[str],
    plan_ls: List[str],
    page_ls: List[str],
    subq_ls: List[str],
    psg_ls: List[Any],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

WebNote Agent: Fill notes based on retrieval results.

Template Variables: `{{ question }}`, `{{ plan }}`, `{{ sub_question }}`, `{{ docs_text }}`, `{{ page }}`

### `webnote_gen_answer`

**Signature**

```python
@app.prompt(output="q_ls,page_ls,webnote_gen_answer_template->prompt_ls")
def webnote_gen_answer(
    q_ls: List[str],
    page_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

WebNote Agent: Generate final answer based on notes.

Template Variables: `{{ question }}`, `{{ page }}`

---

## Search-R1 & R1-Searcher

### `search_r1_gen`

**Signature**

```python
@app.prompt(output="prompt_ls,ans_ls,ret_psg,search_r1_gen_template->prompt_ls")
def search_r1_gen(
    prompt_ls: List[PromptMessage],
    ans_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Generation Prompt suitable for R1 style.

Truncates Top-3 retrieval passages and injects into context.

Template Variables: `{{ history }}`, `{{ answer }}`, `{{ passages }}`

### `r1_searcher_gen`

**Signature**

```python
@app.prompt(output="prompt_ls,ans_ls,ret_psg,r1_searcher_gen_template->prompt_ls")
def r1_searcher_gen(
    prompt_ls: List[PromptMessage],
    ans_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Generation Prompt suitable for R1 Searcher.

Truncates Top-5 retrieval passages.

Template Variables: `{{ history }}`, `{{ answer }}`, `{{ passages }}`

---

## Search-o1 Prompts

### `search_o1_init`

**Signature**

```python
@app.prompt(output="q_ls,searcho1_reasoning_template->prompt_ls")
def search_o1_init(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Search-O1 Initial Reasoning Prompt.

Template Variable: `{{ question }}`

### `search_o1_reasoning_indocument`

**Signature**

```python
@app.prompt(output="extract_query_list,ret_psg,total_reason_list,searcho1_refine_template->prompt_ls")
def search_o1_reasoning_indocument(
    extract_query_list: List[str], 
    ret_psg: List[List[str]],       
    total_reason_list: List[List[str]], 
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Search-O1 Reasoning Refinement Prompt.

Merges historical reasoning steps (first + last 3 steps) with current retrieved documents for next step reasoning.

Template Variables: `{{ prev_reasoning }}`, `{{ search_query }}`, `{{ document }}`

### `search_o1_insert`

**Signature**

```python
@app.prompt(output="q_ls,total_subq_list,total_final_info_list,searcho1_reasoning_template->prompt_ls") 
def search_o1_insert(
    q_ls: List[str],
    total_subq_list: List[List[str]], 
    total_final_info_list: List[List[str]],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Search-O1 Formatted Insertion Prompt.

Explicitly inserts `<|begin_search_query|>` and search result tags into Prompt to construct complete chain-of-thought context.

---

## EVisRAG & Multi-branch Prompts

### `gen_subq`

**Signature**

```python
@app.prompt(output="q_ls,ret_psg,gen_subq_template->prompt_ls")
def gen_subq(
    q_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]
```

**Function**

Loop/Branch Demo: Generate sub-questions based on documents.

Template Variables: `{{ question }}`, `{{ documents }}`

### `evisrag_vqa`

**Signature**

```python
@app.prompt(output="q_ls,ret_psg,evisrag_template->prompt_ls")
def evisrag_vqa(
    q_ls: List[str], 
    ret_psg: List[str | Any], 
    template: str | Path
) -> list[PromptMessage]
```

**Function**

Multimodal VQA RAG Prompt.

Automatically repeatedly inserts `<image>` Token into Prompt according to the number of retrieved images.

Template Variable: `{{ question }}` (contains automatically injected image tokens)

---

## SurveyCPM Prompts

### `surveycpm_search`

**Signature**

```python
@app.prompt(output="instruction_ls,survey_ls,cursor_ls,surveycpm_search_template->prompt_ls")
def surveycpm_search(
    instruction_ls: List[str],
    survey_ls: List[str],
    cursor_ls: List[str | None],
    surveycpm_search_template: str | Path,
) -> List[PromptMessage]
```

**Function**

Survey Agent: Decide next search content.

Parses JSON format outline, generates text description of current outline.

Template Variables: `{{ user_query }}`, `{{ current_outline }}`, `{{ current_instruction }}`

### `surveycpm_init_plan`

**Signature**

```python
@app.prompt(output="instruction_ls,retrieved_info_ls,surveycpm_init_plan_template->prompt_ls")
def surveycpm_init_plan(
    instruction_ls: List[str],
    retrieved_info_ls: List[str],
    surveycpm_init_plan_template: str | Path,
) -> List[PromptMessage]
```

**Function**

Survey Agent: Initialize outline plan.

Template Variables: `{{ user_query }}`, `{{ current_information }}`

### `surveycpm_write`

**Signature**

```python
@app.prompt(output="instruction_ls,survey_ls,cursor_ls,retrieved_info_ls,surveycpm_write_template->prompt_ls")
def surveycpm_write(
    instruction_ls: List[str],
    survey_ls: List[str],
    cursor_ls: List[str | None],
    retrieved_info_ls: List[str],
    surveycpm_write_template: str | Path,
) -> List[PromptMessage]
```

**Function**

Survey Agent: Write specific section content.

Template Variables: `{{ user_query }}`, `{{ current_survey }}`, `{{ current_instruction }}`, `{{ current_information }}`

### `surveycpm_extend_plan`

**Signature**

```python
@app.prompt(output="instruction_ls,survey_ls,surveycpm_extend_plan_template->prompt_ls")
def surveycpm_extend_plan(
    instruction_ls: List[str],
    survey_ls: List[str],
    surveycpm_extend_plan_template: str | Path,
) -> List[PromptMessage]
```

**Function**

Survey Agent: Extend or modify outline plan.

Template Variables: `{{ user_query }}`, `{{ current_survey }}`

---

## Configuration

```yaml servers/prompt/parameter.yaml icon="/images/yaml.svg"
# QA
template: prompt/qa_boxed.jinja

# RankCoT
kr_template: prompt/RankCoT_knowledge_refinement.jinja
qa_template: prompt/RankCoT_question_answering.jinja

# Search-R1
search_r1_gen_template: prompt/search_r1_append.jinja

# R1-Searcher
r1_searcher_gen_template: prompt/r1_searcher_append.jinja

# Search-o1
searcho1_reasoning_template: prompt/search_o1_reasoning.jinja
searcho1_refine_template: prompt/search_o1_refinement.jinja


# For other prompts, please add parameters here as needed

# Take webnote as an example:
webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
webnote_init_page_template: prompt/webnote_init_page.jinja
webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
webnote_fill_page_template: prompt/webnote_fill_page.jinja
webnote_gen_answer_template: prompt/webnote_gen_answer.jinja

# SurveyCPM
surveycpm_search_template: prompt/surveycpm_search.jinja
surveycpm_init_plan_template: prompt/surveycpm_init_plan.jinja
surveycpm_write_template: prompt/surveycpm_write.jinja
surveycpm_extend_plan_template: prompt/surveycpm_extend_plan.jinja
```

| Parameter | Description |
|---|---|
| `template` | Basic QA template path |
| `kr_template` | RankCoT knowledge refinement template path |
| `qa_template` | RankCoT Q&A template path |
| `*_template` | Jinja2 template file path corresponding to each module function |

================
File: pages/en/api/reranker.mdx
================
---
title: "Reranker"
icon: "ranking-star"
---

## `reranker_init`

**Signature**
```python
async def reranker_init(
    model_name_or_path: str,
    backend_configs: Dict[str, Any],
    batch_size: int,
    gpu_ids: Optional[object] = None,
    backend: str = "infinity",
) -> None
```

**Function**
- Initializes reranker backend and model.

---

## `reranker_rerank`

**Signature**
```python
async def reranker_rerank(
    query_list: List[str],
    passages_list: List[List[str]],
    top_k: int = 5,
    query_instruction: str = "",
) -> Dict[str, List[Any]]
```

**Function**
- **Reranks** candidate passages:

**Output Format (JSON)**
```json
{
  "rerank_psg": [
    ["best passage for q0", "..."],
    ["best passage for q1", "..."]
  ]
}
```

---

## Configuration

```yaml servers/reranker/parameter.yaml icon="/images/yaml.svg"
model_name_or_path: openbmb/MiniCPM-Reranker-Light
backend: sentence_transformers # options: infinity, sentence_transformers, openai
backend_configs:
  infinity:
    bettertransformer: false
    pooling_method: auto
    device: cuda
    model_warmup: false
    trust_remote_code: true
  sentence_transformers:
    device: cuda
    trust_remote_code: true
  openai:
    model_name: text-embedding-3-small
    base_url: "https://api.openai.com/v1"
    api_key: ""

gpu_ids: 0
top_k: 5
batch_size: 16
query_instruction: ""
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `model_name_or_path` | str | Model path or name (local or HuggingFace repo) |
| `backend` | str | Select backend type: `infinity`, `sentence_transformers` or `openai` |
| `backend_configs` | dict | Exclusive parameter settings for each backend |
| `gpu_ids` | str/int | Specify GPU ID (can be multi-card, e.g., `"0,1"`) |
| `top_k` | int | Number of reranked results returned |
| `batch_size` | int | Sample quantity per batch |
| `query_instruction` | str | Query prefix hint, used for prompt engineering or query modification |

`backend_configs` Detailed Description:

| Backend | Parameter | Description |
|---|---|---|
| **infinity** | `device` | Device type (cuda / cpu) |
|  | `bettertransformer` | Whether to enable accelerated inference |
|  | `pooling_method` | Vector pooling strategy |
|  | `model_warmup` | Whether to warmup model |
|  | `trust_remote_code` | Whether to trust remote code (Required for HuggingFace models) |
| **sentence_transformers** | `device` | Device type (cuda / cpu) |
|  | `trust_remote_code` | Whether to trust remote code |
| **openai** | `model_name` | API Model name |
|  | `base_url` | API access address |
|  | `api_key` | OpenAI API Key |

================
File: pages/en/api/retriever.mdx
================
---
title: "Retriever"
icon: "magnifying-glass"
---

## `retriever_init`

**Signature**
```python
async def retriever_init(
    model_name_or_path: str,
    backend_configs: Dict[str, Any],
    batch_size: int,
    corpus_path: str,
    gpu_ids: Optional[object] = None,
    is_multimodal: bool = False,
    backend: str = "sentence_transformers",
    index_backend: str = "faiss",
    index_backend_configs: Optional[Dict[str, Any]] = None,
    is_demo: bool = False,
    collection_name: str = "",
) -> None
```

**Function**
- Initializes retrieval service.
- Embedding Backend (backend): Responsible for converting text/images into vectors (Infinity, SentenceTransformers, OpenAI, BM25).
- Index Backend (index_backend): Responsible for vector storage and retrieval (FAISS, Milvus).
- Demo Mode: If is_demo=True, forces OpenAI + Milvus configuration, ignoring some parameters.

---

## `retriever_embed`

**Signature**
```python
async def retriever_embed(
    embedding_path: Optional[str] = None,
    overwrite: bool = False,
    is_multimodal: bool = False,
) -> None
```

**Function**
- (Non-Demo Mode) Batches calculation of vector representations of corpus and saves as .npy file.
- Only applies to Dense Retriever backend (BM25 not supported).


---

## `retriever_index`

**Signature**
```python
async def retriever_index(
    embedding_path: str,
    overwrite: bool = False,
    collection_name: str = "",
    corpus_path: str = ""
) -> None
```

**Function**
- Builds retrieval index.
- FAISS: Reads embedding_path (.npy) to build local index file.
- Milvus / Demo: Reads corpus_path (.jsonl), generates vectors and inserts into specified collection_name.

---

## `retriever_search`

**Signature**
```python
async def retriever_search(
    query_list: List[str],
    top_k: int = 5,
    query_instruction: str = "",
    collection_name: str = "",
) -> Dict[str, List[List[str]]]
```

**Function**
- Retrieves single or multiple queries.
- Automatically handles query vectorization (adds query_instruction) and finds Top-K in specified collection_name (for Milvus) or default index.

**Output Format (JSON)**
```json
{"ret_psg": [["passage 1", "passage 2"], ["..." ]]} 
```

---

## `retriever_batch_search`

**Signature**
```python
async def retriever_batch_search(
    batch_query_list: List[List[str]],
    top_k: int = 5,
    query_instruction: str = "",
    collection_name: str = "",
) -> Dict[str, List[List[List[str]]]]
```

**Function**
- Batch version of retriever_search, accepts nested list input.

**Output Format (JSON)**
```json
{"ret_psg_ls": [[["psg 1-1"], ["psg 1-2"]], [["psg 2-1"]]]}
```

---

## `bm25_index`

**Signature**
```python
async def bm25_index(
    overwrite: bool = False,
) -> None
```

**Function**
- When `backend="bm25"`, builds BM25 sparse index and saves it.

---

## `bm25_search`

**Signature**
```python
async def bm25_search(
    query_list: List[str],
    top_k: int = 5,
) -> Dict[str, List[List[str]]]
```

**Function**
- Keyword retrieval based on BM25 algorithm.

**Output Format (JSON)**
```json
{"ret_psg": [["passage 1", "passage 2"], ["..." ]]} 
```

---

## `retriever_deploy_search`

**Signature**
```python
async def retriever_deploy_search(
    retriever_url: str,
    query_list: List[str],
    top_k: int = 5,
    query_instruction: str = "",
) -> Dict[str, List[List[str]]]
```

**Function**
- As a client, calls remote retrieval service deployed at retriever_url for query.

**Output Format (JSON)**
```json
{"ret_psg": [["passage 1", "passage 2"], ["..." ]]} 
```

---

## `retriever_websearch`

**Signature**
```python
async def retriever_websearch(
    query_list: List[str],
    top_k: Optional[int] | None = 5,
    retrieve_thread_num: Optional[int] | None = 1,
    websearch_backend: str = "tavily",
    websearch_backend_configs: Optional[Dict[str, Any]] | None = None,
) -> Dict[str, List[List[str]]]
```

**Function**
- Unified Web retrieval tool. Choose backend via `websearch_backend` (`tavily`, `exa`, `zhipuai`) and configure in `websearch_backend_configs`.

**Output Format (JSON)**
```json
{"ret_psg": [["snippet 1", "snippet 2"], ["..." ]]} 
```

---

## `retriever_batch_websearch`

**Signature**
```python
async def retriever_batch_websearch(
    batch_query_list: List[List[str]],
    top_k: Optional[int] | None = 5,
    retrieve_thread_num: Optional[int] | None = 1,
    websearch_backend: str = "tavily",
    websearch_backend_configs: Optional[Dict[str, Any]] | None = None,
) -> Dict[str, List[List[List[str]]]]
```

**Function**
- Batch Web retrieval for SurveyCPM-style pipelines.

**Output Format (JSON)**
```json
{"ret_psg_ls": [[["snippet 1", "snippet 2"]], ["..." ]]} 
```


---

## Configuration

```yaml servers/retriever/parameter.yaml icon="/images/yaml.svg"
model_name_or_path: openbmb/MiniCPM-Embedding-Light
corpus_path: data/corpus_example.jsonl
embedding_path: embedding/embedding.npy
collection_name: wiki

# Embedding Backend Configuration
backend: sentence_transformers # options: infinity, sentence_transformers, openai, bm25
backend_configs:
  infinity:
    bettertransformer: false
    pooling_method: auto
    model_warmup: false
    trust_remote_code: true
  sentence_transformers:
    trust_remote_code: true
    sentence_transformers_encode:
      normalize_embeddings: false
      encode_chunk_size: 256
      q_prompt_name: query
      psg_prompt_name: document
      psg_task: null
      q_task: null
  openai:
    model_name: text-embedding-3-small
    base_url: "https://api.openai.com/v1"
    api_key: "abc"
  bm25:
    lang: en
    save_path: index/bm25

# Index Backend Configuration
index_backend: faiss # options: faiss, milvus
index_backend_configs:
  faiss:
    index_use_gpu: True
    index_chunk_size: 10000
    index_path: index/index.index
  milvus:
    uri: index/milvus_demo.db # Local file for Lite, or http://host:port
    token: null
    id_field_name: id
    vector_field_name: vector
    text_field_name: contents
    index_params:
      index_type: AUTOINDEX
      metric_type: IP

# Websearch Backend Configuration
websearch_backend: tavily # options: tavily, exa, zhipuai
websearch_backend_configs:
  exa:
    api_key: ""
    retries: 3
    base_delay: 1.0
    search_kwargs: {}
  tavily:
    api_key: ""
    retries: 3
    base_delay: 1.0
    search_kwargs: {}
  zhipuai:
    api_key: ""
    base_url: "https://open.bigmodel.cn/api/paas/v4/web_search"
    search_engine: "search_std"
    search_intent: false
    search_recency_filter: "noLimit"
    content_size: "medium"
    retries: 3
    base_delay: 1.0
    search_kwargs: {}

batch_size: 16
top_k: 5
gpu_ids: "1"
query_instruction: ""
is_multimodal: false
overwrite: false
retrieve_thread_num: 1
retriever_url: "http://127.0.0.1:64501"
is_demo: false
```

Parameter Description:

| Parameter | Type | Description |
|---|---|---|
| `model_name_or_path` | str | Retrieval model path or name (e.g., HuggingFace model ID) |
| `corpus_path` | str | Input corpus JSONL file path |
| `embedding_path` | str | Vector file save path (`.npy`) |
| `collection_name` | str | Milvus collection name |
| `backend` | str | Select retrieval backend: `infinity`, `sentence_transformers`, `openai`, `bm25` |
| `index_backend` | str | Index backend: `faiss`, `milvus` |
| `backend_configs` | dict | Parameter configuration for each backend (see table below) |
| `index_backend_configs` | dict | Parameter configuration for each index backend (see table below) |
| `websearch_backend` | str | Web search backend: `tavily`, `exa`, `zhipuai` |
| `websearch_backend_configs` | dict | Web search backend configs (api_key, retries, payload, etc.) |
| `batch_size` | int | Batch size for vector generation or retrieval |
| `top_k` | int | Number of returned candidate passages |
| `gpu_ids` | str | Specify visible GPU devices, e.g., `"0,1"` |
| `query_instruction` | str | Query prefix (used by instruction-tuning models) |
| `is_multimodal` | bool | Whether to enable multimodal embedding (e.g., image) |
| `overwrite` | bool | Whether to overwrite if embedding or index file already exists |
| `retrieve_thread_num` | int | Concurrent thread number for Web retrieval |
| `retriever_url` | str | URL of deployed retriever server |
| `is_demo` | bool | Demo mode switch (forces OpenAI+Milvus, simplified configuration) |

`backend_configs` Sub-items:

| Backend | Parameter | Type | Description |
|-------|---|---|---|
| **infinity** | `bettertransformer` | bool| Whether to enable efficient inference optimization |
|  | `pooling_method` | str| Pooling method (e.g., `auto`, `mean`) |
|  | `model_warmup` |bool | Whether to preload model into VRAM |
|  | `trust_remote_code` |bool | Whether to trust remote code (Applicable to custom models) |
| **sentence_transformers** | `trust_remote_code` | bool| Whether to trust remote model code |
|  | `sentence_transformers_encode` |dict | Encoding detailed parameters, see table below |
| **openai** | `model_name` | str| OpenAI model name (e.g., `text-embedding-3-small`) |
|  | `base_url` | str| API base address |
|  | `api_key` | str| OpenAI API Key |
| **bm25** | `lang` | str| Language (determines stop words and tokenizer) |
|  | `save_path` | str| Save directory for BM25 sparse index |

`sentence_transformers_encode` Parameters:

| Parameter | Type | Description |
|---|---|---|
| `normalize_embeddings` | bool | Whether to normalize vectors |
| `encode_chunk_size` | int | Encoding chunk size (avoid VRAM overflow) |
| `q_prompt_name` | str | Query template name |
| `psg_prompt_name` | str | Passage template name |
| `q_task`| str | Task description (for cases where specific models need to specify Task) |
| `psg_task`| str | Task description (for cases where specific models need to specify Task) |


`index_backend_configs` Parameters:

| Backend | Parameter | Type | Description |
|-------|---|---|---|
|faiss	|index_use_gpu|	bool |Whether to use GPU for building and retrieving index|
| |index_chunk_size	|int| Batch size when building index|
| |index_path	|str| Save path for FAISS index file (.index)|
|milvus|	uri|	str| Milvus connection address (Local file path enables Milvus Lite)|
||token	|str |Auth Token (if needed)|
||id_field_name	|str |Primary key field name (default id)|
||vector_field_name|	str|Vector field name (default vector)|
||text_field_name|	str |Text content field name (default contents)|
||id_max_length	|int| Maximum length of string primary key|
||text_max_length|	int |Maximum length of text field (truncated if exceeded)|
||metric_type|	str |Distance metric type (e.g., IP inner product, L2 Euclidean distance)|
||index_params|	Dict| Index construction parameters (e.g., index_type: AUTOINDEX)|
||search_params	|Dict |Retrieval parameters (e.g., nprobe etc.)|
||index_chunk_size|	int| Batch size when inserting data|

================
File: pages/en/api/router.mdx
================
---
title: "Router"
icon: "code-branch"
---

## `route1` / `route2`

**Signature**
```python
@app.tool(output="query_list")
def route1(query_list: List[str]) -> Dict[str, List[Dict[str, str]]]
def route2(query_list: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- Basic routing examples.
- `route1`: If query content is "1", state set to "state1", otherwise "state2".
- `route2`: Forces state to "state2".

---

## `ircot_check_end`

**Signature**
```python
@app.tool(output="ans_ls->ans_ls")
def ircot_check_end(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- IRCoT process check.
- Checks if answer contains `"so the answer is"` (case insensitive).
- If contained, marks state as `"complete"`, otherwise `"incomplete"`.

---

## `search_r1_check`

**Signature**
```python
@app.tool(output="ans_ls->ans_ls")
def search_r1_check(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- Checks if Search-R1 generation has ended.
- Criteria: Text contains `<|endoftext|>` or `<|im_end|>`.
- If condition met, marks as `"complete"`, otherwise `"incomplete"`.

---

## `webnote_check_page`

**Signature**
```python
@app.tool(output="page_ls->page_ls")
def webnote_check_page(page_ls: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- WebNote process check.
- If page content contains `"to be filled"` (case insensitive), marks as `"incomplete"`, otherwise `"complete"`.

---

## `r1_searcher_check`

**Signature**
```python
@app.tool(output="ans_ls->ans_ls")
def r1_searcher_check(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- Checks if R1-Searcher generation has ended.
- Criteria: Text contains `<|endoftext|>`, `<|im_end|>` or `</answer>`.
- If condition met, marks as `"complete"`, otherwise `"incomplete"`.

---

## `search_o1_check`

**Signature**
```python
@app.tool(output="ans_ls,q_ls,total_subq_list,total_reason_list,total_final_info_list->ans_ls,q_ls,total_subq_list,total_reason_list,total_final_info_list")
def search_o1_check(
    ans_ls: List[str],
    q_ls: List[str],
    total_subq_list: List[List[Any]],
    total_reason_list: List[List[Any]],
    total_final_info_list: List[List[Any]],
) -> Dict[str, List[Dict[str, Any]]]
```

**Function**
- Search-o1 process state check.
- Checks special markers in answer:
  - If contains `<|end_search_query|>`: State set to `"retrieve"` (continue retrieval).
  - If contains `<|im_end|>` or other cases: State set to `"stop"` (stop retrieval, output answer).
- Synchronously updates state for all associated lists (`q_ls`, `subq`, `reason`, `info`).

---

## `check_model_state`

**Signature**
```python
@app.tool(output="ans_ls->ans_ls")
def check_model_state(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]
```

**Function**
- General model state check.
- If answer contains `<search>` tag, marks state as `"continue"`, otherwise `"stop"`.

---

## `surveycpm_state_router`

**Signature**
```python
@app.tool(output="state_ls,cursor_ls,survey_ls,step_ls,extend_time_ls,extend_result_ls->state_ls,cursor_ls,survey_ls,step_ls,extend_time_ls,extend_result_ls")
def surveycpm_state_router(
    state_ls: List[str],
    cursor_ls: List[str | None],
    survey_ls: List[str],
    step_ls: List[int],
    extend_time_ls: List[int],
    extend_result_ls: List[str],
) -> Dict[str, List[Dict[str, Any]]]
```

**Function**
- SurveyCPM dedicated router.
- This is a Pass-through tool that packages all input list elements (state, cursor, outline, etc.) into a dictionary with a `"state"` field.
- Purpose: To enable UltraRAG framework to automatically dispatch data to corresponding Pipeline branches based on the `state` field.

================
File: pages/en/demo/deepresearch.mdx
================
---
title: "DeepResearch"
icon: "flask"
---

DeepResearch is UltraRAG's most powerful research reasoning process to date, designed specifically for generating academic-level in-depth reviews of over 10,000 words. It autonomously completes search, plan formulation, content writing, and outline expansion through complex state machine scheduling.

<Tip>To obtain the best generation results, we strongly recommend downloading and using the supporting proprietary model SurveyCPM.</Tip>

## 1. Pipeline Structure Overview

The DeepResearch Pipeline adopts highly flexible state routing logic (Router), containing cycles of multiple core stages.

```yaml examples/DeepResearch.yaml icon="/images/yaml.svg"
# DeepResearch Demo for UltraRAG UI

# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data:
    output:
      q_ls: instruction_ls
- retriever.retriever_init
- generation.generation_init
- custom.surveycpm_init_citation_registry
- custom.surveycpm_state_init
- loop:
    times: 140
    steps:
    - branch:
        router:
        - router.surveycpm_state_router
        branches:
          search:
          - prompt.surveycpm_search:
              output:
                prompt_ls: search_prompt_ls          
          - generation.generate:
              input:
                prompt_ls: search_prompt_ls
              output:
                ans_ls: search_response_ls
          - custom.surveycpm_parse_search_response:
              input:
                response_ls: search_response_ls       
          - retriever.retriever_batch_search:
              input:
                batch_query_list: keywords_ls
          - custom.surveycpm_process_passages_with_citation
          - custom.surveycpm_update_state
          analyst-init_plan:
          - prompt.surveycpm_init_plan:
              output:
                prompt_ls: init_plan_prompt_ls
          - generation.generate:
              input:
                prompt_ls: init_plan_prompt_ls
              output:
                ans_ls: init_plan_response_ls
          - custom.surveycpm_after_init_plan:
              input:
                response_ls: init_plan_response_ls
          - custom.surveycpm_update_state
          write:
          - prompt.surveycpm_write:
              output:
                prompt_ls: write_prompt_ls
          - generation.generate:
              input:
                prompt_ls: write_prompt_ls
              output:
                ans_ls: write_response_ls
          - custom.surveycpm_after_write:
              input:
                response_ls: write_response_ls
          - custom.surveycpm_update_state
          analyst-extend_plan:
          - prompt.surveycpm_extend_plan:
              output:
                prompt_ls: extend_prompt_ls         
          - generation.generate:
              input:
                prompt_ls: extend_prompt_ls
              output:
                ans_ls: extend_response_ls
          - custom.surveycpm_after_extend:
              input:
                response_ls: extend_response_ls
          - custom.surveycpm_update_state
          done: []
- custom.surveycpm_format_output:
    output:
      ans_ls: final_survey_ls

```

## 2. Compile Pipeline File

Execute the following command to compile this workflow:

```shell
ultrarag build examples/DeepResearch.yaml
```

## 3. Configure Running Parameters

Modify `examples/parameter/DeepResearch_parameter.yaml`.

<Note>Want to adjust research depth? Please adjust as needed in `custom` configuration: increase `surveycpm_max_step` to extend research time, increase `surveycpm_max_extend_step` to obtain more detailed extended content. If you have extremely high requirements for quality, be sure to enable `surveycpm_hard_mode`.</Note>

```yaml examples/parameter/DeepResearch_parameter.yaml icon="/images/yaml.svg" 
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom:
  surveycpm_hard_mode: false
  surveycpm_max_extend_step: 12
  surveycpm_max_step: 140
generation:
  backend: vllm  # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: http://localhost:65506/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: surveycpm # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: '' # [!code --]
  system_prompt: 'You are a professional UltraRAG Q&A assistant. Please remember to answer questions in Chinese.' # [!code ++]
prompt:
  surveycpm_extend_plan_template: prompt/surveycpm_extend_plan.jinja
  surveycpm_init_plan_template: prompt/surveycpm_init_plan.jinja
  surveycpm_search_template: prompt/surveycpm_search.jinja
  surveycpm_write_template: prompt/surveycpm_write.jinja
retriever: 
  backend: sentence_transformers # [!code --]
  backend: openai   # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1 # [!code --]
      base_url: http://localhost:65504/v1 # [!code ++]
      model_name: text-embedding-3-small # [!code --]
      model_name: qwen-embedding # [!code ++]
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  query_instruction: '' # [!code --]
  query_instruction: 'Query: ' # [!code ++]
  top_k: 5 # [!code --]
  top_k: 20 # [!code ++]
```

## 4. Effect Demonstration

After configuration is complete, start the DeepResearch Pipeline in UltraRAG UI.

<Note>Since the generation of a 10,000-word review involves a large amount of concurrent retrieval and multi-turn reasoning, it usually takes more than 10 minutes. You can use the UI's background running function and come back to check the final report after the task is completed.</Note>


![](/images/demo/deepresearch.png)

================
File: pages/en/demo/lightresearch.mdx
================
---
title: "LightResearch"
icon: "lightbulb"
---

LightResearch is a lightweight implementation of Deep Research. It simulates expert-level research analysis processes by automatically generating research plans, multi-turn iterative retrieval, sub-question breakdown, and long review generation.

## 1. Pipeline Structure Overview

```yaml examples/LightResearch.yaml icon="/images/yaml.svg"
# LightResearch Demo for UltraRAG UI

# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- custom.init_citation_registry
- prompt.webnote_gen_plan
- generation.generate:
    output:
      ans_ls: plan_ls
- prompt.webnote_init_page
- generation.generate:
    output:
      ans_ls: page_ls
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.webnote_check_page
        branches:
          incomplete:
          - prompt.webnote_gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: psg_ls
          - custom.assign_citation_ids_stateful:
              input:
                ret_psg: psg_ls
              output:
                ret_psg: psg_ls
          - prompt.webnote_fill_page
          - generation.generate:
              output:
                ans_ls: page_ls
          complete: []
- prompt.webnote_gen_answer
- generation.generate

```

## 2. Compile Pipeline File

Execute the following command to compile this complex workflow

```shell
ultrarag build examples/LightResearch.yaml
```

## 3. Configure Running Parameters

Modify `examples/parameter/LightResearch_parameter.yaml`.

```yaml examples/parameter/LightResearch_parameter.yaml icon="/images/yaml.svg" 
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
generation:
  backend: vllm  # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: http://localhost:65503/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: qwen3-32b # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: '' # [!code --]
  system_prompt: 'You are a professional UltraRAG Q&A assistant. Please remember to answer questions in Chinese.' # [!code ++]
prompt:
  webnote_fill_page_template: prompt/webnote_fill_page.jinja # [!code --]
  webnote_fill_page_template: prompt/webnote_fill_page_citation.jinja # [!code ++]
  webnote_gen_answer_template: prompt/webnote_gen_answer.jinja # [!code --]
  webnote_gen_answer_template: prompt/webnote_gen_report.jinja # [!code ++]
  webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
  webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
  webnote_init_page_template: prompt/webnote_init_page.jinja
retriever: 
  backend: sentence_transformers # [!code --]
  backend: openai   # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1 # [!code --]
      base_url: http://localhost:65504/v1 # [!code ++]
      model_name: text-embedding-3-small # [!code --]
      model_name: qwen-embedding # [!code ++]
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  query_instruction: '' # [!code --]
  query_instruction: 'Query: ' # [!code ++]
  top_k: 5
```

## 4. Effect Demonstration

After configuration is complete, start the LightResearch Pipeline in UltraRAG UI.

Different from standard RAG, you will observe that the system goes through multiple rounds of autonomous thinking and retrieval processes before giving the final answer, finally generating a deep review with multi-level headings, detailed arguments, and accurate citations.

![](/images/demo/lightresearch.png)

================
File: pages/en/demo/llm.mdx
================
---
title: "LLM"
icon: "robot"
---

To quickly demonstrate Large Language Model (LLM) capabilities in UltraRAG UI, we provide a preset Pipeline. Before formal operation, please complete Pipeline compilation and parameter configuration.

## 1. Pipeline Structure Overview

```yaml examples/LLM.yaml icon="/images/yaml.svg"
# LLM Demo for UltraRAG UI

# MCP Server
servers:
  benchmark: servers/benchmark
  prompt: servers/prompt
  generation: servers/generation

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- prompt.qa_boxed
- generation.generation_init
- generation.generate
```

## 2. Compile Pipeline File

Execute the following command in the terminal to compile:

```shell
ultrarag build examples/LLM.yaml
```

## 3. Configure Running Parameters

Modify `examples/parameter/LLM_parameter.yaml` according to your environment needs. The following example shows how to switch the backend from `vLLM` to `OpenAI API` standard interface and adjust the model name and system prompt.

```yaml examples/parameter/LLM_parameter.yaml icon="/images/yaml.svg" 
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
generation:
  backend: vllm  # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: http://localhost:65503/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: qwen3-32b # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: '' # [!code --]
  system_prompt: 'You are a professional UltraRAG Q&A assistant. Please remember to answer questions in Chinese.' # [!code ++]
prompt:
  template: prompt/qa_boxed.jinja # [!code --]
  template: prompt/qa_simple.jinja # [!code ++]
```

## 4. Effect Demonstration

After configuration is complete, start UltraRAG UI and select LLM Pipeline in the interface to start interaction.

![](/images/demo/LLM.png)

================
File: pages/en/demo/rag.mdx
================
---
title: "RAG"
icon: "magnifying-glass"
---

To deeply experience Retrieval-Augmented Generation (RAG) capabilities in UltraRAG UI, we provide a standardized RAG Pipeline. This process integrates the complete link of document retrieval, citation annotation, and augmented generation.

## 1. Pipeline Structure Overview

```yaml examples/RAG.yaml icon="/images/yaml.svg"
# RAG Demo for UltraRAG UI

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- custom.assign_citation_ids
- prompt.qa_rag_boxed
- generation.generate
```

## 2. Compile Pipeline File

Execute the following command in the terminal to compile:

```shell
ultrarag build examples/RAG.yaml
```

## 3. Configure Running Parameters

Modify `examples/parameter/RAG_parameter.yaml`. In RAG scenarios, in addition to configuring the LLM generation backend, you also need to focus on configuring the Embedding retrieval backend.

```yaml examples/parameter/RAG_parameter.yaml icon="/images/yaml.svg" 
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
generation:
  backend: vllm  # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: http://localhost:65503/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: qwen3-32b # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: '' # [!code --]
  system_prompt: 'You are a professional UltraRAG Q&A assistant. Please remember to answer questions in Chinese.' # [!code ++]
prompt:
  template: prompt/qa_boxed.jinja # [!code --]
  template: prompt/qa_rag_citation.jinja # [!code ++]
retriever: 
  backend: sentence_transformers # [!code --]
  backend: openai   # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1 # [!code --]
      base_url: http://localhost:65504/v1 # [!code ++]
      model_name: text-embedding-3-small # [!code --]
      model_name: qwen-embedding # [!code ++]
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  query_instruction: '' # [!code --]
  query_instruction: 'Query: ' # [!code ++]
  top_k: 5 # [!code --]
  top_k: 20 # [!code ++]

```

## 4. Effect Demonstration

After configuration is complete, start UltraRAG UI, select RAG Pipeline in the interface, and select the corresponding knowledge base. You will see how the LLM combines retrieved document fragments to give more accurate and cited answers.

![](/images/demo/RAG.png)

================
File: pages/en/develop_guide/case_study.mdx
================
---
title: "Case Study"
icon: "chart-line"
---

UltraRAG provides a convenient Case Study visualization mechanism to help researchers quickly check and analyze whether the built Pipeline works as expected.

After running the Pipeline, the system will automatically generate a memory log file in the output folder.
Just execute the following command to start the Case Study visualization webpage:

```shell
python ./script/case_study.py \
  --data output/memory_nq_rag_branch_20251014_152438.json \
  --host 127.0.0.1 \
  --port 8070 \
  --title "Case Study Viewer"
```

<Note>Please adjust parameters such as file path and port number according to the actual situation.</Note>

After successful startup, you can access the corresponding address through a browser to enter the Case Study Viewer interface. The interface displays the inputs and outputs of each stage of the Pipeline in an intuitive visual way, facilitating the step-by-step review of the reasoning chain and intermediate states.

![](/images/develop_guide/case_study.png)

In the page, you can switch between different Cases one by one to intuitively compare the correspondence between `input question, retrieval results, and model output`, thereby more efficiently locating potential problems, optimizing Pipeline design, and verifying whether the entire reasoning process meets expectations.

================
File: pages/en/develop_guide/code_integration.mdx
================
---
title: "Code Integration"
icon: "code-merge"
---

Through two methods, **ToolCall** and **PipelineCall**, you can directly call UltraRAG's capabilities in your local code.

## ToolCall

When you only need a specific function of UltraRAG (such as data loading, encoding, retrieval, etc.) and do not need to run the full Pipeline, you can call it as a function via `ToolCall`.

<Tip>`ToolCall` requires using `initialize` to specify the Servers to be enabled first.</Tip>
<Tip>`server_root` recommends using an absolute path, e.g., `/home/user/project/UltraRAG/servers`.</Tip>

```python script/api_usage_example.py
from ultrarag.api import initialize, ToolCall


initialize(["benchmark"], server_root="servers") 

benchmark_param_dict = {
    "key_map":{
      "gt_ls": "golden_answers",
      "q_ls": "question"
    },
    "limit": -1,
    "seed": 42,
    "name": "nq",
    "path": "data/sample_nq_10.jsonl",
    
}
benchmark = ToolCall.benchmark.get_data(benchmark_param_dict)

```

<Note>The usage method is consistent with ordinary Python functions; just pass in the corresponding parameters as needed.</Note>


```python script/api_usage_example.py
from ultrarag.api import initialize, ToolCall


initialize(["benchmark", "retriever"], server_root="servers") 

benchmark_param_dict = {
    "key_map":{
      "gt_ls": "golden_answers",
      "q_ls": "question"
    },
    "limit": -1,
    "seed": 42,
    "name": "nq",
    "path": "data/sample_nq_10.jsonl",
    
}
benchmark = ToolCall.benchmark.get_data(benchmark_param_dict)

query_list = benchmark['q_ls']


retriever_init_param_dict = {
    "model_name_or_path": "Qwen/Qwen3-Embedding-0.6B",
}

ToolCall.retriever.retriever_init(
    **retriever_init_param_dict
)

result = ToolCall.retriever.retriever_search(
    query_list=query_list,
    top_k=5,
)

retrieve_passages = result['ret_psg']

```

<Note>Only pass the parameters you wish to modify; other parameters will be automatically completed from the Server's default parameter file.</Note>


## PipelineCall

If you wish to run an entire UltraRAG Pipeline directly locally and obtain the execution results of all steps, you can use `PipelineCall`.

<Note>You need to first generate the `pipeline_parameter.yaml` and parameter files for the corresponding Pipeline via UltraRAG's `build` function.</Note>

```python script/api_usage_example.py
from ultrarag.api import PipelineCall

result = PipelineCall(
    pipeline_file="examples/rag_deploy.yaml",
    parameter_file="examples/parameter/rag_deploy_parameter.yaml",
)

final_step_result = result['final_result']
all_steps_result = result['all_results']

```

<Note>`final_result` is the running result of the last step of the Pipeline, and `all_steps_result` contains the running results of all steps.</Note>

================
File: pages/en/develop_guide/dataset.mdx
================
---
title: "Evaluation Data"
icon: "table"
---

We have organized and preprocessed the most commonly used public evaluation datasets and corpora in current RAG research, and have released them synchronously on [ModelScope](https://modelscope.cn/datasets/UltraRAG/UltraRAG_Benchmark) and [Huggingface](https://huggingface.co/datasets/UltraRAG/UltraRAG_Benchmark).
Users can download and use them directly without additional cleaning or conversion, seamlessly interfacing with UltraRAG's evaluation pipeline.

## Benchmark

The following table summarizes the currently supported task types and statistical information of corresponding datasets:

| Task Type | Dataset Name | Raw Data Count | Eval Sample Count |
|:---|:---|:---|:---|
| QA | [NQ](https://huggingface.co/datasets/google-research-datasets/nq_open) | 3,610 | 1,000 |
| QA | [TriviaQA](https://nlp.cs.washington.edu/triviaqa/) | 11,313 | 1,000 |
| QA | [PopQA](https://huggingface.co/datasets/akariasai/PopQA) | 14,267 | 1,000 |
| QA | [AmbigQA](https://huggingface.co/datasets/sewon/ambig_qa) | 2,002 | 1,000 |
| QA | [MarcoQA](https://huggingface.co/datasets/microsoft/ms_marco/viewer/v2.1/validation) | 55,636 | 1,000 |
| QA | [WebQuestions](https://huggingface.co/datasets/stanfordnlp/web_questions) | 2,032 | 1,000 |
| VQA | [MP-DocVQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-MP-DocVQA) | 591 | 591 |
| VQA | [ChartQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-ChartQA) | 63 | 63 |
| VQA | [InfoVQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-InfoVQA) | 718 | 718 |
| VQA | [PlotQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-PlotQA) | 863 | 863 |
| Multi-hop QA | [HotpotQA](https://huggingface.co/datasets/hotpotqa/hotpot_qa) | 7,405 | 1,000 |
| Multi-hop QA | [2WikiMultiHopQA](https://www.dropbox.com/scl/fi/heid2pkiswhfaqr5g0piw/data.zip?e=2&file_subpath=%2Fdata&rlkey=ira57daau8lxfj022xvk1irju) | 12,576 | 1,000 |
| Multi-hop QA | [Musique](https://drive.google.com/file/d/1tGdADlNjWFaHLeZZGShh2IRcpO6Lv24h/view) | 2,417 | 1,000 |
| Multi-hop QA | [Bamboogle](https://huggingface.co/datasets/chiayewken/bamboogle) | 125 | 125 |
| Multi-hop QA | [StrategyQA](https://huggingface.co/datasets/tasksource/strategy-qa) | 2,290 | 1,000 |
| Multi-hop VQA | [SlideVQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-SlideVQA) | 556 | 556 |
| Multiple-choice | [ARC](https://huggingface.co/datasets/allenai/ai2_arc) | 3,548 | 1,000 |
| Multiple-choice | [MMLU](https://huggingface.co/datasets/cais/mmlu) | 14,042 | 1,000 |
| Multiple-choice VQA | [ArXivQA](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-ArxivQA) | 816 | 816 |
| Long-form QA | [ASQA](https://huggingface.co/datasets/din0s/asqa) | 948 | 948 |
| Fact-verification | [FEVER](https://fever.ai/dataset/fever.html) | 13,332 | 1,000 |
| Dialogue | [WoW](https://huggingface.co/datasets/facebook/kilt_tasks) | 3,054 | 1,000 |
| Slot-filling | [T-REx](https://huggingface.co/datasets/facebook/kilt_tasks) | 5,000 | 1,000 |

**Data Format Specification**

To ensure full compatibility with UltraRAG modules, it is recommended that users store test data uniformly as .jsonl files and follow the format specifications below.

Non-multiple-choice data format:

```json icon="/images/json.svg"
{
  "id": 0, 
  "question": "where does the karate kid 2010 take place", 
  "golden_answers": ["China", "Beijing", "Beijing, China"], 
  "meta_data": {} 
}
```

Multiple-choice data format:

```json icon="/images/json.svg"
{
  "id": 0, 
  "question": "Mast Co. converted from the FIFO method for inventory valuation to the LIFO method for financial statement and tax purposes. During a period of inflation would Mast's ending inventory and income tax payable using LIFO be higher or lower than FIFO? Ending inventory Income tax payable", 
  "golden_answers": ["A"], 
  "choices": ["Lower Lower", "Higher Higher", "Lower Higher", "Higher Lower"], 
  "meta_data": {"subject": "professional_accounting"}
}
```

## Corpus

UltraRAG provides multi-source, high-quality standardized corpora, covering both text and image modalities, facilitating the construction of multi-scenario RAG systems.
The following is statistical information on currently collected corpora:

| Corpus Name | Document Count |
|:---|:---|
| Wiki-2018 | 21,015,324 |
| Wiki-2024 | 30,463,973 |
| MP-DocVQA | 741 |
| ChartQA | 500 |
| InfoVQA | 459 |
| PlotQA | 9,593 |
| SlideVQA | 1,284 |
| ArXivQA | 8,066 |

**Data Format Specification**

Text corpus format:

```json icon="/images/json.svg"
{
  "id": "15106858", 
  "contents": "Arrowhead Stadium 1970s practice would eventually spread to the other NFL stadiums as the 1970s progressed, finally becoming mandatory league-wide in the 1978 season (after being used in Super Bowl XII), and become almost near-universal at the lower levels of football. On January 20, 1974, Arrowhead Stadium hosted the Pro Bowl. Due to an ice storm and brutally cold temperatures the week leading up to the game, the game's participants worked out at the facilities of the San Diego Chargers. On game day, the temperature soared to 41 F, melting most of the ice and snow that accumulated during the week. The AFC defeated the NFC, 15–13."
}
```

Image corpus format:

```json icon="/images/json.svg"
{
  "id": 0, 
  "image_id": "37313.jpeg", 
  "image_path": "image/37313.jpg"
}
```

================
File: pages/en/develop_guide/debug.mdx
================
---
title: "Debugging"
icon: "bug"
---

To develop and debug UltraRAG more efficiently, you can use the built-in debugging function of VSCode to run specific Pipeline configuration files and perform breakpoint debugging on the execution process.

## Step 1: Compile and Configure Parameters

Taking `examples/rag_full.yaml` as an example, first execute Pipeline Compilation and complete parameter configuration. This process is consistent with [Quick Start](/pages/en/getting_started/quick_start), the only difference is that there is no need to execute the run command here.

## Step 2: Create Debug Configuration File

Create a `.vscode/launch.json` file in the project root directory and write the following content:

```json .vscode/launch.json icon="/images/json.svg" highlight="11"
{
    "configurations": [
        {
            "name": "UltraRAG Debug",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}/src/ultrarag/client.py",
            "console": "integratedTerminal",
            "args": [
                "run",
                "${workspaceFolder}/examples/rag_full.yaml" 
            ],
            "cwd": "${workspaceFolder}",
        }
    ]
}
```

<Note>If you want to debug other Pipelines, just replace the path in args with the corresponding YAML file path.</Note>

## Step 3: Start Debugging

1. Open the Debug panel on the left side of VSCode (shortcut Ctrl+Shift+D).
2. Select **UltraRAG Debug** in the debug configuration.
3. Click the green ▶ Start button to run and debug in the integrated terminal within VSCode.

You can set breakpoints in any Python file to observe the execution flow and variable states step by step.

================
File: pages/en/develop_guide/parallel.mdx
================
---
title: "Parallel Experiment"
icon: "chart-waterfall"
---

During experiments, we often need to perform parallel or batch experiments on the same Pipeline using different hyperparameter configurations. UltraRAG provides a flexible parameter file mechanism that supports quick configuration switching without modifying the main Pipeline.

## Step 1: Compile Pipeline

Consistent with the regular running process, first compile the Pipeline:

```shell
ultrarag build examples/rag_full.yaml
```

## Step 2: Run

After modifying the corresponding fields in the generated parameter file, execute the following command directly to run:

```shell
ultrarag run examples/rag_full.yaml
```

## Step 3: Create New Parameter File

If you want to test different parameter combinations on the same Pipeline, you can create a new parameter file, for example `examples/parameter/rag_full_parameter_new.yaml` (the file name can be customized).

Then, when running the command, specify the configuration file using the `--param` argument:

```shell
ultrarag run examples/rag_full.yaml --param examples/parameter/rag_full_parameter_new.yaml
```

In this way, the system will execute the same Pipeline using the new parameter file, making it easy to perform parallel and batch comparisons of multiple experiments.

================
File: pages/en/getting_started/installation.mdx
================
---
title: "Installation"
icon: "download"
---

UltraRAG provides two installation methods: local source code installation (recommended using `uv` for package management) and Docker container deployment.

## Source Code Installation

We strongly recommend using [uv](https://github.com/astral-sh/uv) to manage the Python environment and dependencies, as it can greatly improve installation speed.

**Prepare Environment**

If you haven't installed uv yet, please execute:

```shell
## Direct install
pip install uv
## Download script
curl -LsSf https://astral.sh/uv/install.sh | sh
```

**Download Source Code**

```shell
git clone https://github.com/OpenBMB/UltraRAG.git --depth 1
cd UltraRAG
```

**Install Dependencies**

Please choose a synchronization method according to your usage scenario:

- **Core Dependencies**: If you only need to run basic core functions, such as using only UltraRAG UI:
  ```shell
  uv sync
  ```

- **Full Installation**: If you want to fully experience UltraRAG's retrieval, generation, corpus processing, and evaluation functions, please run:
  ```shell
  uv sync --extra retriever --extra generation --extra corpus --extra evaluation
  ```
- **On-Demand Installation**: If you only need to run specific modules, retain the corresponding `--extra` as needed, for example:

  ```shell
  uv sync --extra retriever   # Retrieval module only
  uv sync --extra generation  # Generation module only
  ```

## Docker Container Deployment

If you don't want to configure a local Python environment, you can use Docker for a one-click start.

```shell
# 1. Download code
git clone https://github.com/OpenBMB/UltraRAG.git --depth 1
cd UltraRAG
# 2. Build image
docker build -t ultrarag:latest .
# 3. Start container (port 5050 is automatically mapped)
docker run -it --gpus all -p 5050:5050 ultrarag:latest
```
Tip: The container will automatically run UltraRAG UI after startup. You can access `http://localhost:5050` directly in your browser.

## Verify Installation

After installation is complete, run the following example command to check if the environment is normal:

```shell
ultrarag run examples/experiments/sayhello.yaml
```

Seeing the following output indicates a successful installation:

```
Hello, UltraRAG v3!
```

================
File: pages/en/getting_started/introduction.mdx
================
---
title: "Introduction"
icon: "hand-wave"
---

## UltraRAG

UltraRAG is the first lightweight RAG development framework designed based on the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro) architecture, specifically built for research exploration and industrial prototyping. It standardizes core RAG components (such as [Retriever](/pages/en/rag_servers/retriever), [Generation](/pages/en/rag_servers/generation), etc.) into independent MCP Servers, enabling flexible extension based on function-level Tool interfaces. Leveraging the process scheduling capabilities of the MCP Client, developers can achieve precise orchestration of complex control structures (such as conditions, loops, etc.) through YAML configuration. Furthermore, the system supports seamless migration of algorithmic logic to conversational demo interfaces, greatly optimizing the full-link efficiency of complex RAG system development.

<p align="center">
  <img src="/images/getting_started/intro.png" width="95%" />
</p>
<Columns cols={2}>

  <Card title="📑 Pipeline · Process Definition">
    **Core Blueprint**: Task logic written by users via YAML, defining the execution order and business logic of each component, realizing configurable inference processes.
  </Card>
  <Card title="🕹️ Client · Scheduling Hub">
    **Command Center**: Responsible for parsing Pipeline configurations, unifying the coordination of tool calls and data transfer between Servers, ensuring precise process execution.
  </Card>
  <Card title="⚙️ Server · Functional Execution">
    **Capability Carrier**: Standardizes core functions into independent services, supporting rapid extension and flexible combination of new modules through simple interfaces.
  </Card>
  <Card title="🖥️ UI · Interactive Demo">
    **Visual Portal**: Instantly converts logic defined in YAML into an intuitive conversational Web UI with a single command, significantly improving system debugging efficiency and demonstration effects.
  </Card>
</Columns>

## Why UltraRAG?

RAG systems are undergoing a paradigm shift from static chain concatenation to autonomous reasoning systems, increasingly relying on the model's active reasoning, dynamic retrieval, and conditional decision-making. However, traditional frameworks often face bottlenecks such as lack of flexibility, deep module coupling, and loose structures when dealing with multi-turn interactions and dynamic updates, making it difficult for researchers to efficiently reproduce and horizontally compare results.

UltraRAG aims to break this deadlock by providing developers with a standardized, decoupled, and minimalist new development paradigm:

<Card title="🚀 Low-Code Orchestration of Complex Processes">
  **Inference Orchestration**: Natively supports control structures such as serial, loop, and conditional branching. Developers only need to write a YAML configuration file to implement complex iterative RAG logic within dozens of lines of code.
</Card>

<Card title="⚡ Modular Extension and Reproduction">
  **Atomic Servers**: Decouples functions into independent Servers based on the MCP architecture. New functions only need to be registered as function-level Tools to seamlessly integrate into the process, achieving extremely high reusability.
</Card>

<Card title="📊 Unified Evaluation and Benchmark Comparison">
  **Research Efficiency**: Built-in standardized evaluation processes, ready-to-use mainstream scientific research [Benchmarks](/pages/en/develop_guide/dataset). Through unified metric management and baseline integration, the reproducibility of experiments and comparison efficiency are significantly improved.
</Card>

<Card title="✨ Rapid Prototyping of Interactions">
  **One-Click Delivery**: Say goodbye to tedious UI development. With just one command, Pipeline logic can be instantly converted into an interactable conversational Web UI, shortening the distance from algorithm to demonstration.
</Card>

================
File: pages/en/getting_started/quick_start.mdx
================
---
title: "Quick Start"
icon: "rocket-launch"
---

This section will help you quickly understand how to run a complete RAG Pipeline based on UltraRAG. The usage process of UltraRAG mainly includes the following three stages:

- Write Pipeline configuration file
- Compile Pipeline and adjust parameters
- Run Pipeline

In addition, you can also analyze and evaluate the running results through visualization tools.

<Tip>If you haven't installed UltraRAG yet, please refer to [Installation](/pages/en/getting_started/installation).</Tip>

<Tip>For a more complete RAG development practice, please check the full documentation.</Tip>

## Step 1: Write Pipeline Configuration File

<Info>Please ensure that the current working directory is located at the UltraRAG root directory</Info>

Create and write your Pipeline configuration file in the `examples` folder, for example:

```yaml examples/rag_full.yaml icon="/images/yaml.svg"
# Vanilla RAG with Corpus Indexing Demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

UltraRAG's Pipeline configuration file needs to include the following two parts:

- `servers`: Declare the various modules (Servers) depended on by the current process. For example, the `retriever` Server is required for the retrieval stage.
- `pipeline`: Define the calling sequence of functional functions (Tools) in each Server. This example shows a complete process from data loading, retrieval encoding and index construction, to generation and evaluation.

## Step 2: Compile Pipeline and Adjust Parameters

Before running the code, you first need to configure the parameters required for operation. UltraRAG provides a shortcut `build` command, which can automatically generate the complete parameter file depended on by the current Pipeline.
The system will read the `parameter.yaml` file of each Server, parse all parameter items involved in this process, and consolidate them into an independent configuration file. Execute the following command:

```shell
ultrarag build examples/rag_full.yaml
```

After execution, the terminal will output content as follows:

![](/images/getting_started/rag_build.png)

The system will generate the corresponding parameter configuration file in the `examples/parameters/` folder. Open the file and modify relevant parameters according to the actual situation, for example:

```yaml examples/parameters/rag_full_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: 'abc'
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 5
      gpu_memory_utilization: 0.5
      model_name_or_path: openbmb/MiniCPM4-8B  # [!code --]
      model_name_or_path: Qwen/Qwen3-8B # [!code ++]
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja # [!code --]
  template: prompt/qa_rag_boxed.jinja  # [!code ++]
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: 'abc'
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: '5'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light  # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B  # [!code ++]
  overwrite: false
  query_instruction: ''
  top_k: 5
```

You can modify the parameters according to the actual situation, for example:

- Adjust `template` to the RAG template `prompt/qa_rag_boxed.jinja`;
- Replace `model_name_or_path` of the retriever and generator with the local downloaded model path;
- If running in a multi-GPU environment, modify `gpu_ids` to match available devices.

## Step 3: Run Pipeline

When the parameter configuration is complete, you can run the entire process with one click. Execute the following command:

```shell
ultrarag run examples/rag_full.yaml
```

The system will sequentially execute the various Servers and Tools defined in the configuration file, and output running logs and progress information in the terminal in real time:

![](/images/getting_started/rag_run.png)

After running, the results (such as generated content, evaluation reports, etc.) will be automatically saved in the corresponding output path, such as `output/memory_nq_rag_full_20251010_145420.json` in this example, which can be directly used for subsequent analysis and visual display.

## Step 4: Visual Analysis Case Study

After completing the process run, you can quickly analyze the generation results through the built-in visualization tool. Execute the following command to start the Case Study Viewer:

```shell
python ./script/case_study.py \
  --data output/memory_nq_rag_full_20251010_145420.json \
  --host 127.0.0.1 \
  --port 8080 \
  --title "Case Study Viewer"
```

After successful operation, the terminal will display the access address. Open the browser and enter the address to enter the Case Study Viewer interface to interactively browse and analyze the results.
The interface example is shown below:

![](/images/getting_started/case1.png)
![](/images/getting_started/case2.png)

## Summary

At this point, you have completed the full RAG practice process from **Pipeline Configuration**, **Parameter Compilation** to **Process Running** and **Visual Analysis**.
UltraRAG makes the construction, operation, and analysis of RAG systems more efficient, intuitive, and reproducible through a modular MCP architecture and a unified evaluation system.

Based on this, you can:

- Replace different models or retrievers to explore various combination effects;
- Customize new Servers and Tools to extend system functions;
- Use the evaluation module to quickly compare experimental results and conduct systematic research.

================
File: pages/en/getting_started/update.mdx
================
---
title: "Changelog"
icon: "sparkles"
---

<Update label="Jan 12, 2025" description="UltraRAG v2.1.3" tags={["Releases"]}>
  **2026-1-12**

  <Card
    title="UltraRAG v2.1.3 Update"
    icon="github"
    href="https://github.com/OpenBMB/UltraRAG/releases/tag/v0.2.1.3"
    arrow="true"
    cta="View Release"
  >
    Improved system stability and fixed implementation bugs such as Search-o1.
  </Card>
</Update>

<Update label="Nov 25, 2025" description="UltraRAG v2.1.2" tags={["Releases"]}>
  **2025-11-25**

  <Card
    title="UltraRAG v2.1.2 Update"
    icon="github"
    href="https://github.com/OpenBMB/UltraRAG/releases/tag/v0.2.1.2"
    arrow="true"
    cta="View Release"
  >
    Provided ToolCall and PipelineCall features for direct local code invocation.
  </Card>
</Update>

<Update label="Nov 13, 2025" description="UltraRAG v2.1.1" tags={["Releases"]}>
  **2025-11-13**

  <Card
    title="UltraRAG v2.1.1 Update"
    icon="github"
    href="https://github.com/OpenBMB/UltraRAG/releases/tag/v0.2.1.1"
    arrow="true"
    cta="View Release"
  >
    Decoupled retrieval and indexing architecture and supported Milvus/Faiss, comprehensively improving stability and flexibility.
  </Card>
</Update>

<Update label="Oct 22, 2025" description="UltraRAG v2.1" tags={["Releases"]}>
  **2025-10-22**

  <Card
    title="UltraRAG v2.1 Update"
    icon="github"
    href="https://github.com/OpenBMB/UltraRAG/releases/tag/v0.2.1"
    arrow="true"
    cta="View Release"
  >
    RAG Servers fully upgraded — refactored document parsing and knowledge base construction processes, enhanced multi-modal RAG capabilities, and supported more backend frameworks.
  </Card>
</Update>

<Update label="Aug 28, 2025" description="UltraRAG v2.0" tags={["Releases"]}>
  **2025-08-28**

  <Card
    title="UltraRAG v2.0 Update"
    icon="github"
    href="https://github.com/OpenBMB/UltraRAG/releases/tag/v0.2.0"
    arrow="true"
    cta="View Release"
  >
    Low-code RAG framework based on MCP, helping researchers build complex processes more efficiently and achieve innovative R&D.
  </Card>
</Update>

================
File: pages/en/pipeline/light_deepresearch.mdx
================
---
title: "WebNote"
icon: "cloud"
---

<Note>We recorded an explanatory video for this Demo: [📺 bilibili](https://www.bilibili.com/video/BV1p8JfziEwM/?spm_id_from=333.337.search-card.all.click).</Note>

## What is DeepResearch?

Deep Research (also known as Agentic Deep Research) refers to an intelligent research agent where a Large Language Model (LLM) collaborates with tools (such as search, browser, code execution, memory storage, etc.) to complete complex research tasks in a closed loop of "multi-turn reasoning → retrieval → verification → fusion".

Different from single-retrieval RAG (Retrieval-Augmented Generation), Deep Research is more like a human expert's approach — first making a plan, then constantly exploring, adjusting direction, verifying information, and finally outputting a well-structured and sourced report.

## Prerequisites

In this development, we will complete the example based on the UltraRAG framework. Considering that most users may not have computing servers, we will implement the entire process on a MacBook Air (M2) to ensure the environment is lightweight and easy to reproduce.

### API Preparation

- Retrieval API: We use [Tavily Web Search](https://www.tavily.com/). You can get 1000 free calls upon initial registration.
- LLM API: You can choose any large model service according to your habits. In this tutorial, we use gpt-5-nano as an example.

### API Settings

We provide two ways to pass the API Key: environment variables and explicit parameters. Environment variables are recommended as they are safer and avoid API Key leakage in logs.

In the UltraRAG root directory, rename the template file `.env.dev` to `.env`, and fill in your key information, for example:

```
LLM_API_KEY="your llm key"
TAVILY_API_KEY="your retriever key"
```
UltraRAG will automatically read this file and load relevant configurations at startup.

## Pipeline Introduction

In this example, we will implement a lightweight Deep Research Pipeline. It has the following basic functions:

- Plan Formulation: The model first formulates a solution plan based on the user's question;
- Sub-question Generation and Retrieval: Decompose big questions into retrievable sub-questions and call Web search tools to obtain relevant information;
- Report Organization and Filling: Gradually improve the content of the research report;
- Reasoning and Final Generation: After the report is completed, the model gives the final answer.

The flow chart is shown below:

![](/images/pipeline/light_deep/pipe.png)

The pipeline is mainly divided into two stages:

1. **Initialization Phase:** The model generates a plan based on the user's question and constructs an initial report page accordingly.

2. **Iterative Filling Phase:**
   
- The system checks if the current report page is fully filled.
- The criterion is: whether the string "to be filled" still exists in the page.
- If the report is not yet complete, the model generates a new sub-question combining the user's question, plan, and current page, and triggers Web retrieval.
- The retrieved documents are used to update the page, then entering the next round of checking.
- This process iterates until the page is filled.
  
Finally, the model generates a complete answer based on the user's question and the final report page.

The code implementation of this example is very concise, mainly relying on custom extensions of router and prompt tool. Interested users can view the source code directly. The following is the complete pipeline definition:

```yaml examples/webnote_websearch.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- generation.generation_init
- prompt.webnote_gen_plan
- generation.generate:
    output:
      ans_ls: plan_ls
- prompt.webnote_init_page
- generation.generate:
    output:
      ans_ls: page_ls
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.webnote_check_page
        branches:
          incomplete:
          - prompt.webnote_gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_websearch:
              input:
                query_list: subq_ls
              output:
                ret_psg: psg_ls
          - prompt.webnote_fill_page
          - generation.generate:
              output:
                ans_ls: page_ls
          complete: []
- prompt.webnote_gen_answer
- generation.generate
```

## Run

### Construct Question Data

First, create a new file named `sample_light_ds.jsonl` under the data folder and write the questions you want to research. For example:

```json data/sample_light_ds.jsonl icon="/images/json.svg"
{"id": 0, "question": "Introduce Teyvat Continent", "golden_answers": [], "meta_data": {}}
```

### Construct Parameter Configuration File

Execute the following command to generate the parameter file corresponding to the pipeline:

```shell
ultrarag build examples/webnote_websearch.yaml
```

Modify parameters according to actual conditions, for example:

```yaml examples/parameter/webnote_websearch_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq # [!code --]
    path: data/sample_nq_10.jsonl # [!code --]
    name: ds # [!code ++]
    path: data/sample_light_ds.jsonl # [!code ++]
    seed: 42
    shuffle: false
generation:
  backend: vllm # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: https://api.openai.com/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: gpt-5-nano # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8 # [!code --]
  system_prompt: ''
prompt:
  webnote_fill_page_template: prompt/webnote_fill_page.jinja
  webnote_gen_answer_template: prompt/webnote_gen_answer.jinja
  webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
  webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
  webnote_init_page_template: prompt/webnote_init_page.jinja
retriever:
  retrieve_thread_num: 1
  top_k: 5
  websearch_backend: tavily
  websearch_backend_configs:
    tavily:
      api_key: ""
      retries: 3
      base_delay: 1.0
      search_kwargs: {}
```


### Start

Before running, don't forget to set your API Key:
```shell
ultrarag examples/webnote_websearch.yaml
```

After running, you can visually view the generated content through the Case Study Viewer:

```shell
python ./script/case_study.py \
  --data output/memory_ds_light_deepresearch_20250909_152727.json   \
  --host 127.0.0.1 \
  --port 8070 \
  --title "Case Study Viewer"
```

This will open the result page in the browser, allowing you to intuitively analyze the execution process and generated content of the pipeline.

![](/images/pipeline/light_deep/result.png)

================
File: pages/en/pipeline/rag.mdx
================
---
title: "Vanilla RAG"
icon: "ice-cream"
---

<Note>We recorded an explanatory video for this Demo: [📺 bilibili](https://www.bilibili.com/video/BV1B9apz4E7K/?share_source=copy_web&vd_source=7035ae721e76c8149fb74ea7a2432710).</Note>

## What is RAG?

> Imagine you are taking an open-book exam. You are the large language model yourself, capable of understanding questions and writing answers.
> But you can't remember all the knowledge points. At this time, you are allowed to bring a textbook or reference book into the exam room — this is retrieval.
> When you find relevant content in the book, and then combine it with your own understanding to write the answer, the answer is both accurate and grounded.
> This is RAG — Retrieval-Augmented Generation.

RAG (Retrieval-Augmented Generation) is a technology that allows Large Language Models (LLMs) to "retrieve" relevant documents or knowledge bases before "generating" answers, and then combine this information to generate responses.

### Process

**Retrieval Phase**: Find the most relevant content from the document library (such as knowledge bases, web pages, etc.) based on user questions;
![](/images/pipeline/rag/retrieve_stage.png)

**Generation Phase**: Use the retrieved content as context and input it to the LLM, allowing it to generate answers based on this information.
![](/images/pipeline/rag/gen_stage.png)

### Benefits

- Improve accuracy and reduce "hallucinations"
- Maintain timeliness and professionalism without retraining the model
- Enhance credibility

## Corpus Encoding and Indexing

Before using RAG, original documents need to be converted into vector representations and a retrieval index needs to be established. In this way, when a user asks a question, the system can quickly find the most relevant content in the large-scale corpus.
- **Embedding**: Convert natural language text into vectors so that computers can compare semantic similarities mathematically.
- **Indexing**: Organize these vectors, for example using FAISS, so that retrieval can instantly find the most relevant entries among millions of documents.

![](/images/pipeline/rag/emb_index.png)

### Example Corpus (Wiki Text)

```json data/corpus_example.jsonl icon="/images/json.svg"
{"id": "2066692", "contents": "Truman Sports Complex The Harry S. Truman Sports...."}
{"id": "15106858", "contents": "Arrowhead Stadium 1970s...."}
```

This is a typical Wiki corpus, where `id` is the unique identifier of the document, and `contents` is the actual text content. We will vectorize `contents` and build an index later.

### Write Encoding and Indexing Pipeline

```yaml examples/corpus_index.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
```
Here a minimal three-step process is defined: Initialization → Embedding → Indexing.

### Compile Pipeline File

```shell
ultrarag build examples/corpus_index.yaml
```

### Modify Parameter File

```yaml examples/parameters/corpus_index_parameter.yaml icon="/images/yaml.svg" 
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  overwrite: false
```

### Run Pipeline File

```shell
ultrarag run examples/corpus_index.yaml
```

The encoding and indexing phase usually involves large-scale corpus processing and takes a long time. It is recommended to use `screen` or `nohup` to mount the task to run in the background, for example:

```shell
nohup ultrarag run examples/corpus_index.yaml > log.txt 2>&1 &
```

After successful execution, you will get the corresponding corpus vector and index files, which can be directly used by the subsequent RAG Pipeline for retrieval.

## Build RAG Pipeline

After the corpus index is ready, the next step is to combine the Retriever and the Large Language Model (LLM) to build a complete RAG Pipeline. In this way, questions can be retrieved to find relevant documents, and then handed over to the model to generate the final answer.

### Retrieval Process

![](/images/pipeline/rag/retrieve.png)

### Generation Process

![](/images/pipeline/rag/gen_stage.png)

### Data Format (Taking NQ Dataset as Example)

```json data/sample_nq_10.jsonl icon="/images/json.svg"
{"id": 0, "question": "when was the last time anyone was on the moon", "golden_answers": ["14 December 1972 UTC", "December 1972"], "meta_data": {}}
{"id": 1, "question": "who wrote he ain't heavy he's my brother lyrics", "golden_answers": ["Bobby Scott", "Bob Russell"], "meta_data": {}}
{"id": 2, "question": "how many seasons of the bastard executioner are there", "golden_answers": ["one", "one season"], "meta_data": {}}
{"id": 3, "question": "when did the eagles win last super bowl", "golden_answers": ["2017"], "meta_data": {}}
{"id": 4, "question": "who won last year's ncaa women's basketball", "golden_answers": ["South Carolina"], "meta_data": {}}
```
Each sample contains a question, standard answers (`golden_answers`), and additional information (`meta_data`), which will be used as input and evaluation benchmarks later.

### Write RAG Pipeline

```yaml examples/rag.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

The entire process completes sequentially:
1. Read data → 2. Initialize retriever and search → 3. Start LLM service → 4. Assemble Prompt → 5. Generate answer → 6. Extract result → 7. Evaluate performance.

### Compile Pipeline File

```shell
ultrarag build examples/rag.yaml
```

### Modify Parameter File (Specify Dataset, Model, and Retrieval Configuration)

```yaml examples/parameters/rag_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B  # [!code --]
      model_name_or_path: Qwen/Qwen3-8B # [!code ++]
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja # [!code --]
  template: prompt/qa_rag_boxed.jinja  # [!code ++]
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light  # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B  # [!code ++]
  query_instruction: ''
  top_k: 5
```

### Run Pipeline File

```shell
ultrarag run examples/rag.yaml
```

### View Generation Results

Use the visualization script to quickly browse model outputs.

```shell
python ./script/case_study.py \
  --data output/memory_nq_rag_full_20251010_145420.json \
  --host 127.0.0.1 \
  --port 8080 \
  --title "Case Study Viewer"
```

================
File: pages/en/pipeline/search_o1.mdx
================
---
title: "Search-o1"
icon: "searchengin" 
---

## Introduction

Search-o1 proposes a framework that combines large-scale reasoning models with **Agentic Retrieval-Augmented Generation (Agentic RAG)** and **Reason-in-Documents**. When the model encounters a knowledge gap during the reasoning process, it actively retrieves external information, refines it, and injects the result into the reasoning chain, thereby improving the reasoning accuracy and robustness in complex tasks such as science, mathematics, and programming.

<Note>Paper link: [Arxiv](https://arxiv.org/abs/2501.05366).</Note>

### Process

![](/images/pipeline/searcho1/suanfa.png)

In short, Search-o1 starts reasoning with the original question; once an information gap is identified, it generates a sub-question and triggers retrieval; then it refines the retrieved response, extracting key information to inject back into the reasoning process until a credible final answer is formed.

## Reproduction

### Write Pipeline

Based on the above logic, the following Pipeline can be written:

```yaml examples/search_o1.yaml icon="/images/yaml.svg"
# Search-o1 Demo

# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- custom.search_o1_init_list
- prompt.search_o1_init
- generation.generate
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.search_o1_check
        branches:
          retrieve:
          - custom.search_o1_query_extract
          - retriever.retriever_search:
              input:
                query_list: extract_query_list
          - custom.search_o1_reasoning_extract
          - custom.search_o1_combine_list
          - prompt.search_o1_reasoning_indocument
          - generation.generate
          - custom.search_o1_extract_final_information
          - custom.search_o1_combine_final_information
          - prompt.search_o1_insert
          - generation.generate
          stop: []   
- custom.output_extract_from_boxed
- evaluation.evaluate
```

### Compile Pipeline File

```shell
ultrarag build examples/search_o1.yaml
```

### Modify Parameter File

```yaml examples/parameters/search_o1_parameter.yaml icon="/images/yaml.svg" 
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B # [!code --]
      model_name_or_path: Qwen/QwQ-32B # [!code ++]
      trust_remote_code: true
  extra_params:
    chat_template_kwargs: # [!code --]
      enable_thinking: false # [!code --]
    top_k: 20 # [!code ++]
    repetition_penalty: 1.05 # [!code ++]
    include_stop_str_in_output: true # [!code ++]
    stop: # [!code ++]
    - <|im_end|> # [!code ++]
    - <|end_search_query|> # [!code ++]
  sampling_params:
    max_tokens: 2048 # [!code --]
    max_tokens: 32768 # [!code ++]
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  searcho1_reasoning_template: prompt/search_o1_reasoning.jinja
  searcho1_refine_template: prompt/search_o1_refinement.jinja
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  query_instruction: ''
  top_k: 5
```

### Run Pipeline File

```shell
ultrarag run examples/search_o1.yaml
```

================
File: pages/en/pipeline/visrag.mdx
================
---
title: "VisRAG"
icon: "eyes"
---

## Introduction
In daily use, we often encounter scenarios like this: buying a new device but not knowing how to set up certain functions, and flipping through the manual is both time-consuming and inefficient. At this time, if there is an intelligent assistant that can directly answer questions, it will greatly improve the experience.


For example, a user purchased a Nikon Z7 camera and wants to know "in which scenarios the electronic vibration reduction function is unavailable".
Asking LLM directly yields the following answer:

```
The electronic vibration reduction function of Nikon Z7 is mainly used to provide help in scenarios such as still life and portrait shooting that require stable images. However, in certain specific scenarios, the electronic vibration reduction function is unavailable. According to

According to Nikon official documents and common usage scenarios, the electronic vibration reduction function is unavailable in the following cases:

1. Shooting moving objects: Since the electronic vibration reduction function adjusts the shutter speed according to the camera's movement, when shooting moving objects, the electronic vibration reduction function will become unstable and cannot work properly.

2. Using long exposure: When shooting with long exposure, camera shake becomes obvious, and the electronic vibration reduction function cannot effectively compensate for this shake.

3. Using long exposure in low light environments: When shooting long exposure in low light environments, camera shake becomes more obvious, and the electronic vibration reduction function cannot effectively compensate for this shake.

Therefore, the electronic vibration reduction function of Nikon Z7 is unavailable in scenarios such as shooting moving objects, using long exposure, and using long exposure in low light environments.
```
It can be seen that although the model can generate semantically smooth answers, due to the lack of real document support, it often produces "plausible but incorrect" results.

To solve this problem, UltraRAG implemented a multi-modal retrieval-augmented Pipeline based on [VisRAG](https://arxiv.org/abs/2410.10594).
It no longer relies on cumbersome text parsing, but directly inputs "screenshots of relevant documents" into the vision language model, thereby achieving real document Q&A based on visual semantics.

## Build Personal Knowledge Base

Take "Nikon User Manual" as an example. You can [click here to download](https://download.nikonimglib.com/archive4/ywJ4K00fa2Lr05vv5OS00pV5Hg36/Z7Z6UM_TH(Sc)07.pdf) the PDF file.

We use UltraRAG's Corpus Server to convert this PDF directly into an image corpus:

```yaml examples/build_image_corpus.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  corpus: servers/corpus

# MCP Client Pipeline
pipeline:
- corpus.build_image_corpus
```

Execute the following command:

```shell
ultrarag build examples/build_image_corpus.yaml
```

Modify parameters as follows:

```yaml examples/parameters/build_image_corpus_parameter.yaml icon="/images/yaml.svg"
corpus:
  image_corpus_save_path: corpora/image.jsonl
  parse_file_path: data/UltraRAG.pdf # [!code --]
  parse_file_path: data/nikon.pdf # [!code ++]
```

Run Pipeline:
```shell
ultrarag run examples/build_image_corpus.yaml
```

After execution, the image corpus file will be automatically generated:
```json corpora/image.jsonl icon="/images/json.svg"
{"id": 0, "image_id": "nikon/page_0.jpg", "image_path": "image/nikon/page_0.jpg"}
{"id": 1, "image_id": "nikon/page_1.jpg", "image_path": "image/nikon/page_1.jpg"}
{"id": 2, "image_id": "nikon/page_2.jpg", "image_path": "image/nikon/page_2.jpg"}
{"id": 3, "image_id": "nikon/page_3.jpg", "image_path": "image/nikon/page_3.jpg"}
...
```

Next, use the Retriever Server to perform vector encoding and indexing on the image corpus:

```yaml examples/corpus_index.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
```

Execute the following command:

```shell
ultrarag build examples/corpus_index.yaml
```

Modify parameters:

```yaml examples/parameters/corpus_index_parameter.yaml icon="/images/yaml.svg"
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document # [!code --]
        psg_task: null # [!code --]
        q_prompt_name: query # [!code --]
        q_task: null # [!code --]
        psg_prompt_name: null # [!code ++]
        psg_task: retrieval # [!code ++]
        q_prompt_name: query # [!code ++]
        q_task: retrieval # [!code ++]
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl # [!code --]
  corpus_path: corpora/image.jsonl # [!code ++]
  embedding_path: embedding/embedding.npy
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false # [!code --]
  is_multimodal: true # [!code ++]
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: jinaai/jina-embeddings-v4 # [!code ++]
  overwrite: false
```

Run index construction:
```shell
ultrarag run examples/corpus_index.yaml
```

## VisRAG

Prepare user query file:

```json data/test.jsonl icon="/images/json.svg"
{"id": 0, "question": "In which scenarios is the electronic vibration reduction function of Nikon Z7 unavailable?", "golden_answers": [], "meta_data": {}}
```

Define VisRAG Pipeline:
```yaml examples/visrag.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search
- generation.generation_init
- prompt.qa_boxed
- generation.multimodal_generate:
    input:
      multimodal_path: ret_psg
```

Execute the following command:

```shell
ultrarag build examples/visrag.yaml
```

Modify parameters:

```yaml examples/parameters/visrag_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq # [!code --]
    path: data/sample_nq_10.jsonl # [!code --]
    name: test # [!code ++]
    path: data/test.jsonl # [!code ++]
    seed: 42
    shuffle: false
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B # [!code --]
      model_name_or_path: openbmb/MiniCPM-V-4 # [!code ++]
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  image_tag: null
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja # [!code --]
  template: prompt/visrag.jinja # [!code ++]
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document # [!code --]
        psg_task: null # [!code --]
        q_prompt_name: query # [!code --]
        q_task: null # [!code --]
        psg_prompt_name: null # [!code ++]
        psg_task: retrieval # [!code ++]
        q_prompt_name: query # [!code ++]
        q_task: retrieval # [!code ++]
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl # [!code --]
  corpus_path: corpora/image.jsonl # [!code ++]
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false # [!code --]
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  is_multimodal: true # [!code ++]
  model_name_or_path: jinaai/jina-embeddings-v4 # [!code ++]
  query_instruction: ''
  top_k: 5


```

Run this Pipeline:
```shell
ultrarag run examples/visrag.yaml
```

Execute the following command to start Case Study Viewer:

```shell
python ./script/case_study.py \
  --data output/memory_test_visrag_20251015_163425.json \
  --host 127.0.0.1 \
  --port 8070 \
  --title "Case Study Viewer"
```

The system will automatically display the screenshots of the retrieved manual pages:
![](/images/pipeline/visrag_result.png)

The answer generated by the model will be based on the real image content. Example follows:

```
The electronic vibration reduction function of Nikon Z7 is unavailable in the following scenarios:

1. When the frame size is 1920×1080.

2. At 120p, 1920×1080, 100p, or 1920×1080 (slow motion).

These information can be found from the text part in the image, specifically in the paragraph describing the electronic vibration reduction function of Nikon Z7.
```

Through visual semantic enhancement, the system can answer user questions more accurately, especially suitable for multi-modal scenarios such as manuals, textbooks, reports, etc.

================
File: pages/en/rag_client/branch.mdx
================
---
title: "Branch Structure"
icon: "code-branch"
---

In complex reasoning tasks, it is often necessary to decide whether the subsequent process should continue based on the model's intermediate output or current state. For example, judging the content generated by the model:

- If the model generates a new query, enter the next round of retrieval;
- If the model has output the final answer, terminate the process.

To achieve the above capabilities, UltraRAG provides a Branch Structure for building controllable reasoning processes with conditional jump logic.

<Check>The Router Server is the partner of the branch structure. It is responsible for judging the current state and returning a state label to drive the process direction.</Check>

<Note>If you do not yet know how to implement a Router Tool, please refer to [Router Server](/pages/en/rag_servers/router).</Note>

## Usage Example


```yaml examples/rag_branch.yaml icon="/images/yaml.svg" highlight="22-37"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- loop:
    times: 10
    steps:
    - prompt.check_passages
    - generation.generate
    - branch:
        router:
        - router.check_model_state
        branches:
          continue:
          - prompt.gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: temp_psg
          - custom.merge_passages
          stop: []
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

In the branch structure, the Pipeline uses the following keywords to define the control logic of the dynamic process:

- `branch`: Declares the starting point of a branch structure;
- `router`: Specifies the Router Tool used for judging branch logic, which needs to return a state label (such as incomplete / complete);
- `branches`: Defines the sequence of execution steps corresponding to each state. The key is the state label, which must be consistent with the state value returned by the Router Tool; the value is the list of steps to be executed under that state (can be empty, indicating process termination).

================
File: pages/en/rag_client/data_and_params.mdx
================
---
title: "Data Flow"
icon: "arrows-split-up-and-left"
---

In UltraRAG, the Pipeline achieves data binding through variable names: each tool declares its input parameters and output variables during registration, and the Pipeline relies on these variable names to pass and share data between steps during execution.

This mechanism is simple and intuitive, facilitating the construction of sequential data flows. However, in multi-turn calls or complex control structures, variable name conflicts or data overwriting issues may occur. For this reason, UltraRAG provides a parameter renaming mechanism, allowing developers to flexibly rename variables in the Pipeline without modifying the source code.

## How Does Data Flow?

Each tool declares its input and output variable names during registration, thereby determining the entry and exit of the data flow. For example:

<CodeGroup>

```python Class Like Example icon="python" highlight={4}
def __init__(self, mcp_inst):
    mcp_inst.tool(
        self.retriever_search,
        output="q_ls,top_k->ret_psg",
    )

def retriever_search(self, q_ls, top_k) -> ...
    ...
    return {"ret_psg": ...}
```

```python Decorate Like Example icon="python" highlight={2}
@app.tool(
    output="q_ls,top_k->ret_psg"
)
def retriever_search(q_ls, top_k) -> ...
    ...
    return {"ret_psg": ...}
```

</CodeGroup>

Here, the definition indicates:

- The tool receives two input variables: `q_ls` and `top_k`
- The tool returns one output variable: `ret_psg`

If you call the same tool (such as retriever_search) multiple times and wish to pass in different data variables (e.g., q_ls for the first time, subq_ls for the second time),
you need a way to tell the Pipeline: these variables are actually **"synonyms"**.

## Parameter Renaming Mechanism

To solve variable name conflicts and binding ambiguity issues, UltraRAG provides a flexible Parameter Renaming Mechanism.
You can directly use `input:` and `output:` fields in `pipeline.yaml` to explicitly specify the mapping relationship between parameters and variables — without modifying the internal code of the Server, you can complete data binding redirection.

```yaml  icon="/images/yaml.svg"
- module.tool:
    input:
      function_parameter_name: variable_name_in_pipeline
    output:
      tool_output_key: variable_name_in_pipeline
```

This mechanism follows the principle of "explicit binding by name":
`input:` maps the function's input parameter names, and `output:` maps the output keys defined during tool registration.

<Note>The simplest way: keep the input and output parameter names consistent during function definition and tool registration to directly avoid distinguishing the above two binding rules.</Note>

### Example 1: Input Variable Renaming

Suppose the tool function is declared as follows:

```python icon="python"
async def retriever_search(
        self,
        query_list: List[str],
        top_k: Optional[int] | None = None,
        query_instruction: str = "",
        use_openai: bool = False,
    ) -> Dict[str, List[List[str]]]:
```

You can explicitly rename the input variable in the Pipeline:

```yaml  icon="/images/yaml.svg"
- retriever.retriever_search:
    input:
      query_list: sub_q_ls
```

Here, the tool originally expects to receive an input parameter named `query_list`, but we map it to the variable `sub_q_ls` in the Pipeline via `input:`, thereby achieving seamless binding.

<Tip>Input parameter mapping is performed based on the parameter names in the function declaration.</Tip>

### Example 2: Output Variable Renaming

Suppose the tool is defined as follows during registration:

```python icon="python"
mcp_inst.tool(
    self.retriever_search,
    output="q_ls,top_k,query_instruction,use_openai->ret_psg",
)
```

You can rewrite the output variable name in the Pipeline:

```yaml  icon="/images/yaml.svg"
- retriever.retriever_search:
    output:
      ret_psg: round1_result
```

At this time, regardless of the return variable name inside the function, as long as the output key is specified as `ret_psg` during registration,
the result will be mapped to `round1_result` for use in subsequent steps.

<Tip>Output variable mapping is performed based on the output key specified during tool registration.</Tip>

If a downstream module depends on this output result:

```python icon="python"
@app.prompt(output="q_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed(
    q_ls: List[str], ret_psg: List[str | Any], template: str | Path
) -> list[PromptMessage]:
```

Then you can explicitly complete input redirection in the Pipeline:

```yaml  icon="/images/yaml.svg"
- prompt.qa_rag_boxed:
    input:
      ret_psg: round1_result
```

In this way, the input `ret_psg` expected by `qa_rag_boxed` will be read from `round1_result` of the previous step, achieving data transfer.


### Example 3: Renaming Input and Output Simultaneously

```yaml icon="/images/yaml.svg"
- retriever.retriever_search:
    input:
      q_ls: round1_query
    output:
      ret_psg: round1_result
```

This way of writing is particularly common in loop structures — each round of retrieval can use new input and output variables to avoid naming conflicts.

<Tip>Reasonable use of parameter renaming allows your RAG process to remain clean and controllable in complex scenarios such as multi-turn iterations and dynamic branches without modifying the source code.</Tip>

================
File: pages/en/rag_client/loop.mdx
================
---
title: "Loop Structure"
icon: "rotate-right"
---

In tasks such as multi-turn reasoning, multi-hop Q&A, or multi-turn retrieval, a single execution process often fails to obtain an ideal final answer. In this case, a Loop Structure can be used to repeatedly execute specific modules, thereby achieving iterative information refinement and continuous result optimization.

## Usage Example

```yaml examples/rag_loop.yaml icon="/images/yaml.svg" highlight="16-28"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- loop:
    times: 3
    steps:
    - prompt.gen_subq
    - generation.generate:
        output:
          ans_ls: subq_ls
    - retriever.retriever_search:
        input:
          query_list: subq_ls
        output:
          ret_psg: temp_psg
    - custom.merge_passages
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

In the loop structure, the Pipeline uses the following keywords to define tool modules that need to be executed repeatedly:

- `loop`: Declares a loop block, indicating that the steps inside it will be executed repeatedly;
- `times`: Specifies the maximum number of iterations for the loop;
- `steps`: Defines the sequence of tool calls to be executed in each round of the loop.

<Note>If you wish to dynamically control loop termination conditions, you can use it in conjunction with the Branch Structure (branch) and Router Server. See [Branch Structure](/pages/en/rag_client/branch) for details.</Note>

================
File: pages/en/rag_client/multi_agents.mdx
================
---
title: "Module Reuse"
icon: "book-copy"
---

In many practical scenarios, you may want to use multiple different Retriever or Generation modules in the same Pipeline to perform different logical tasks, such as hybrid retrieval or multi-agent systems.
In fact, this only requires setting different parameters for the same module.

For this reason, UltraRAG provides a simple and flexible mechanism —
By configuring different aliases for the same Server module, module reuse and independent calling can be achieved.

## Usage Example

### Step 1: Configure Alias Server

In `pipeline.yaml`, you can define multiple aliases for the same path under the `servers` field:

```yaml examples/hybrid_search.yaml icon="/images/yaml.svg" highlight="4,5"
# MCP Server
servers:
  benchmark: servers/benchmark
  dense: servers/retriever
  bm25: servers/retriever
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- dense.retriever_init
- bm25.retriever_init
- dense.retriever_search:
    output:
      ret_psg: dense_psg
- bm25.bm25_search:
    output:
      ret_psg: sparse_psg
- custom.merge_passages:
    input:
      ret_psg: dense_psg
      temp_psg: sparse_psg
```

In this example, both dense and bm25 point to the same module path servers/retriever, but will be built and called as two independent Server instances.

### Step 2: Call Separately in Pipeline

In the Pipeline definition section, you can use their aliases just like calling different modules:

```yaml examples/hybrid_search.yaml icon="/images/yaml.svg" highlight="11-18"
# MCP Server
servers:
  benchmark: servers/benchmark
  dense: servers/retriever
  bm25: servers/retriever
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- dense.retriever_init
- bm25.retriever_init
- dense.retriever_search:
    output:
      ret_psg: dense_psg
- bm25.bm25_search:
    output:
      ret_psg: sparse_psg
- custom.merge_passages:
    input:
      ret_psg: dense_psg
      temp_psg: sparse_psg
```

In this way, UltraRAG automatically distinguishes these two instances at runtime:
Each alias corresponds to independent parameter files, runtime context, and cache space, thereby achieving multi-module parallel and non-interfering calls.

================
File: pages/en/rag_client/pipeline.mdx
================
---
title: "Pipeline Control"
icon: "diagram-project"
---

## Pipeline Introduction

In UltraRAG, a Pipeline is a process script used to define "how the inference task is executed". It is like a "task schedule" clarifying the operations the system needs to perform at each step.

You can flexibly combine functions (Tools) in different modules (Servers) through the Pipeline to build a complete, reproducible, and controllable RAG inference process. For example:

- Load data → Retrieve documents → Construct prompt → Call large model → Evaluate results;
- Or in multi-turn generation, decide whether to re-retrieve or stop generation early based on the model's intermediate performance.

<Note>With a single YAML file, you can define and run a complete RAG inference process.</Note>

## Writing Specifications

In UltraRAG, Pipelines are written in the form of YAML files to define the complete task execution process. A Pipeline file usually consists of two top-level structures:

- `servers`: Declares all MCP Server modules used in the current process. Each Server corresponds to a functional module (such as retrieval, generation, evaluation, etc.), where the key is the module name and the value is its path in the project.
- `pipeline`: Defines the execution logic of the task. Each item represents an execution step or process control node, supporting control structures such as serial, loop, and branch judgment.

```yaml examples/rag_full.yaml icon="/images/yaml.svg"
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

================
File: pages/en/rag_client/serial.mdx
================
---
title: "Serial Structure"
icon: "arrow-right-arrow-left"
---

The Serial Structure is the most basic and commonly used execution mode in the Pipeline. Multiple steps are executed sequentially. The output of the previous step (if any) can be used as the input for the next step, or it can be executed independently. Building a standard RAG workflow can usually be completed relying solely on the serial structure, clearly connecting everything from data loading to result evaluation.

## Usage Example

```yaml examples/rag.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

In the serial structure, each line of the Pipeline represents a call to a Tool. Its basic syntax is:

```yaml
- server_name.tool_name
```

- `server_name`: The module name called, which must be declared in advance in the servers section;
- `tool_name`: The function name registered via `@tool(...)` or `@prompt(...)` decorator in that module.

This structure is suitable for most single-turn Q&A or reasoning tasks and is also the basis for understanding more complex processes (such as loops, branches).

================
File: pages/en/rag_servers/benchmark.mdx
================
---
title: "Benchmark"
icon: "chart-line"
---

## Function

The Benchmark Server is used to load evaluation datasets, commonly used in the data configuration phase of benchmark testing, Q&A tasks, or generation tasks.

<Info>We strongly recommend preprocessing data into `.jsonl` format.</Info>

Example data:

```json data/sample_nq_10.jsonl icon="/images/json.svg"
{"id": 0, "question": "when was the last time anyone was on the moon", "golden_answers": ["14 December 1972 UTC", "December 1972"], "meta_data": {}}
{"id": 1, "question": "who wrote he ain't heavy he's my brother lyrics", "golden_answers": ["Bobby Scott", "Bob Russell"], "meta_data": {}}
{"id": 2, "question": "how many seasons of the bastard executioner are there", "golden_answers": ["one", "one season"], "meta_data": {}}
{"id": 3, "question": "when did the eagles win last super bowl", "golden_answers": ["2017"], "meta_data": {}}
{"id": 4, "question": "who won last year's ncaa women's basketball", "golden_answers": ["South Carolina"], "meta_data": {}}
```

## Usage Examples

### Basic Usage

```yaml examples/load_data.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark

# MCP Client Pipeline
pipeline:
- benchmark.get_data
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/load_data.yaml
```

Modify corresponding fields according to the actual situation:

```yaml examples/parameters/load_data_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
```
Run the following command to execute the Pipeline:

```shell
ultrarag run examples/load_data.yaml
```

After completion, the system will automatically load and output data samples, providing input support for subsequent retrieval and generation tasks.

### Add Dataset Loading Fields

In some cases, we may not only need to load `query` and `ground_truth` fields, but also wish to use other information in the dataset, such as retrieved `passage`.
In this case, you can modify the code of the Benchmark Server to add fields that need to be returned.

<Note>You can extend other fields (such as cot, retrieved_passages, etc.) in the same way, just add the corresponding key names synchronously in the decorator output and key_map.</Note>
<Check>If you have generated results (such as the pred field), you can use it together with [Evaluation Server](/pages/en/rag_servers/evaluation) to achieve rapid evaluation.</Check>

The following example demonstrates how to add the `id_ls` field in the `get_data` function:
```python servers/prompt/src/benchmark.py icon="python"
@app.tool(output="benchmark->q_ls,gt_ls") # [!code --]
@app.tool(output="benchmark->q_ls,gt_ls,id_ls") # [!code ++]
def get_data(
    benchmark: Dict[str, Any],
) -> Dict[str, List[Any]]:
```

Then, run the following command to recompile the Pipeline:

```shell
ultrarag build examples/load_data.yaml
```

In the generated parameter file, add the field `id_ls` and specify its corresponding key name in the original data:

```yaml examples/parameters/load_data_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
      id_ls: id  # [!code ++]
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
```

After completing the modification, rerun the Pipeline to load data samples containing id.

================
File: pages/en/rag_servers/corpus.mdx
================
---
title: "Corpus"
icon: "file"
---

## Function

The Corpus Server is the core component in UltraRAG for processing raw corpus documents. It supports parsing, extracting, and standardizing text or image content from various data sources, and provides multiple chunking strategies to convert raw documents into formats that can be directly used for subsequent retrieval and generation.

The main functions of the Corpus Server include:

- **Document Parsing**: Supports content extraction from multiple file types (such as .pdf, .txt, .md, .docx, etc.).
- **Corpus Construction**: Saves parsed content as a standardized .jsonl structure, where each line corresponds to an independent document.
- **Image Conversion**: Supports converting PDF pages into image corpora, preserving layout and visual structure information.
- **Text Chunking**: Provides multiple splitting strategies such as Token, Sentence, Recursive, etc.

Example data:

Text Modality:

```json data/corpus_example.jsonl icon="/images/json.svg"
{"id": "2066692", "contents": "Truman Sports Complex The Harry S. Truman Sports...."}
{"id": "15106858", "contents": "Arrowhead Stadium 1970s...."}
```

Image Modality:
```json icon="/images/json.svg"
{"id": 0, "image_id": "UltraRAG/page_0.jpg", "image_path": "image/UltraRAG/page_0.jpg"}
{"id": 1, "image_id": "UltraRAG/page_1.jpg", "image_path": "image/UltraRAG/page_1.jpg"}
{"id": 2, "image_id": "UltraRAG/page_2.jpg", "image_path": "image/UltraRAG/page_2.jpg"}
```

## Document Parsing Examples

### Text Parsing

The Corpus Server supports multiple text parsing formats, including `.pdf, .txt, .md, .docx, .xps, .oxps, .epub, .mobi, .fb2`, etc.

```yaml examples/build_text_corpus.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  corpus: servers/corpus

# MCP Client Pipeline
pipeline:
- corpus.build_text_corpus
```

Compile Pipeline:

```shell
ultrarag build examples/build_text_corpus.yaml
```

Modify corresponding fields according to the actual situation:

```yaml examples/parameters/build_text_corpus_parameter.yaml icon="/images/yaml.svg"
corpus:
  parse_file_path: data/UltraRAG.pdf
  text_corpus_save_path: corpora/text.jsonl
```

Where `parse_file_path` can be a single file or a folder path — when specified as a folder, the system will automatically traverse and batch read all parsable files within it.

Run Pipeline:

```shell
ultrarag run examples/build_text_corpus.yaml
```

After successful execution, the system will automatically parse the text and output a standardized corpus file, for example:
```json icon="/images/json.svg"
{"id": "UltraRAG", "title": "UltraRAG", "contents": "xxxxx"}
```

### PDF to Image

In multi-modal RAG scenarios, [one approach](https://arxiv.org/abs/2410.10594) is to directly convert document pages into images and perform retrieval and generation in the form of complete images.
The advantage of this method is that it can preserve the document's layout, format, and visual structure, making retrieval and understanding closer to real reading scenarios.

```yaml examples/build_image_corpus.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  corpus: servers/corpus

# MCP Client Pipeline
pipeline:
- corpus.build_image_corpus
```

Compile Pipeline:

```shell
ultrarag build examples/build_image_corpus.yaml
```

Modify corresponding fields according to the actual situation:

```yaml examples/parameters/build_image_corpus_parameter.yaml icon="/images/yaml.svg"
corpus:
  image_corpus_save_path: corpora/image.jsonl
  parse_file_path: data/UltraRAG.pdf
```
Similarly, the `parse_file_path` parameter can be specified as either a single file or a folder path. When set to a folder, the system will automatically traverse and process all files within it.

Run Pipeline:

```shell
ultrarag run examples/build_image_corpus.yaml
```

After successful execution, the system will save the generated image corpus file. Each record contains the image identifier and relative path. The generated .jsonl file can be directly used as input for multi-modal retrieval or generation tasks. Output example:

```json icon="/images/json.svg"
{"id": 0, "image_id": "UltraRAG/page_0.jpg", "image_path": "image/UltraRAG/page_0.jpg"}
{"id": 1, "image_id": "UltraRAG/page_1.jpg", "image_path": "image/UltraRAG/page_1.jpg"}
{"id": 2, "image_id": "UltraRAG/page_2.jpg", "image_path": "image/UltraRAG/page_2.jpg"}
```

### MinerU Parsing

[MinerU](https://github.com/opendatalab/MinerU) is an industry-acclaimed PDF parsing framework that supports high-precision text and layout structure extraction.
UltraRAG seamlessly integrates MinerU as a built-in tool, which can be called directly in the Pipeline to achieve one-stop PDF → Text + Image corpus construction.

```yaml examples/build_mineru_corpus.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  corpus: servers/corpus

# MCP Client Pipeline
pipeline:
- corpus.mineru_parse
- corpus.build_mineru_corpus
```

Compile Pipeline:

```shell
ultrarag build examples/build_mineru_corpus.yaml
```

Modify corresponding fields according to the actual situation:

```yaml examples/parameters/build_mineru_corpus_parameter.yaml icon="/images/yaml.svg"
corpus:
  image_corpus_save_path: corpora/image.jsonl    # Image corpus save path
  mineru_dir: corpora/                           # MinerU parsing result save directory
  mineru_extra_params:
    source: modelscope                           # Model download source (default is Hugging Face, optional modelscope)
  parse_file_path: data/UltraRAG.pdf             # File or folder path to parse
  text_corpus_save_path: corpora/text.jsonl      # Text corpus save path
```

Similarly, the `parse_file_path` parameter can be either a single file or a folder path.

Run Pipeline (downloading MinerU model is required for the first execution, which may be slow):

```shell
ultrarag run examples/build_mineru_corpus.yaml
```

After successful execution, the system will automatically output the corresponding Text Corpus and Image Corpus files, the formats of which are consistent with `build_text_corpus` and `build_image_corpus`, and can be directly used for multi-modal retrieval and generation tasks.

## Document Chunking Examples

UltraRAG integrates the [chonkie](https://docs.chonkie.ai/common/welcome) document chunking library and has built-in three mainstream chunking strategies: `Token Chunker`, `Sentence Chunker`, and `Recursive Chunker`, flexibly coping with different types of text structures.

- `Token Chunker`: Chunks by tokenizer, word, or character, suitable for general text.
- `Sentence Chunker`: Splits by sentence boundaries, ensuring semantic integrity.
- `Recursive Chunker`: Suitable for well-structured long documents (such as books, papers), capable of automatically dividing content by hierarchy.

```yaml examples/corpus_chunk.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  corpus: servers/corpus

# MCP Client Pipeline
pipeline:
- corpus.chunk_documents
```

Compile Pipeline:

```shell
ultrarag build examples/corpus_chunk.yaml
```

Modify corresponding fields according to the actual situation:

```yaml examples/parameters/corpus_chunk_parameter.yaml icon="/images/yaml.svg"
corpus:
  chunk_backend: token    # Chunking strategy, optional token / sentence / recursive
  chunk_backend_configs:
    recursive:
      min_characters_per_chunk: 12  # Minimum length per chunk to prevent being too short
    sentence:
      chunk_overlap: 50              # Overlapping characters of adjacent chunks
      delim: '[''.'', ''!'', ''?'', ''\n'']'  # Sentence delimiter
      min_sentences_per_chunk: 1  # Minimum sentences per chunk
    token:
      chunk_overlap: 50             # Overlapping tokens of adjacent chunks
  chunk_path: corpora/chunks.jsonl      # Output path for chunked corpus
  chunk_size: 256                      # Maximum tokens per chunk
  raw_chunk_path: corpora/text.jsonl    # Raw text corpus path
  tokenizer_or_token_counter: character # Tokenizer used
  use_title: false                     # Whether to append title to the beginning of each chunk
```

Run Pipeline:

```shell
ultrarag run examples/corpus_chunk.yaml
```

After execution, the system will output standardized chunked corpus files, which can be directly used for subsequent retrieval and generation modules.
Output example:
```json icon="/images/json.svg"
{"id": 0, "doc_id": "UltraRAG", "title": "UltraRAG", "contents": "xxxxx"}
{"id": 1, "doc_id": "UltraRAG", "title": "UltraRAG", "contents": "xxxxx"}
{"id": 2, "doc_id": "UltraRAG", "title": "UltraRAG", "contents": "xxxxx"}
```

<Note>You can call parsing tools and chunking tools in the same Pipeline to build your own personalized knowledge base.</Note>

================
File: pages/en/rag_servers/custom.mdx
================
---
title: "Custom"
icon: "puzzle-piece"
---

## Function

The Custom Server is used to store custom tool functions that cannot be classified into standard modules (such as Retriever, Generation, Evaluation, etc.).
It provides developers with a flexible extension space for implementing various logical components that cooperate with core RAG modules, such as:
- Data cleaning and preprocessing
- Keyword extraction or feature construction
- Specific task logic (such as answer extraction, formatting, filtering, etc.)

<Note>The Custom Server is your free toolbox — any functional logic that does not belong to core Servers can be defined and reused here.</Note>

## Implementation Example

The following takes a common example `output_extract_from_boxed` to show how to customize and register a Tool.

```python servers/custom/src/custom.py icon="python"
@app.tool(output="ans_ls->pred_ls")
def output_extract_from_boxed(ans_ls: List[str]) -> Dict[str, List[str]]:
    def extract(ans: str) -> str:
        start = ans.rfind(r"\boxed{")
        if start == -1:
            content = ans.strip()
        else:
            i = start + len(r"\boxed{")
            brace_level = 1
            end = i
            while end < len(ans) and brace_level > 0:
                if ans[end] == "{":
                    brace_level += 1
                elif ans[end] == "}":
                    brace_level -= 1
                end += 1
            content = ans[i : end - 1].strip()
            content = re.sub(r"^\$+|\$+$", "", content).strip()
            content = re.sub(r"^\\\(|\\\)$", "", content).strip()
            if content.startswith(r"\text{") and content.endswith("}"):
                content = content[len(r"\text{") : -1].strip()
            content = content.strip("()").strip()

        content = content.replace("\\", " ")
        content = content.replace("  ", " ")
        return content

    return {"pred_ls": [extract(ans) for ans in ans_ls]}
```

The function of this tool is to extract the final answer text in `\boxed{...}` format from the model output string,
and the output result will be mapped to the variable `pred_ls` for use by downstream evaluation or post-processing modules.

## Usage Example

After defining the custom tool, you only need to register the custom module in the Pipeline and call the corresponding Tool:

```yaml examples/rag_full.yaml icon="/images/yaml.svg" highlight="8,20"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```
In this example, `custom.output_extract_from_boxed` is used to extract the standardized answer from the model output,
and then it is passed to `evaluation.evaluate` for evaluation.

================
File: pages/en/rag_servers/evaluation.mdx
================
---
title: "Evaluation"
icon: "clipboard-check"
---

## Function

The Evaluation Server provides a set of comprehensive automated evaluation tools for systematic and reproducible performance evaluation of model outputs in retrieval and generation tasks.
It supports various mainstream metrics, including ranking-based, matching-based, and summarization-based evaluations, and can be directly embedded at the end of the Pipeline to achieve automatic calculation and saving of evaluation results.

### Retrieval

| Metric Name | Type | Description |
|:---|:---|:---|
| `MRR` | float | Mean Reciprocal Rank, measuring the average rank position of the first relevant document. |
| `MAP` | float | Mean Average Precision, comprehensively considering retrieval precision and recall. |
| `Recall` | float | Recall rate, measuring how many relevant documents the retrieval system can find. |
| `Precision` | float | Precision rate, measuring how many of the retrieval results are relevant documents. |
| `NDCG` | float | Normalized Discounted Cumulative Gain, evaluating the consistency between retrieval results and ideal ranking. |


### Generation
| Metric Name | Type | Description |
|:---|---|:---|
| `EM` | float | Exact Match, prediction is exactly the same as any reference. |
| `Acc` | float | Answer contains any form of the reference answer (loose matching). |
| `StringEM` | float | Soft match ratio for multiple sets of answers (commonly used for multiple choice/nested QA). |
| `CoverEM` | float | Whether the reference answer is completely covered by the predicted text. |
| `F1` | float | Token-level F1 score. |
| `Rouge_1` | float | 1-gram ROUGE-F1. |
| `Rouge_2` | float | 2-gram ROUGE-F1. |
| `Rouge_L` | float | Longest Common Subsequence (LCS) based ROUGE. |

## Usage Examples

### Retrieval

#### TREC File Evaluation

In information retrieval, TREC format files are standardized evaluation interfaces used to measure model performance in ranking, recall, etc.
TREC evaluation usually consists of two types of files: qrel (human-annotated true relevance) and run (system retrieval output results).

**I. qrel file ("ground truth", human-annotated relevance)**

The qrel file is used to store human-annotated true relevance judgments of "which documents are relevant to which query".
During evaluation, the system output retrieval results will be compared with the qrel file to calculate metrics (such as MAP, NDCG, Recall, Precision, etc.).

Format (4 columns, space-separated):
```
<query_id>  <iter>  <doc_id>  <relevance>
```
- `query_id`: Query ID
- `iter`: Usually write `0` (legacy field, can be ignored)
- `doc_id`: Document ID
- `relevance`: Relevance annotation (usually 0 means irrelevant, 1 or higher means relevant)

Example:
```
1 0 DOC123 1
1 0 DOC456 0
2 0 DOC321 1
2 0 DOC654 1
```

**II. run file (system output retrieval results)**

The run file saves the output results of the retrieval system and is used to compare with the qrel file to evaluate performance.
Each line represents a document returned by a query and its score information.

Format (6 columns, space-separated):
```
<query_id>  Q0  <doc_id>  <rank>  <score>  <run_name>
```
- `query_id`: Query ID
- `Q0`: Fixed write `Q0` (TREC standard requirement)
- `doc_id`: Document ID
- `rank`: Ranking position (1 means most relevant)
- `score`: System score
- `run_name`: System name (e.g., bm25, dense_retriever)

Example:
```
1 Q0 DOC123 1 12.34 bm25
1 Q0 DOC456 2 11.21 bm25
2 Q0 DOC654 1 13.89 bm25
2 Q0 DOC321 2 12.01 bm25
```

<Note>You can click the following links to download example files: [qrels.test](https://github.com/usnistgov/trec_eval/blob/main/test/qrels.test) and [results.test](https://github.com/usnistgov/trec_eval/blob/main/test/results.test)</Note>

```yaml examples/eval_trec.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  evaluation: servers/evaluation

# MCP Client Pipeline
pipeline:
- evaluation.evaluate_trec
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/eval_trec.yaml
```

```yaml examples/parameters/eval_trec_parameter.yaml icon="/images/yaml.svg"
evaluation:
  ir_metrics:
  - mrr
  - map
  - recall
  - ndcg
  - precision
  ks:
  - 1
  - 5
  - 10
  - 20
  - 50
  - 100
  qrels_path: data/qrels.txt # [!code --]
  run_path: data/run_a.txt # [!code --]
  qrels_path: data/qrels.test # [!code ++]
  run_path: data/results.test # [!code ++]
  save_path: output/evaluate_results.json

```

Run the following command to execute this Pipeline:

```shell
ultrarag run examples/eval_trec.yaml
```

#### Significance Analysis

Significance Testing is used to judge whether the performance difference between two retrieval systems is "real" rather than caused by random fluctuations.
The core question it answers is: Is the improvement of system A statistically significant?

In retrieval tasks, system performance is usually measured by average metrics of multiple queries (such as MAP, NDCG, Recall, etc.).
However, the improvement of the average value is not necessarily reliable because there is randomness between different queries.
Significance analysis evaluates whether system improvement is stable and reproducible through statistical test methods.

Common significance analysis methods include:

- **Permutation Test**: By randomly exchanging the query results of system A and system B multiple times (e.g., 10000 times), construct a random distribution of differences. If the actual difference exceeds 95% of random cases (p < 0.05), the improvement is considered significant.
- **Paired t-test**: Assuming that the query scores of the two systems follow a normal distribution, calculate the significance of the difference between their means.

UltraRAG has a built-in Two-sided Permutation Test, outputting the following key statistical information during automatic evaluation:

- **A_mean / B_mean**: Average metrics of the new and old systems;
- **Diff(A-B)**: Improvement magnitude;
- **p_value**: Probability of significance test;
- **significant**: Significance judgment (True when p < 0.05).

```yaml examples/eval_trec_pvalue.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  evaluation: servers/evaluation

# MCP Client Pipeline
pipeline:
- evaluation.evaluate_trec_pvalue
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/eval_trec_pvalue.yaml
```

```yaml examples/parameters/eval_trec_pvalue_parameter.yaml icon="/images/yaml.svg"
evaluation:
  ir_metrics:
  - mrr
  - map
  - recall
  - ndcg
  - precision
  ks:
  - 1
  - 5
  - 10
  - 20
  - 50
  - 100
  n_resamples: 10000
  qrels_path: data/qrels.txt
  run_new_path: data/run_a.txt
  run_old_path: data/run_b.txt
  save_path: output/evaluate_results.json
```

Run the following command to execute this Pipeline:

```shell
ultrarag run examples/eval_trec_pvalue.yaml
```

### Generation
#### Basic Usage

```yaml examples/rag_full.yaml icon="/images/yaml.svg" highlight="5,19"
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

Simply add the `evaluation.evaluate` tool at the end of the Pipeline to automatically calculate all specified evaluation metrics after the task execution is completed, and output the results to the path set in the configuration file.


#### Evaluate Existing Results

If you already have the result file generated by the model and wish to evaluate it directly, you can organize the results into a standardized JSONL format. The file should at least contain fields representing answer labels and generation results, for example:

```json icon="/images/json.svg"
{"id": 0, "question": "when was the last time anyone was on the moon", "golden_answers": ["14 December 1972 UTC", "December 1972"], "pred_answer": "December 14, 1973"}
{"id": 1, "question": "who wrote he ain't heavy he's my brother lyrics", "golden_answers": ["Bobby Scott", "Bob Russell"], "pred_answer": "The documents do not provide information about the author of the lyrics to \"He Ain't Heavy, He's My Brother.\""}
```


```yaml examples/evaluate_results.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  evaluation: servers/evaluation

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- evaluation.evaluate
```

To allow the Benchmark Server to read the generation results, you need to add the `pred_ls` field in the `get_data` function:
```python servers/prompt/src/benchmark.py icon="python"
@app.tool(output="benchmark->q_ls,gt_ls") # [!code --]
@app.tool(output="benchmark->q_ls,gt_ls,pred_ls") # [!code ++]
def get_data(
    benchmark: Dict[str, Any],
) -> Dict[str, List[Any]]:
```

Then, run the following command to compile the Pipeline:

```shell
ultrarag build examples/evaluate_results.yaml
```

In the generated parameter file, add the field `pred_ls` and specify its corresponding key name in the original data, and modify the data path and name to point to the new evaluation file:

```yaml examples/parameters/evaluate_results_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
      pred_ls: pred_answer  # [!code ++]
    limit: -1
    name: nq  # [!code --]
    path: data/sample_nq_10.jsonl # [!code --]
    name: evaluate  # [!code ++]
    path: data/test_evaluate.jsonl # [!code ++]
    seed: 42
    shuffle: false
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
```

Run the following command to execute this Pipeline:

```shell
ultrarag run examples/evaluate_results.yaml
```

================
File: pages/en/rag_servers/generation.mdx
================
---
title: "Generation"
icon: "pen-nib"
---

## Function

The Generation Server is the core module in UltraRAG responsible for calling and deploying Large Language Models (LLMs).
It receives input prompts (Prompts) constructed by the Prompt Server and generates corresponding output results.
This module supports two modes: **Text Generation** and **Image-Text Multi-modal Generation**, flexibly adapting to different task scenarios (such as Q&A, reasoning, summarization, visual Q&A, etc.).

The Generation Server is natively compatible with the following mainstream backends: [vLLM](https://github.com/vllm-project/vllm), [HuggingFace](https://github.com/huggingface/transformers),
and [OpenAI](https://platform.openai.com/docs/quickstart).

## Usage Examples

### Text Generation

The following example shows how to use the Generation Server to execute a basic text generation task. The process calls the LLM to generate an answer after constructing the input prompt through the Prompt Server, and finally completes result extraction and evaluation.

```yaml examples/vanilla_llm.yaml icon="/images/yaml.svg" highlight="5,12,14"
# MCP Server
servers:
  benchmark: servers/benchmark
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- generation.generation_init
- prompt.qa_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/vanilla_llm.yaml
```

Modify parameters:
```yaml examples/parameters/vanilla_llm_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja
```

Run Pipeline:
```shell
ultrarag run examples/vanilla_llm.yaml
```

### Multi-modal Inference

In multi-modal scenarios, the Generation Server can not only process text inputs but also combine visual information such as images to complete more complex reasoning tasks. The following example shows how to implement this.

First, prepare an example dataset (including image paths):

```json data/test.jsonl icon="/images/json.svg"
{"id": 0, "question": "when was the last time anyone was on the moon", "golden_answers": ["14 December 1972 UTC", "December 1972"], "image":["image/page_0.jpg"],"meta_data": {}}
```

Before performing multi-modal generation, you need to add a new field `multimodal_path` in the `get_data` function of the Benchmark Server to specify the image input path.
<Note>Please refer to [Add Dataset Loading Fields](/pages/en/rag_servers/benchmark) for how to add new fields.</Note>

```yaml examples/vanilla_vlm.yaml icon="/images/yaml.svg" highlight="5,12,14"
# MCP Server
servers:
  benchmark: servers/benchmark
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- generation.generation_init
- prompt.qa_boxed
- generation.multimodal_generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/vanilla_vlm.yaml
```

Modify parameters:
```yaml examples/parameters/vanilla_vlm_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
      multimodal_path: image # [!code ++]
    limit: -1
    name: nq # [!code --]
    path: data/sample_nq_10.jsonl # [!code --]
    name: test # [!code ++]
    path: data/test.jsonl # [!code ++]
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B # [!code --]
      model_name_or_path: openbmb/MiniCPM-V-4 # [!code ++]
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  image_tag: null
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja
```

Run:
```shell
ultrarag run examples/vanilla_vlm.yaml
```

<Tip>Note: You can set `image_tag` such as `<IMG>` to specify the position where you wish the image input to be. If empty, it defaults to the leftmost input.</Tip>

### Deploy Model

UltraRAG is fully compatible with the OpenAI API interface specification, so any model that conforms to this interface standard can be directly accessed without additional adaptation or code modification.
The following example shows how to use [vLLM](https://docs.vllm.ai/en/latest/cli/serve.html#parallelconfig) to deploy a local model.

**Step 1: Background Model Deployment**

Taking Qwen3-32B as an example, it is recommended to use multi-card parallelism to ensure inference speed.

**Screen (Run directly on host)**

1. Create session:

```shell
screen -S llm
```

2. Start command:

```shell script/vllm_serve.sh
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen3-32b \
    --model Qwen/Qwen3-32B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65503 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 2 \
    --enforce-eager
```

Seeing output similar to the following indicates that the model service has started successfully:

```
(APIServer pid=2811812) INFO:     Started server process [2811812]
(APIServer pid=2811812) INFO:     Waiting for application startup.
(APIServer pid=2811812) INFO:     Application startup complete.
```

3. Exit session: Press `Ctrl + A + D` to exit and keep the service running in the background.
If you need to re-enter the session, execute:

```shell
screen -r llm
```

**Step 2: Modify Pipeline Parameters**

Modify parameters:
```yaml examples/parameters/vanilla_llm_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1 # [!code --]
      base_url: http://127.0.0.1:65501/v1 # [!code ++]
      concurrency: 8
      model_name: MiniCPM4-8B # [!code --]
      model_name: qwen3-8b # [!code ++]
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  template: prompt/qa_boxed.jinja
```

After completing the configuration, you can run normally.

================
File: pages/en/rag_servers/overview.mdx
================
---
title: "Overview"
icon: "books"
---

## Server Introduction

In a typical RAG system, the overall process usually consists of multiple functional modules, such as Retriever, Generator, etc. These modules undertake different tasks and work synergistically through process orchestration to complete complex question-answering and reasoning processes.

In UltraRAG, based on the MCP (Model Context Protocol) architecture, we have uniformly encapsulated these functional modules and proposed a more standardized implementation method — Server.

<Note>A Server is essentially a RAG module component with independent functions.</Note>

Each Server encapsulates a core task logic (such as retrieval, generation, evaluation, etc.) and provides standardized interfaces through function-level Tools. With this mechanism, Servers can be flexibly combined, called, and reused in a complete Pipeline, thereby realizing a modular and scalable system construction method.

## Server Development

To help you better understand how to use Server, this section will demonstrate the complete development process of building a custom Server from scratch through a simple example.

### Step 1: Create Server File

First, create a folder named `sayhello` under the `servers` folder, and create a source code directory `sayhello/src` in it. Then, create a file `sayhello.py` in the `src` directory as the main program entry of the Server.

In UltraRAG, all Servers are instantiated through the base class `UltraRAG_MCP_Server`. The example is as follows:

```python servers/sayhello/src/sayhello.py icon="python"
from ultrarag.server import UltraRAG_MCP_Server

app = UltraRAG_MCP_Server("sayhello")

if __name__ == "__main__":
    # Start the sayhello server using stdio transport
    app.run(transport="stdio")
```

### Step 2: Implement Tool Functions

Use the `@app.tool` decorator to register tool functions (Tool). These functions will be called during the Pipeline execution process to implement specific functional logic.

For example, the following example defines the simplest greeting function `greet`, which inputs a name and returns the corresponding greeting:

```python servers/sayhello/src/sayhello.py icon="python"
from typing import Dict
from ultrarag.server import UltraRAG_MCP_Server

app = UltraRAG_MCP_Server("sayhello")

@app.tool(output="name->msg")
def greet(name: str) -> Dict[str, str]:
    ret = f"Hello, {name}!"
    app.logger.info(ret)
    return {"msg": ret}

if __name__ == "__main__":
    # Start the sayhello server using stdio transport
    app.run(transport="stdio")

```

### Step 3: Configure Parameter File

Next, create a parameter configuration file `parameter.yaml` under the `sayhello` folder. This file is used to declare the input parameters required by the Tool and their default values, facilitating automatic loading and passing during Pipeline runtime.

The example is as follows:
```yaml servers/sayhello/parameter.yaml icon="/images/yaml.svg"
name: UltraRAG v3
```
Here, the parameter `name` is defined with a default value of "UltraRAG v3".

### Parameter Registration Mechanism

<Note>If there are parameter naming conflicts between different Prompt Tools, please refer to the "Multi-Prompt Tool Calling Scenario" section in [Prompt Server](/pages/en/rag_servers/prompt) for solutions.</Note>

UltraRAG automatically reads the `parameter.yaml` file in each Server directory during the build phase, and perceives and registers the parameters required by tool functions accordingly. When using, please note the following points:

- **Parameter Sharing Mechanism**: When multiple Tools need to share the same parameter (such as template, model_name_or_path, etc.), it can be declared only once in `parameter.yaml` and reused without repeated definition.
- **Field Overwrite Risk**: If the parameters required by multiple Tools have the same name but different meanings or default values, the field names should be explicitly distinguished using different names to avoid being overwritten in the automatically generated configuration file.
- **Context Automatic Inference Mechanism**: If some input parameters in the tool function do not appear in `parameter.yaml`, UltraRAG will default to attempting to infer from the runtime context (i.e., obtaining from the output of upstream Tools). Therefore, it is only necessary to explicitly define in `parameter.yaml` when parameters cannot be automatically passed through context.

### Encapsulating Shared Variables via Class

In some scenarios, we may want to maintain shared state or variables within the same Server, such as model instances, cache objects, configurations, etc. In this case, the Server can be encapsulated as a class, and the definition of shared variables and Tool registration can be completed during the initialization phase of the class.

The following example demonstrates how to encapsulate the sayhello Server as a class to achieve internal variable sharing:
```python servers/sayhello/src/sayhello.py icon="python" highlight="9"
from typing import Dict
from ultrarag.server import UltraRAG_MCP_Server

app = UltraRAG_MCP_Server("sayhello")

class Sayhello:
    def __init__(self, mcp_inst: UltraRAG_MCP_Server):
        mcp_inst.tool(self.greet, output="name->msg")
        self.sen = "Nice to meet you"

    def greet(self, name: str) -> Dict[str, str]:
        ret = f"Hello, {name}! {self.sen}!"
        app.logger.info(ret)
        return {"msg": ret}

if __name__ == "__main__":
    Sayhello(app)
    app.run(transport="stdio")
```
In this example, `self.sen` is used to simulate variables that need to be shared between different `Tools`. This method is particularly suitable for scenarios that require loading models and repeated configuration parameters.

================
File: pages/en/rag_servers/prompt.mdx
================
---
title: "Prompt"
icon: "terminal"
---

## Function

The Prompt Tool is the core component for constructing language model inputs (Prompts).
Each Prompt Tool is defined by the `@app.prompt` decorator, and its main responsibilities are:
Based on the input content (such as questions, retrieved passages, etc.), load the corresponding template file and generate a standardized PromptMessage,
so that it can be directly passed to the Large Language Model (LLM) for generation or reasoning.

## Implementation Example

### Step 1: Prepare Prompt Template

Please save your prompt template as a file ending with `.jinja`, for example:

```jinja prompt/qa_rag_boxed.jinja icon="/images/jinja.svg"
Please answer the following question based on the given documents.
Think step by step.
Provide your final answer in the format \boxed{YOUR_ANSWER}.

Documents:
{{documents}}

Question: {{question}}
```

### Step 2: Implement Tool in Prompt Server

Call the `load_prompt_template` method to load the template, and implement a tool function in the Prompt Server to assemble the prompt:

```python servers/prompt/src/prompt.py icon="python"
@app.prompt(output="q_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed(
    q_ls: List[str], ret_psg: List[str | Any], template: str | Path
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        ret.append(p)
    return ret
```

## Usage Example

Before calling the model generation tool, you need to construct the input prompt through the corresponding Prompt Tool.

```yaml examples/rag_full.yaml icon="/images/yaml.svg" highlight="3,16"
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

## Multi-Prompt Tool Calling Scenario

In some complex Pipelines, the model often needs to perform different tasks at different stages — for example, first generating sub-questions, and then generating the final answer based on new retrieval results.
In this case, multiple Prompt Tools need to be configured in the same Pipeline, each responsible for different prompt construction logic.

```yaml examples/rag_loop.yaml icon="/images/yaml.svg" highlight="19,29"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- loop:
    times: 3
    steps:
    - prompt.gen_subq
    - generation.generate:
        output:
          ans_ls: subq_ls
    - retriever.retriever_search:
        input:
          query_list: subq_ls
        output:
          ret_psg: temp_psg
    - custom.merge_passages
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

If you want to load different templates for different tasks, you need to specify independent template field names for each Prompt Tool during registration:

```python servers/prompt/src/prompt.py icon="python" highlight="1,13"
@app.prompt(output="q_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed(
    q_ls: List[str], ret_psg: List[str | Any], template: str | Path
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        ret.append(p)
    return ret

@app.prompt(output="q_ls,ret_psg,gen_subq_template->prompt_ls")
def gen_subq(
    q_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        all_prompts.append(p)
    return all_prompts
```

Subsequently, add the corresponding template field in `servers/prompt/parameter.yaml`:

<Note>Please ensure this modification is completed before executing the build command.</Note>

```yaml servers/prompt/parameter.yaml icon="/images/yaml.svg" 
# servers/prompt/parameter.yaml

# QA
template: prompt/qa_boxed.jinja

# RankCoT
kr_template: prompt/RankCoT_knowledge_refinement.jinja
qa_template: prompt/RankCoT_question_answering.jinja

# Search-R1
search_r1_gen_template: prompt/search_r1_append.jinja

# R1-Searcher
r1_searcher_gen_template: prompt/r1_searcher_append.jinja

# For other prompts, please add parameters here as needed

# Take webnote as an example:
webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
webnote_init_page_template: prompt/webnote_init_page.jinja
webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
webnote_fill_page_template: prompt/webnote_fill_page.jinja
webnote_gen_answer_template: prompt/webnote_gen_answer.jinja

gen_subq_template: prompt/gen_subq.jinja  # [!code ++]
```

Run the following command to compile the Pipeline:

```shell
ultrarag build rag_loop.yaml
```

The system will automatically register the new field in the generated parameter file:

```yaml examples/rag_loop_parameter.yaml icon="/images/yaml.svg" highlight="3"
...
prompt:
  gen_subq_template: prompt/gen_subq.jinja
  template: prompt/qa_boxed.jinja
retriever:
  backend: sentence_transformers
...
```

Then the Pipeline can be executed normally.

================
File: pages/en/rag_servers/reranker.mdx
================
---
title: "Reranker"
icon: "ranking-star"
---

## Function

The Reranker Server is a module in UltraRAG used to re-rank retrieval results.
It receives preliminary retrieval results from the Retriever Server and reorders candidate documents based on semantic relevance, thereby improving the precision of the retrieval stage and the quality of the final generation results.
This module natively supports various mainstream backends including [Sentence-Transformers](https://github.com/UKPLab/sentence-transformers), [Infinity](https://github.com/michaelfeil/infinity), and [OpenAI](https://platform.openai.com/docs/guides/embeddings).

## Usage Examples

```yaml examples/corpus_rerank.yaml icon="/images/yaml.svg" highlight="5,14,15"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  reranker: servers/reranker

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- reranker.reranker_init
- reranker.reranker_rerank
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/corpus_rerank.yaml
```

Modify parameters:
```yaml examples/parameters/corpus_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
reranker:
  backend: sentence_transformers
  backend_configs:
    infinity:
      bettertransformer: false
      device: cuda
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      device: cuda
      trust_remote_code: true
  batch_size: 16
  gpu_ids: 0
  model_name_or_path: openbmb/MiniCPM-Reranker-Light # [!code --]
  model_name_or_path: BAAI/bge-reranker-large # [!code ++]
  query_instruction: ''
  top_k: 5
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: 0,1 # [!code --]
  gpu_ids: 1 # [!code ++]
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  overwrite: false
  query_instruction: ''
  top_k: 5
```

Run Pipeline:
```shell
ultrarag run examples/corpus_rerank.yaml
```

================
File: pages/en/rag_servers/retriever.mdx
================
---
title: "Retriever"
icon: "magnifying-glass"
---

## Function

The Retriever Server is the core retrieval module in UltraRAG, integrating model loading, text encoding, index construction, and retrieval query functions.
It natively supports multiple backend interfaces such as [Sentence-Transformers](https://github.com/UKPLab/sentence-transformers), [Infinity](https://github.com/michaelfeil/infinity), and [OpenAI](https://platform.openai.com/docs/guides/embeddings), enabling flexible adaptation to corpora of different scales and types to meet the needs of large-scale vectorization and efficient document recall.

## Usage Examples

### Corpus Encoding and Indexing

The following example shows how to use the Retriever Server to perform encoding and index construction on a corpus.

```yaml examples/corpus_index.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/corpus_index.yaml
```

Modify the parameter file according to the actual situation. Two typical scenarios are shown below: Text Corpus Encoding and Image Corpus Encoding.

1. **Text Corpus Encoding**

Example: Using `Qwen3-Embedding-0.6B` to vectorize text corpus.
```yaml examples/parameters/corpus_index_parameter.yaml icon="/images/yaml.svg" highlight="2"
retriever:
  backend: sentence_transformers # We take st as an example here
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  overwrite: false
```

2. **Image Corpus Encoding**

Example: Using `jinaai/jina-embeddings-v4` to vectorize image corpus.
```yaml examples/parameters/corpus_index_parameter.yaml icon="/images/yaml.svg"
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document # [!code --]
        psg_task: null # [!code --]
        q_prompt_name: query # [!code --]
        q_task: null # [!code --]
        psg_prompt_name: null # [!code ++]
        psg_task: retrieval # [!code ++]
        q_prompt_name: query # [!code ++]
        q_task: retrieval # [!code ++]
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl # [!code --]
  corpus_path: corpora/image.jsonl # [!code ++]
  embedding_path: embedding/embedding.npy
  gpu_ids: 0,1 # [!code --]
  gpu_ids: 1 # [!code ++]
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false # [!code --]
  is_multimodal: true # [!code ++]
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: jinaai/jina-embeddings-v4 # [!code ++]
  overwrite: false
```

Run the following command to execute this Pipeline:

```shell
ultrarag run examples/corpus_index.yaml
```

The encoding and indexing phase usually involves large-scale corpus processing and takes a long time. It is recommended to use `screen` or `nohup` to mount the task to run in the background, for example:

```shell
nohup ultrarag run examples/corpus_index.yaml > log.txt 2>&1 &
```

### Vector Retrieval

The following example shows how to use the Retriever Server to perform vector retrieval tasks on the constructed index.

```yaml examples/corpus_search.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search

```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/corpus_search.yaml
```

Modify parameters:
```yaml examples/parameters/corpus_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  query_instruction: ''
  top_k: 5
```

Run Pipeline:

```shell
ultrarag run examples/corpus_search.yaml
```

### BM25 Retrieval

In addition to vector retrieval, UltraRAG also has a built-in classic BM25 text retrieval algorithm. BM25 is a sparse retrieval method improved based on Term Frequency-Inverse Document Frequency (TF-IDF), often used for fast, lightweight text semantic matching tasks. In practical applications, BM25 can complement dense retrieval to improve retrieval coverage and recall diversity.

**Step 1: Build BM25 Index**

Before using BM25 for retrieval, you need to tokenize the document and build a sparse index.

```yaml examples/bm25_index.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.bm25_index
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/bm25_index.yaml
```

Modify parameters:
```yaml examples/parameters/bm25_index_parameter.yaml icon="/images/yaml.svg"
retriever:
  backend: sentence_transformers # [!code --]
  backend: bm25 # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  overwrite: false
```
Run:
```shell
ultrarag run examples/bm25_index.yaml
```

**Step 2: Execute BM25 Retrieval**

After the index construction is completed, document retrieval based on BM25 can be performed.

```yaml examples/bm25_search.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.bm25_search
```

Compile Pipeline:

```shell
ultrarag build examples/bm25_search.yaml
```

Modify parameters:
```yaml examples/parameters/bm25_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers # [!code --]
  backend: bm25 # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  top_k: 5
```

Run retrieval process:
```shell
ultrarag run examples/bm25_search.yaml
```

### Hybrid Retrieval

In practical applications, a single retrieval method is often difficult to balance recall and precision.
For example, BM25 excels at keyword matching, while vector retrieval has advantages in semantic understanding.
Therefore, UltraRAG supports fusing sparse retrieval (BM25) with dense retrieval (Dense Retrieval), comprehensively utilizing the advantages of both through hybrid strategies (Hybrid Retrieval) to further improve retrieval diversity and robustness.

The following example demonstrates how to run BM25 and vector retrieval simultaneously in the same Pipeline, and merge results through a custom module.

<Note>You can refer to this example to flexibly extend retrieval methods into any combination, such as combining local knowledge bases with online Web retrieval, or fusing multi-modal retrieval results such as text and images, to build a more powerful hybrid retrieval Pipeline.</Note>

```yaml examples/hybrid_search.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  dense: servers/retriever
  bm25: servers/retriever
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- dense.retriever_init
- bm25.retriever_init
- dense.retriever_search:
    output:
      ret_psg: dense_psg
- bm25.bm25_search:
    output:
      ret_psg: sparse_psg
- custom.merge_passages:
    input:
      ret_psg: dense_psg
      temp_psg: sparse_psg
```

<Note>This Pipeline involves [Parameter Renaming](/pages/en/rag_client/data_and_params) and [Module Reuse](/pages/en/rag_client/multi_agents) mechanisms. You can click the links to view detailed instructions.</Note>

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/hybrid_search.yaml
```

Modify parameters:
```yaml examples/parameters/hybrid_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
bm25:
  backend: sentence_transformers # [!code --]
  backend: bm25 # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  top_k: 5
custom: {}
dense:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light # [!code --]
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B # [!code ++]
  query_instruction: ''
  top_k: 5
```

Run Hybrid Search Pipeline:
```shell
ultrarag run examples/hybrid_search.yaml
```

### Deploy Retrieval Model

UltraRAG is fully compatible with the OpenAI API interface specification, so any Embedding model that conforms to this interface standard can be directly accessed without additional adaptation or code modification.
The following example shows how to deploy a local retrieval model using [vLLM](https://docs.vllm.ai/en/latest/cli/serve.html#parallelconfig).

**Step 1: Background Model Deployment**

It is recommended to use the Screen method to run in the background to view logs and status in real time.

Enter a new Screen session:

```shell
screen -S retriever
```

Execute the following command to deploy the model (taking Qwen3-Embedding-0.6B as an example):

```shell script/vllm_serve_emb.sh
CUDA_VISIBLE_DEVICES=2 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen-embedding \
    --model Qwen/Qwen3-Embedding-0.6B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65504 \
    --task embed \
    --gpu-memory-utilization 0.2
```

Seeing output similar to the following indicates that the model service has started successfully:

```
(APIServer pid=2270761) INFO:     Started server process [2270761]
(APIServer pid=2270761) INFO:     Waiting for application startup.
(APIServer pid=2270761) INFO:     Application startup complete.
```

Press Ctrl + A + D to exit and keep the service running in the background.
If you need to re-enter the session, execute:

```shell
screen -r retriever
```

**Step 2: Modify Pipeline Parameters**

Taking corpus_search Pipeline as an example, just switch the retrieval backend to openai and point base_url to the local vLLM service:
```yaml examples/parameters/corpus_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers # [!code --]
  backend: openai # [!code ++]
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: '' # [!code --]
      api_key: 'abc' # [!code ++]
      base_url: https://api.openai.com/v1 # [!code --]
      base_url: http://127.0.0.1:65504/v1 # [!code ++]
      model_name: text-embedding-3-small # [!code --]
      model_name: qwen-embedding # [!code ++]
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  query_instruction: ''
  top_k: 5

```

After completing the configuration, you can run it just like using ordinary vector retrieval.

### Web Search API

UltraRAG natively integrates three mainstream Web retrieval APIs: [Tavily](https://www.tavily.com/), [Exa](https://exa.ai/), and [GLM](https://docs.z.ai/guides/tools/web-search).
These APIs can be directly used as the retrieval backend of the Retriever Server to achieve online information retrieval and real-time knowledge enhancement.

**Step 1: Configure API Key**

You need to set the API Key of the corresponding service before use. You can manually export environment variables before running the Pipeline:

```shell
export TAVILY_API_KEY="your retriever key"
```

It is recommended to use the .env configuration file for unified management:
In the UltraRAG root directory, rename the template file `.env.dev` to `.env`, and fill in your key information, for example:

```
LLM_API_KEY=
RETRIEVER_API_KEY=
TAVILY_API_KEY=tvly-dev-yourapikeyhere
EXA_API_KEY=
ZHIPUAI_API_KEY=
```
UltraRAG will automatically read this file and load relevant configurations at startup.

**Step 2: Web Search**

The following example demonstrates how to use the unified Web search tool:

```yaml examples/web_search.yaml icon="/images/yaml.svg"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_websearch
```

Compile Pipeline:

```shell
ultrarag build examples/web_search.yaml
```

Fill in the data path and retrieval parameters in the automatically generated parameter file:
```yaml examples/parameters/web_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  retrieve_thread_num: 1
  top_k: 5
  websearch_backend: tavily
  websearch_backend_configs:
    tavily:
      api_key: ""
      retries: 3
      base_delay: 1.0
      search_kwargs: {}
```

Execute the following command to start the Web retrieval process:
```shell
ultrarag run examples/web_search.yaml
```

<Note>Switch Web retrieval backend via `websearch_backend` and `websearch_backend_configs`.</Note>

### Deploy Retriever Server

When testing multiple benchmarks or model performances under the same corpus, if the retriever server is re-initialized every time, the large corpus and index will be repeatedly loaded, which is time-consuming and inefficient.

Therefore, UltraRAG provides a Resident Retriever Server Deployment Script, which allows the retriever to run on the CPU or GPU for a long time, avoiding repeated loading and accelerating the experimental process.

**Step 1: Parameter Settings**

Similar to ordinary retriever server, you need to prepare the configuration file first:

```json script/deploy_retriever_config.json icon="/images/json.svg"
{
  "model_name_or_path": "openbmb/MiniCPM-Embedding-Light",
  "corpus_path": "data/corpus_example.jsonl",
  "collection_name": "ultrarag_embeddings",

  "backend": "sentence_transformers",
  "backend_configs": {
    "infinity": {
      "bettertransformer": false,
      "pooling_method": "auto",
      "model_warmup": false,
      "trust_remote_code": true
    },
    "sentence_transformers": {
      "trust_remote_code": true,
      "sentence_transformers_encode": {
        "normalize_embeddings": false,
        "encode_chunk_size": 10000,
        "q_prompt_name": "query",
        "psg_prompt_name": "document",
        "psg_task": null,
        "q_task": null
      }
    },
    "openai": {
      "model_name": "text-embedding-3-small",
      "base_url": "https://api.openai.com/v1",
      "api_key": ""
    },
    "bm25": {
      "lang": "en",
      "save_path": "index/bm25"
    }
  },

  "index_backend": "faiss",
  "index_backend_configs": {
    "faiss": {
      "index_use_gpu": true,
      "index_chunk_size": 50000,
      "index_path": "index/index.index"
    },
    "milvus": {
      "uri": "index/milvus_demo.db",
      "token": null,
      "id_field_name": "id",
      "vector_field_name": "vector",
      "text_field_name": "contents",
      "id_max_length": 64,
      "text_max_length": 60000,
      "metric_type": "IP",
      "index_params": {
        "index_type": "AUTOINDEX",
        "metric_type": "IP"
      },
      "search_params": {
        "metric_type": "IP",
        "params": {}
      },
      "index_chunk_size": 50000
    }
  },

  "batch_size": 16,
  "gpu_ids": "0,1",
  "is_multimodal": false,
  "is_demo": false
}
```

**Step 2: Background Deployment**

It is recommended to use Screen so that the retriever can run in the background for a long time and logs can be viewed at any time.

Create Screen session:

```shell
screen -S retriever
```

Start retriever server:

```python script/deploy_retriever_server.py
python ./script/deploy_retriever_server.py \
    --config_path script/deploy_retriever_config.json \
    --host 0.0.0.0 \
    --port 64501
```

After the Server starts, it will reside in memory without repeated loading of corpus and index.

**Step 3: Online Retrieval**

During online retrieval, there is no need to re-initialize the retriever, just specify the deployed address in the pipeline:

```yaml examples/deploy_corpus_search.yaml icon="/images/yaml.svg" 
#  Deploy Corpus Search Demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
```

Run the following command to compile the Pipeline:

```shell
ultrarag build examples/deploy_corpus_search.yaml
```

Modify parameters:
```yaml examples/parameters/deploy_corpus_search_parameter.yaml icon="/images/yaml.svg"
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  query_instruction: ''
  retriever_url: http://127.0.0.1:64501
  top_k: 5
```

Run Pipeline:
```shell
ultrarag run examples/deploy_corpus_search.yaml
```

================
File: pages/en/rag_servers/router.mdx
================
---
title: "Router"
icon: "code-branch"
---

<Note>It is recommended to study this section together with the tutorial [Branch Structure](/pages/en/rag_client/branch).</Note>

## Function

In complex RAG reasoning tasks, it is often necessary to dynamically decide the subsequent execution path based on intermediate results (such as the model's current generated content or retrieval results).
The Router Server is a key component designed for this purpose — it judges the current state based on input information and returns a custom branch label (state identifier) to drive branch jumps and dynamic control in the Pipeline.

## Implementation Example

The following example shows how to implement a Router Tool.

Suppose in the current RAG process, the model needs to judge whether the currently retrieved documents contain enough information to answer the question: if the information is sufficient, end the process, otherwise continue retrieval.

You can implement a Router Tool like this:

```python servers/router/src/router.py icon="python"
@app.tool(output="ans_ls->ans_ls")
def check_model_state(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    def check_state(text):
        if "<search>" in text:
            return True
        else:
            return False
    ans_ls = [
        {
            "data": answer,
            "state": "continue" if check_state(answer) else "stop",
        }
        for answer in ans_ls
    ]
    return {"ans_ls": ans_ls}
```

This Tool will tag each answer with a state label to guide the subsequent process execution:

- `continue`: Insufficient information, need to continue retrieval;
- `stop`: Information is sufficient, terminate the process.

## Usage Example

The defined `Router Tool` needs to be used in conjunction with branch structures `branch:` and `router:` to jointly realize dynamic jumps based on state labels.

```yaml examples/rag_branch.yaml icon="/images/yaml.svg" highlight="9,24,26,37"
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- retriever.retriever_search
- loop:
    times: 10
    steps:
    - prompt.check_passages
    - generation.generate
    - branch:
        router:
        - router.check_model_state
        branches:
          continue:
          - prompt.gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: temp_psg
          - custom.merge_passages
          stop: []
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate
```

This example demonstrates a typical cyclic reasoning process:
When `router.check_model_state` judges that the model output contains the `<search>` identifier, it enters the `continue` branch to continue retrieval;
otherwise, it enters the `stop` branch to directly end the loop.

================
File: pages/en/ui/prepare.mdx
================
---
title: "Deployment Guide"
icon: "box"
---

This guide will walk you through the full-stack deployment of UltraRAG UI, including the Large Language Model (LLM), Retrieval Model (Embedding), and Milvus Vector Database.

## Model Inference Service Deployment

UltraRAG UI uniformly uses the OpenAI API protocol for invocation. You can choose to run directly on the host using `Screen` or use `Docker` for containerized deployment.

### LLM Deployment

Taking Qwen3-32B as an example, it is recommended to use multi-card parallelism to ensure inference speed.

**Screen (Run directly on host)**

1. Create session:

```shell
screen -S llm
```

2. Start command:

```shell script/vllm_serve.sh
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen3-32b \
    --model Qwen/Qwen3-32B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65503 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 2 \
    --enforce-eager
```

Seeing output similar to the following indicates that the model service has started successfully:

```
(APIServer pid=2811812) INFO:     Started server process [2811812]
(APIServer pid=2811812) INFO:     Waiting for application startup.
(APIServer pid=2811812) INFO:     Application startup complete.
```

3. Exit session: Press `Ctrl + A + D` to exit and keep the service running in the background.
If you need to re-enter the session, execute:

```shell
screen -r llm
```

**Docker (Containerized Deployment)**

```shell
docker run -d --gpus all \
  -e CUDA_VISIBLE_DEVICES=0,1 \
  -v /parent_dir_of_models:/workspace \
  -p 29001:65503 \
  --ipc=host \
  --name vllm_qwen \
  vllm/vllm-openai:latest \
  --served-model-name qwen3-32b \
  --model Qwen/Qwen3-32B \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 65503 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9 \
  --tensor-parallel-size 2 \
  --enforce-eager
```

### Retrieval Model Deployment

Taking Qwen3-Embedding-0.6B as an example, which usually occupies less video memory.

**Screen (Run directly on host)**

1. Create session:

```shell
screen -S retriever
```

2. Start command:

```shell script/vllm_serve_emb.sh
CUDA_VISIBLE_DEVICES=2 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen-embedding \
    --model Qwen/Qwen3-Embedding-0.6B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65504 \
    --task embed \
    --gpu-memory-utilization 0.2
```
**Docker (Containerized Deployment)**

```shell
docker run -d --gpus all \
  -e CUDA_VISIBLE_DEVICES=2 \
  -v /parent_dir_of_models:/workspace \
  -p 29002:65504 \
  --ipc=host \
  --name vllm_qwen_emb \
  vllm/vllm-openai:latest \
  --served-model-name qwen-embedding \
  --model Qwen/Qwen3-Embedding-0.6B \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 65504 \
  --task embed \
  --gpu-memory-utilization 0.2
```

## Vector Database Deployment (Milvus)

Milvus is used for efficient storage and retrieval of vector data.

**Official Deployment**

```shell
# Milvus Standalone (docker): https://milvus.io/docs/install_standalone-docker.md
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
bash standalone_embed.sh start
```

**Custom Deployment**

If you need to customize ports (e.g., to prevent port conflicts) or data paths, you can use the following script:
```shell start_milvus.sh highlight="7,8,10"
#!/usr/bin/env bash
set -e

CONTAINER_NAME=milvus-ultrarag
MILVUS_IMAGE=milvusdb/milvus:latest

GRPC_PORT=29901
HTTP_PORT=29902

DATA_DIR=/root/ultrarag-demo/milvus/

echo "==> Starting Milvus (standalone)"
echo "==> gRPC: ${GRPC_PORT}, HTTP: ${HTTP_PORT}"
echo "==> Data dir: ${DATA_DIR}"

mkdir -p ${DATA_DIR}
chown -R 1000:1000 ${DATA_DIR} 2>/dev/null || true

docker run -d \
  --name ${CONTAINER_NAME} \
  --restart unless-stopped \
  --security-opt seccomp:unconfined \
  -e DEPLOY_MODE=STANDALONE \
  -e ETCD_USE_EMBED=true \
  -e COMMON_STORAGETYPE=local \
  -v ${DATA_DIR}:/var/lib/milvus \
  -p ${GRPC_PORT}:19530 \
  -p ${HTTP_PORT}:9091 \
  --health-cmd="curl -f http://localhost:9091/healthz" \
  --health-interval=30s \
  --health-start-period=60s \
  --health-timeout=10s \
  --health-retries=3 \
  ${MILVUS_IMAGE} \
  milvus run standalone

echo "==> Waiting for Milvus to become healthy..."
sleep 5
docker ps | grep ${CONTAINER_NAME} || true
```

Modify GRPC_PORT, HTTP_PORT, and DATA_DIR, and run the following command to deploy:

```shell
bash start_milvus.sh
```

After successful deployment, you can check the status of Milvus with the following command:

```shell
docker ps | grep milvus-ultrarag
```

If everything is normal, you should be able to see the Milvus container running.

<Tip>UI Configuration Tip: After successful startup, fill in the `GRPC_PORT` address (e.g., `tcp://127.0.0.1:29901`) in `Knowledge Base` -> `Configure DB` of UltraRAG UI. Click Connect, and seeing Connected means success.</Tip>

================
File: pages/en/ui/start.mdx
================
---
title: "Quick Start"
icon: "atom"
---

UltraRAG UI is not just a chat interface, it is a complete RAG development and debugging platform.

![](/images/ui/chat_menu.png)

## Startup Command

Use the following command to start the UI service:

```bash
ultrarag show ui [OPTIONS]
```

### Common Options

- `--port <INTEGER>`: Specify service port, default is `5050`.
- `--host <TEXT>`: Specify binding address, default is `127.0.0.1`.

### Examples

**Start UI Service:**

```bash
ultrarag show ui
```

**Start UI on a custom host and port:**

```bash
ultrarag show ui --host 0.0.0.0 --port 5050
```

After startup, access `http://127.0.0.1:5050` in your browser to enter the system.

## 1. Chat

Entering the system displays the chat page by default. You can directly select a compiled Pipeline to start a conversation.

<Tip>We provide deployment tutorials for several developed Pipelines in [Typical Scenarios](/pages/en/demo/llm). You can also customize Pipelines according to your needs.</Tip>

![](/images/ui/chat_menu.png)

### Pipeline Switching

You can quickly switch between Pipelines with configured parameters through the dropdown menu in the upper left corner.

![](/images/ui/pipeline_select.png)

### Select Knowledge Base

Click the Knowledge Base icon to mount the constructed knowledge base, and you can perform document-based Q&A.

![](/images/ui/kb_select.png)

### Background Running

For time-consuming Pipelines such as Deep Research, background running mode is supported. After the task execution is completed, the results will be automatically loaded into the current chat window.

![](/images/ui/background.png)

## 2. Knowledge Base
UltraRAG UI provides full-process knowledge base management functions, supporting file upload, chunking, and embedding management.

![](/images/ui/kb_menu.png)

### Connect Vector Database

<Tip>Don't know how to deploy Milvus vector library? Please refer to [Deployment Guide](/pages/en/ui/prepare).</Tip>

Click Configure DB to establish a connection with the deployed Milvus vector library, and you can use the knowledge base function.

![](/images/ui/kb_connect.png)

### Build Knowledge Base

Click New Collection to upload documents and create an exclusive knowledge base.

![](/images/ui/kb_upload.png)

Click the Settings button to customize the slicing strategy (Chunk) and Embedding model parameters.

<Tip>Configuration is usually required for first-time use. If you want to achieve configuration-free deployment (users do not need to set manually), please modify relevant parameters in `examples/parameter/corpus_chunk_parameter.yaml` and `examples/parameter/milvus_index_parameter.yaml` in advance to skip the setting steps below.</Tip>

![](/images/ui/kb_chunk.png)
![](/images/ui/kb_index.png)


## 3. Pipeline Builder
After startup, users with administrator privileges can see the Settings entry in the sidebar. Click to enter the advanced configuration page.

### Visual Pipeline Construction

Supports bidirectional real-time synchronization between drag-and-drop orchestration on the left canvas and code editor on the right. You can build intuitively like building blocks, or fine-tune in the code editor.

![](/images/ui/pipeline_build.png)

### Configure Parameters

After clicking the Build button to parse the Pipeline, you can view and modify running parameters in the parameter panel.

![](/images/ui/pipeline_param.png)

### Prompt Management

Supports online creation, editing, and deletion of Prompts, and one-click application to the Pipeline.

![](/images/ui/pipeline_prompt.png)

### AI Assistant

The system has a built-in AI assistant that can assist you in building Pipelines, adjusting parameters, and writing Prompts.

<Tip>Before using this function for the first time, you need to click Settings to configure the API Key and related model parameters.</Tip>

**Usage Example: Optimize Prompt**

Suppose you have a basic Prompt and want to adjust it to a style suitable for the legal field:

1. Open the AI assistant and enter the original Prompt and modification requirements.

![](/images/ui/ai_1.png)

2. Click Apply, and the AI assistant will automatically generate optimized content and replace the original Prompt.

![](/images/ui/ai_2.png)
