# DeepSeek MCP Server - Complete Documentation

A Model Context Protocol (MCP) server that integrates DeepSeek AI models with MCP-compatible clients like Claude Code and Gemini CLI. Hosted remote endpoint at deepseek-mcp.tahirl.com/mcp with BYOK (Bring Your Own Key) authentication. Also supports local stdio and self-hosted Streamable HTTP transports with Docker deployment. Provides multi-turn sessions, model fallback with circuit breaker, MCP Resources, chat completion, thinking mode, JSON output, function calling, multimodal content support, model-aware cost tracking, and 12 prompt templates.

## Installation

### Remote (No Install)

```bash
# Claude Code — connect to hosted endpoint with your own API key
claude mcp add --transport http deepseek \
  https://deepseek-mcp.tahirl.com/mcp \
  --header "Authorization: Bearer YOUR_DEEPSEEK_API_KEY"
```

```json
// Cursor / Windsurf / VS Code
{
  "mcpServers": {
    "deepseek": {
      "url": "https://deepseek-mcp.tahirl.com/mcp",
      "headers": { "Authorization": "Bearer ${DEEPSEEK_API_KEY}" }
    }
  }
}
```

### Local (stdio)

```bash
# Claude Code (all projects)
claude mcp add -s user deepseek npx @arikusi/deepseek-mcp-server -e DEEPSEEK_API_KEY=your-key

# Gemini CLI
gemini mcp add deepseek npx @arikusi/deepseek-mcp-server -e DEEPSEEK_API_KEY=your-key

# Global npm install
npm install -g @arikusi/deepseek-mcp-server

# Docker
docker run -d -p 3000:3000 -e DEEPSEEK_API_KEY=your-key ghcr.io/arikusi/deepseek-mcp-server
```

Get API key: https://platform.deepseek.com

## Models

Both models run DeepSeek-V3.2 with unified pricing. They are the same underlying model with different modes:
- deepseek-chat = V3.2 non-thinking mode
- deepseek-reasoner = V3.2 thinking mode (transparently routed as deepseek-chat + thinking:{type:"enabled"})

### deepseek-chat
- General conversations, coding, content generation
- Non-thinking mode by default. Can enable thinking via `thinking: {type: "enabled"}` parameter
- 128K context, 8K max output (default 4K), 64K max output when thinking enabled
- Supports thinking mode, JSON mode, function calling, FIM completion
- Pricing: $0.028/1M cache hit, $0.28/1M cache miss, $0.42/1M output

### deepseek-reasoner
- Complex reasoning, math, logic, multi-step tasks
- Transparently routed as deepseek-chat with thinking:{type:"enabled"}
- This enables full feature parity including function calling, which the raw deepseek-reasoner API does not support
- 128K context, 64K max output (default 32K)
- Supports JSON mode, function calling (via transparent routing)
- Pricing: $0.028/1M cache hit, $0.28/1M cache miss, $0.42/1M output

### Model Routing Table

| User selects | Sent to API | Thinking | reasoning_content | Function calling |
|---|---|---|---|---|
| deepseek-chat | deepseek-chat | Off | No | Yes |
| deepseek-chat + thinking:enabled | deepseek-chat + thinking | On | Yes | Yes |
| deepseek-reasoner | deepseek-chat + thinking | On | Yes | Yes |

## Tool: deepseek_chat

### Input Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| messages | Array<{role, content, tool_call_id?}> | Yes | - | Conversation messages. Roles: system, user, assistant, tool. Content can be string or array of content parts (text/image_url) when multimodal is enabled |
| model | "deepseek-chat" \| "deepseek-reasoner" | No | "deepseek-chat" | Model to use |
| temperature | number (0-2) | No | 1.0 | Sampling temperature. Ignored when thinking enabled |
| max_tokens | number (1-65536) | No | model max | Max tokens: chat=8192, reasoner=65536. Both have 128K context |
| stream | boolean | No | false | Enable streaming mode |
| tools | Array<ToolDefinition> (max 128) | No | - | Function calling tool definitions |
| tool_choice | "auto" \| "none" \| "required" \| {type, function} | No | "auto" | Which tool to call |
| thinking | {type: "enabled" \| "disabled"} | No | - | Enable thinking mode |
| json_mode | boolean | No | false | Enable JSON output mode (supported by both models) |
| session_id | string | No | - | Session ID for multi-turn conversations |

### Multi-Turn Sessions

Use `session_id` to maintain conversation context across requests:

```json
// First request - creates session
{
  "messages": [{"role": "user", "content": "What is the capital of France?"}],
  "session_id": "my-session"
}

// Second request - previous context is automatically prepended
{
  "messages": [{"role": "user", "content": "What about Germany?"}],
  "session_id": "my-session"
}
```

Sessions are stored in memory. In STDIO transport they live for the lifetime of the server process. In HTTP transport each MCP session gets its own isolated SessionStore, so session_id values are scoped to the MCP session that created them and are not visible to other connected HTTP clients. Sessions expire after SESSION_TTL_MINUTES (default: 30). Omit session_id for stateless single-turn requests.

### Thinking Mode

There are two ways to enable thinking mode:

1. Select deepseek-reasoner (automatically routed as chat + thinking):
```json
{
  "messages": [{"role": "user", "content": "Analyze quicksort complexity"}],
  "model": "deepseek-reasoner"
}
```

2. Explicitly enable thinking on deepseek-chat:
```json
{
  "messages": [{"role": "user", "content": "Analyze quicksort complexity"}],
  "model": "deepseek-chat",
  "thinking": {"type": "enabled"}
}
```

Both produce identical results. When thinking is active, temperature/top_p/frequency_penalty/presence_penalty are automatically ignored. The response includes a reasoning_content field with the chain-of-thought.

### JSON Output Mode

Get structured JSON responses:

```json
{
  "messages": [{"role": "user", "content": "Return a json object with user data"}],
  "model": "deepseek-chat",
  "json_mode": true
}
```

Include the word "json" in your prompt for best results. Supported by both models.

### Tool Definition Format

```json
{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "What the function does",
    "parameters": {
      "type": "object",
      "properties": {
        "param1": { "type": "string", "description": "..." }
      },
      "required": ["param1"]
    }
  }
}
```

### Response Format

```json
{
  "content": "Response text",
  "reasoning_content": "Chain-of-thought (reasoner only)",
  "model": "deepseek-chat",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30,
    "prompt_cache_hit_tokens": 8,
    "prompt_cache_miss_tokens": 2
  },
  "finish_reason": "stop",
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": { "name": "fn_name", "arguments": "{\"key\":\"value\"}" }
    }
  ],
  "cost_usd": 0.000012,
  "session_id": "my-session"
}
```

## Tool: deepseek_sessions

Manage conversation sessions:

```json
{"action": "list"}                                   // List all active sessions
{"action": "delete", "session_id": "my-session"}     // Delete specific session
{"action": "clear"}                                  // Clear all sessions
```

## MCP Resources

Three read-only resources:

| URI | Description |
|-----|-------------|
| deepseek://models | Model list with capabilities, context limits, pricing |
| deepseek://config | Current server config (API key masked: sk-****1234) |
| deepseek://usage | Real-time usage stats (requests, tokens, costs, sessions, cache ratio) |

## Model Fallback & Circuit Breaker

**Fallback**: On retryable errors (429, 503, timeout), automatically tries the other model:
- deepseek-chat fails → tries deepseek-reasoner
- deepseek-reasoner fails → tries deepseek-chat
- Response includes fallback info when fallback was used

**Circuit Breaker**: Protects against cascading failures:
- CLOSED → normal operation
- After CIRCUIT_BREAKER_THRESHOLD failures (default 5) → OPEN (fast-fail for CIRCUIT_BREAKER_RESET_TIMEOUT ms)
- After timeout → HALF_OPEN (probe with 1 request)
- Probe succeeds → CLOSED; Probe fails → OPEN again

Disable with FALLBACK_ENABLED=false.

## Function Calling Flow

1. Send messages with `tools` array defining available functions
2. Model responds with `tool_calls` containing function name and arguments
3. Execute the function locally
4. Send result back as a `tool` role message with `tool_call_id`
5. Model generates final response using the tool result

## Cost Tracking

V3.2 provides cache-aware pricing. Cost display shows:
- Total cost in USD
- Cache hit ratio (percentage of prompt tokens served from cache)
- Estimated savings from cache hits

Example output: `$0.0042 (cache hit: 80%, saved ~$0.0168)`

## Configuration (Environment Variables)

| Variable | Default | Description |
|----------|---------|-------------|
| DEEPSEEK_API_KEY | (required) | DeepSeek API key |
| DEEPSEEK_BASE_URL | https://api.deepseek.com | Custom API endpoint |
| DEFAULT_MODEL | deepseek-chat | Default model for requests |
| SHOW_COST_INFO | true | Show cost info in responses |
| REQUEST_TIMEOUT | 60000 | Request timeout (ms) |
| MAX_RETRIES | 2 | Max retry count |
| SKIP_CONNECTION_TEST | false | Skip startup API connection test |
| MAX_MESSAGE_LENGTH | 100000 | Max message content length (chars) |
| SESSION_TTL_MINUTES | 30 | Session time-to-live in minutes |
| MAX_SESSIONS | 100 | Maximum concurrent sessions |
| FALLBACK_ENABLED | true | Enable automatic model fallback |
| CIRCUIT_BREAKER_THRESHOLD | 5 | Consecutive failures before circuit opens |
| CIRCUIT_BREAKER_RESET_TIMEOUT | 30000 | Milliseconds before circuit half-opens |
| MAX_SESSION_MESSAGES | 200 | Max messages per session (sliding window) |

## Prompt Templates (12 total)

### Core Reasoning
- debug_with_reasoning(code, error?, language?)
- code_review_deep(code, language?, focus: security|performance|quality|all)
- research_synthesis(topic, context?, depth: brief|moderate|comprehensive)
- strategic_planning(goal, context?, constraints?)
- explain_like_im_five(topic, audience: child|beginner|intermediate)

### Advanced
- mathematical_proof(statement, context?)
- argument_validation(argument, type: informal|formal|both)
- creative_ideation(challenge, constraints?, quantity: 1-20)
- cost_comparison(task, estimated_tokens)
- pair_programming(task, language, style: beginner|intermediate|expert)

### Function Calling
- function_call_debug(tools_json, messages_json, error?)
- create_function_schema(description, examples?)

## Remote Endpoint (Hosted)

A hosted BYOK endpoint is available at: https://deepseek-mcp.tahirl.com/mcp

Send your DeepSeek API key as `Authorization: Bearer <key>`. No server-side key stored. Powered by Cloudflare Workers (global edge, zero cold start, free tier).

Endpoints:
- `GET /health` — health check (status, version, transport, timestamp)
- `GET /` — server info (name, version, description, endpoints)
- `POST /mcp` — MCP JSON-RPC requests (requires Bearer auth)

## HTTP Transport (Self-Hosted)

Run your own HTTP server instead of stdio:

```bash
TRANSPORT=http HTTP_PORT=3000 DEEPSEEK_API_KEY=your-key node dist/index.js
```

Endpoints:
- `GET /health` — health check (status, version, uptime)
- `POST /mcp` — MCP JSON-RPC requests (Streamable HTTP protocol)
- `GET /mcp` — SSE stream (requires Mcp-Session-Id header)
- `DELETE /mcp` — terminate session

Each MCP session gets its own McpServer instance AND its own SessionStore instance (session-store isolation added in 1.7.0 to prevent cross-session data exposure). DeepSeekClient is shared across sessions since it is a stateless API client.

## Docker

```bash
docker build -t deepseek-mcp-server .
docker run -d -p 3000:3000 -e DEEPSEEK_API_KEY=your-key deepseek-mcp-server
```

Docker defaults to HTTP transport. Health check is built in (wget to /health every 30s).

## Architecture

```
src/
  index.ts              # Entry point, bootstrap
  server.ts             # McpServer factory (version from package.json)
  deepseek-client.ts    # DeepSeek API wrapper (circuit breaker + fallback)
  config.ts             # Zod-validated config from env vars
  cost.ts               # V3.2 cache-aware cost calculation
  schemas.ts            # Zod input validation schemas
  types.ts              # TypeScript types + type guards
  errors.ts             # Custom error classes
  session.ts            # In-memory session store
  circuit-breaker.ts    # Circuit breaker pattern
  usage-tracker.ts      # Usage statistics tracker
  transport-http.ts     # Streamable HTTP transport (Express)
  tools/
    deepseek-chat.ts    # deepseek_chat tool (sessions + fallback)
    deepseek-sessions.ts # deepseek_sessions tool
    index.ts            # Tool registration aggregator
  resources/
    models.ts           # deepseek://models resource
    config.ts           # deepseek://config resource
    usage.ts            # deepseek://usage resource
    index.ts            # Resource registration aggregator
  prompts/
    core.ts             # 5 core reasoning prompts
    advanced.ts         # 5 advanced prompts
    function-calling.ts # 2 function calling prompts
    index.ts            # Prompt registration aggregator
```

## Configuration (Environment Variables)

| Variable | Default | Description |
|----------|---------|-------------|
| DEEPSEEK_API_KEY | (required) | DeepSeek API key |
| DEEPSEEK_BASE_URL | https://api.deepseek.com | Custom API endpoint |
| DEFAULT_MODEL | deepseek-chat | Default model for requests |
| SHOW_COST_INFO | true | Show cost info in responses |
| REQUEST_TIMEOUT | 60000 | Request timeout (ms) |
| MAX_RETRIES | 2 | Max retry count |
| SKIP_CONNECTION_TEST | false | Skip startup API connection test |
| MAX_MESSAGE_LENGTH | 100000 | Max message content length (chars) |
| SESSION_TTL_MINUTES | 30 | Session time-to-live in minutes |
| MAX_SESSIONS | 100 | Maximum concurrent sessions |
| FALLBACK_ENABLED | true | Enable automatic model fallback |
| CIRCUIT_BREAKER_THRESHOLD | 5 | Consecutive failures before circuit opens |
| CIRCUIT_BREAKER_RESET_TIMEOUT | 30000 | Milliseconds before circuit half-opens |
| MAX_SESSION_MESSAGES | 200 | Max messages per session (sliding window) |
| ENABLE_MULTIMODAL | false | Enable multimodal (image) input support |
| TRANSPORT | stdio | Transport mode: stdio or http |
| HTTP_PORT | 3000 | HTTP server port (when TRANSPORT=http) |

## Tech Stack

- TypeScript 5.7 (strict mode)
- @modelcontextprotocol/sdk for MCP protocol
- OpenAI SDK v6 for API compatibility
- Zod v4 for validation
- Vitest for testing (265 tests, ~89% line coverage)
- Node.js 18+

## Links

- npm: https://www.npmjs.com/package/@arikusi/deepseek-mcp-server
- GitHub: https://github.com/arikusi/deepseek-mcp-server
- DeepSeek API: https://api-docs.deepseek.com
- MCP Spec: https://modelcontextprotocol.io
