tests — reasoning
tests — reasoning
The reasoning module provides sophisticated problem-solving capabilities to the agent, leveraging various AI reasoning techniques like Chain-of-Thought (CoT) and Tree-of-Thought (ToT)/MCTS. It offers both automated complexity detection and user-controlled modes to guide the agent's thinking process.
This module is composed of three main parts:
ReasoningFacade: The core interface for executing different reasoning strategies and managing their lifecycle.ReasoningMiddleware: Integrates reasoning capabilities into the agent's message processing pipeline by detecting problem complexity and injecting guidance.Think Command Handlers: Provides user-facing commands (/think) to control reasoning modes and initiate problem-solving.
Core Concepts
The module supports several reasoning modes, each offering a different trade-off between computational cost, latency, and problem-solving depth:
shallow(Chain-of-Thought - CoT): A quick, single-pass reasoning approach where the LLM generates a sequence of thoughts leading to a final answer. It's suitable for simpler problems or when a fast, approximate solution is acceptable.medium(Tree-of-Thought - ToT): Explores a limited search space, generating multiple thought paths and evaluating them to find a better solution. This mode is more robust thanshallowfor moderately complex problems.deep(Monte Carlo Tree Search - MCTS): Employs a more extensive search strategy, building a tree of possibilities and using MCTS to navigate it. Ideal for complex problems requiring deeper exploration.exhaustive(MCTS): The most thorough search mode, designed for highly challenging problems where maximum effort is justified. It typically involves higher computational cost and latency.
Beyond explicit mode selection, the module also features:
- Auto-Selection: The system can automatically choose a reasoning mode based on the perceived complexity of a problem description.
- Auto-Escalation: If an initial
shallowreasoning attempt yields low confidence, the system can automatically re-attempt the problem with a more robust mode (e.g.,medium) to improve the solution quality. - Usage Tracking: Monitors API calls, estimated tokens, and total time spent across all reasoning operations.
Architecture Overview
The following diagram illustrates how the different components of the reasoning module interact:
graph TD
subgraph User Interaction
User -->|/think command| ThinkHandlers
end
subgraph Agent Core
AgentLoop -->|User Message| ReasoningMiddleware
end
subgraph Reasoning Logic
ThinkHandlers -->|setActiveThinkingMode| GlobalThinkingMode
ReasoningMiddleware -->|getActiveThinkingMode| GlobalThinkingMode
ReasoningMiddleware -->|detectComplexity| ReasoningMiddleware
ReasoningMiddleware -->|injects <reasoning_guidance>| LLMInput
LLMInput --> LLM(LLM Model)
ThinkHandlers -->|calls solve()| ReasoningFacade
ReasoningFacade -->|dispatches to| TreeOfThoughtReasoner
ReasoningFacade -->|dispatches to| ChainOfThoughtReasoner
ReasoningFacade -->|tracks usage| UsageStats
end
Components
ReasoningFacade (src/agent/reasoning/reasoning-facade.ts)
The ReasoningFacade acts as the primary interface for interacting with the underlying reasoning engines (Chain-of-Thought, Tree-of-Thought). It abstracts away the specifics of different reasoning algorithms, providing a unified solve() method. It also handles API key management, usage tracking, result formatting, and the heuristics for auto-selecting reasoning modes and auto-escalation.
Key Responsibilities:
- Initialization: Configured with an API key and an optional base URL for the LLM.
- Unified
solve()Method: The core entry point for initiating a reasoning process. It takes aProblemobject (containingdescription,constraints,examples) andReasoningOptions(specifyingmode,autoEscalate, etc.). - Mode Dispatch: Based on the selected or auto-detected mode,
solve()intelligently dispatches the problem to either theChainOfThoughtReasoner(forshallowmode) or theTreeOfThoughtReasoner(formedium,deep,exhaustivemodes). - Auto-Selection Logic: If no explicit
modeis provided inReasoningOptions,solve()uses heuristics based on the problem'sdescriptionlength, presence ofconstraints, andexamplesto determine an appropriate default reasoning mode. - Auto-Escalation: If
options.autoEscalateistrueand an initialshallowreasoning attempt yields low confidence (e.g., < 0.5),solve()will automatically re-attempt the problem using amediumreasoning mode to improve the solution quality. - Usage Tracking: Provides
getUsage()to retrieve statistics on reasoning calls (CoT, ToT, MCTS), total time, and estimated tokens.resetUsage()clears these statistics. - Result Formatting:
formatResult()takes the raw output from either aCoTResultorReasoningResultand formats it into a human-readable string for display. - Singleton Management:
getReasoningFacade()andresetReasoningFacade()manage a singleton instance ofReasoningFacade, ensuring consistent API key usage and usage tracking across the application.
ReasoningMiddleware (src/agent/middleware/reasoning-middleware.ts)
The ReasoningMiddleware integrates the reasoning capabilities into the agent's message processing pipeline. Its primary role is to detect the complexity of user prompts and, if appropriate, inject a system message () to encourage the LLM to engage in more structured or advanced reasoning.
Key Responsibilities:
- Complexity Detection: The
detectComplexity(message: string)function analyzes a given string (typically the user's last message) for keywords and length to assign a complexityscoreandlevel(none,cot,tot,mcts). Signals include action verbs, constraint language, exploration language, multi-step indicators, and a length bonus. - Guidance Injection (
beforeTurn): - During the
beforeTurnphase of the agent loop, the middleware checks the currently active thinking mode (viagetActiveThinkingMode). - Explicit Mode: If an explicit mode is set (e.g., via
/think deep), it always injectsinto the system messages, instructing the LLM to use that specific reasoning mode. - Auto-Detect Mode: If no explicit mode is set and auto-detection is enabled, it calls
detectComplexity()on the last user message. If the detected complexity level istotormcts, it injectsto prompt the LLM for more advanced reasoning. - Prevention of Double Injection: Ensures that reasoning guidance is not injected multiple times into the message history.
- Auto-Detection Toggle:
setAutoDetect(enabled: boolean)allows enabling or disabling the automatic complexity detection and guidance injection. - Factory Function:
createReasoningMiddleware()provides a convenient way to instantiate the middleware with optional configurations.
Think Command Handlers (src/commands/handlers/think-handlers.ts)
The think-handlers module provides user-facing commands (/think) to control the agent's reasoning behavior. Users can manually set reasoning modes, check status, and directly initiate reasoning for specific problems.
Key Responsibilities:
- Command Parsing (
handleThink): The main command handler for/thinkparses various arguments: - No arguments: Displays help text, including the current reasoning mode and available options.
off: Disables the active reasoning mode, setting it tonull.status: Shows the current reasoning mode and configuration details (e.g., max iterations, depth) for the active mode. It also indicates if no reasoning runs have occurred yet.(e.g.,shallow,medium,deep,exhaustive): Sets the global active reasoning mode. This mode is then used byReasoningMiddlewareto decide whether to inject guidance.: If an API key (GROK_API_KEY) is available, it initiates a reasoning run for the provided problem using the currently active mode (or auto-selected if none).: Sets the specified mode and then initiates a reasoning run for the problem.- Global Mode Management:
getActiveThinkingMode()returns the currently active reasoning mode ('shallow','medium','deep','exhaustive', ornull).setActiveThinkingMode(mode: ThinkingMode | null)sets this global state. - API Key Requirement: For any actual problem-solving initiated via
/think, theGROK_API_KEYenvironment variable must be set. If not, an error message is returned. - Error Handling: Gracefully handles errors during reasoning attempts (e.g., LLM API timeouts), reporting them to the user.
How it Works
User-Initiated Reasoning via /think
- A user types a command like
/think deep "How do I refactor this complex module?". - The
handleThinkfunction inthink-handlers.tsparses the command. - It calls
setActiveThinkingMode('deep')to update the global reasoning mode. - It then constructs a
Problemobject from the problem description and callsReasoningFacade.solve()with the problem and the specified mode. ReasoningFacade.solve()executes the appropriate reasoning strategy (in this case,TreeOfThoughtReasonerfordeepmode).- The raw result from the reasoner is then formatted by
ReasoningFacade.formatResult()into a human-readable string. handleThinkreturns this formatted result, which is displayed to the user.
Agent-Initiated Reasoning via ReasoningMiddleware
- A user sends a message to the agent, e.g., "Please design a robust, scalable architecture for a new microservice, considering trade-offs between latency and cost."
- As part of the agent's message processing loop,
ReasoningMiddleware.beforeTurn()is invoked. - The middleware first checks
getActiveThinkingMode().
- If the user has explicitly set a mode (e.g.,
/think medium), the middleware will injectinto the system messages, instructing the LLM to use that specific reasoning mode. - If no explicit mode is set (and auto-detection is enabled),
detectComplexity()analyzes the user's message. For the example message above, it would likely detect atotormctscomplexity level due to keywords like "design," "robust," "scalable," "trade-offs."
- Based on the detected complexity or explicit mode, the middleware injects a system message containing
into the message history. This guidance is a prompt engineering technique designed to steer the LLM towards using its internal reasoning capabilities more effectively. - The LLM receives the augmented message history, including the reasoning guidance. It is then expected to perform the reasoning based on this guidance, potentially leading to a more structured and higher-quality response.
- Important Note: The
ReasoningMiddlewareguides the LLM; it does not directly callReasoningFacade.solve(). The LLM itself is expected to perform the reasoning based on the guidance provided in the system message.
Configuration and Usage
API Key
All reasoning operations that involve external LLM calls (i.e., anything beyond local complexity detection) require an API key. This is typically configured via an environment variable:
export GROK_API_KEY="your_grok_api_key_here"
/think Command
The /think command provides direct control over the agent's reasoning behavior:
/think: Displays help text, including the current reasoning mode./think off: Disables any active explicit reasoning mode./think status: Shows the current reasoning mode and its detailed configuration (e.g., max iterations, max depth, expansion strategy). Also displays accumulated usage statistics if any reasoning runs have occurred./think shallow|medium|deep|exhaustive: Sets the global explicit reasoning mode. This mode will then be used by theReasoningMiddlewareto inject guidance for subsequent user prompts./think: Initiates a reasoning run for the provided problem description using the currently active reasoning mode (or auto-selected if no mode is explicitly set)./think deep: Sets the reasoning mode todeepfor this specific command and then initiates a reasoning run for the problem. The mode will remaindeepfor subsequent interactions until changed.
ReasoningMiddleware Auto-Detection
The automatic complexity detection and guidance injection by ReasoningMiddleware can be toggled programmatically:
import { createReasoningMiddleware } from 39;../../src/agent/middleware/reasoning-middleware.js39;;
const middleware = createReasoningMiddleware(); class="hl-cmt">// Auto-detect is true by default
middleware.setAutoDetect(false); class="hl-cmt">// Disable auto-detection
class="hl-cmt">// ... later ...
middleware.setAutoDetect(true); class="hl-cmt">// Re-enable auto-detection