Merge branch 'BeehiveInnovations:main' into fix/google-allowed-models-restriction

This commit is contained in:
Ming
2025-06-16 21:17:19 +08:00
committed by GitHub
29 changed files with 1533 additions and 494 deletions

1
CLAUDE.local.md Normal file
View File

@@ -0,0 +1 @@
- Before any commit / push to github, you must first always run and confirm run that code quality checks pass. Use @code_quality_checks.sh and confirm that we have 100% unit tests passing.

View File

@@ -112,6 +112,11 @@ docker logs zen-mcp-redis
### Testing ### Testing
Simulation tests are available to test the MCP server in a 'live' scenario, using your configured
API keys to ensure the models are working and the server is able to communicate back and forth.
IMPORTANT: Any time any code is changed or updated, you MUST first restart it with ./run-server.sh OR
pass `--rebuild` to the `communication_simulator_test.py` script (if running it for the first time after changes) so that it's able to restart and use the latest code.
#### Run All Simulator Tests #### Run All Simulator Tests
```bash ```bash
# Run the complete test suite # Run the complete test suite

View File

@@ -80,6 +80,7 @@ Claude is brilliant, but sometimes you need:
- **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control - **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
- **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis - **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis
- **Smart file handling** - Automatically expands directories, manages token limits based on model capacity - **Smart file handling** - Automatically expands directories, manages token limits based on model capacity
- **Vision support** - Analyze images, diagrams, screenshots, and visual content with vision-capable models
- **[Bypass MCP's token limits](docs/advanced-usage.md#working-with-large-prompts)** - Work around MCP's 25K limit automatically - **[Bypass MCP's token limits](docs/advanced-usage.md#working-with-large-prompts)** - Work around MCP's 25K limit automatically
- **[Context revival across sessions](docs/context-revival.md)** - Continue conversations even after Claude's context resets, with other models maintaining full history - **[Context revival across sessions](docs/context-revival.md)** - Continue conversations even after Claude's context resets, with other models maintaining full history
@@ -314,6 +315,7 @@ and then debate with the other models to give me a final verdict
- Technology comparisons and best practices - Technology comparisons and best practices
- Architecture and design discussions - Architecture and design discussions
- Can reference files for context: `"Use gemini to explain this algorithm with context from algorithm.py"` - Can reference files for context: `"Use gemini to explain this algorithm with context from algorithm.py"`
- **Image support**: Include screenshots, diagrams, UI mockups for visual analysis: `"Chat with gemini about this error dialog screenshot to understand the user experience issue"`
- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response - **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices - **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
@@ -337,6 +339,7 @@ with the best architecture for my project
- Offers alternative perspectives and approaches - Offers alternative perspectives and approaches
- Validates architectural decisions and design patterns - Validates architectural decisions and design patterns
- Can reference specific files for context: `"Use gemini to think deeper about my API design with reference to api/routes.py"` - Can reference specific files for context: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
- **Image support**: Analyze architectural diagrams, flowcharts, design mockups: `"Think deeper about this system architecture diagram with gemini pro using max thinking mode"`
- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution - **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude - **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
@@ -362,6 +365,7 @@ I need an actionable plan but break it down into smaller quick-wins that we can
- Supports specialized reviews: security, performance, quick - Supports specialized reviews: security, performance, quick
- Can enforce coding standards: `"Use gemini to review src/ against PEP8 standards"` - Can enforce coding standards: `"Use gemini to review src/ against PEP8 standards"`
- Filters by severity: `"Get gemini to review auth/ - only report critical vulnerabilities"` - Filters by severity: `"Get gemini to review auth/ - only report critical vulnerabilities"`
- **Image support**: Review code from screenshots, error dialogs, or visual bug reports: `"Review this error screenshot and the related auth.py file for potential security issues"`
### 4. `precommit` - Pre-Commit Validation ### 4. `precommit` - Pre-Commit Validation
**Comprehensive review of staged/unstaged git changes across multiple repositories** **Comprehensive review of staged/unstaged git changes across multiple repositories**
@@ -408,6 +412,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
- `review_type`: full|security|performance|quick - `review_type`: full|security|performance|quick
- `severity_filter`: Filter by issue severity - `severity_filter`: Filter by issue severity
- `max_depth`: How deep to search for nested repos - `max_depth`: How deep to search for nested repos
- `images`: Screenshots of requirements, design mockups, or error states for validation context
### 5. `debug` - Expert Debugging Assistant ### 5. `debug` - Expert Debugging Assistant
**Root cause analysis for complex problems** **Root cause analysis for complex problems**
@@ -428,6 +433,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
- Supports runtime info and previous attempts - Supports runtime info and previous attempts
- Provides structured root cause analysis with validation steps - Provides structured root cause analysis with validation steps
- Can request additional context when needed for thorough analysis - Can request additional context when needed for thorough analysis
- **Image support**: Include error screenshots, stack traces, console output: `"Debug this error using gemini with the stack trace screenshot and the failing test.py"`
- **Web search capability**: When enabled (default: true), identifies when searching for error messages, known issues, or documentation would help solve the problem and recommends specific searches for Claude - **Web search capability**: When enabled (default: true), identifies when searching for error messages, known issues, or documentation would help solve the problem and recommends specific searches for Claude
### 6. `analyze` - Smart File Analysis ### 6. `analyze` - Smart File Analysis
**General-purpose code understanding and exploration** **General-purpose code understanding and exploration**
@@ -447,6 +453,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
- Supports specialized analysis types: architecture, performance, security, quality - Supports specialized analysis types: architecture, performance, security, quality
- Uses file paths (not content) for clean terminal output - Uses file paths (not content) for clean terminal output
- Can identify patterns, anti-patterns, and refactoring opportunities - Can identify patterns, anti-patterns, and refactoring opportunities
- **Image support**: Analyze architecture diagrams, UML charts, flowcharts: `"Analyze this system diagram with gemini to understand the data flow and identify bottlenecks"`
- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices - **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
### 7. `refactor` - Intelligent Code Refactoring ### 7. `refactor` - Intelligent Code Refactoring
@@ -489,6 +496,7 @@ did *not* discover.
- **Conservative approach** - Careful dependency analysis to prevent breaking changes - **Conservative approach** - Careful dependency analysis to prevent breaking changes
- **Multi-file analysis** - Understands cross-file relationships and dependencies - **Multi-file analysis** - Understands cross-file relationships and dependencies
- **Priority sequencing** - Recommends implementation order for refactoring changes - **Priority sequencing** - Recommends implementation order for refactoring changes
- **Image support**: Analyze code architecture diagrams, legacy system charts: `"Refactor this legacy module using gemini pro with the current architecture diagram"`
**Refactor Types (Progressive Priority System):** **Refactor Types (Progressive Priority System):**
@@ -529,7 +537,8 @@ Claude can use to efficiently trace execution flows and map dependencies within
- Creates structured instructions for call-flow graph generation - Creates structured instructions for call-flow graph generation
- Provides detailed formatting requirements for consistent output - Provides detailed formatting requirements for consistent output
- Supports any programming language with automatic convention detection - Supports any programming language with automatic convention detection
- Output can be used as an input into another tool, such as `chat` along with related code files to perform a logical call-flow analysis - Output can be used as an input into another tool, such as `chat` along with related code files to perform a logical call-flow analysis
- **Image support**: Analyze visual call flow diagrams, sequence diagrams: `"Generate tracer analysis for this payment flow using the sequence diagram"`
#### Example Prompts: #### Example Prompts:
``` ```
@@ -564,6 +573,7 @@ suites that cover realistic failure scenarios and integration points that shorte
- Prioritizes smallest test files for pattern detection - Prioritizes smallest test files for pattern detection
- Can reference existing test files: `"Generate tests following patterns from tests/unit/"` - Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
- Specific code coverage - target specific functions/classes rather than testing everything - Specific code coverage - target specific functions/classes rather than testing everything
- **Image support**: Test UI components, analyze visual requirements: `"Generate tests for this login form using the UI mockup screenshot"`
### 10. `version` - Server Information ### 10. `version` - Server Information
``` ```
@@ -626,6 +636,7 @@ This server enables **true AI collaboration** between Claude and multiple AI mod
- **Automatic 25K limit bypass**: Each exchange sends only incremental context, allowing unlimited total conversation size - **Automatic 25K limit bypass**: Each exchange sends only incremental context, allowing unlimited total conversation size
- Up to 10 exchanges per conversation (configurable via `MAX_CONVERSATION_TURNS`) with 3-hour expiry (configurable via `CONVERSATION_TIMEOUT_HOURS`) - Up to 10 exchanges per conversation (configurable via `MAX_CONVERSATION_TURNS`) with 3-hour expiry (configurable via `CONVERSATION_TIMEOUT_HOURS`)
- Thread-safe with Redis persistence across all tools - Thread-safe with Redis persistence across all tools
- **Image context preservation** - Images and visual references are maintained across conversation turns and tool switches
**Cross-tool & Cross-Model Continuation Example:** **Cross-tool & Cross-Model Continuation Example:**
``` ```
@@ -659,7 +670,7 @@ DEFAULT_MODEL=auto # Claude picks the best model automatically
# API Keys (at least one required) # API Keys (at least one required)
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high, GPT-4.1
``` ```
**Available Models:** **Available Models:**
@@ -669,6 +680,7 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high
- **`o3mini`**: Balanced speed/quality - **`o3mini`**: Balanced speed/quality
- **`o4-mini`**: Latest reasoning model, optimized for shorter contexts - **`o4-mini`**: Latest reasoning model, optimized for shorter contexts
- **`o4-mini-high`**: Enhanced O4 with higher reasoning effort - **`o4-mini-high`**: Enhanced O4 with higher reasoning effort
- **`gpt4.1`**: GPT-4.1 with 1M context window
- **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.) - **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.)
For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md). For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).

View File

@@ -25,6 +25,8 @@
"supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)", "supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)",
"supports_json_mode": "Whether the model can guarantee valid JSON output", "supports_json_mode": "Whether the model can guarantee valid JSON output",
"supports_function_calling": "Whether the model supports function/tool calling", "supports_function_calling": "Whether the model supports function/tool calling",
"supports_images": "Whether the model can process images/visual input",
"max_image_size_mb": "Maximum total size in MB for all images combined (capped at 40MB max for custom models)",
"is_custom": "Set to true for models that should ONLY be used with custom API endpoints (Ollama, vLLM, etc.). False or omitted for OpenRouter/cloud models.", "is_custom": "Set to true for models that should ONLY be used with custom API endpoints (Ollama, vLLM, etc.). False or omitted for OpenRouter/cloud models.",
"description": "Human-readable description of the model" "description": "Human-readable description of the model"
}, },
@@ -35,6 +37,8 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"supports_images": true,
"max_image_size_mb": 10.0,
"is_custom": true, "is_custom": true,
"description": "Example custom/local model for Ollama, vLLM, etc." "description": "Example custom/local model for Ollama, vLLM, etc."
} }
@@ -47,7 +51,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Claude 3 Opus - Most capable Claude model" "supports_images": true,
"max_image_size_mb": 5.0,
"description": "Claude 3 Opus - Most capable Claude model with vision"
}, },
{ {
"model_name": "anthropic/claude-3-sonnet", "model_name": "anthropic/claude-3-sonnet",
@@ -56,7 +62,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Claude 3 Sonnet - Balanced performance" "supports_images": true,
"max_image_size_mb": 5.0,
"description": "Claude 3 Sonnet - Balanced performance with vision"
}, },
{ {
"model_name": "anthropic/claude-3-haiku", "model_name": "anthropic/claude-3-haiku",
@@ -65,7 +73,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Claude 3 Haiku - Fast and efficient" "supports_images": true,
"max_image_size_mb": 5.0,
"description": "Claude 3 Haiku - Fast and efficient with vision"
}, },
{ {
"model_name": "google/gemini-2.5-pro-preview", "model_name": "google/gemini-2.5-pro-preview",
@@ -74,7 +84,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Google's Gemini 2.5 Pro via OpenRouter" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "Google's Gemini 2.5 Pro via OpenRouter with vision"
}, },
{ {
"model_name": "google/gemini-2.5-flash-preview-05-20", "model_name": "google/gemini-2.5-flash-preview-05-20",
@@ -83,7 +95,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Google's Gemini 2.5 Flash via OpenRouter" "supports_images": true,
"max_image_size_mb": 15.0,
"description": "Google's Gemini 2.5 Flash via OpenRouter with vision"
}, },
{ {
"model_name": "mistral/mistral-large", "model_name": "mistral/mistral-large",
@@ -92,7 +106,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "Mistral's largest model" "supports_images": false,
"max_image_size_mb": 0.0,
"description": "Mistral's largest model (text-only)"
}, },
{ {
"model_name": "meta-llama/llama-3-70b", "model_name": "meta-llama/llama-3-70b",
@@ -101,7 +117,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Meta's Llama 3 70B model" "supports_images": false,
"max_image_size_mb": 0.0,
"description": "Meta's Llama 3 70B model (text-only)"
}, },
{ {
"model_name": "deepseek/deepseek-r1-0528", "model_name": "deepseek/deepseek-r1-0528",
@@ -110,7 +128,9 @@
"supports_extended_thinking": true, "supports_extended_thinking": true,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": false, "supports_function_calling": false,
"description": "DeepSeek R1 with thinking mode - advanced reasoning capabilities" "supports_images": false,
"max_image_size_mb": 0.0,
"description": "DeepSeek R1 with thinking mode - advanced reasoning capabilities (text-only)"
}, },
{ {
"model_name": "perplexity/llama-3-sonar-large-32k-online", "model_name": "perplexity/llama-3-sonar-large-32k-online",
@@ -119,7 +139,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"description": "Perplexity's online model with web search" "supports_images": false,
"max_image_size_mb": 0.0,
"description": "Perplexity's online model with web search (text-only)"
}, },
{ {
"model_name": "openai/o3", "model_name": "openai/o3",
@@ -128,7 +150,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o3 model - well-rounded and powerful across domains" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o3 model - well-rounded and powerful across domains with vision"
}, },
{ {
"model_name": "openai/o3-mini", "model_name": "openai/o3-mini",
@@ -137,7 +161,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o3-mini model - balanced performance and speed" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o3-mini model - balanced performance and speed with vision"
}, },
{ {
"model_name": "openai/o3-mini-high", "model_name": "openai/o3-mini-high",
@@ -146,7 +172,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems with vision"
}, },
{ {
"model_name": "openai/o3-pro", "model_name": "openai/o3-pro",
@@ -155,7 +183,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o3-pro model - professional-grade reasoning and analysis" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o3-pro model - professional-grade reasoning and analysis with vision"
}, },
{ {
"model_name": "openai/o4-mini", "model_name": "openai/o4-mini",
@@ -164,7 +194,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o4-mini model - optimized for shorter contexts with rapid reasoning" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o4-mini model - optimized for shorter contexts with rapid reasoning and vision"
}, },
{ {
"model_name": "openai/o4-mini-high", "model_name": "openai/o4-mini-high",
@@ -173,7 +205,9 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": true, "supports_json_mode": true,
"supports_function_calling": true, "supports_function_calling": true,
"description": "OpenAI's o4-mini with high reasoning effort - enhanced for complex tasks" "supports_images": true,
"max_image_size_mb": 20.0,
"description": "OpenAI's o4-mini with high reasoning effort - enhanced for complex tasks with vision"
}, },
{ {
"model_name": "llama3.2", "model_name": "llama3.2",
@@ -182,8 +216,10 @@
"supports_extended_thinking": false, "supports_extended_thinking": false,
"supports_json_mode": false, "supports_json_mode": false,
"supports_function_calling": false, "supports_function_calling": false,
"supports_images": false,
"max_image_size_mb": 0.0,
"is_custom": true, "is_custom": true,
"description": "Local Llama 3.2 model via custom endpoint (Ollama/vLLM) - 128K context window" "description": "Local Llama 3.2 model via custom endpoint (Ollama/vLLM) - 128K context window (text-only)"
} }
] ]
} }

View File

@@ -14,7 +14,7 @@ import os
# These values are used in server responses and for tracking releases # These values are used in server responses and for tracking releases
# IMPORTANT: This is the single source of truth for version and author info # IMPORTANT: This is the single source of truth for version and author info
# Semantic versioning: MAJOR.MINOR.PATCH # Semantic versioning: MAJOR.MINOR.PATCH
__version__ = "4.7.5" __version__ = "4.8.0"
# Last update date in ISO format # Last update date in ISO format
__updated__ = "2025-06-16" __updated__ = "2025-06-16"
# Primary maintainer # Primary maintainer

View File

@@ -8,13 +8,13 @@ services:
- "6379:6379" - "6379:6379"
volumes: volumes:
- redis_data:/data - redis_data:/data
command: redis-server --save 60 1 --loglevel warning --maxmemory 64mb --maxmemory-policy allkeys-lru command: redis-server --save 60 1 --loglevel warning --maxmemory 512mb --maxmemory-policy allkeys-lru
deploy: deploy:
resources: resources:
limits: limits:
memory: 1G memory: 1G
reservations: reservations:
memory: 256M memory: 128M
zen-mcp: zen-mcp:
build: . build: .

View File

@@ -11,6 +11,7 @@ This guide covers advanced features, configuration options, and workflows for po
- [Context Revival: AI Memory Beyond Context Limits](#context-revival-ai-memory-beyond-context-limits) - [Context Revival: AI Memory Beyond Context Limits](#context-revival-ai-memory-beyond-context-limits)
- [Collaborative Workflows](#collaborative-workflows) - [Collaborative Workflows](#collaborative-workflows)
- [Working with Large Prompts](#working-with-large-prompts) - [Working with Large Prompts](#working-with-large-prompts)
- [Vision Support](#vision-support)
- [Web Search Integration](#web-search-integration) - [Web Search Integration](#web-search-integration)
- [System Prompts](#system-prompts) - [System Prompts](#system-prompts)
@@ -25,7 +26,7 @@ DEFAULT_MODEL=auto # Claude picks the best model automatically
# API Keys (at least one required) # API Keys (at least one required)
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high, GPT-4.1
``` ```
**How Auto Mode Works:** **How Auto Mode Works:**
@@ -43,6 +44,7 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks | | **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts | | **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis | | **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
| **`gpt4.1`** | OpenAI | 1M tokens | Latest GPT-4 with extended context | Large codebase analysis, comprehensive reviews |
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing | | **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements | | **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
@@ -57,6 +59,7 @@ You can specify a default model instead of auto mode:
DEFAULT_MODEL=gemini-2.5-pro-preview-06-05 # Always use Gemini Pro DEFAULT_MODEL=gemini-2.5-pro-preview-06-05 # Always use Gemini Pro
DEFAULT_MODEL=flash # Always use Flash DEFAULT_MODEL=flash # Always use Flash
DEFAULT_MODEL=o3 # Always use O3 DEFAULT_MODEL=o3 # Always use O3
DEFAULT_MODEL=gpt4.1 # Always use GPT-4.1
``` ```
**Important:** After changing any configuration in `.env` (including `DEFAULT_MODEL`, API keys, or other settings), restart the server with `./run-server.sh` to apply the changes. **Important:** After changing any configuration in `.env` (including `DEFAULT_MODEL`, API keys, or other settings), restart the server with `./run-server.sh` to apply the changes.
@@ -67,10 +70,12 @@ Regardless of your default setting, you can specify models per request:
- "Use **flash** to quickly format this code" - "Use **flash** to quickly format this code"
- "Use **o3** to debug this logic error" - "Use **o3** to debug this logic error"
- "Review with **o4-mini** for balanced analysis" - "Review with **o4-mini** for balanced analysis"
- "Use **gpt4.1** for comprehensive codebase analysis"
**Model Capabilities:** **Model Capabilities:**
- **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context - **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
- **O3 Models**: Excellent reasoning, systematic analysis, 200K context - **O3 Models**: Excellent reasoning, systematic analysis, 200K context
- **GPT-4.1**: Extended context window (1M tokens), general capabilities
## Model Usage Restrictions ## Model Usage Restrictions
@@ -186,7 +191,7 @@ All tools that work with files support **both individual files and entire direct
**`analyze`** - Analyze files or directories **`analyze`** - Analyze files or directories
- `files`: List of file paths or directories (required) - `files`: List of file paths or directories (required)
- `question`: What to analyze (required) - `question`: What to analyze (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `analysis_type`: architecture|performance|security|quality|general - `analysis_type`: architecture|performance|security|quality|general
- `output_format`: summary|detailed|actionable - `output_format`: summary|detailed|actionable
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only) - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -201,7 +206,7 @@ All tools that work with files support **both individual files and entire direct
**`codereview`** - Review code files or directories **`codereview`** - Review code files or directories
- `files`: List of file paths or directories (required) - `files`: List of file paths or directories (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `review_type`: full|security|performance|quick - `review_type`: full|security|performance|quick
- `focus_on`: Specific aspects to focus on - `focus_on`: Specific aspects to focus on
- `standards`: Coding standards to enforce - `standards`: Coding standards to enforce
@@ -217,7 +222,7 @@ All tools that work with files support **both individual files and entire direct
**`debug`** - Debug with file context **`debug`** - Debug with file context
- `error_description`: Description of the issue (required) - `error_description`: Description of the issue (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `error_context`: Stack trace or logs - `error_context`: Stack trace or logs
- `files`: Files or directories related to the issue - `files`: Files or directories related to the issue
- `runtime_info`: Environment details - `runtime_info`: Environment details
@@ -233,7 +238,7 @@ All tools that work with files support **both individual files and entire direct
**`thinkdeep`** - Extended analysis with file context **`thinkdeep`** - Extended analysis with file context
- `current_analysis`: Your current thinking (required) - `current_analysis`: Your current thinking (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `problem_context`: Additional context - `problem_context`: Additional context
- `focus_areas`: Specific aspects to focus on - `focus_areas`: Specific aspects to focus on
- `files`: Files or directories for context - `files`: Files or directories for context
@@ -249,7 +254,7 @@ All tools that work with files support **both individual files and entire direct
**`testgen`** - Comprehensive test generation with edge case coverage **`testgen`** - Comprehensive test generation with edge case coverage
- `files`: Code files or directories to generate tests for (required) - `files`: Code files or directories to generate tests for (required)
- `prompt`: Description of what to test, testing objectives, and scope (required) - `prompt`: Description of what to test, testing objectives, and scope (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `test_examples`: Optional existing test files as style/pattern reference - `test_examples`: Optional existing test files as style/pattern reference
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only) - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -264,7 +269,7 @@ All tools that work with files support **both individual files and entire direct
- `files`: Code files or directories to analyze for refactoring opportunities (required) - `files`: Code files or directories to analyze for refactoring opportunities (required)
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required) - `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
- `refactor_type`: codesmells|decompose|modernize|organization (required) - `refactor_type`: codesmells|decompose|modernize|organization (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default) - `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security') - `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
- `style_guide_examples`: Optional existing code files to use as style/pattern reference - `style_guide_examples`: Optional existing code files to use as style/pattern reference
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only) - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -357,6 +362,47 @@ To help choose the right tool for your needs:
- `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues - `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues
- `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding - `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding
## Vision Support
The Zen MCP server supports vision-capable models for analyzing images, diagrams, screenshots, and visual content. Vision support works seamlessly with all tools and conversation threading.
**Supported Models:**
- **Gemini 2.5 Pro & Flash**: Excellent for diagrams, architecture analysis, UI mockups (up to 20MB total)
- **OpenAI O3/O4 series**: Strong for visual debugging, error screenshots (up to 20MB total)
- **Claude models via OpenRouter**: Good for code screenshots, visual analysis (up to 5MB total)
- **Custom models**: Support varies by model, with 40MB maximum enforced for abuse prevention
**Usage Examples:**
```bash
# Debug with error screenshots
"Use zen to debug this error with the stack trace screenshot and error.py"
# Architecture analysis with diagrams
"Analyze this system architecture diagram with gemini pro for bottlenecks"
# UI review with mockups
"Chat with flash about this UI mockup - is the layout intuitive?"
# Code review with visual context
"Review this authentication code along with the error dialog screenshot"
```
**Image Formats Supported:**
- **Images**: JPG, PNG, GIF, WebP, BMP, SVG, TIFF
- **Documents**: PDF (where supported by model)
- **Data URLs**: Base64-encoded images from Claude
**Key Features:**
- **Automatic validation**: File type, magic bytes, and size validation
- **Conversation context**: Images persist across tool switches and continuation
- **Budget management**: Automatic dropping of old images when limits exceeded
- **Model capability-aware**: Only sends images to vision-capable models
**Best Practices:**
- Describe images when including them: "screenshot of login error", "system architecture diagram"
- Use appropriate models: Gemini for complex diagrams, O3 for debugging visuals
- Consider image sizes: Larger images consume more of the model's capacity
## Working with Large Prompts ## Working with Large Prompts
The MCP protocol has a combined request+response limit of approximately 25K tokens. This server intelligently works around this limitation by automatically handling large prompts as files: The MCP protocol has a combined request+response limit of approximately 25K tokens. This server intelligently works around this limitation by automatically handling large prompts as files:

View File

@@ -112,6 +112,8 @@ class ModelCapabilities:
supports_system_prompts: bool = True supports_system_prompts: bool = True
supports_streaming: bool = True supports_streaming: bool = True
supports_function_calling: bool = False supports_function_calling: bool = False
supports_images: bool = False # Whether model can process images
max_image_size_mb: float = 0.0 # Maximum total size for all images in MB
# Temperature constraint object - preferred way to define temperature limits # Temperature constraint object - preferred way to define temperature limits
temperature_constraint: TemperatureConstraint = field( temperature_constraint: TemperatureConstraint = field(

View File

@@ -1,6 +1,8 @@
"""Gemini model provider implementation.""" """Gemini model provider implementation."""
import base64
import logging import logging
import os
import time import time
from typing import Optional from typing import Optional
@@ -21,11 +23,15 @@ class GeminiModelProvider(ModelProvider):
"context_window": 1_048_576, # 1M tokens "context_window": 1_048_576, # 1M tokens
"supports_extended_thinking": True, "supports_extended_thinking": True,
"max_thinking_tokens": 24576, # Flash 2.5 thinking budget limit "max_thinking_tokens": 24576, # Flash 2.5 thinking budget limit
"supports_images": True, # Vision capability
"max_image_size_mb": 20.0, # Conservative 20MB limit for reliability
}, },
"gemini-2.5-pro-preview-06-05": { "gemini-2.5-pro-preview-06-05": {
"context_window": 1_048_576, # 1M tokens "context_window": 1_048_576, # 1M tokens
"supports_extended_thinking": True, "supports_extended_thinking": True,
"max_thinking_tokens": 32768, # Pro 2.5 thinking budget limit "max_thinking_tokens": 32768, # Pro 2.5 thinking budget limit
"supports_images": True, # Vision capability
"max_image_size_mb": 32.0, # Higher limit for Pro model
}, },
# Shorthands # Shorthands
"flash": "gemini-2.5-flash-preview-05-20", "flash": "gemini-2.5-flash-preview-05-20",
@@ -84,6 +90,8 @@ class GeminiModelProvider(ModelProvider):
supports_system_prompts=True, supports_system_prompts=True,
supports_streaming=True, supports_streaming=True,
supports_function_calling=True, supports_function_calling=True,
supports_images=config.get("supports_images", False),
max_image_size_mb=config.get("max_image_size_mb", 0.0),
temperature_constraint=temp_constraint, temperature_constraint=temp_constraint,
) )
@@ -95,6 +103,7 @@ class GeminiModelProvider(ModelProvider):
temperature: float = 0.7, temperature: float = 0.7,
max_output_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None,
thinking_mode: str = "medium", thinking_mode: str = "medium",
images: Optional[list[str]] = None,
**kwargs, **kwargs,
) -> ModelResponse: ) -> ModelResponse:
"""Generate content using Gemini model.""" """Generate content using Gemini model."""
@@ -102,12 +111,34 @@ class GeminiModelProvider(ModelProvider):
resolved_name = self._resolve_model_name(model_name) resolved_name = self._resolve_model_name(model_name)
self.validate_parameters(model_name, temperature) self.validate_parameters(model_name, temperature)
# Combine system prompt with user prompt if provided # Prepare content parts (text and potentially images)
parts = []
# Add system and user prompts as text
if system_prompt: if system_prompt:
full_prompt = f"{system_prompt}\n\n{prompt}" full_prompt = f"{system_prompt}\n\n{prompt}"
else: else:
full_prompt = prompt full_prompt = prompt
parts.append({"text": full_prompt})
# Add images if provided and model supports vision
if images and self._supports_vision(resolved_name):
for image_path in images:
try:
image_part = self._process_image(image_path)
if image_part:
parts.append(image_part)
except Exception as e:
logger.warning(f"Failed to process image {image_path}: {e}")
# Continue with other images and text
continue
elif images and not self._supports_vision(resolved_name):
logger.warning(f"Model {resolved_name} does not support images, ignoring {len(images)} image(s)")
# Create contents structure
contents = [{"parts": parts}]
# Prepare generation config # Prepare generation config
generation_config = types.GenerateContentConfig( generation_config = types.GenerateContentConfig(
temperature=temperature, temperature=temperature,
@@ -139,7 +170,7 @@ class GeminiModelProvider(ModelProvider):
# Generate content # Generate content
response = self.client.models.generate_content( response = self.client.models.generate_content(
model=resolved_name, model=resolved_name,
contents=full_prompt, contents=contents,
config=generation_config, config=generation_config,
) )
@@ -274,3 +305,51 @@ class GeminiModelProvider(ModelProvider):
usage["total_tokens"] = usage["input_tokens"] + usage["output_tokens"] usage["total_tokens"] = usage["input_tokens"] + usage["output_tokens"]
return usage return usage
def _supports_vision(self, model_name: str) -> bool:
"""Check if the model supports vision (image processing)."""
# Gemini 2.5 models support vision
vision_models = {
"gemini-2.5-flash-preview-05-20",
"gemini-2.5-pro-preview-06-05",
"gemini-2.0-flash",
"gemini-1.5-pro",
"gemini-1.5-flash",
}
return model_name in vision_models
def _process_image(self, image_path: str) -> Optional[dict]:
"""Process an image for Gemini API."""
try:
if image_path.startswith("data:image/"):
# Handle data URL: data:image/png;base64,iVBORw0...
header, data = image_path.split(",", 1)
mime_type = header.split(";")[0].split(":")[1]
return {"inline_data": {"mime_type": mime_type, "data": data}}
else:
# Handle file path - translate for Docker environment
from utils.file_types import get_image_mime_type
from utils.file_utils import translate_path_for_environment
translated_path = translate_path_for_environment(image_path)
logger.debug(f"Translated image path from '{image_path}' to '{translated_path}'")
if not os.path.exists(translated_path):
logger.warning(f"Image file not found: {translated_path} (original: {image_path})")
return None
# Use translated path for all subsequent operations
image_path = translated_path
# Detect MIME type from file extension using centralized mappings
ext = os.path.splitext(image_path)[1].lower()
mime_type = get_image_mime_type(ext)
# Read and encode the image
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
return {"inline_data": {"mime_type": mime_type, "data": image_data}}
except Exception as e:
logger.error(f"Error processing image {image_path}: {e}")
return None

View File

@@ -23,22 +23,38 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
"o3": { "o3": {
"context_window": 200_000, # 200K tokens "context_window": 200_000, # 200K tokens
"supports_extended_thinking": False, "supports_extended_thinking": False,
"supports_images": True, # O3 models support vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
}, },
"o3-mini": { "o3-mini": {
"context_window": 200_000, # 200K tokens "context_window": 200_000, # 200K tokens
"supports_extended_thinking": False, "supports_extended_thinking": False,
"supports_images": True, # O3 models support vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
}, },
"o3-pro": { "o3-pro": {
"context_window": 200_000, # 200K tokens "context_window": 200_000, # 200K tokens
"supports_extended_thinking": False, "supports_extended_thinking": False,
"supports_images": True, # O3 models support vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
}, },
"o4-mini": { "o4-mini": {
"context_window": 200_000, # 200K tokens "context_window": 200_000, # 200K tokens
"supports_extended_thinking": False, "supports_extended_thinking": False,
"supports_images": True, # O4 models support vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
}, },
"o4-mini-high": { "o4-mini-high": {
"context_window": 200_000, # 200K tokens "context_window": 200_000, # 200K tokens
"supports_extended_thinking": False, "supports_extended_thinking": False,
"supports_images": True, # O4 models support vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
},
"gpt-4.1-2025-04-14": {
"context_window": 1_000_000, # 1M tokens
"supports_extended_thinking": False,
"supports_images": True, # GPT-4.1 supports vision
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
}, },
# Shorthands # Shorthands
"mini": "o4-mini", # Default 'mini' to latest mini model "mini": "o4-mini", # Default 'mini' to latest mini model
@@ -46,6 +62,7 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
"o4mini": "o4-mini", "o4mini": "o4-mini",
"o4minihigh": "o4-mini-high", "o4minihigh": "o4-mini-high",
"o4minihi": "o4-mini-high", "o4minihi": "o4-mini-high",
"gpt4.1": "gpt-4.1-2025-04-14",
} }
def __init__(self, api_key: str, **kwargs): def __init__(self, api_key: str, **kwargs):
@@ -76,7 +93,7 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
# O3 and O4 reasoning models only support temperature=1.0 # O3 and O4 reasoning models only support temperature=1.0
temp_constraint = FixedTemperatureConstraint(1.0) temp_constraint = FixedTemperatureConstraint(1.0)
else: else:
# Other OpenAI models support 0.0-2.0 range # Other OpenAI models (including GPT-4.1) support 0.0-2.0 range
temp_constraint = RangeTemperatureConstraint(0.0, 2.0, 0.7) temp_constraint = RangeTemperatureConstraint(0.0, 2.0, 0.7)
return ModelCapabilities( return ModelCapabilities(
@@ -88,6 +105,8 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
supports_system_prompts=True, supports_system_prompts=True,
supports_streaming=True, supports_streaming=True,
supports_function_calling=True, supports_function_calling=True,
supports_images=config.get("supports_images", False),
max_image_size_mb=config.get("max_image_size_mb", 0.0),
temperature_constraint=temp_constraint, temperature_constraint=temp_constraint,
) )

View File

@@ -1,5 +1,6 @@
"""Base class for OpenAI-compatible API providers.""" """Base class for OpenAI-compatible API providers."""
import base64
import ipaddress import ipaddress
import logging import logging
import os import os
@@ -229,6 +230,7 @@ class OpenAICompatibleProvider(ModelProvider):
system_prompt: Optional[str] = None, system_prompt: Optional[str] = None,
temperature: float = 0.7, temperature: float = 0.7,
max_output_tokens: Optional[int] = None, max_output_tokens: Optional[int] = None,
images: Optional[list[str]] = None,
**kwargs, **kwargs,
) -> ModelResponse: ) -> ModelResponse:
"""Generate content using the OpenAI-compatible API. """Generate content using the OpenAI-compatible API.
@@ -255,7 +257,32 @@ class OpenAICompatibleProvider(ModelProvider):
messages = [] messages = []
if system_prompt: if system_prompt:
messages.append({"role": "system", "content": system_prompt}) messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
# Prepare user message with text and potentially images
user_content = []
user_content.append({"type": "text", "text": prompt})
# Add images if provided and model supports vision
if images and self._supports_vision(model_name):
for image_path in images:
try:
image_content = self._process_image(image_path)
if image_content:
user_content.append(image_content)
except Exception as e:
logging.warning(f"Failed to process image {image_path}: {e}")
# Continue with other images and text
continue
elif images and not self._supports_vision(model_name):
logging.warning(f"Model {model_name} does not support images, ignoring {len(images)} image(s)")
# Add user message
if len(user_content) == 1:
# Only text content, use simple string format for compatibility
messages.append({"role": "user", "content": prompt})
else:
# Text + images, use content array format
messages.append({"role": "user", "content": user_content})
# Prepare completion parameters # Prepare completion parameters
completion_params = { completion_params = {
@@ -424,3 +451,66 @@ class OpenAICompatibleProvider(ModelProvider):
Default is False for OpenAI-compatible providers. Default is False for OpenAI-compatible providers.
""" """
return False return False
def _supports_vision(self, model_name: str) -> bool:
"""Check if the model supports vision (image processing).
Default implementation for OpenAI-compatible providers.
Subclasses should override with specific model support.
"""
# Common vision-capable models - only include models that actually support images
vision_models = {
"gpt-4o",
"gpt-4o-mini",
"gpt-4-turbo",
"gpt-4-vision-preview",
"gpt-4.1-2025-04-14", # GPT-4.1 supports vision
"o3",
"o3-mini",
"o3-pro",
"o4-mini",
"o4-mini-high",
# Note: Claude models would be handled by a separate provider
}
supports = model_name.lower() in vision_models
logging.debug(f"Model '{model_name}' vision support: {supports}")
return supports
def _process_image(self, image_path: str) -> Optional[dict]:
"""Process an image for OpenAI-compatible API."""
try:
if image_path.startswith("data:image/"):
# Handle data URL: data:image/png;base64,iVBORw0...
return {"type": "image_url", "image_url": {"url": image_path}}
else:
# Handle file path - translate for Docker environment
from utils.file_utils import translate_path_for_environment
translated_path = translate_path_for_environment(image_path)
logging.debug(f"Translated image path from '{image_path}' to '{translated_path}'")
if not os.path.exists(translated_path):
logging.warning(f"Image file not found: {translated_path} (original: {image_path})")
return None
# Use translated path for all subsequent operations
image_path = translated_path
# Detect MIME type from file extension using centralized mappings
from utils.file_types import get_image_mime_type
ext = os.path.splitext(image_path)[1].lower()
mime_type = get_image_mime_type(ext)
logging.debug(f"Processing image '{image_path}' with extension '{ext}' as MIME type '{mime_type}'")
# Read and encode the image
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
# Create data URL for OpenAI API
data_url = f"data:{mime_type};base64,{image_data}"
return {"type": "image_url", "image_url": {"url": data_url}}
except Exception as e:
logging.error(f"Error processing image {image_path}: {e}")
return None

View File

@@ -23,6 +23,8 @@ class OpenRouterModelConfig:
supports_streaming: bool = True supports_streaming: bool = True
supports_function_calling: bool = False supports_function_calling: bool = False
supports_json_mode: bool = False supports_json_mode: bool = False
supports_images: bool = False # Whether model can process images
max_image_size_mb: float = 0.0 # Maximum total size for all images in MB
is_custom: bool = False # True for models that should only be used with custom endpoints is_custom: bool = False # True for models that should only be used with custom endpoints
description: str = "" description: str = ""
@@ -37,6 +39,8 @@ class OpenRouterModelConfig:
supports_system_prompts=self.supports_system_prompts, supports_system_prompts=self.supports_system_prompts,
supports_streaming=self.supports_streaming, supports_streaming=self.supports_streaming,
supports_function_calling=self.supports_function_calling, supports_function_calling=self.supports_function_calling,
supports_images=self.supports_images,
max_image_size_mb=self.max_image_size_mb,
temperature_constraint=RangeTemperatureConstraint(0.0, 2.0, 1.0), temperature_constraint=RangeTemperatureConstraint(0.0, 2.0, 1.0),
) )
@@ -66,7 +70,8 @@ class OpenRouterModelRegistry:
translated_path = translate_path_for_environment(env_path) translated_path = translate_path_for_environment(env_path)
self.config_path = Path(translated_path) self.config_path = Path(translated_path)
else: else:
# Default to conf/custom_models.json (already in container) # Default to conf/custom_models.json - use relative path from this file
# This works both in development and container environments
self.config_path = Path(__file__).parent.parent / "conf" / "custom_models.json" self.config_path = Path(__file__).parent.parent / "conf" / "custom_models.json"
# Load configuration # Load configuration

View File

@@ -24,6 +24,7 @@ from .test_redis_validation import RedisValidationTest
from .test_refactor_validation import RefactorValidationTest from .test_refactor_validation import RefactorValidationTest
from .test_testgen_validation import TestGenValidationTest from .test_testgen_validation import TestGenValidationTest
from .test_token_allocation_validation import TokenAllocationValidationTest from .test_token_allocation_validation import TokenAllocationValidationTest
from .test_vision_capability import VisionCapabilityTest
from .test_xai_models import XAIModelsTest from .test_xai_models import XAIModelsTest
# Test registry for dynamic loading # Test registry for dynamic loading
@@ -45,6 +46,7 @@ TEST_REGISTRY = {
"testgen_validation": TestGenValidationTest, "testgen_validation": TestGenValidationTest,
"refactor_validation": RefactorValidationTest, "refactor_validation": RefactorValidationTest,
"conversation_chain_validation": ConversationChainValidationTest, "conversation_chain_validation": ConversationChainValidationTest,
"vision_capability": VisionCapabilityTest,
"xai_models": XAIModelsTest, "xai_models": XAIModelsTest,
# "o3_pro_expensive": O3ProExpensiveTest, # COMMENTED OUT - too expensive to run by default # "o3_pro_expensive": O3ProExpensiveTest, # COMMENTED OUT - too expensive to run by default
} }
@@ -69,6 +71,7 @@ __all__ = [
"TestGenValidationTest", "TestGenValidationTest",
"RefactorValidationTest", "RefactorValidationTest",
"ConversationChainValidationTest", "ConversationChainValidationTest",
"VisionCapabilityTest",
"XAIModelsTest", "XAIModelsTest",
"TEST_REGISTRY", "TEST_REGISTRY",
] ]

View File

@@ -0,0 +1,163 @@
#!/usr/bin/env python3
"""
Vision Capability Test
Tests vision capability with the chat tool using O3 model:
- Test file path image (PNG triangle)
- Test base64 data URL image
- Use chat tool with O3 model to analyze the images
- Verify the model correctly identifies shapes
"""
import base64
import os
from .base_test import BaseSimulatorTest
class VisionCapabilityTest(BaseSimulatorTest):
"""Test vision capability with chat tool and O3 model"""
@property
def test_name(self) -> str:
return "vision_capability"
@property
def test_description(self) -> str:
return "Vision capability test with chat tool and O3 model"
def get_triangle_png_path(self) -> str:
"""Get the path to the triangle.png file in tests directory"""
# Get the project root and find the triangle.png in tests/
current_dir = os.getcwd()
triangle_path = os.path.join(current_dir, "tests", "triangle.png")
if not os.path.exists(triangle_path):
raise FileNotFoundError(f"triangle.png not found at {triangle_path}")
abs_path = os.path.abspath(triangle_path)
self.logger.debug(f"Using triangle PNG at host path: {abs_path}")
return abs_path
def create_base64_triangle_data_url(self) -> str:
"""Create a base64 data URL from the triangle.png file"""
triangle_path = self.get_triangle_png_path()
with open(triangle_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
data_url = f"data:image/png;base64,{image_data}"
self.logger.debug(f"Created base64 data URL with {len(image_data)} characters")
return data_url
def run_test(self) -> bool:
"""Test vision capability with O3 model"""
try:
self.logger.info("Test: Vision capability with O3 model")
# Test 1: File path image
self.logger.info(" 1.1: Testing file path image (PNG triangle)")
triangle_path = self.get_triangle_png_path()
self.logger.info(f" ✅ Using triangle PNG at: {triangle_path}")
response1, continuation_id = self.call_mcp_tool(
"chat",
{
"prompt": "What shape do you see in this image? Please be specific and only mention the shape name.",
"images": [triangle_path],
"model": "o3",
},
)
if not response1:
self.logger.error("Failed to get response from O3 model for file path test")
return False
# Check for error indicators first
response1_lower = response1.lower()
if any(
error_phrase in response1_lower
for error_phrase in [
"don't have access",
"cannot see",
"no image",
"clarification_required",
"image you're referring to",
"supply the image",
"error",
]
):
self.logger.error(f" ❌ O3 model cannot access file path image. Response: {response1[:300]}...")
return False
if "triangle" not in response1_lower:
self.logger.error(
f" ❌ O3 did not identify triangle in file path test. Response: {response1[:200]}..."
)
return False
self.logger.info(" ✅ O3 correctly identified file path image as triangle")
# Test 2: Base64 data URL image
self.logger.info(" 1.2: Testing base64 data URL image")
data_url = self.create_base64_triangle_data_url()
response2, _ = self.call_mcp_tool(
"chat",
{
"prompt": "What shape do you see in this image? Please be specific and only mention the shape name.",
"images": [data_url],
"model": "o3",
},
)
if not response2:
self.logger.error("Failed to get response from O3 model for base64 test")
return False
response2_lower = response2.lower()
if any(
error_phrase in response2_lower
for error_phrase in [
"don't have access",
"cannot see",
"no image",
"clarification_required",
"image you're referring to",
"supply the image",
"error",
]
):
self.logger.error(f" ❌ O3 model cannot access base64 image. Response: {response2[:300]}...")
return False
if "triangle" not in response2_lower:
self.logger.error(f" ❌ O3 did not identify triangle in base64 test. Response: {response2[:200]}...")
return False
self.logger.info(" ✅ O3 correctly identified base64 image as triangle")
# Optional: Test continuation with same image
if continuation_id:
self.logger.info(" 1.3: Testing continuation with same image")
response3, _ = self.call_mcp_tool(
"chat",
{
"prompt": "What color is this triangle?",
"images": [triangle_path], # Same image should be deduplicated
"continuation_id": continuation_id,
"model": "o3",
},
)
if response3:
self.logger.info(" ✅ Continuation also working correctly")
else:
self.logger.warning(" ⚠️ Continuation response not received")
self.logger.info(" ✅ Vision capability test completed successfully")
return True
except Exception as e:
self.logger.error(f"Vision capability test failed: {e}")
return False

View File

@@ -1,126 +0,0 @@
"""
Test /app/ to ./ path translation for standalone mode.
Tests that internal application paths work in both Docker and standalone modes.
"""
import os
import tempfile
from unittest.mock import patch
from utils.file_utils import translate_path_for_environment
class TestAppPathTranslation:
"""Test translation of /app/ paths for different environments."""
def test_app_path_translation_in_standalone_mode(self):
"""Test that /app/ paths are translated to ./ in standalone mode."""
# Mock standalone environment (no Docker)
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
mock_container_workspace.exists.return_value = False
# Clear WORKSPACE_ROOT to simulate standalone mode
with patch.dict(os.environ, {}, clear=True):
# Test translation of internal app paths
test_cases = [
("/app/conf/custom_models.json", "./conf/custom_models.json"),
("/app/conf/other_config.json", "./conf/other_config.json"),
("/app/logs/app.log", "./logs/app.log"),
("/app/data/file.txt", "./data/file.txt"),
]
for input_path, expected_output in test_cases:
result = translate_path_for_environment(input_path)
assert result == expected_output, f"Expected {expected_output}, got {result}"
def test_allowed_app_path_unchanged_in_docker_mode(self):
"""Test that allowed /app/ paths remain unchanged in Docker mode."""
with tempfile.TemporaryDirectory() as tmpdir:
# Mock Docker environment
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
mock_container_workspace.exists.return_value = True
mock_container_workspace.__str__.return_value = "/workspace"
# Set WORKSPACE_ROOT to simulate Docker environment
with patch.dict(os.environ, {"WORKSPACE_ROOT": tmpdir}):
# Only specifically allowed internal app paths should remain unchanged in Docker
allowed_path = "/app/conf/custom_models.json"
result = translate_path_for_environment(allowed_path)
assert (
result == allowed_path
), f"Docker mode should preserve allowed path {allowed_path}, got {result}"
def test_non_allowed_app_paths_blocked_in_docker_mode(self):
"""Test that non-allowed /app/ paths are blocked in Docker mode."""
with tempfile.TemporaryDirectory() as tmpdir:
# Mock Docker environment
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
mock_container_workspace.exists.return_value = True
mock_container_workspace.__str__.return_value = "/workspace"
# Set WORKSPACE_ROOT to simulate Docker environment
with patch.dict(os.environ, {"WORKSPACE_ROOT": tmpdir}):
# Non-allowed internal app paths should be blocked in Docker for security
blocked_paths = [
"/app/conf/other_config.json",
"/app/logs/app.log",
"/app/server.py",
]
for blocked_path in blocked_paths:
result = translate_path_for_environment(blocked_path)
assert result.startswith(
"/inaccessible/"
), f"Docker mode should block non-allowed path {blocked_path}, got {result}"
def test_non_app_paths_unchanged_in_standalone(self):
"""Test that non-/app/ paths are unchanged in standalone mode."""
# Mock standalone environment
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
mock_container_workspace.exists.return_value = False
with patch.dict(os.environ, {}, clear=True):
# Non-app paths should be unchanged
test_cases = [
"/home/user/file.py",
"/etc/config.conf",
"./local/file.txt",
"relative/path.py",
"/workspace/file.py",
]
for input_path in test_cases:
result = translate_path_for_environment(input_path)
assert result == input_path, f"Non-app path {input_path} should be unchanged, got {result}"
def test_edge_cases_in_app_translation(self):
"""Test edge cases in /app/ path translation."""
# Mock standalone environment
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
mock_container_workspace.exists.return_value = False
with patch.dict(os.environ, {}, clear=True):
# Test edge cases
test_cases = [
("/app/", "./"), # Root app directory
("/app", "/app"), # Exact match without trailing slash - not translated
("/app/file", "./file"), # File directly in app
("/app//double/slash", "./double/slash"), # Handle double slashes
]
for input_path, expected_output in test_cases:
result = translate_path_for_environment(input_path)
assert (
result == expected_output
), f"Edge case {input_path}: expected {expected_output}, got {result}"

View File

@@ -0,0 +1,591 @@
"""
Integration tests for native image support feature.
Tests the complete image support pipeline:
- Conversation memory integration with images
- Tool request validation and schema support
- Provider image processing capabilities
- Cross-tool image context preservation
"""
import json
import os
import tempfile
import uuid
from unittest.mock import Mock, patch
import pytest
from tools.chat import ChatTool
from tools.debug import DebugIssueTool
from utils.conversation_memory import (
ConversationTurn,
ThreadContext,
add_turn,
create_thread,
get_conversation_image_list,
get_thread,
)
class TestImageSupportIntegration:
"""Integration tests for the complete image support feature."""
def test_conversation_turn_includes_images(self):
"""Test that ConversationTurn can store and track images."""
turn = ConversationTurn(
role="user",
content="Please analyze this diagram",
timestamp="2025-01-01T00:00:00Z",
files=["code.py"],
images=["diagram.png", "flowchart.jpg"],
tool_name="chat",
)
assert turn.images == ["diagram.png", "flowchart.jpg"]
assert turn.files == ["code.py"]
assert turn.content == "Please analyze this diagram"
def test_get_conversation_image_list_newest_first(self):
"""Test that image list prioritizes newest references."""
# Create thread context with multiple turns
context = ThreadContext(
thread_id=str(uuid.uuid4()),
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:00:00Z",
tool_name="chat",
turns=[
ConversationTurn(
role="user",
content="Turn 1",
timestamp="2025-01-01T00:00:00Z",
images=["old_diagram.png", "shared.png"],
),
ConversationTurn(
role="assistant", content="Turn 2", timestamp="2025-01-01T01:00:00Z", images=["middle.png"]
),
ConversationTurn(
role="user",
content="Turn 3",
timestamp="2025-01-01T02:00:00Z",
images=["shared.png", "new_diagram.png"], # shared.png appears again
),
],
initial_context={},
)
image_list = get_conversation_image_list(context)
# Should prioritize newest first, with duplicates removed (newest wins)
expected = ["shared.png", "new_diagram.png", "middle.png", "old_diagram.png"]
assert image_list == expected
@patch("utils.conversation_memory.get_redis_client")
def test_add_turn_with_images(self, mock_redis):
"""Test adding a conversation turn with images."""
mock_client = Mock()
mock_redis.return_value = mock_client
# Mock the Redis operations to return success
mock_client.set.return_value = True
thread_id = create_thread("test_tool", {"initial": "context"})
# Set up initial thread context for add_turn to find
initial_context = ThreadContext(
thread_id=thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:00:00Z",
tool_name="test_tool",
turns=[], # Empty initially
initial_context={"initial": "context"},
)
mock_client.get.return_value = initial_context.model_dump_json()
success = add_turn(
thread_id=thread_id,
role="user",
content="Analyze these screenshots",
files=["app.py"],
images=["screenshot1.png", "screenshot2.png"],
tool_name="debug",
)
assert success
# Mock thread context for get_thread call
updated_context = ThreadContext(
thread_id=thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:00:00Z",
tool_name="test_tool",
turns=[
ConversationTurn(
role="user",
content="Analyze these screenshots",
timestamp="2025-01-01T00:00:00Z",
files=["app.py"],
images=["screenshot1.png", "screenshot2.png"],
tool_name="debug",
)
],
initial_context={"initial": "context"},
)
mock_client.get.return_value = updated_context.model_dump_json()
# Retrieve and verify the thread
context = get_thread(thread_id)
assert context is not None
assert len(context.turns) == 1
turn = context.turns[0]
assert turn.images == ["screenshot1.png", "screenshot2.png"]
assert turn.files == ["app.py"]
assert turn.content == "Analyze these screenshots"
def test_chat_tool_schema_includes_images(self):
"""Test that ChatTool schema includes images field."""
tool = ChatTool()
schema = tool.get_input_schema()
assert "images" in schema["properties"]
images_field = schema["properties"]["images"]
assert images_field["type"] == "array"
assert images_field["items"]["type"] == "string"
assert "visual context" in images_field["description"].lower()
def test_debug_tool_schema_includes_images(self):
"""Test that DebugIssueTool schema includes images field."""
tool = DebugIssueTool()
schema = tool.get_input_schema()
assert "images" in schema["properties"]
images_field = schema["properties"]["images"]
assert images_field["type"] == "array"
assert images_field["items"]["type"] == "string"
assert "error screens" in images_field["description"].lower()
def test_tool_image_validation_limits(self):
"""Test that tools validate image size limits using real provider resolution."""
tool = ChatTool()
# Create small test images (each 0.5MB, total 1MB)
small_images = []
for _ in range(2):
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
# Write 0.5MB of data
temp_file.write(b"\x00" * (512 * 1024))
small_images.append(temp_file.name)
try:
# Test with a model that should fail (no provider available in test environment)
result = tool._validate_image_limits(small_images, "mistral-large")
# Should return error because model not available
assert result is not None
assert result["status"] == "error"
assert "does not support image processing" in result["content"]
# Test that empty/None images always pass regardless of model
result = tool._validate_image_limits([], "any-model")
assert result is None
result = tool._validate_image_limits(None, "any-model")
assert result is None
finally:
# Clean up temp files
for img_path in small_images:
if os.path.exists(img_path):
os.unlink(img_path)
def test_image_validation_model_specific_limits(self):
"""Test that different models have appropriate size limits using real provider resolution."""
import importlib
tool = ChatTool()
# Test OpenAI O3 model (20MB limit) - Create 15MB image (should pass)
small_image_path = None
large_image_path = None
# Save original environment
original_env = {
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
}
try:
# Create 15MB image (under 20MB O3 limit)
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
temp_file.write(b"\x00" * (15 * 1024 * 1024)) # 15MB
small_image_path = temp_file.name
# Set up environment for OpenAI provider
os.environ["OPENAI_API_KEY"] = "test-key-o3-validation-test-not-real"
os.environ["DEFAULT_MODEL"] = "o3"
# Clear other provider keys to isolate to OpenAI
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
os.environ.pop(key, None)
# Reload config and clear registry
import config
importlib.reload(config)
from providers.registry import ModelProviderRegistry
ModelProviderRegistry._instance = None
result = tool._validate_image_limits([small_image_path], "o3")
assert result is None # Should pass (15MB < 20MB limit)
# Create 25MB image (over 20MB O3 limit)
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
temp_file.write(b"\x00" * (25 * 1024 * 1024)) # 25MB
large_image_path = temp_file.name
result = tool._validate_image_limits([large_image_path], "o3")
assert result is not None # Should fail (25MB > 20MB limit)
assert result["status"] == "error"
assert "Image size limit exceeded" in result["content"]
assert "20.0MB" in result["content"] # O3 limit
assert "25.0MB" in result["content"] # Provided size
finally:
# Clean up temp files
if small_image_path and os.path.exists(small_image_path):
os.unlink(small_image_path)
if large_image_path and os.path.exists(large_image_path):
os.unlink(large_image_path)
# Restore environment
for key, value in original_env.items():
if value is not None:
os.environ[key] = value
else:
os.environ.pop(key, None)
# Reload config and clear registry
importlib.reload(config)
ModelProviderRegistry._instance = None
@pytest.mark.asyncio
async def test_chat_tool_execution_with_images(self):
"""Test that ChatTool can execute with images parameter using real provider resolution."""
import importlib
# Create a temporary image file for testing
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
# Write a simple PNG header (minimal valid PNG)
png_header = b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\rIDATx\x9cc\x00\x01\x00\x00\x05\x00\x01\r\n-\xdb\x00\x00\x00\x00IEND\xaeB`\x82"
temp_file.write(png_header)
temp_image_path = temp_file.name
# Save original environment
original_env = {
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
}
try:
# Set up environment for real provider resolution
os.environ["OPENAI_API_KEY"] = "sk-test-key-images-test-not-real"
os.environ["DEFAULT_MODEL"] = "gpt-4o"
# Clear other provider keys to isolate to OpenAI
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
os.environ.pop(key, None)
# Reload config and clear registry
import config
importlib.reload(config)
from providers.registry import ModelProviderRegistry
ModelProviderRegistry._instance = None
tool = ChatTool()
# Test with real provider resolution
try:
result = await tool.execute(
{"prompt": "What do you see in this image?", "images": [temp_image_path], "model": "gpt-4o"}
)
# If we get here, check the response format
assert len(result) == 1
# Should be a valid JSON response
output = json.loads(result[0].text)
assert "status" in output
# Test passed - provider accepted images parameter
except Exception as e:
# Expected: API call will fail with fake key
error_msg = str(e)
# Should NOT be a mock-related error
assert "MagicMock" not in error_msg
assert "'<' not supported between instances" not in error_msg
# Should be a real provider error (API key or network)
assert any(
phrase in error_msg
for phrase in ["API", "key", "authentication", "provider", "network", "connection", "401", "403"]
)
# Test passed - provider processed images parameter before failing on auth
finally:
# Clean up temp file
os.unlink(temp_image_path)
# Restore environment
for key, value in original_env.items():
if value is not None:
os.environ[key] = value
else:
os.environ.pop(key, None)
# Reload config and clear registry
importlib.reload(config)
ModelProviderRegistry._instance = None
@patch("utils.conversation_memory.get_redis_client")
def test_cross_tool_image_context_preservation(self, mock_redis):
"""Test that images are preserved across different tools in conversation."""
mock_client = Mock()
mock_redis.return_value = mock_client
# Mock the Redis operations to return success
mock_client.set.return_value = True
# Create initial thread with chat tool
thread_id = create_thread("chat", {"initial": "context"})
# Set up initial thread context for add_turn to find
initial_context = ThreadContext(
thread_id=thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:00:00Z",
tool_name="chat",
turns=[], # Empty initially
initial_context={"initial": "context"},
)
mock_client.get.return_value = initial_context.model_dump_json()
# Add turn with images from chat tool
add_turn(
thread_id=thread_id,
role="user",
content="Here's my UI design",
images=["design.png", "mockup.jpg"],
tool_name="chat",
)
add_turn(
thread_id=thread_id, role="assistant", content="I can see your design. It looks good!", tool_name="chat"
)
# Add turn with different images from debug tool
add_turn(
thread_id=thread_id,
role="user",
content="Now I'm getting this error",
images=["error_screen.png"],
files=["error.log"],
tool_name="debug",
)
# Mock complete thread context for get_thread call
complete_context = ThreadContext(
thread_id=thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:05:00Z",
tool_name="chat",
turns=[
ConversationTurn(
role="user",
content="Here's my UI design",
timestamp="2025-01-01T00:01:00Z",
images=["design.png", "mockup.jpg"],
tool_name="chat",
),
ConversationTurn(
role="assistant",
content="I can see your design. It looks good!",
timestamp="2025-01-01T00:02:00Z",
tool_name="chat",
),
ConversationTurn(
role="user",
content="Now I'm getting this error",
timestamp="2025-01-01T00:03:00Z",
images=["error_screen.png"],
files=["error.log"],
tool_name="debug",
),
],
initial_context={"initial": "context"},
)
mock_client.get.return_value = complete_context.model_dump_json()
# Retrieve thread and check image preservation
context = get_thread(thread_id)
assert context is not None
# Get conversation image list (should prioritize newest first)
image_list = get_conversation_image_list(context)
expected = ["error_screen.png", "design.png", "mockup.jpg"]
assert image_list == expected
# Verify each turn has correct images
assert context.turns[0].images == ["design.png", "mockup.jpg"]
assert context.turns[1].images is None # Assistant turn without images
assert context.turns[2].images == ["error_screen.png"]
def test_tool_request_base_class_has_images(self):
"""Test that base ToolRequest class includes images field."""
from tools.base import ToolRequest
# Create request with images
request = ToolRequest(images=["test.png", "test2.jpg"])
assert request.images == ["test.png", "test2.jpg"]
# Test default value
request_no_images = ToolRequest()
assert request_no_images.images is None
def test_data_url_image_format_support(self):
"""Test that tools can handle data URL format images."""
import importlib
tool = ChatTool()
# Test with data URL (base64 encoded 1x1 transparent PNG)
data_url = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
images = [data_url]
# Save original environment
original_env = {
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
}
try:
# Set up environment for OpenAI provider
os.environ["OPENAI_API_KEY"] = "test-key-data-url-test-not-real"
os.environ["DEFAULT_MODEL"] = "o3"
# Clear other provider keys to isolate to OpenAI
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
os.environ.pop(key, None)
# Reload config and clear registry
import config
importlib.reload(config)
from providers.registry import ModelProviderRegistry
ModelProviderRegistry._instance = None
# Use a model that should be available - o3 from OpenAI
result = tool._validate_image_limits(images, "o3")
assert result is None # Small data URL should pass validation
# Also test with a non-vision model to ensure validation works
result = tool._validate_image_limits(images, "mistral-large")
# This should fail because model not available with current setup
assert result is not None
assert result["status"] == "error"
assert "does not support image processing" in result["content"]
finally:
# Restore environment
for key, value in original_env.items():
if value is not None:
os.environ[key] = value
else:
os.environ.pop(key, None)
# Reload config and clear registry
importlib.reload(config)
ModelProviderRegistry._instance = None
def test_empty_images_handling(self):
"""Test that tools handle empty images lists gracefully."""
tool = ChatTool()
# Empty list should not fail validation (no need for provider setup)
result = tool._validate_image_limits([], "test_model")
assert result is None
# None should not fail validation (no need for provider setup)
result = tool._validate_image_limits(None, "test_model")
assert result is None
@patch("utils.conversation_memory.get_redis_client")
def test_conversation_memory_thread_chaining_with_images(self, mock_redis):
"""Test that images work correctly with conversation thread chaining."""
mock_client = Mock()
mock_redis.return_value = mock_client
# Mock the Redis operations to return success
mock_client.set.return_value = True
# Create parent thread with images
parent_thread_id = create_thread("chat", {"parent": "context"})
# Set up initial parent thread context for add_turn to find
parent_context = ThreadContext(
thread_id=parent_thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:00:00Z",
tool_name="chat",
turns=[], # Empty initially
initial_context={"parent": "context"},
)
mock_client.get.return_value = parent_context.model_dump_json()
add_turn(
thread_id=parent_thread_id,
role="user",
content="Parent thread with images",
images=["parent1.png", "shared.png"],
tool_name="chat",
)
# Create child thread linked to parent
child_thread_id = create_thread("debug", {"child": "context"}, parent_thread_id=parent_thread_id)
add_turn(
thread_id=child_thread_id,
role="user",
content="Child thread with more images",
images=["child1.png", "shared.png"], # shared.png appears again (should prioritize newer)
tool_name="debug",
)
# Mock child thread context for get_thread call
child_context = ThreadContext(
thread_id=child_thread_id,
created_at="2025-01-01T00:00:00Z",
last_updated_at="2025-01-01T00:02:00Z",
tool_name="debug",
turns=[
ConversationTurn(
role="user",
content="Child thread with more images",
timestamp="2025-01-01T00:02:00Z",
images=["child1.png", "shared.png"],
tool_name="debug",
)
],
initial_context={"child": "context"},
parent_thread_id=parent_thread_id,
)
mock_client.get.return_value = child_context.model_dump_json()
# Get child thread and verify image collection works across chain
child_context = get_thread(child_thread_id)
assert child_context is not None
assert child_context.parent_thread_id == parent_thread_id
# Test image collection for child thread only
child_images = get_conversation_image_list(child_context)
assert child_images == ["child1.png", "shared.png"]

View File

@@ -1,290 +0,0 @@
"""
Integration tests for internal application configuration file access.
These tests verify that:
1. Specific internal config files are accessible (exact path matching)
2. Path variations and traversal attempts are blocked (security)
3. The OpenRouter model configuration loads properly
4. Normal workspace file operations continue to work
This follows the established testing patterns from test_docker_path_integration.py
by using actual file operations and module reloading instead of mocks.
"""
import importlib
import os
import tempfile
from pathlib import Path
from unittest.mock import patch
import pytest
from utils.file_utils import translate_path_for_environment
class TestInternalConfigFileAccess:
"""Test access to internal application configuration files."""
def test_allowed_internal_config_file_access(self):
"""Test that the specific internal config file is accessible."""
with tempfile.TemporaryDirectory() as tmpdir:
# Set up Docker-like environment
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
# Reload modules to pick up environment
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
# Test with Docker environment simulation
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# The exact allowed path should pass through unchanged
result = translate_path_for_environment("/app/conf/custom_models.json")
assert result == "/app/conf/custom_models.json"
finally:
# Restore environment
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
def test_blocked_config_file_variations(self):
"""Test that variations of the config file path are blocked."""
with tempfile.TemporaryDirectory() as tmpdir:
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Test blocked variations - these should return inaccessible paths
blocked_paths = [
"/app/conf/", # Directory
"/app/conf/other_file.json", # Different file
"/app/conf/custom_models.json.backup", # Extra extension
"/app/conf/custom_models.txt", # Different extension
"/app/conf/../server.py", # Path traversal
"/app/server.py", # Application code
"/etc/passwd", # System file
]
for path in blocked_paths:
result = translate_path_for_environment(path)
assert result.startswith("/inaccessible/"), f"Path {path} should be blocked but got: {result}"
finally:
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
def test_workspace_files_continue_to_work(self):
"""Test that normal workspace file operations are unaffected."""
with tempfile.TemporaryDirectory() as tmpdir:
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
# Create a test file in the workspace
test_file = host_workspace / "src" / "test.py"
test_file.parent.mkdir(parents=True)
test_file.write_text("# test file")
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Normal workspace file should translate correctly
result = translate_path_for_environment(str(test_file))
expected = str(container_workspace / "src" / "test.py")
assert result == expected
finally:
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
def test_openrouter_config_loading_real_world(self):
"""Test that OpenRouter configuration loading works in real container environment."""
# This test validates that our fix works in the actual Docker environment
# by checking that the translate_path_for_environment function handles
# the exact internal config path correctly
with tempfile.TemporaryDirectory() as tmpdir:
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Test that the function correctly handles the config path
result = translate_path_for_environment("/app/conf/custom_models.json")
# The path should pass through unchanged (not be blocked)
assert result == "/app/conf/custom_models.json"
# Verify it's not marked as inaccessible
assert not result.startswith("/inaccessible/")
finally:
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
def test_security_boundary_comprehensive(self):
"""Comprehensive test of all security boundaries in Docker environment."""
with tempfile.TemporaryDirectory() as tmpdir:
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
# Create a workspace file for testing
workspace_file = host_workspace / "project" / "main.py"
workspace_file.parent.mkdir(parents=True)
workspace_file.write_text("# workspace file")
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Test cases: (path, should_be_allowed, description)
test_cases = [
# Allowed cases
("/app/conf/custom_models.json", True, "Exact allowed internal config"),
(str(workspace_file), True, "Workspace file"),
(str(container_workspace / "existing.py"), True, "Container path"),
# Blocked cases
("/app/conf/", False, "Directory access"),
("/app/conf/other.json", False, "Different config file"),
("/app/conf/custom_models.json.backup", False, "Config with extra extension"),
("/app/server.py", False, "Application source"),
("/etc/passwd", False, "System file"),
("../../../etc/passwd", False, "Relative path traversal"),
("/app/conf/../server.py", False, "Path traversal through config dir"),
]
for path, should_be_allowed, description in test_cases:
result = translate_path_for_environment(path)
if should_be_allowed:
# Should either pass through unchanged or translate to container path
assert not result.startswith(
"/inaccessible/"
), f"{description}: {path} should be allowed but was blocked"
else:
# Should be blocked with inaccessible path
assert result.startswith(
"/inaccessible/"
), f"{description}: {path} should be blocked but got: {result}"
finally:
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
def test_exact_path_matching_prevents_wildcards(self):
"""Test that using exact path matching prevents any wildcard-like behavior."""
with tempfile.TemporaryDirectory() as tmpdir:
host_workspace = Path(tmpdir) / "host_workspace"
host_workspace.mkdir()
container_workspace = Path(tmpdir) / "container_workspace"
container_workspace.mkdir()
original_env = os.environ.copy()
try:
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
# Even subtle variations should be blocked
subtle_variations = [
"/app/conf/custom_models.jsonx", # Extra char
"/app/conf/custom_models.jso", # Missing char
"/app/conf/custom_models.JSON", # Different case
"/app/conf/custom_models.json ", # Trailing space
" /app/conf/custom_models.json", # Leading space
"/app/conf/./custom_models.json", # Current dir reference
"/app/conf/subdir/../custom_models.json", # Up and down
]
for variation in subtle_variations:
result = translate_path_for_environment(variation)
assert result.startswith(
"/inaccessible/"
), f"Variation {variation} should be blocked but got: {result}"
finally:
os.environ.clear()
os.environ.update(original_env)
import utils.security_config
importlib.reload(utils.security_config)
importlib.reload(utils.file_utils)
if __name__ == "__main__":
pytest.main([__file__, "-v"])

BIN
tests/triangle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

View File

@@ -87,7 +87,13 @@ class AnalyzeTool(BaseTool):
}, },
"use_websearch": { "use_websearch": {
"type": "boolean", "type": "boolean",
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.", "description": (
"Enable web search for documentation, best practices, and current information. "
"Particularly useful for: brainstorming sessions, architectural design discussions, "
"exploring industry best practices, working with specific frameworks/technologies, "
"researching solutions to complex problems, or when current documentation and "
"community insights would enhance the analysis."
),
"default": True, "default": True,
}, },
"continuation_id": { "continuation_id": {

View File

@@ -27,6 +27,7 @@ if TYPE_CHECKING:
from config import MCP_PROMPT_SIZE_LIMIT from config import MCP_PROMPT_SIZE_LIMIT
from providers import ModelProvider, ModelProviderRegistry from providers import ModelProvider, ModelProviderRegistry
from providers.base import ProviderType
from utils import check_token_limit from utils import check_token_limit
from utils.conversation_memory import ( from utils.conversation_memory import (
MAX_CONVERSATION_TURNS, MAX_CONVERSATION_TURNS,
@@ -84,6 +85,17 @@ class ToolRequest(BaseModel):
"additional findings, or answers to follow-up questions. Can be used across different tools." "additional findings, or answers to follow-up questions. Can be used across different tools."
), ),
) )
images: Optional[list[str]] = Field(
None,
description=(
"Optional image(s) for visual context. Accepts absolute file paths or "
"base64 data URLs. Only provide when user explicitly mentions images. "
"When including images, please describe what you believe each image contains "
"(e.g., 'screenshot of error dialog', 'architecture diagram', 'code snippet') "
"to aid with contextual understanding. Useful for UI discussions, diagrams, "
"visual problems, error screens, architecture mockups, and visual analysis tasks."
),
)
class BaseTool(ABC): class BaseTool(ABC):
@@ -981,6 +993,141 @@ When recommending searches, be specific about what information you need and why
} }
return None return None
def _validate_image_limits(
self, images: Optional[list[str]], model_name: str, continuation_id: Optional[str] = None
) -> Optional[dict]:
"""
Validate image size against model capabilities at MCP boundary.
This performs strict validation to ensure we don't exceed model-specific
image size limits. Uses capability-based validation with actual model
configuration rather than hard-coded limits.
Args:
images: List of image paths/data URLs to validate
model_name: Name of the model to check limits against
Returns:
Optional[dict]: Error response if validation fails, None if valid
"""
if not images:
return None
# Get model capabilities to check image support and size limits
try:
provider = self.get_model_provider(model_name)
capabilities = provider.get_capabilities(model_name)
except Exception as e:
logger.warning(f"Failed to get capabilities for model {model_name}: {e}")
# Fall back to checking custom models configuration
capabilities = None
# Check if model supports images at all
supports_images = False
max_size_mb = 0.0
if capabilities:
supports_images = capabilities.supports_images
max_size_mb = capabilities.max_image_size_mb
else:
# Fall back to custom models configuration
try:
import json
from pathlib import Path
custom_models_path = Path(__file__).parent.parent / "conf" / "custom_models.json"
if custom_models_path.exists():
with open(custom_models_path) as f:
custom_config = json.load(f)
# Check if model is in custom models list
for model_config in custom_config.get("models", []):
if model_config.get("model_name") == model_name or model_name in model_config.get(
"aliases", []
):
supports_images = model_config.get("supports_images", False)
max_size_mb = model_config.get("max_image_size_mb", 0.0)
break
except Exception as e:
logger.warning(f"Failed to load custom models config: {e}")
# If model doesn't support images, reject
if not supports_images:
return {
"status": "error",
"content": (
f"Image support not available: Model '{model_name}' does not support image processing. "
f"Please use a vision-capable model such as 'gemini-2.5-flash-preview-05-20', 'o3', "
f"or 'claude-3-opus' for image analysis tasks."
),
"content_type": "text",
"metadata": {
"error_type": "validation_error",
"model_name": model_name,
"supports_images": False,
"image_count": len(images),
},
}
# Calculate total size of all images
total_size_mb = 0.0
for image_path in images:
try:
if image_path.startswith("data:image/"):
# Handle data URL: data:image/png;base64,iVBORw0...
_, data = image_path.split(",", 1)
# Base64 encoding increases size by ~33%, so decode to get actual size
import base64
actual_size = len(base64.b64decode(data))
actual_size = len(base64.b64decode(data))
total_size_mb += actual_size / (1024 * 1024)
else:
# Handle file path
if os.path.exists(image_path):
file_size = os.path.getsize(image_path)
total_size_mb += file_size / (1024 * 1024)
else:
logger.warning(f"Image file not found: {image_path}")
# Assume a reasonable size for missing files to avoid breaking validation
total_size_mb += 1.0 # 1MB assumption
except Exception as e:
logger.warning(f"Failed to get size for image {image_path}: {e}")
# Assume a reasonable size for problematic files
total_size_mb += 1.0 # 1MB assumption
# Apply 40MB cap for custom models as requested
effective_limit_mb = max_size_mb
if hasattr(capabilities, "provider") and capabilities.provider == ProviderType.CUSTOM:
effective_limit_mb = min(max_size_mb, 40.0)
elif not capabilities: # Fallback case for custom models
effective_limit_mb = min(max_size_mb, 40.0)
# Validate against size limit
if total_size_mb > effective_limit_mb:
return {
"status": "error",
"content": (
f"Image size limit exceeded: Model '{model_name}' supports maximum {effective_limit_mb:.1f}MB "
f"for all images combined, but {total_size_mb:.1f}MB was provided. "
f"Please reduce image sizes or count and try again."
),
"content_type": "text",
"metadata": {
"error_type": "validation_error",
"model_name": model_name,
"total_size_mb": round(total_size_mb, 2),
"limit_mb": round(effective_limit_mb, 2),
"image_count": len(images),
"supports_images": supports_images,
},
}
# All validations passed
logger.debug(f"Image validation passed: {len(images)} images")
return None
def estimate_tokens_smart(self, file_path: str) -> int: def estimate_tokens_smart(self, file_path: str) -> int:
""" """
Estimate tokens for a file using file-type aware ratios. Estimate tokens for a file using file-type aware ratios.
@@ -1131,6 +1278,9 @@ When recommending searches, be specific about what information you need and why
) )
return [TextContent(type="text", text=error_output.model_dump_json())] return [TextContent(type="text", text=error_output.model_dump_json())]
# Extract and validate images from request
images = getattr(request, "images", None) or []
# Check if we have continuation_id - if so, conversation history is already embedded # Check if we have continuation_id - if so, conversation history is already embedded
continuation_id = getattr(request, "continuation_id", None) continuation_id = getattr(request, "continuation_id", None)
@@ -1215,6 +1365,12 @@ When recommending searches, be specific about what information you need and why
# Only set this after auto mode validation to prevent "auto" being used as a model name # Only set this after auto mode validation to prevent "auto" being used as a model name
self._current_model_name = model_name self._current_model_name = model_name
# Validate images at MCP boundary if any were provided
if images:
image_validation_error = self._validate_image_limits(images, model_name, continuation_id)
if image_validation_error:
return [TextContent(type="text", text=json.dumps(image_validation_error))]
temperature = getattr(request, "temperature", None) temperature = getattr(request, "temperature", None)
if temperature is None: if temperature is None:
temperature = self.get_default_temperature() temperature = self.get_default_temperature()
@@ -1247,6 +1403,7 @@ When recommending searches, be specific about what information you need and why
system_prompt=system_prompt, system_prompt=system_prompt,
temperature=temperature, temperature=temperature,
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None, thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
images=images if images else None, # Pass images via kwargs
) )
logger.info(f"Received response from {provider.get_provider_type().value} API for {self.name}") logger.info(f"Received response from {provider.get_provider_type().value} API for {self.name}")
@@ -1298,6 +1455,7 @@ When recommending searches, be specific about what information you need and why
system_prompt=system_prompt, system_prompt=system_prompt,
temperature=temperature, temperature=temperature,
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None, thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
images=images if images else None, # Pass images via kwargs in retry too
) )
if retry_response.content: if retry_response.content:
@@ -1398,6 +1556,7 @@ When recommending searches, be specific about what information you need and why
continuation_id = getattr(request, "continuation_id", None) continuation_id = getattr(request, "continuation_id", None)
if continuation_id: if continuation_id:
request_files = getattr(request, "files", []) or [] request_files = getattr(request, "files", []) or []
request_images = getattr(request, "images", []) or []
# Extract model metadata for conversation tracking # Extract model metadata for conversation tracking
model_provider = None model_provider = None
model_name = None model_name = None
@@ -1417,6 +1576,7 @@ When recommending searches, be specific about what information you need and why
"assistant", "assistant",
formatted_content, formatted_content,
files=request_files, files=request_files,
images=request_images,
tool_name=self.name, tool_name=self.name,
model_provider=model_provider, model_provider=model_provider,
model_name=model_name, model_name=model_name,
@@ -1519,6 +1679,7 @@ When recommending searches, be specific about what information you need and why
# Use actually processed files from file preparation instead of original request files # Use actually processed files from file preparation instead of original request files
# This ensures directories are tracked as their individual expanded files # This ensures directories are tracked as their individual expanded files
request_files = getattr(self, "_actually_processed_files", []) or getattr(request, "files", []) or [] request_files = getattr(self, "_actually_processed_files", []) or getattr(request, "files", []) or []
request_images = getattr(request, "images", []) or []
# Extract model metadata # Extract model metadata
model_provider = None model_provider = None
model_name = None model_name = None
@@ -1538,6 +1699,7 @@ When recommending searches, be specific about what information you need and why
"assistant", "assistant",
content, content,
files=request_files, files=request_files,
images=request_images,
tool_name=self.name, tool_name=self.name,
model_provider=model_provider, model_provider=model_provider,
model_name=model_name, model_name=model_name,

View File

@@ -20,12 +20,25 @@ class ChatRequest(ToolRequest):
prompt: str = Field( prompt: str = Field(
..., ...,
description="Your question, topic, or current thinking to discuss", description=(
"Your thorough, expressive question with as much context as possible. Remember: you're talking to "
"another Claude assistant who has deep expertise and can provide nuanced insights. Include your "
"current thinking, specific challenges, background context, what you've already tried, and what "
"kind of response would be most helpful. The more context and detail you provide, the more "
"valuable and targeted the response will be."
),
) )
files: Optional[list[str]] = Field( files: Optional[list[str]] = Field(
default_factory=list, default_factory=list,
description="Optional files for context (must be absolute paths)", description="Optional files for context (must be absolute paths)",
) )
images: Optional[list[str]] = Field(
default_factory=list,
description=(
"Optional images for visual context. Useful for UI discussions, diagrams, visual problems, "
"error screens, or architectural mockups."
),
)
class ChatTool(BaseTool): class ChatTool(BaseTool):
@@ -42,7 +55,8 @@ class ChatTool(BaseTool):
"Also great for: explanations, comparisons, general development questions. " "Also great for: explanations, comparisons, general development questions. "
"Use this when you want to ask questions, brainstorm ideas, get opinions, discuss topics, " "Use this when you want to ask questions, brainstorm ideas, get opinions, discuss topics, "
"share your thinking, or need explanations about concepts and approaches. " "share your thinking, or need explanations about concepts and approaches. "
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities." "Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can "
"provide enhanced capabilities."
) )
def get_input_schema(self) -> dict[str, Any]: def get_input_schema(self) -> dict[str, Any]:
@@ -51,13 +65,27 @@ class ChatTool(BaseTool):
"properties": { "properties": {
"prompt": { "prompt": {
"type": "string", "type": "string",
"description": "Your question, topic, or current thinking to discuss", "description": (
"Your thorough, expressive question with as much context as possible. Remember: you're "
"talking to another Claude assistant who has deep expertise and can provide nuanced "
"insights. Include your current thinking, specific challenges, background context, what "
"you've already tried, and what kind of response would be most helpful. The more context "
"and detail you provide, the more valuable and targeted the response will be."
),
}, },
"files": { "files": {
"type": "array", "type": "array",
"items": {"type": "string"}, "items": {"type": "string"},
"description": "Optional files for context (must be absolute paths)", "description": "Optional files for context (must be absolute paths)",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": (
"Optional images for visual context. Useful for UI discussions, diagrams, visual "
"problems, error screens, or architectural mockups."
),
},
"model": self.get_model_field_schema(), "model": self.get_model_field_schema(),
"temperature": { "temperature": {
"type": "number", "type": "number",
@@ -68,16 +96,29 @@ class ChatTool(BaseTool):
"thinking_mode": { "thinking_mode": {
"type": "string", "type": "string",
"enum": ["minimal", "low", "medium", "high", "max"], "enum": ["minimal", "low", "medium", "high", "max"],
"description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)", "description": (
"Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), "
"max (100% of model max)"
),
}, },
"use_websearch": { "use_websearch": {
"type": "boolean", "type": "boolean",
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.", "description": (
"Enable web search for documentation, best practices, and current information. "
"Particularly useful for: brainstorming sessions, architectural design discussions, "
"exploring industry best practices, working with specific frameworks/technologies, "
"researching solutions to complex problems, or when current documentation and "
"community insights would enhance the analysis."
),
"default": True, "default": True,
}, },
"continuation_id": { "continuation_id": {
"type": "string", "type": "string",
"description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.", "description": (
"Thread continuation ID for multi-turn conversations. Can be used to continue "
"conversations across different tools. Only provide this if continuing a previous "
"conversation thread."
),
}, },
}, },
"required": ["prompt"] + (["model"] if self.is_effective_auto_mode() else []), "required": ["prompt"] + (["model"] if self.is_effective_auto_mode() else []),
@@ -157,4 +198,7 @@ Please provide a thoughtful, comprehensive response:"""
def format_response(self, response: str, request: ChatRequest, model_info: Optional[dict] = None) -> str: def format_response(self, response: str, request: ChatRequest, model_info: Optional[dict] = None) -> str:
"""Format the chat response""" """Format the chat response"""
return f"{response}\n\n---\n\n**Claude's Turn:** Evaluate this perspective alongside your analysis to form a comprehensive solution and continue with the user's request and task at hand." return (
f"{response}\n\n---\n\n**Claude's Turn:** Evaluate this perspective alongside your analysis to "
"form a comprehensive solution and continue with the user's request and task at hand."
)

View File

@@ -41,6 +41,10 @@ class CodeReviewRequest(ToolRequest):
..., ...,
description="User's summary of what the code does, expected behavior, constraints, and review objectives", description="User's summary of what the code does, expected behavior, constraints, and review objectives",
) )
images: Optional[list[str]] = Field(
None,
description="Optional images of architecture diagrams, UI mockups, design documents, or visual references for code review context",
)
review_type: str = Field("full", description="Type of review: full|security|performance|quick") review_type: str = Field("full", description="Type of review: full|security|performance|quick")
focus_on: Optional[str] = Field( focus_on: Optional[str] = Field(
None, None,
@@ -94,6 +98,11 @@ class CodeReviewTool(BaseTool):
"type": "string", "type": "string",
"description": "User's summary of what the code does, expected behavior, constraints, and review objectives", "description": "User's summary of what the code does, expected behavior, constraints, and review objectives",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": "Optional images of architecture diagrams, UI mockups, design documents, or visual references for code review context",
},
"review_type": { "review_type": {
"type": "string", "type": "string",
"enum": ["full", "security", "performance", "quick"], "enum": ["full", "security", "performance", "quick"],

View File

@@ -24,6 +24,10 @@ class DebugIssueRequest(ToolRequest):
None, None,
description="Files or directories that might be related to the issue (must be absolute paths)", description="Files or directories that might be related to the issue (must be absolute paths)",
) )
images: Optional[list[str]] = Field(
None,
description="Optional images showing error screens, UI issues, logs displays, or visual debugging information",
)
runtime_info: Optional[str] = Field(None, description="Environment, versions, or runtime information") runtime_info: Optional[str] = Field(None, description="Environment, versions, or runtime information")
previous_attempts: Optional[str] = Field(None, description="What has been tried already") previous_attempts: Optional[str] = Field(None, description="What has been tried already")
@@ -69,6 +73,11 @@ class DebugIssueTool(BaseTool):
"items": {"type": "string"}, "items": {"type": "string"},
"description": "Files or directories that might be related to the issue (must be absolute paths)", "description": "Files or directories that might be related to the issue (must be absolute paths)",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": "Optional images showing error screens, UI issues, logs displays, or visual debugging information",
},
"runtime_info": { "runtime_info": {
"type": "string", "type": "string",
"description": "Environment, versions, or runtime information", "description": "Environment, versions, or runtime information",

View File

@@ -78,6 +78,10 @@ class PrecommitRequest(ToolRequest):
None, None,
description="Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.", description="Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
) )
images: Optional[list[str]] = Field(
None,
description="Optional images showing expected UI changes, design requirements, or visual references for the changes being validated",
)
class Precommit(BaseTool): class Precommit(BaseTool):
@@ -170,6 +174,11 @@ class Precommit(BaseTool):
"items": {"type": "string"}, "items": {"type": "string"},
"description": "Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.", "description": "Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": "Optional images showing expected UI changes, design requirements, or visual references for the changes being validated",
},
"use_websearch": { "use_websearch": {
"type": "boolean", "type": "boolean",
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.", "description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.",

View File

@@ -33,6 +33,10 @@ class ThinkDeepRequest(ToolRequest):
None, None,
description="Optional file paths or directories for additional context (must be absolute paths)", description="Optional file paths or directories for additional context (must be absolute paths)",
) )
images: Optional[list[str]] = Field(
None,
description="Optional images for visual analysis - diagrams, charts, system architectures, or any visual information to analyze",
)
class ThinkDeepTool(BaseTool): class ThinkDeepTool(BaseTool):
@@ -60,7 +64,13 @@ class ThinkDeepTool(BaseTool):
"properties": { "properties": {
"prompt": { "prompt": {
"type": "string", "type": "string",
"description": "Your current thinking/analysis to extend and validate. IMPORTANT: Before using this tool, Claude MUST first think deeply and establish a deep understanding of the topic and question by thinking through all relevant details, context, constraints, and implications. Share these extended thoughts and ideas in the prompt so the model has comprehensive information to work with for the best analysis.", "description": (
"Your current thinking/analysis to extend and validate. IMPORTANT: Before using this tool, "
"Claude MUST first think deeply and establish a deep understanding of the topic and question "
"by thinking through all relevant details, context, constraints, and implications. Share "
"these extended thoughts and ideas in the prompt so the model has comprehensive information "
"to work with for the best analysis."
),
}, },
"model": self.get_model_field_schema(), "model": self.get_model_field_schema(),
"problem_context": { "problem_context": {
@@ -77,6 +87,11 @@ class ThinkDeepTool(BaseTool):
"items": {"type": "string"}, "items": {"type": "string"},
"description": "Optional file paths or directories for additional context (must be absolute paths)", "description": "Optional file paths or directories for additional context (must be absolute paths)",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": "Optional images for visual analysis - diagrams, charts, system architectures, or any visual information to analyze",
},
"temperature": { "temperature": {
"type": "number", "type": "number",
"description": "Temperature for creative thinking (0-1, default 0.7)", "description": "Temperature for creative thinking (0-1, default 0.7)",

View File

@@ -22,11 +22,29 @@ class TracerRequest(ToolRequest):
prompt: str = Field( prompt: str = Field(
..., ...,
description="Detailed description of what to trace and WHY you need this analysis. Include context about what you're trying to understand, debug, or analyze. For precision mode: describe the specific method/function and what aspect of its execution flow you need to understand. For dependencies mode: describe the class/module and what relationships you need to map. Example: 'I need to understand how BookingManager.finalizeInvoice method is called throughout the system and what side effects it has, as I'm debugging payment processing issues' rather than just 'BookingManager finalizeInvoice method'", description=(
"Detailed description of what to trace and WHY you need this analysis. Include context about what "
"you're trying to understand, debug, or analyze. For precision mode: describe the specific "
"method/function and what aspect of its execution flow you need to understand. For dependencies "
"mode: describe the class/module and what relationships you need to map. Example: 'I need to "
"understand how BookingManager.finalizeInvoice method is called throughout the system and what "
"side effects it has, as I'm debugging payment processing issues' rather than just "
"'BookingManager finalizeInvoice method'"
),
) )
trace_mode: Literal["precision", "dependencies"] = Field( trace_mode: Literal["precision", "dependencies"] = Field(
..., ...,
description="Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)", description=(
"Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or "
"'dependencies' (for classes/modules/protocols - shows structural relationships)"
),
)
images: list[str] = Field(
default_factory=list,
description=(
"Optional images of system architecture diagrams, flow charts, or visual references to help "
"understand the tracing context"
),
) )
@@ -44,11 +62,15 @@ class TracerTool(BaseTool):
def get_description(self) -> str: def get_description(self) -> str:
return ( return (
"ANALYSIS PROMPT GENERATOR - Creates structured prompts for static code analysis. " "ANALYSIS PROMPT GENERATOR - Creates structured prompts for static code analysis. "
"Helps generate detailed analysis requests with specific method/function names, file paths, and component context. " "Helps generate detailed analysis requests with specific method/function names, file paths, and "
"Type 'precision': For methods/functions - traces execution flow, call chains, call stacks, and shows when/how they are used. " "component context. "
"Type 'dependencies': For classes/modules/protocols - maps structural relationships and bidirectional dependencies. " "Type 'precision': For methods/functions - traces execution flow, call chains, call stacks, and "
"shows when/how they are used. "
"Type 'dependencies': For classes/modules/protocols - maps structural relationships and "
"bidirectional dependencies. "
"Returns detailed instructions on how to perform the analysis and format the results. " "Returns detailed instructions on how to perform the analysis and format the results. "
"Use this to create focused analysis requests that can be fed back to Claude with the appropriate code files. " "Use this to create focused analysis requests that can be fed back to Claude with the appropriate "
"code files. "
) )
def get_input_schema(self) -> dict[str, Any]: def get_input_schema(self) -> dict[str, Any]:
@@ -57,13 +79,26 @@ class TracerTool(BaseTool):
"properties": { "properties": {
"prompt": { "prompt": {
"type": "string", "type": "string",
"description": "Detailed description of what to trace and WHY you need this analysis. Include context about what you're trying to understand, debug, or analyze. For precision mode: describe the specific method/function and what aspect of its execution flow you need to understand. For dependencies mode: describe the class/module and what relationships you need to map. Example: 'I need to understand how BookingManager.finalizeInvoice method is called throughout the system and what side effects it has, as I'm debugging payment processing issues' rather than just 'BookingManager finalizeInvoice method'", "description": (
"Detailed description of what to trace and WHY you need this analysis. Include context "
"about what you're trying to understand, debug, or analyze. For precision mode: describe "
"the specific method/function and what aspect of its execution flow you need to understand. "
"For dependencies mode: describe the class/module and what relationships you need to map. "
"Example: 'I need to understand how BookingManager.finalizeInvoice method is called "
"throughout the system and what side effects it has, as I'm debugging payment processing "
"issues' rather than just 'BookingManager finalizeInvoice method'"
),
}, },
"trace_mode": { "trace_mode": {
"type": "string", "type": "string",
"enum": ["precision", "dependencies"], "enum": ["precision", "dependencies"],
"description": "Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)", "description": "Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)",
}, },
"images": {
"type": "array",
"items": {"type": "string"},
"description": "Optional images of system architecture diagrams, flow charts, or visual references to help understand the tracing context",
},
}, },
"required": ["prompt", "trace_mode"], "required": ["prompt", "trace_mode"],
} }

View File

@@ -142,6 +142,7 @@ class ConversationTurn(BaseModel):
content: The actual message content/response content: The actual message content/response
timestamp: ISO timestamp when this turn was created timestamp: ISO timestamp when this turn was created
files: List of file paths referenced in this specific turn files: List of file paths referenced in this specific turn
images: List of image paths referenced in this specific turn
tool_name: Which tool generated this turn (for cross-tool tracking) tool_name: Which tool generated this turn (for cross-tool tracking)
model_provider: Provider used (e.g., "google", "openai") model_provider: Provider used (e.g., "google", "openai")
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini") model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
@@ -152,6 +153,7 @@ class ConversationTurn(BaseModel):
content: str content: str
timestamp: str timestamp: str
files: Optional[list[str]] = None # Files referenced in this turn files: Optional[list[str]] = None # Files referenced in this turn
images: Optional[list[str]] = None # Images referenced in this turn
tool_name: Optional[str] = None # Tool used for this turn tool_name: Optional[str] = None # Tool used for this turn
model_provider: Optional[str] = None # Model provider (google, openai, etc) model_provider: Optional[str] = None # Model provider (google, openai, etc)
model_name: Optional[str] = None # Specific model used model_name: Optional[str] = None # Specific model used
@@ -300,6 +302,7 @@ def add_turn(
role: str, role: str,
content: str, content: str,
files: Optional[list[str]] = None, files: Optional[list[str]] = None,
images: Optional[list[str]] = None,
tool_name: Optional[str] = None, tool_name: Optional[str] = None,
model_provider: Optional[str] = None, model_provider: Optional[str] = None,
model_name: Optional[str] = None, model_name: Optional[str] = None,
@@ -318,6 +321,7 @@ def add_turn(
role: "user" (Claude) or "assistant" (Gemini/O3/etc) role: "user" (Claude) or "assistant" (Gemini/O3/etc)
content: The actual message/response content content: The actual message/response content
files: Optional list of files referenced in this turn files: Optional list of files referenced in this turn
images: Optional list of images referenced in this turn
tool_name: Name of the tool adding this turn (for attribution) tool_name: Name of the tool adding this turn (for attribution)
model_provider: Provider used (e.g., "google", "openai") model_provider: Provider used (e.g., "google", "openai")
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini") model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
@@ -335,6 +339,7 @@ def add_turn(
- Refreshes thread TTL to configured timeout on successful update - Refreshes thread TTL to configured timeout on successful update
- Turn limits prevent runaway conversations - Turn limits prevent runaway conversations
- File references are preserved for cross-tool access with atomic ordering - File references are preserved for cross-tool access with atomic ordering
- Image references are preserved for cross-tool visual context
- Model information enables cross-provider conversations - Model information enables cross-provider conversations
""" """
logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})") logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
@@ -355,6 +360,7 @@ def add_turn(
content=content, content=content,
timestamp=datetime.now(timezone.utc).isoformat(), timestamp=datetime.now(timezone.utc).isoformat(),
files=files, # Preserved for cross-tool file context files=files, # Preserved for cross-tool file context
images=images, # Preserved for cross-tool visual context
tool_name=tool_name, # Track which tool generated this turn tool_name=tool_name, # Track which tool generated this turn
model_provider=model_provider, # Track model provider model_provider=model_provider, # Track model provider
model_name=model_name, # Track specific model model_name=model_name, # Track specific model
@@ -489,6 +495,78 @@ def get_conversation_file_list(context: ThreadContext) -> list[str]:
return file_list return file_list
def get_conversation_image_list(context: ThreadContext) -> list[str]:
"""
Extract all unique images from conversation turns with newest-first prioritization.
This function implements the identical prioritization logic as get_conversation_file_list()
to ensure consistency in how images are handled across conversation turns. It walks
backwards through conversation turns (from newest to oldest) and collects unique image
references, ensuring that when the same image appears in multiple turns, the reference
from the NEWEST turn takes precedence.
PRIORITIZATION ALGORITHM:
1. Iterate through turns in REVERSE order (index len-1 down to 0)
2. For each turn, process images in the order they appear in turn.images
3. Add image to result list only if not already seen (newest reference wins)
4. Skip duplicate images that were already added from newer turns
This ensures that:
- Images from newer conversation turns appear first in the result
- When the same image is referenced multiple times, only the newest reference is kept
- The order reflects the most recent conversation context
Example:
Turn 1: images = ["diagram.png", "flow.jpg"]
Turn 2: images = ["error.png"]
Turn 3: images = ["diagram.png", "updated.png"] # diagram.png appears again
Result: ["diagram.png", "updated.png", "error.png", "flow.jpg"]
(diagram.png from Turn 3 takes precedence over Turn 1)
Args:
context: ThreadContext containing all conversation turns to process
Returns:
list[str]: Unique image paths ordered by newest reference first.
Empty list if no turns exist or no images are referenced.
Performance:
- Time Complexity: O(n*m) where n=turns, m=avg images per turn
- Space Complexity: O(i) where i=total unique images
- Uses set for O(1) duplicate detection
"""
if not context.turns:
logger.debug("[IMAGES] No turns found, returning empty image list")
return []
# Collect images by walking backwards (newest to oldest turns)
seen_images = set()
image_list = []
logger.debug(f"[IMAGES] Collecting images from {len(context.turns)} turns (newest first)")
# Process turns in reverse order (newest first) - this is the CORE of newest-first prioritization
# By iterating from len-1 down to 0, we encounter newer turns before older turns
# When we find a duplicate image, we skip it because the newer version is already in our list
for i in range(len(context.turns) - 1, -1, -1): # REVERSE: newest turn first
turn = context.turns[i]
if turn.images:
logger.debug(f"[IMAGES] Turn {i + 1} has {len(turn.images)} images: {turn.images}")
for image_path in turn.images:
if image_path not in seen_images:
# First time seeing this image - add it (this is the NEWEST reference)
seen_images.add(image_path)
image_list.append(image_path)
logger.debug(f"[IMAGES] Added new image: {image_path} (from turn {i + 1})")
else:
# Image already seen from a NEWER turn - skip this older reference
logger.debug(f"[IMAGES] Skipping duplicate image: {image_path} (newer version already included)")
logger.debug(f"[IMAGES] Final image list ({len(image_list)}): {image_list}")
return image_list
def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]: def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
""" """
Plan which files to include based on size constraints. Plan which files to include based on size constraints.

View File

@@ -88,8 +88,9 @@ TEXT_DATA = {
".lock", # Lock files ".lock", # Lock files
} }
# Image file extensions # Image file extensions - limited to what AI models actually support
IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico", ".tiff", ".tif"} # Based on OpenAI and Gemini supported formats: PNG, JPEG, GIF, WebP
IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
# Binary executable and library extensions # Binary executable and library extensions
BINARIES = { BINARIES = {
@@ -240,3 +241,30 @@ def get_token_estimation_ratio(file_path: str) -> float:
extension = Path(file_path).suffix.lower() extension = Path(file_path).suffix.lower()
return TOKEN_ESTIMATION_RATIOS.get(extension, 3.5) # Conservative default return TOKEN_ESTIMATION_RATIOS.get(extension, 3.5) # Conservative default
# MIME type mappings for image files - limited to what AI models actually support
# Based on OpenAI and Gemini supported formats: PNG, JPEG, GIF, WebP
IMAGE_MIME_TYPES = {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
}
def get_image_mime_type(extension: str) -> str:
"""
Get the MIME type for an image file extension.
Args:
extension: File extension (with or without leading dot)
Returns:
MIME type string (default: image/jpeg for unknown extensions)
"""
if not extension.startswith("."):
extension = "." + extension
extension = extension.lower()
return IMAGE_MIME_TYPES.get(extension, "image/jpeg")

View File

@@ -48,6 +48,36 @@ from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TE
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
def _is_builtin_custom_models_config(path_str: str) -> bool:
"""
Check if path points to the server's built-in custom_models.json config file.
This only matches the server's internal config, not user-specified CUSTOM_MODELS_CONFIG_PATH.
We identify the built-in config by checking if it resolves to the server's conf directory.
Args:
path_str: Path to check
Returns:
True if this is the server's built-in custom_models.json config file
"""
try:
path = Path(path_str)
# Get the server root by going up from this file: utils/file_utils.py -> server_root
server_root = Path(__file__).parent.parent
builtin_config = server_root / "conf" / "custom_models.json"
# Check if the path resolves to the same file as our built-in config
# This handles both relative and absolute paths to the same file
return path.resolve() == builtin_config.resolve()
except Exception:
# If path resolution fails, it's not our built-in config
return False
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -271,7 +301,8 @@ def translate_path_for_environment(path_str: str) -> str:
tools and utilities throughout the codebase. It handles: tools and utilities throughout the codebase. It handles:
1. Docker host-to-container path translation (host paths -> /workspace/...) 1. Docker host-to-container path translation (host paths -> /workspace/...)
2. Direct mode (no translation needed) 2. Direct mode (no translation needed)
3. Security validation and error handling 3. Internal server files (conf/custom_models.json)
4. Security validation and error handling
Docker Path Translation Logic: Docker Path Translation Logic:
- Input: /Users/john/project/src/file.py (host path from Claude) - Input: /Users/john/project/src/file.py (host path from Claude)
@@ -284,32 +315,9 @@ def translate_path_for_environment(path_str: str) -> str:
Returns: Returns:
Translated path appropriate for the current environment Translated path appropriate for the current environment
""" """
# Allow access to specific internal application configuration files # Handle built-in server config file - no translation needed
# Store as relative paths so they work in both Docker and standalone modes if _is_builtin_custom_models_config(path_str):
# Use exact paths for security - no wildcards or prefix matching return path_str
ALLOWED_INTERNAL_PATHS = {
"conf/custom_models.json",
# Add other specific internal files here as needed
}
# Check for internal app paths - extract relative part if it's an /app/ path
relative_internal_path = None
if path_str.startswith("/app/"):
relative_internal_path = path_str[5:] # Remove "/app/" prefix
if relative_internal_path.startswith("/"):
relative_internal_path = relative_internal_path[1:] # Remove leading slash if present
# Check if this is an allowed internal file
if relative_internal_path and relative_internal_path in ALLOWED_INTERNAL_PATHS:
# Translate to appropriate path for current environment
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
# Standalone mode: use relative path
return "./" + relative_internal_path
else:
# Docker mode: use absolute app path
return "/app/" + relative_internal_path
# Handle other /app/ paths in standalone mode (for non-whitelisted files)
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists(): if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
if path_str.startswith("/app/"): if path_str.startswith("/app/"):
# Convert Docker internal paths to local relative paths for standalone mode # Convert Docker internal paths to local relative paths for standalone mode