Merge branch 'BeehiveInnovations:main' into fix/google-allowed-models-restriction
This commit is contained in:
1
CLAUDE.local.md
Normal file
1
CLAUDE.local.md
Normal file
@@ -0,0 +1 @@
|
|||||||
|
- Before any commit / push to github, you must first always run and confirm run that code quality checks pass. Use @code_quality_checks.sh and confirm that we have 100% unit tests passing.
|
||||||
@@ -112,6 +112,11 @@ docker logs zen-mcp-redis
|
|||||||
|
|
||||||
### Testing
|
### Testing
|
||||||
|
|
||||||
|
Simulation tests are available to test the MCP server in a 'live' scenario, using your configured
|
||||||
|
API keys to ensure the models are working and the server is able to communicate back and forth.
|
||||||
|
IMPORTANT: Any time any code is changed or updated, you MUST first restart it with ./run-server.sh OR
|
||||||
|
pass `--rebuild` to the `communication_simulator_test.py` script (if running it for the first time after changes) so that it's able to restart and use the latest code.
|
||||||
|
|
||||||
#### Run All Simulator Tests
|
#### Run All Simulator Tests
|
||||||
```bash
|
```bash
|
||||||
# Run the complete test suite
|
# Run the complete test suite
|
||||||
|
|||||||
16
README.md
16
README.md
@@ -80,6 +80,7 @@ Claude is brilliant, but sometimes you need:
|
|||||||
- **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
|
- **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
|
||||||
- **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis
|
- **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis
|
||||||
- **Smart file handling** - Automatically expands directories, manages token limits based on model capacity
|
- **Smart file handling** - Automatically expands directories, manages token limits based on model capacity
|
||||||
|
- **Vision support** - Analyze images, diagrams, screenshots, and visual content with vision-capable models
|
||||||
- **[Bypass MCP's token limits](docs/advanced-usage.md#working-with-large-prompts)** - Work around MCP's 25K limit automatically
|
- **[Bypass MCP's token limits](docs/advanced-usage.md#working-with-large-prompts)** - Work around MCP's 25K limit automatically
|
||||||
- **[Context revival across sessions](docs/context-revival.md)** - Continue conversations even after Claude's context resets, with other models maintaining full history
|
- **[Context revival across sessions](docs/context-revival.md)** - Continue conversations even after Claude's context resets, with other models maintaining full history
|
||||||
|
|
||||||
@@ -314,6 +315,7 @@ and then debate with the other models to give me a final verdict
|
|||||||
- Technology comparisons and best practices
|
- Technology comparisons and best practices
|
||||||
- Architecture and design discussions
|
- Architecture and design discussions
|
||||||
- Can reference files for context: `"Use gemini to explain this algorithm with context from algorithm.py"`
|
- Can reference files for context: `"Use gemini to explain this algorithm with context from algorithm.py"`
|
||||||
|
- **Image support**: Include screenshots, diagrams, UI mockups for visual analysis: `"Chat with gemini about this error dialog screenshot to understand the user experience issue"`
|
||||||
- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
|
- **Dynamic collaboration**: Gemini can request additional files or context during the conversation if needed for a more thorough response
|
||||||
- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
|
- **Web search capability**: Analyzes when web searches would be helpful and recommends specific searches for Claude to perform, ensuring access to current documentation and best practices
|
||||||
|
|
||||||
@@ -337,6 +339,7 @@ with the best architecture for my project
|
|||||||
- Offers alternative perspectives and approaches
|
- Offers alternative perspectives and approaches
|
||||||
- Validates architectural decisions and design patterns
|
- Validates architectural decisions and design patterns
|
||||||
- Can reference specific files for context: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
|
- Can reference specific files for context: `"Use gemini to think deeper about my API design with reference to api/routes.py"`
|
||||||
|
- **Image support**: Analyze architectural diagrams, flowcharts, design mockups: `"Think deeper about this system architecture diagram with gemini pro using max thinking mode"`
|
||||||
- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
|
- **Enhanced Critical Evaluation (v2.10.0)**: After Gemini's analysis, Claude is prompted to critically evaluate the suggestions, consider context and constraints, identify risks, and synthesize a final recommendation - ensuring a balanced, well-considered solution
|
||||||
- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
|
- **Web search capability**: When enabled (default: true), identifies areas where current documentation or community solutions would strengthen the analysis and suggests specific searches for Claude
|
||||||
|
|
||||||
@@ -362,6 +365,7 @@ I need an actionable plan but break it down into smaller quick-wins that we can
|
|||||||
- Supports specialized reviews: security, performance, quick
|
- Supports specialized reviews: security, performance, quick
|
||||||
- Can enforce coding standards: `"Use gemini to review src/ against PEP8 standards"`
|
- Can enforce coding standards: `"Use gemini to review src/ against PEP8 standards"`
|
||||||
- Filters by severity: `"Get gemini to review auth/ - only report critical vulnerabilities"`
|
- Filters by severity: `"Get gemini to review auth/ - only report critical vulnerabilities"`
|
||||||
|
- **Image support**: Review code from screenshots, error dialogs, or visual bug reports: `"Review this error screenshot and the related auth.py file for potential security issues"`
|
||||||
|
|
||||||
### 4. `precommit` - Pre-Commit Validation
|
### 4. `precommit` - Pre-Commit Validation
|
||||||
**Comprehensive review of staged/unstaged git changes across multiple repositories**
|
**Comprehensive review of staged/unstaged git changes across multiple repositories**
|
||||||
@@ -408,6 +412,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
|
|||||||
- `review_type`: full|security|performance|quick
|
- `review_type`: full|security|performance|quick
|
||||||
- `severity_filter`: Filter by issue severity
|
- `severity_filter`: Filter by issue severity
|
||||||
- `max_depth`: How deep to search for nested repos
|
- `max_depth`: How deep to search for nested repos
|
||||||
|
- `images`: Screenshots of requirements, design mockups, or error states for validation context
|
||||||
### 5. `debug` - Expert Debugging Assistant
|
### 5. `debug` - Expert Debugging Assistant
|
||||||
**Root cause analysis for complex problems**
|
**Root cause analysis for complex problems**
|
||||||
|
|
||||||
@@ -428,6 +433,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
|
|||||||
- Supports runtime info and previous attempts
|
- Supports runtime info and previous attempts
|
||||||
- Provides structured root cause analysis with validation steps
|
- Provides structured root cause analysis with validation steps
|
||||||
- Can request additional context when needed for thorough analysis
|
- Can request additional context when needed for thorough analysis
|
||||||
|
- **Image support**: Include error screenshots, stack traces, console output: `"Debug this error using gemini with the stack trace screenshot and the failing test.py"`
|
||||||
- **Web search capability**: When enabled (default: true), identifies when searching for error messages, known issues, or documentation would help solve the problem and recommends specific searches for Claude
|
- **Web search capability**: When enabled (default: true), identifies when searching for error messages, known issues, or documentation would help solve the problem and recommends specific searches for Claude
|
||||||
### 6. `analyze` - Smart File Analysis
|
### 6. `analyze` - Smart File Analysis
|
||||||
**General-purpose code understanding and exploration**
|
**General-purpose code understanding and exploration**
|
||||||
@@ -447,6 +453,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
|
|||||||
- Supports specialized analysis types: architecture, performance, security, quality
|
- Supports specialized analysis types: architecture, performance, security, quality
|
||||||
- Uses file paths (not content) for clean terminal output
|
- Uses file paths (not content) for clean terminal output
|
||||||
- Can identify patterns, anti-patterns, and refactoring opportunities
|
- Can identify patterns, anti-patterns, and refactoring opportunities
|
||||||
|
- **Image support**: Analyze architecture diagrams, UML charts, flowcharts: `"Analyze this system diagram with gemini to understand the data flow and identify bottlenecks"`
|
||||||
- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
|
- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
|
||||||
|
|
||||||
### 7. `refactor` - Intelligent Code Refactoring
|
### 7. `refactor` - Intelligent Code Refactoring
|
||||||
@@ -489,6 +496,7 @@ did *not* discover.
|
|||||||
- **Conservative approach** - Careful dependency analysis to prevent breaking changes
|
- **Conservative approach** - Careful dependency analysis to prevent breaking changes
|
||||||
- **Multi-file analysis** - Understands cross-file relationships and dependencies
|
- **Multi-file analysis** - Understands cross-file relationships and dependencies
|
||||||
- **Priority sequencing** - Recommends implementation order for refactoring changes
|
- **Priority sequencing** - Recommends implementation order for refactoring changes
|
||||||
|
- **Image support**: Analyze code architecture diagrams, legacy system charts: `"Refactor this legacy module using gemini pro with the current architecture diagram"`
|
||||||
|
|
||||||
**Refactor Types (Progressive Priority System):**
|
**Refactor Types (Progressive Priority System):**
|
||||||
|
|
||||||
@@ -529,7 +537,8 @@ Claude can use to efficiently trace execution flows and map dependencies within
|
|||||||
- Creates structured instructions for call-flow graph generation
|
- Creates structured instructions for call-flow graph generation
|
||||||
- Provides detailed formatting requirements for consistent output
|
- Provides detailed formatting requirements for consistent output
|
||||||
- Supports any programming language with automatic convention detection
|
- Supports any programming language with automatic convention detection
|
||||||
- Output can be used as an input into another tool, such as `chat` along with related code files to perform a logical call-flow analysis
|
- Output can be used as an input into another tool, such as `chat` along with related code files to perform a logical call-flow analysis
|
||||||
|
- **Image support**: Analyze visual call flow diagrams, sequence diagrams: `"Generate tracer analysis for this payment flow using the sequence diagram"`
|
||||||
|
|
||||||
#### Example Prompts:
|
#### Example Prompts:
|
||||||
```
|
```
|
||||||
@@ -564,6 +573,7 @@ suites that cover realistic failure scenarios and integration points that shorte
|
|||||||
- Prioritizes smallest test files for pattern detection
|
- Prioritizes smallest test files for pattern detection
|
||||||
- Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
|
- Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
|
||||||
- Specific code coverage - target specific functions/classes rather than testing everything
|
- Specific code coverage - target specific functions/classes rather than testing everything
|
||||||
|
- **Image support**: Test UI components, analyze visual requirements: `"Generate tests for this login form using the UI mockup screenshot"`
|
||||||
|
|
||||||
### 10. `version` - Server Information
|
### 10. `version` - Server Information
|
||||||
```
|
```
|
||||||
@@ -626,6 +636,7 @@ This server enables **true AI collaboration** between Claude and multiple AI mod
|
|||||||
- **Automatic 25K limit bypass**: Each exchange sends only incremental context, allowing unlimited total conversation size
|
- **Automatic 25K limit bypass**: Each exchange sends only incremental context, allowing unlimited total conversation size
|
||||||
- Up to 10 exchanges per conversation (configurable via `MAX_CONVERSATION_TURNS`) with 3-hour expiry (configurable via `CONVERSATION_TIMEOUT_HOURS`)
|
- Up to 10 exchanges per conversation (configurable via `MAX_CONVERSATION_TURNS`) with 3-hour expiry (configurable via `CONVERSATION_TIMEOUT_HOURS`)
|
||||||
- Thread-safe with Redis persistence across all tools
|
- Thread-safe with Redis persistence across all tools
|
||||||
|
- **Image context preservation** - Images and visual references are maintained across conversation turns and tool switches
|
||||||
|
|
||||||
**Cross-tool & Cross-Model Continuation Example:**
|
**Cross-tool & Cross-Model Continuation Example:**
|
||||||
```
|
```
|
||||||
@@ -659,7 +670,7 @@ DEFAULT_MODEL=auto # Claude picks the best model automatically
|
|||||||
|
|
||||||
# API Keys (at least one required)
|
# API Keys (at least one required)
|
||||||
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
||||||
OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high
|
OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high, GPT-4.1
|
||||||
```
|
```
|
||||||
|
|
||||||
**Available Models:**
|
**Available Models:**
|
||||||
@@ -669,6 +680,7 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high
|
|||||||
- **`o3mini`**: Balanced speed/quality
|
- **`o3mini`**: Balanced speed/quality
|
||||||
- **`o4-mini`**: Latest reasoning model, optimized for shorter contexts
|
- **`o4-mini`**: Latest reasoning model, optimized for shorter contexts
|
||||||
- **`o4-mini-high`**: Enhanced O4 with higher reasoning effort
|
- **`o4-mini-high`**: Enhanced O4 with higher reasoning effort
|
||||||
|
- **`gpt4.1`**: GPT-4.1 with 1M context window
|
||||||
- **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.)
|
- **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.)
|
||||||
|
|
||||||
For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).
|
For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).
|
||||||
|
|||||||
@@ -25,6 +25,8 @@
|
|||||||
"supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)",
|
"supports_extended_thinking": "Whether the model supports extended reasoning tokens (currently none do via OpenRouter or custom APIs)",
|
||||||
"supports_json_mode": "Whether the model can guarantee valid JSON output",
|
"supports_json_mode": "Whether the model can guarantee valid JSON output",
|
||||||
"supports_function_calling": "Whether the model supports function/tool calling",
|
"supports_function_calling": "Whether the model supports function/tool calling",
|
||||||
|
"supports_images": "Whether the model can process images/visual input",
|
||||||
|
"max_image_size_mb": "Maximum total size in MB for all images combined (capped at 40MB max for custom models)",
|
||||||
"is_custom": "Set to true for models that should ONLY be used with custom API endpoints (Ollama, vLLM, etc.). False or omitted for OpenRouter/cloud models.",
|
"is_custom": "Set to true for models that should ONLY be used with custom API endpoints (Ollama, vLLM, etc.). False or omitted for OpenRouter/cloud models.",
|
||||||
"description": "Human-readable description of the model"
|
"description": "Human-readable description of the model"
|
||||||
},
|
},
|
||||||
@@ -35,6 +37,8 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 10.0,
|
||||||
"is_custom": true,
|
"is_custom": true,
|
||||||
"description": "Example custom/local model for Ollama, vLLM, etc."
|
"description": "Example custom/local model for Ollama, vLLM, etc."
|
||||||
}
|
}
|
||||||
@@ -47,7 +51,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Claude 3 Opus - Most capable Claude model"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 5.0,
|
||||||
|
"description": "Claude 3 Opus - Most capable Claude model with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "anthropic/claude-3-sonnet",
|
"model_name": "anthropic/claude-3-sonnet",
|
||||||
@@ -56,7 +62,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Claude 3 Sonnet - Balanced performance"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 5.0,
|
||||||
|
"description": "Claude 3 Sonnet - Balanced performance with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "anthropic/claude-3-haiku",
|
"model_name": "anthropic/claude-3-haiku",
|
||||||
@@ -65,7 +73,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Claude 3 Haiku - Fast and efficient"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 5.0,
|
||||||
|
"description": "Claude 3 Haiku - Fast and efficient with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "google/gemini-2.5-pro-preview",
|
"model_name": "google/gemini-2.5-pro-preview",
|
||||||
@@ -74,7 +84,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Google's Gemini 2.5 Pro via OpenRouter"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "Google's Gemini 2.5 Pro via OpenRouter with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "google/gemini-2.5-flash-preview-05-20",
|
"model_name": "google/gemini-2.5-flash-preview-05-20",
|
||||||
@@ -83,7 +95,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Google's Gemini 2.5 Flash via OpenRouter"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 15.0,
|
||||||
|
"description": "Google's Gemini 2.5 Flash via OpenRouter with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "mistral/mistral-large",
|
"model_name": "mistral/mistral-large",
|
||||||
@@ -92,7 +106,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "Mistral's largest model"
|
"supports_images": false,
|
||||||
|
"max_image_size_mb": 0.0,
|
||||||
|
"description": "Mistral's largest model (text-only)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "meta-llama/llama-3-70b",
|
"model_name": "meta-llama/llama-3-70b",
|
||||||
@@ -101,7 +117,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Meta's Llama 3 70B model"
|
"supports_images": false,
|
||||||
|
"max_image_size_mb": 0.0,
|
||||||
|
"description": "Meta's Llama 3 70B model (text-only)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "deepseek/deepseek-r1-0528",
|
"model_name": "deepseek/deepseek-r1-0528",
|
||||||
@@ -110,7 +128,9 @@
|
|||||||
"supports_extended_thinking": true,
|
"supports_extended_thinking": true,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "DeepSeek R1 with thinking mode - advanced reasoning capabilities"
|
"supports_images": false,
|
||||||
|
"max_image_size_mb": 0.0,
|
||||||
|
"description": "DeepSeek R1 with thinking mode - advanced reasoning capabilities (text-only)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "perplexity/llama-3-sonar-large-32k-online",
|
"model_name": "perplexity/llama-3-sonar-large-32k-online",
|
||||||
@@ -119,7 +139,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
"description": "Perplexity's online model with web search"
|
"supports_images": false,
|
||||||
|
"max_image_size_mb": 0.0,
|
||||||
|
"description": "Perplexity's online model with web search (text-only)"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o3",
|
"model_name": "openai/o3",
|
||||||
@@ -128,7 +150,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3 model - well-rounded and powerful across domains"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o3 model - well-rounded and powerful across domains with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o3-mini",
|
"model_name": "openai/o3-mini",
|
||||||
@@ -137,7 +161,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3-mini model - balanced performance and speed"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o3-mini model - balanced performance and speed with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o3-mini-high",
|
"model_name": "openai/o3-mini-high",
|
||||||
@@ -146,7 +172,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o3-pro",
|
"model_name": "openai/o3-pro",
|
||||||
@@ -155,7 +183,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3-pro model - professional-grade reasoning and analysis"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o3-pro model - professional-grade reasoning and analysis with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o4-mini",
|
"model_name": "openai/o4-mini",
|
||||||
@@ -164,7 +194,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o4-mini model - optimized for shorter contexts with rapid reasoning"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o4-mini model - optimized for shorter contexts with rapid reasoning and vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o4-mini-high",
|
"model_name": "openai/o4-mini-high",
|
||||||
@@ -173,7 +205,9 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o4-mini with high reasoning effort - enhanced for complex tasks"
|
"supports_images": true,
|
||||||
|
"max_image_size_mb": 20.0,
|
||||||
|
"description": "OpenAI's o4-mini with high reasoning effort - enhanced for complex tasks with vision"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"model_name": "llama3.2",
|
"model_name": "llama3.2",
|
||||||
@@ -182,8 +216,10 @@
|
|||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": false,
|
"supports_json_mode": false,
|
||||||
"supports_function_calling": false,
|
"supports_function_calling": false,
|
||||||
|
"supports_images": false,
|
||||||
|
"max_image_size_mb": 0.0,
|
||||||
"is_custom": true,
|
"is_custom": true,
|
||||||
"description": "Local Llama 3.2 model via custom endpoint (Ollama/vLLM) - 128K context window"
|
"description": "Local Llama 3.2 model via custom endpoint (Ollama/vLLM) - 128K context window (text-only)"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -14,7 +14,7 @@ import os
|
|||||||
# These values are used in server responses and for tracking releases
|
# These values are used in server responses and for tracking releases
|
||||||
# IMPORTANT: This is the single source of truth for version and author info
|
# IMPORTANT: This is the single source of truth for version and author info
|
||||||
# Semantic versioning: MAJOR.MINOR.PATCH
|
# Semantic versioning: MAJOR.MINOR.PATCH
|
||||||
__version__ = "4.7.5"
|
__version__ = "4.8.0"
|
||||||
# Last update date in ISO format
|
# Last update date in ISO format
|
||||||
__updated__ = "2025-06-16"
|
__updated__ = "2025-06-16"
|
||||||
# Primary maintainer
|
# Primary maintainer
|
||||||
|
|||||||
@@ -8,13 +8,13 @@ services:
|
|||||||
- "6379:6379"
|
- "6379:6379"
|
||||||
volumes:
|
volumes:
|
||||||
- redis_data:/data
|
- redis_data:/data
|
||||||
command: redis-server --save 60 1 --loglevel warning --maxmemory 64mb --maxmemory-policy allkeys-lru
|
command: redis-server --save 60 1 --loglevel warning --maxmemory 512mb --maxmemory-policy allkeys-lru
|
||||||
deploy:
|
deploy:
|
||||||
resources:
|
resources:
|
||||||
limits:
|
limits:
|
||||||
memory: 1G
|
memory: 1G
|
||||||
reservations:
|
reservations:
|
||||||
memory: 256M
|
memory: 128M
|
||||||
|
|
||||||
zen-mcp:
|
zen-mcp:
|
||||||
build: .
|
build: .
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ This guide covers advanced features, configuration options, and workflows for po
|
|||||||
- [Context Revival: AI Memory Beyond Context Limits](#context-revival-ai-memory-beyond-context-limits)
|
- [Context Revival: AI Memory Beyond Context Limits](#context-revival-ai-memory-beyond-context-limits)
|
||||||
- [Collaborative Workflows](#collaborative-workflows)
|
- [Collaborative Workflows](#collaborative-workflows)
|
||||||
- [Working with Large Prompts](#working-with-large-prompts)
|
- [Working with Large Prompts](#working-with-large-prompts)
|
||||||
|
- [Vision Support](#vision-support)
|
||||||
- [Web Search Integration](#web-search-integration)
|
- [Web Search Integration](#web-search-integration)
|
||||||
- [System Prompts](#system-prompts)
|
- [System Prompts](#system-prompts)
|
||||||
|
|
||||||
@@ -25,7 +26,7 @@ DEFAULT_MODEL=auto # Claude picks the best model automatically
|
|||||||
|
|
||||||
# API Keys (at least one required)
|
# API Keys (at least one required)
|
||||||
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
||||||
OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high
|
OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high, GPT-4.1
|
||||||
```
|
```
|
||||||
|
|
||||||
**How Auto Mode Works:**
|
**How Auto Mode Works:**
|
||||||
@@ -43,6 +44,7 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini, O4-mini, O4-mini-high
|
|||||||
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
|
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
|
||||||
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
|
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
|
||||||
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
|
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
|
||||||
|
| **`gpt4.1`** | OpenAI | 1M tokens | Latest GPT-4 with extended context | Large codebase analysis, comprehensive reviews |
|
||||||
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
|
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
|
||||||
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
|
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
|
||||||
|
|
||||||
@@ -57,6 +59,7 @@ You can specify a default model instead of auto mode:
|
|||||||
DEFAULT_MODEL=gemini-2.5-pro-preview-06-05 # Always use Gemini Pro
|
DEFAULT_MODEL=gemini-2.5-pro-preview-06-05 # Always use Gemini Pro
|
||||||
DEFAULT_MODEL=flash # Always use Flash
|
DEFAULT_MODEL=flash # Always use Flash
|
||||||
DEFAULT_MODEL=o3 # Always use O3
|
DEFAULT_MODEL=o3 # Always use O3
|
||||||
|
DEFAULT_MODEL=gpt4.1 # Always use GPT-4.1
|
||||||
```
|
```
|
||||||
|
|
||||||
**Important:** After changing any configuration in `.env` (including `DEFAULT_MODEL`, API keys, or other settings), restart the server with `./run-server.sh` to apply the changes.
|
**Important:** After changing any configuration in `.env` (including `DEFAULT_MODEL`, API keys, or other settings), restart the server with `./run-server.sh` to apply the changes.
|
||||||
@@ -67,10 +70,12 @@ Regardless of your default setting, you can specify models per request:
|
|||||||
- "Use **flash** to quickly format this code"
|
- "Use **flash** to quickly format this code"
|
||||||
- "Use **o3** to debug this logic error"
|
- "Use **o3** to debug this logic error"
|
||||||
- "Review with **o4-mini** for balanced analysis"
|
- "Review with **o4-mini** for balanced analysis"
|
||||||
|
- "Use **gpt4.1** for comprehensive codebase analysis"
|
||||||
|
|
||||||
**Model Capabilities:**
|
**Model Capabilities:**
|
||||||
- **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
|
- **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
|
||||||
- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
|
- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
|
||||||
|
- **GPT-4.1**: Extended context window (1M tokens), general capabilities
|
||||||
|
|
||||||
## Model Usage Restrictions
|
## Model Usage Restrictions
|
||||||
|
|
||||||
@@ -186,7 +191,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
**`analyze`** - Analyze files or directories
|
**`analyze`** - Analyze files or directories
|
||||||
- `files`: List of file paths or directories (required)
|
- `files`: List of file paths or directories (required)
|
||||||
- `question`: What to analyze (required)
|
- `question`: What to analyze (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `analysis_type`: architecture|performance|security|quality|general
|
- `analysis_type`: architecture|performance|security|quality|general
|
||||||
- `output_format`: summary|detailed|actionable
|
- `output_format`: summary|detailed|actionable
|
||||||
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
||||||
@@ -201,7 +206,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
|
|
||||||
**`codereview`** - Review code files or directories
|
**`codereview`** - Review code files or directories
|
||||||
- `files`: List of file paths or directories (required)
|
- `files`: List of file paths or directories (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `review_type`: full|security|performance|quick
|
- `review_type`: full|security|performance|quick
|
||||||
- `focus_on`: Specific aspects to focus on
|
- `focus_on`: Specific aspects to focus on
|
||||||
- `standards`: Coding standards to enforce
|
- `standards`: Coding standards to enforce
|
||||||
@@ -217,7 +222,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
|
|
||||||
**`debug`** - Debug with file context
|
**`debug`** - Debug with file context
|
||||||
- `error_description`: Description of the issue (required)
|
- `error_description`: Description of the issue (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `error_context`: Stack trace or logs
|
- `error_context`: Stack trace or logs
|
||||||
- `files`: Files or directories related to the issue
|
- `files`: Files or directories related to the issue
|
||||||
- `runtime_info`: Environment details
|
- `runtime_info`: Environment details
|
||||||
@@ -233,7 +238,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
|
|
||||||
**`thinkdeep`** - Extended analysis with file context
|
**`thinkdeep`** - Extended analysis with file context
|
||||||
- `current_analysis`: Your current thinking (required)
|
- `current_analysis`: Your current thinking (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `problem_context`: Additional context
|
- `problem_context`: Additional context
|
||||||
- `focus_areas`: Specific aspects to focus on
|
- `focus_areas`: Specific aspects to focus on
|
||||||
- `files`: Files or directories for context
|
- `files`: Files or directories for context
|
||||||
@@ -249,7 +254,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
**`testgen`** - Comprehensive test generation with edge case coverage
|
**`testgen`** - Comprehensive test generation with edge case coverage
|
||||||
- `files`: Code files or directories to generate tests for (required)
|
- `files`: Code files or directories to generate tests for (required)
|
||||||
- `prompt`: Description of what to test, testing objectives, and scope (required)
|
- `prompt`: Description of what to test, testing objectives, and scope (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `test_examples`: Optional existing test files as style/pattern reference
|
- `test_examples`: Optional existing test files as style/pattern reference
|
||||||
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
||||||
|
|
||||||
@@ -264,7 +269,7 @@ All tools that work with files support **both individual files and entire direct
|
|||||||
- `files`: Code files or directories to analyze for refactoring opportunities (required)
|
- `files`: Code files or directories to analyze for refactoring opportunities (required)
|
||||||
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
|
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
|
||||||
- `refactor_type`: codesmells|decompose|modernize|organization (required)
|
- `refactor_type`: codesmells|decompose|modernize|organization (required)
|
||||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
|
||||||
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
|
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
|
||||||
- `style_guide_examples`: Optional existing code files to use as style/pattern reference
|
- `style_guide_examples`: Optional existing code files to use as style/pattern reference
|
||||||
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
||||||
@@ -357,6 +362,47 @@ To help choose the right tool for your needs:
|
|||||||
- `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues
|
- `refactor` vs `codereview`: refactor suggests structural improvements, codereview finds bugs/issues
|
||||||
- `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding
|
- `refactor` vs `analyze`: refactor provides actionable refactoring steps, analyze provides understanding
|
||||||
|
|
||||||
|
## Vision Support
|
||||||
|
|
||||||
|
The Zen MCP server supports vision-capable models for analyzing images, diagrams, screenshots, and visual content. Vision support works seamlessly with all tools and conversation threading.
|
||||||
|
|
||||||
|
**Supported Models:**
|
||||||
|
- **Gemini 2.5 Pro & Flash**: Excellent for diagrams, architecture analysis, UI mockups (up to 20MB total)
|
||||||
|
- **OpenAI O3/O4 series**: Strong for visual debugging, error screenshots (up to 20MB total)
|
||||||
|
- **Claude models via OpenRouter**: Good for code screenshots, visual analysis (up to 5MB total)
|
||||||
|
- **Custom models**: Support varies by model, with 40MB maximum enforced for abuse prevention
|
||||||
|
|
||||||
|
**Usage Examples:**
|
||||||
|
```bash
|
||||||
|
# Debug with error screenshots
|
||||||
|
"Use zen to debug this error with the stack trace screenshot and error.py"
|
||||||
|
|
||||||
|
# Architecture analysis with diagrams
|
||||||
|
"Analyze this system architecture diagram with gemini pro for bottlenecks"
|
||||||
|
|
||||||
|
# UI review with mockups
|
||||||
|
"Chat with flash about this UI mockup - is the layout intuitive?"
|
||||||
|
|
||||||
|
# Code review with visual context
|
||||||
|
"Review this authentication code along with the error dialog screenshot"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Image Formats Supported:**
|
||||||
|
- **Images**: JPG, PNG, GIF, WebP, BMP, SVG, TIFF
|
||||||
|
- **Documents**: PDF (where supported by model)
|
||||||
|
- **Data URLs**: Base64-encoded images from Claude
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- **Automatic validation**: File type, magic bytes, and size validation
|
||||||
|
- **Conversation context**: Images persist across tool switches and continuation
|
||||||
|
- **Budget management**: Automatic dropping of old images when limits exceeded
|
||||||
|
- **Model capability-aware**: Only sends images to vision-capable models
|
||||||
|
|
||||||
|
**Best Practices:**
|
||||||
|
- Describe images when including them: "screenshot of login error", "system architecture diagram"
|
||||||
|
- Use appropriate models: Gemini for complex diagrams, O3 for debugging visuals
|
||||||
|
- Consider image sizes: Larger images consume more of the model's capacity
|
||||||
|
|
||||||
## Working with Large Prompts
|
## Working with Large Prompts
|
||||||
|
|
||||||
The MCP protocol has a combined request+response limit of approximately 25K tokens. This server intelligently works around this limitation by automatically handling large prompts as files:
|
The MCP protocol has a combined request+response limit of approximately 25K tokens. This server intelligently works around this limitation by automatically handling large prompts as files:
|
||||||
|
|||||||
@@ -112,6 +112,8 @@ class ModelCapabilities:
|
|||||||
supports_system_prompts: bool = True
|
supports_system_prompts: bool = True
|
||||||
supports_streaming: bool = True
|
supports_streaming: bool = True
|
||||||
supports_function_calling: bool = False
|
supports_function_calling: bool = False
|
||||||
|
supports_images: bool = False # Whether model can process images
|
||||||
|
max_image_size_mb: float = 0.0 # Maximum total size for all images in MB
|
||||||
|
|
||||||
# Temperature constraint object - preferred way to define temperature limits
|
# Temperature constraint object - preferred way to define temperature limits
|
||||||
temperature_constraint: TemperatureConstraint = field(
|
temperature_constraint: TemperatureConstraint = field(
|
||||||
|
|||||||
@@ -1,6 +1,8 @@
|
|||||||
"""Gemini model provider implementation."""
|
"""Gemini model provider implementation."""
|
||||||
|
|
||||||
|
import base64
|
||||||
import logging
|
import logging
|
||||||
|
import os
|
||||||
import time
|
import time
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
@@ -21,11 +23,15 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
"context_window": 1_048_576, # 1M tokens
|
"context_window": 1_048_576, # 1M tokens
|
||||||
"supports_extended_thinking": True,
|
"supports_extended_thinking": True,
|
||||||
"max_thinking_tokens": 24576, # Flash 2.5 thinking budget limit
|
"max_thinking_tokens": 24576, # Flash 2.5 thinking budget limit
|
||||||
|
"supports_images": True, # Vision capability
|
||||||
|
"max_image_size_mb": 20.0, # Conservative 20MB limit for reliability
|
||||||
},
|
},
|
||||||
"gemini-2.5-pro-preview-06-05": {
|
"gemini-2.5-pro-preview-06-05": {
|
||||||
"context_window": 1_048_576, # 1M tokens
|
"context_window": 1_048_576, # 1M tokens
|
||||||
"supports_extended_thinking": True,
|
"supports_extended_thinking": True,
|
||||||
"max_thinking_tokens": 32768, # Pro 2.5 thinking budget limit
|
"max_thinking_tokens": 32768, # Pro 2.5 thinking budget limit
|
||||||
|
"supports_images": True, # Vision capability
|
||||||
|
"max_image_size_mb": 32.0, # Higher limit for Pro model
|
||||||
},
|
},
|
||||||
# Shorthands
|
# Shorthands
|
||||||
"flash": "gemini-2.5-flash-preview-05-20",
|
"flash": "gemini-2.5-flash-preview-05-20",
|
||||||
@@ -84,6 +90,8 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
supports_system_prompts=True,
|
supports_system_prompts=True,
|
||||||
supports_streaming=True,
|
supports_streaming=True,
|
||||||
supports_function_calling=True,
|
supports_function_calling=True,
|
||||||
|
supports_images=config.get("supports_images", False),
|
||||||
|
max_image_size_mb=config.get("max_image_size_mb", 0.0),
|
||||||
temperature_constraint=temp_constraint,
|
temperature_constraint=temp_constraint,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -95,6 +103,7 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
temperature: float = 0.7,
|
temperature: float = 0.7,
|
||||||
max_output_tokens: Optional[int] = None,
|
max_output_tokens: Optional[int] = None,
|
||||||
thinking_mode: str = "medium",
|
thinking_mode: str = "medium",
|
||||||
|
images: Optional[list[str]] = None,
|
||||||
**kwargs,
|
**kwargs,
|
||||||
) -> ModelResponse:
|
) -> ModelResponse:
|
||||||
"""Generate content using Gemini model."""
|
"""Generate content using Gemini model."""
|
||||||
@@ -102,12 +111,34 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
resolved_name = self._resolve_model_name(model_name)
|
resolved_name = self._resolve_model_name(model_name)
|
||||||
self.validate_parameters(model_name, temperature)
|
self.validate_parameters(model_name, temperature)
|
||||||
|
|
||||||
# Combine system prompt with user prompt if provided
|
# Prepare content parts (text and potentially images)
|
||||||
|
parts = []
|
||||||
|
|
||||||
|
# Add system and user prompts as text
|
||||||
if system_prompt:
|
if system_prompt:
|
||||||
full_prompt = f"{system_prompt}\n\n{prompt}"
|
full_prompt = f"{system_prompt}\n\n{prompt}"
|
||||||
else:
|
else:
|
||||||
full_prompt = prompt
|
full_prompt = prompt
|
||||||
|
|
||||||
|
parts.append({"text": full_prompt})
|
||||||
|
|
||||||
|
# Add images if provided and model supports vision
|
||||||
|
if images and self._supports_vision(resolved_name):
|
||||||
|
for image_path in images:
|
||||||
|
try:
|
||||||
|
image_part = self._process_image(image_path)
|
||||||
|
if image_part:
|
||||||
|
parts.append(image_part)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to process image {image_path}: {e}")
|
||||||
|
# Continue with other images and text
|
||||||
|
continue
|
||||||
|
elif images and not self._supports_vision(resolved_name):
|
||||||
|
logger.warning(f"Model {resolved_name} does not support images, ignoring {len(images)} image(s)")
|
||||||
|
|
||||||
|
# Create contents structure
|
||||||
|
contents = [{"parts": parts}]
|
||||||
|
|
||||||
# Prepare generation config
|
# Prepare generation config
|
||||||
generation_config = types.GenerateContentConfig(
|
generation_config = types.GenerateContentConfig(
|
||||||
temperature=temperature,
|
temperature=temperature,
|
||||||
@@ -139,7 +170,7 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
# Generate content
|
# Generate content
|
||||||
response = self.client.models.generate_content(
|
response = self.client.models.generate_content(
|
||||||
model=resolved_name,
|
model=resolved_name,
|
||||||
contents=full_prompt,
|
contents=contents,
|
||||||
config=generation_config,
|
config=generation_config,
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -274,3 +305,51 @@ class GeminiModelProvider(ModelProvider):
|
|||||||
usage["total_tokens"] = usage["input_tokens"] + usage["output_tokens"]
|
usage["total_tokens"] = usage["input_tokens"] + usage["output_tokens"]
|
||||||
|
|
||||||
return usage
|
return usage
|
||||||
|
|
||||||
|
def _supports_vision(self, model_name: str) -> bool:
|
||||||
|
"""Check if the model supports vision (image processing)."""
|
||||||
|
# Gemini 2.5 models support vision
|
||||||
|
vision_models = {
|
||||||
|
"gemini-2.5-flash-preview-05-20",
|
||||||
|
"gemini-2.5-pro-preview-06-05",
|
||||||
|
"gemini-2.0-flash",
|
||||||
|
"gemini-1.5-pro",
|
||||||
|
"gemini-1.5-flash",
|
||||||
|
}
|
||||||
|
return model_name in vision_models
|
||||||
|
|
||||||
|
def _process_image(self, image_path: str) -> Optional[dict]:
|
||||||
|
"""Process an image for Gemini API."""
|
||||||
|
try:
|
||||||
|
if image_path.startswith("data:image/"):
|
||||||
|
# Handle data URL: data:image/png;base64,iVBORw0...
|
||||||
|
header, data = image_path.split(",", 1)
|
||||||
|
mime_type = header.split(";")[0].split(":")[1]
|
||||||
|
return {"inline_data": {"mime_type": mime_type, "data": data}}
|
||||||
|
else:
|
||||||
|
# Handle file path - translate for Docker environment
|
||||||
|
from utils.file_types import get_image_mime_type
|
||||||
|
from utils.file_utils import translate_path_for_environment
|
||||||
|
|
||||||
|
translated_path = translate_path_for_environment(image_path)
|
||||||
|
logger.debug(f"Translated image path from '{image_path}' to '{translated_path}'")
|
||||||
|
|
||||||
|
if not os.path.exists(translated_path):
|
||||||
|
logger.warning(f"Image file not found: {translated_path} (original: {image_path})")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Use translated path for all subsequent operations
|
||||||
|
image_path = translated_path
|
||||||
|
|
||||||
|
# Detect MIME type from file extension using centralized mappings
|
||||||
|
ext = os.path.splitext(image_path)[1].lower()
|
||||||
|
mime_type = get_image_mime_type(ext)
|
||||||
|
|
||||||
|
# Read and encode the image
|
||||||
|
with open(image_path, "rb") as f:
|
||||||
|
image_data = base64.b64encode(f.read()).decode()
|
||||||
|
|
||||||
|
return {"inline_data": {"mime_type": mime_type, "data": image_data}}
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing image {image_path}: {e}")
|
||||||
|
return None
|
||||||
|
|||||||
@@ -23,22 +23,38 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
"o3": {
|
"o3": {
|
||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # O3 models support vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
},
|
},
|
||||||
"o3-mini": {
|
"o3-mini": {
|
||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # O3 models support vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
},
|
},
|
||||||
"o3-pro": {
|
"o3-pro": {
|
||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # O3 models support vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
},
|
},
|
||||||
"o4-mini": {
|
"o4-mini": {
|
||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # O4 models support vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
},
|
},
|
||||||
"o4-mini-high": {
|
"o4-mini-high": {
|
||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # O4 models support vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
|
},
|
||||||
|
"gpt-4.1-2025-04-14": {
|
||||||
|
"context_window": 1_000_000, # 1M tokens
|
||||||
|
"supports_extended_thinking": False,
|
||||||
|
"supports_images": True, # GPT-4.1 supports vision
|
||||||
|
"max_image_size_mb": 20.0, # 20MB per OpenAI docs
|
||||||
},
|
},
|
||||||
# Shorthands
|
# Shorthands
|
||||||
"mini": "o4-mini", # Default 'mini' to latest mini model
|
"mini": "o4-mini", # Default 'mini' to latest mini model
|
||||||
@@ -46,6 +62,7 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
"o4mini": "o4-mini",
|
"o4mini": "o4-mini",
|
||||||
"o4minihigh": "o4-mini-high",
|
"o4minihigh": "o4-mini-high",
|
||||||
"o4minihi": "o4-mini-high",
|
"o4minihi": "o4-mini-high",
|
||||||
|
"gpt4.1": "gpt-4.1-2025-04-14",
|
||||||
}
|
}
|
||||||
|
|
||||||
def __init__(self, api_key: str, **kwargs):
|
def __init__(self, api_key: str, **kwargs):
|
||||||
@@ -76,7 +93,7 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
# O3 and O4 reasoning models only support temperature=1.0
|
# O3 and O4 reasoning models only support temperature=1.0
|
||||||
temp_constraint = FixedTemperatureConstraint(1.0)
|
temp_constraint = FixedTemperatureConstraint(1.0)
|
||||||
else:
|
else:
|
||||||
# Other OpenAI models support 0.0-2.0 range
|
# Other OpenAI models (including GPT-4.1) support 0.0-2.0 range
|
||||||
temp_constraint = RangeTemperatureConstraint(0.0, 2.0, 0.7)
|
temp_constraint = RangeTemperatureConstraint(0.0, 2.0, 0.7)
|
||||||
|
|
||||||
return ModelCapabilities(
|
return ModelCapabilities(
|
||||||
@@ -88,6 +105,8 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
supports_system_prompts=True,
|
supports_system_prompts=True,
|
||||||
supports_streaming=True,
|
supports_streaming=True,
|
||||||
supports_function_calling=True,
|
supports_function_calling=True,
|
||||||
|
supports_images=config.get("supports_images", False),
|
||||||
|
max_image_size_mb=config.get("max_image_size_mb", 0.0),
|
||||||
temperature_constraint=temp_constraint,
|
temperature_constraint=temp_constraint,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
"""Base class for OpenAI-compatible API providers."""
|
"""Base class for OpenAI-compatible API providers."""
|
||||||
|
|
||||||
|
import base64
|
||||||
import ipaddress
|
import ipaddress
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
@@ -229,6 +230,7 @@ class OpenAICompatibleProvider(ModelProvider):
|
|||||||
system_prompt: Optional[str] = None,
|
system_prompt: Optional[str] = None,
|
||||||
temperature: float = 0.7,
|
temperature: float = 0.7,
|
||||||
max_output_tokens: Optional[int] = None,
|
max_output_tokens: Optional[int] = None,
|
||||||
|
images: Optional[list[str]] = None,
|
||||||
**kwargs,
|
**kwargs,
|
||||||
) -> ModelResponse:
|
) -> ModelResponse:
|
||||||
"""Generate content using the OpenAI-compatible API.
|
"""Generate content using the OpenAI-compatible API.
|
||||||
@@ -255,7 +257,32 @@ class OpenAICompatibleProvider(ModelProvider):
|
|||||||
messages = []
|
messages = []
|
||||||
if system_prompt:
|
if system_prompt:
|
||||||
messages.append({"role": "system", "content": system_prompt})
|
messages.append({"role": "system", "content": system_prompt})
|
||||||
messages.append({"role": "user", "content": prompt})
|
|
||||||
|
# Prepare user message with text and potentially images
|
||||||
|
user_content = []
|
||||||
|
user_content.append({"type": "text", "text": prompt})
|
||||||
|
|
||||||
|
# Add images if provided and model supports vision
|
||||||
|
if images and self._supports_vision(model_name):
|
||||||
|
for image_path in images:
|
||||||
|
try:
|
||||||
|
image_content = self._process_image(image_path)
|
||||||
|
if image_content:
|
||||||
|
user_content.append(image_content)
|
||||||
|
except Exception as e:
|
||||||
|
logging.warning(f"Failed to process image {image_path}: {e}")
|
||||||
|
# Continue with other images and text
|
||||||
|
continue
|
||||||
|
elif images and not self._supports_vision(model_name):
|
||||||
|
logging.warning(f"Model {model_name} does not support images, ignoring {len(images)} image(s)")
|
||||||
|
|
||||||
|
# Add user message
|
||||||
|
if len(user_content) == 1:
|
||||||
|
# Only text content, use simple string format for compatibility
|
||||||
|
messages.append({"role": "user", "content": prompt})
|
||||||
|
else:
|
||||||
|
# Text + images, use content array format
|
||||||
|
messages.append({"role": "user", "content": user_content})
|
||||||
|
|
||||||
# Prepare completion parameters
|
# Prepare completion parameters
|
||||||
completion_params = {
|
completion_params = {
|
||||||
@@ -424,3 +451,66 @@ class OpenAICompatibleProvider(ModelProvider):
|
|||||||
Default is False for OpenAI-compatible providers.
|
Default is False for OpenAI-compatible providers.
|
||||||
"""
|
"""
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
def _supports_vision(self, model_name: str) -> bool:
|
||||||
|
"""Check if the model supports vision (image processing).
|
||||||
|
|
||||||
|
Default implementation for OpenAI-compatible providers.
|
||||||
|
Subclasses should override with specific model support.
|
||||||
|
"""
|
||||||
|
# Common vision-capable models - only include models that actually support images
|
||||||
|
vision_models = {
|
||||||
|
"gpt-4o",
|
||||||
|
"gpt-4o-mini",
|
||||||
|
"gpt-4-turbo",
|
||||||
|
"gpt-4-vision-preview",
|
||||||
|
"gpt-4.1-2025-04-14", # GPT-4.1 supports vision
|
||||||
|
"o3",
|
||||||
|
"o3-mini",
|
||||||
|
"o3-pro",
|
||||||
|
"o4-mini",
|
||||||
|
"o4-mini-high",
|
||||||
|
# Note: Claude models would be handled by a separate provider
|
||||||
|
}
|
||||||
|
supports = model_name.lower() in vision_models
|
||||||
|
logging.debug(f"Model '{model_name}' vision support: {supports}")
|
||||||
|
return supports
|
||||||
|
|
||||||
|
def _process_image(self, image_path: str) -> Optional[dict]:
|
||||||
|
"""Process an image for OpenAI-compatible API."""
|
||||||
|
try:
|
||||||
|
if image_path.startswith("data:image/"):
|
||||||
|
# Handle data URL: data:image/png;base64,iVBORw0...
|
||||||
|
return {"type": "image_url", "image_url": {"url": image_path}}
|
||||||
|
else:
|
||||||
|
# Handle file path - translate for Docker environment
|
||||||
|
from utils.file_utils import translate_path_for_environment
|
||||||
|
|
||||||
|
translated_path = translate_path_for_environment(image_path)
|
||||||
|
logging.debug(f"Translated image path from '{image_path}' to '{translated_path}'")
|
||||||
|
|
||||||
|
if not os.path.exists(translated_path):
|
||||||
|
logging.warning(f"Image file not found: {translated_path} (original: {image_path})")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Use translated path for all subsequent operations
|
||||||
|
image_path = translated_path
|
||||||
|
|
||||||
|
# Detect MIME type from file extension using centralized mappings
|
||||||
|
from utils.file_types import get_image_mime_type
|
||||||
|
|
||||||
|
ext = os.path.splitext(image_path)[1].lower()
|
||||||
|
mime_type = get_image_mime_type(ext)
|
||||||
|
logging.debug(f"Processing image '{image_path}' with extension '{ext}' as MIME type '{mime_type}'")
|
||||||
|
|
||||||
|
# Read and encode the image
|
||||||
|
with open(image_path, "rb") as f:
|
||||||
|
image_data = base64.b64encode(f.read()).decode()
|
||||||
|
|
||||||
|
# Create data URL for OpenAI API
|
||||||
|
data_url = f"data:{mime_type};base64,{image_data}"
|
||||||
|
|
||||||
|
return {"type": "image_url", "image_url": {"url": data_url}}
|
||||||
|
except Exception as e:
|
||||||
|
logging.error(f"Error processing image {image_path}: {e}")
|
||||||
|
return None
|
||||||
|
|||||||
@@ -23,6 +23,8 @@ class OpenRouterModelConfig:
|
|||||||
supports_streaming: bool = True
|
supports_streaming: bool = True
|
||||||
supports_function_calling: bool = False
|
supports_function_calling: bool = False
|
||||||
supports_json_mode: bool = False
|
supports_json_mode: bool = False
|
||||||
|
supports_images: bool = False # Whether model can process images
|
||||||
|
max_image_size_mb: float = 0.0 # Maximum total size for all images in MB
|
||||||
is_custom: bool = False # True for models that should only be used with custom endpoints
|
is_custom: bool = False # True for models that should only be used with custom endpoints
|
||||||
description: str = ""
|
description: str = ""
|
||||||
|
|
||||||
@@ -37,6 +39,8 @@ class OpenRouterModelConfig:
|
|||||||
supports_system_prompts=self.supports_system_prompts,
|
supports_system_prompts=self.supports_system_prompts,
|
||||||
supports_streaming=self.supports_streaming,
|
supports_streaming=self.supports_streaming,
|
||||||
supports_function_calling=self.supports_function_calling,
|
supports_function_calling=self.supports_function_calling,
|
||||||
|
supports_images=self.supports_images,
|
||||||
|
max_image_size_mb=self.max_image_size_mb,
|
||||||
temperature_constraint=RangeTemperatureConstraint(0.0, 2.0, 1.0),
|
temperature_constraint=RangeTemperatureConstraint(0.0, 2.0, 1.0),
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -66,7 +70,8 @@ class OpenRouterModelRegistry:
|
|||||||
translated_path = translate_path_for_environment(env_path)
|
translated_path = translate_path_for_environment(env_path)
|
||||||
self.config_path = Path(translated_path)
|
self.config_path = Path(translated_path)
|
||||||
else:
|
else:
|
||||||
# Default to conf/custom_models.json (already in container)
|
# Default to conf/custom_models.json - use relative path from this file
|
||||||
|
# This works both in development and container environments
|
||||||
self.config_path = Path(__file__).parent.parent / "conf" / "custom_models.json"
|
self.config_path = Path(__file__).parent.parent / "conf" / "custom_models.json"
|
||||||
|
|
||||||
# Load configuration
|
# Load configuration
|
||||||
|
|||||||
@@ -24,6 +24,7 @@ from .test_redis_validation import RedisValidationTest
|
|||||||
from .test_refactor_validation import RefactorValidationTest
|
from .test_refactor_validation import RefactorValidationTest
|
||||||
from .test_testgen_validation import TestGenValidationTest
|
from .test_testgen_validation import TestGenValidationTest
|
||||||
from .test_token_allocation_validation import TokenAllocationValidationTest
|
from .test_token_allocation_validation import TokenAllocationValidationTest
|
||||||
|
from .test_vision_capability import VisionCapabilityTest
|
||||||
from .test_xai_models import XAIModelsTest
|
from .test_xai_models import XAIModelsTest
|
||||||
|
|
||||||
# Test registry for dynamic loading
|
# Test registry for dynamic loading
|
||||||
@@ -45,6 +46,7 @@ TEST_REGISTRY = {
|
|||||||
"testgen_validation": TestGenValidationTest,
|
"testgen_validation": TestGenValidationTest,
|
||||||
"refactor_validation": RefactorValidationTest,
|
"refactor_validation": RefactorValidationTest,
|
||||||
"conversation_chain_validation": ConversationChainValidationTest,
|
"conversation_chain_validation": ConversationChainValidationTest,
|
||||||
|
"vision_capability": VisionCapabilityTest,
|
||||||
"xai_models": XAIModelsTest,
|
"xai_models": XAIModelsTest,
|
||||||
# "o3_pro_expensive": O3ProExpensiveTest, # COMMENTED OUT - too expensive to run by default
|
# "o3_pro_expensive": O3ProExpensiveTest, # COMMENTED OUT - too expensive to run by default
|
||||||
}
|
}
|
||||||
@@ -69,6 +71,7 @@ __all__ = [
|
|||||||
"TestGenValidationTest",
|
"TestGenValidationTest",
|
||||||
"RefactorValidationTest",
|
"RefactorValidationTest",
|
||||||
"ConversationChainValidationTest",
|
"ConversationChainValidationTest",
|
||||||
|
"VisionCapabilityTest",
|
||||||
"XAIModelsTest",
|
"XAIModelsTest",
|
||||||
"TEST_REGISTRY",
|
"TEST_REGISTRY",
|
||||||
]
|
]
|
||||||
|
|||||||
163
simulator_tests/test_vision_capability.py
Normal file
163
simulator_tests/test_vision_capability.py
Normal file
@@ -0,0 +1,163 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Vision Capability Test
|
||||||
|
|
||||||
|
Tests vision capability with the chat tool using O3 model:
|
||||||
|
- Test file path image (PNG triangle)
|
||||||
|
- Test base64 data URL image
|
||||||
|
- Use chat tool with O3 model to analyze the images
|
||||||
|
- Verify the model correctly identifies shapes
|
||||||
|
"""
|
||||||
|
|
||||||
|
import base64
|
||||||
|
import os
|
||||||
|
|
||||||
|
from .base_test import BaseSimulatorTest
|
||||||
|
|
||||||
|
|
||||||
|
class VisionCapabilityTest(BaseSimulatorTest):
|
||||||
|
"""Test vision capability with chat tool and O3 model"""
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_name(self) -> str:
|
||||||
|
return "vision_capability"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def test_description(self) -> str:
|
||||||
|
return "Vision capability test with chat tool and O3 model"
|
||||||
|
|
||||||
|
def get_triangle_png_path(self) -> str:
|
||||||
|
"""Get the path to the triangle.png file in tests directory"""
|
||||||
|
# Get the project root and find the triangle.png in tests/
|
||||||
|
current_dir = os.getcwd()
|
||||||
|
triangle_path = os.path.join(current_dir, "tests", "triangle.png")
|
||||||
|
|
||||||
|
if not os.path.exists(triangle_path):
|
||||||
|
raise FileNotFoundError(f"triangle.png not found at {triangle_path}")
|
||||||
|
|
||||||
|
abs_path = os.path.abspath(triangle_path)
|
||||||
|
self.logger.debug(f"Using triangle PNG at host path: {abs_path}")
|
||||||
|
return abs_path
|
||||||
|
|
||||||
|
def create_base64_triangle_data_url(self) -> str:
|
||||||
|
"""Create a base64 data URL from the triangle.png file"""
|
||||||
|
triangle_path = self.get_triangle_png_path()
|
||||||
|
|
||||||
|
with open(triangle_path, "rb") as f:
|
||||||
|
image_data = base64.b64encode(f.read()).decode()
|
||||||
|
|
||||||
|
data_url = f"data:image/png;base64,{image_data}"
|
||||||
|
self.logger.debug(f"Created base64 data URL with {len(image_data)} characters")
|
||||||
|
return data_url
|
||||||
|
|
||||||
|
def run_test(self) -> bool:
|
||||||
|
"""Test vision capability with O3 model"""
|
||||||
|
try:
|
||||||
|
self.logger.info("Test: Vision capability with O3 model")
|
||||||
|
|
||||||
|
# Test 1: File path image
|
||||||
|
self.logger.info(" 1.1: Testing file path image (PNG triangle)")
|
||||||
|
triangle_path = self.get_triangle_png_path()
|
||||||
|
self.logger.info(f" ✅ Using triangle PNG at: {triangle_path}")
|
||||||
|
|
||||||
|
response1, continuation_id = self.call_mcp_tool(
|
||||||
|
"chat",
|
||||||
|
{
|
||||||
|
"prompt": "What shape do you see in this image? Please be specific and only mention the shape name.",
|
||||||
|
"images": [triangle_path],
|
||||||
|
"model": "o3",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response1:
|
||||||
|
self.logger.error("Failed to get response from O3 model for file path test")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check for error indicators first
|
||||||
|
response1_lower = response1.lower()
|
||||||
|
if any(
|
||||||
|
error_phrase in response1_lower
|
||||||
|
for error_phrase in [
|
||||||
|
"don't have access",
|
||||||
|
"cannot see",
|
||||||
|
"no image",
|
||||||
|
"clarification_required",
|
||||||
|
"image you're referring to",
|
||||||
|
"supply the image",
|
||||||
|
"error",
|
||||||
|
]
|
||||||
|
):
|
||||||
|
self.logger.error(f" ❌ O3 model cannot access file path image. Response: {response1[:300]}...")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "triangle" not in response1_lower:
|
||||||
|
self.logger.error(
|
||||||
|
f" ❌ O3 did not identify triangle in file path test. Response: {response1[:200]}..."
|
||||||
|
)
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ O3 correctly identified file path image as triangle")
|
||||||
|
|
||||||
|
# Test 2: Base64 data URL image
|
||||||
|
self.logger.info(" 1.2: Testing base64 data URL image")
|
||||||
|
data_url = self.create_base64_triangle_data_url()
|
||||||
|
|
||||||
|
response2, _ = self.call_mcp_tool(
|
||||||
|
"chat",
|
||||||
|
{
|
||||||
|
"prompt": "What shape do you see in this image? Please be specific and only mention the shape name.",
|
||||||
|
"images": [data_url],
|
||||||
|
"model": "o3",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if not response2:
|
||||||
|
self.logger.error("Failed to get response from O3 model for base64 test")
|
||||||
|
return False
|
||||||
|
|
||||||
|
response2_lower = response2.lower()
|
||||||
|
if any(
|
||||||
|
error_phrase in response2_lower
|
||||||
|
for error_phrase in [
|
||||||
|
"don't have access",
|
||||||
|
"cannot see",
|
||||||
|
"no image",
|
||||||
|
"clarification_required",
|
||||||
|
"image you're referring to",
|
||||||
|
"supply the image",
|
||||||
|
"error",
|
||||||
|
]
|
||||||
|
):
|
||||||
|
self.logger.error(f" ❌ O3 model cannot access base64 image. Response: {response2[:300]}...")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if "triangle" not in response2_lower:
|
||||||
|
self.logger.error(f" ❌ O3 did not identify triangle in base64 test. Response: {response2[:200]}...")
|
||||||
|
return False
|
||||||
|
|
||||||
|
self.logger.info(" ✅ O3 correctly identified base64 image as triangle")
|
||||||
|
|
||||||
|
# Optional: Test continuation with same image
|
||||||
|
if continuation_id:
|
||||||
|
self.logger.info(" 1.3: Testing continuation with same image")
|
||||||
|
response3, _ = self.call_mcp_tool(
|
||||||
|
"chat",
|
||||||
|
{
|
||||||
|
"prompt": "What color is this triangle?",
|
||||||
|
"images": [triangle_path], # Same image should be deduplicated
|
||||||
|
"continuation_id": continuation_id,
|
||||||
|
"model": "o3",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
if response3:
|
||||||
|
self.logger.info(" ✅ Continuation also working correctly")
|
||||||
|
else:
|
||||||
|
self.logger.warning(" ⚠️ Continuation response not received")
|
||||||
|
|
||||||
|
self.logger.info(" ✅ Vision capability test completed successfully")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Vision capability test failed: {e}")
|
||||||
|
return False
|
||||||
@@ -1,126 +0,0 @@
|
|||||||
"""
|
|
||||||
Test /app/ to ./ path translation for standalone mode.
|
|
||||||
|
|
||||||
Tests that internal application paths work in both Docker and standalone modes.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import tempfile
|
|
||||||
from unittest.mock import patch
|
|
||||||
|
|
||||||
from utils.file_utils import translate_path_for_environment
|
|
||||||
|
|
||||||
|
|
||||||
class TestAppPathTranslation:
|
|
||||||
"""Test translation of /app/ paths for different environments."""
|
|
||||||
|
|
||||||
def test_app_path_translation_in_standalone_mode(self):
|
|
||||||
"""Test that /app/ paths are translated to ./ in standalone mode."""
|
|
||||||
|
|
||||||
# Mock standalone environment (no Docker)
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
|
|
||||||
mock_container_workspace.exists.return_value = False
|
|
||||||
|
|
||||||
# Clear WORKSPACE_ROOT to simulate standalone mode
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
|
|
||||||
# Test translation of internal app paths
|
|
||||||
test_cases = [
|
|
||||||
("/app/conf/custom_models.json", "./conf/custom_models.json"),
|
|
||||||
("/app/conf/other_config.json", "./conf/other_config.json"),
|
|
||||||
("/app/logs/app.log", "./logs/app.log"),
|
|
||||||
("/app/data/file.txt", "./data/file.txt"),
|
|
||||||
]
|
|
||||||
|
|
||||||
for input_path, expected_output in test_cases:
|
|
||||||
result = translate_path_for_environment(input_path)
|
|
||||||
assert result == expected_output, f"Expected {expected_output}, got {result}"
|
|
||||||
|
|
||||||
def test_allowed_app_path_unchanged_in_docker_mode(self):
|
|
||||||
"""Test that allowed /app/ paths remain unchanged in Docker mode."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
# Mock Docker environment
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
|
|
||||||
mock_container_workspace.exists.return_value = True
|
|
||||||
mock_container_workspace.__str__.return_value = "/workspace"
|
|
||||||
|
|
||||||
# Set WORKSPACE_ROOT to simulate Docker environment
|
|
||||||
with patch.dict(os.environ, {"WORKSPACE_ROOT": tmpdir}):
|
|
||||||
|
|
||||||
# Only specifically allowed internal app paths should remain unchanged in Docker
|
|
||||||
allowed_path = "/app/conf/custom_models.json"
|
|
||||||
result = translate_path_for_environment(allowed_path)
|
|
||||||
assert (
|
|
||||||
result == allowed_path
|
|
||||||
), f"Docker mode should preserve allowed path {allowed_path}, got {result}"
|
|
||||||
|
|
||||||
def test_non_allowed_app_paths_blocked_in_docker_mode(self):
|
|
||||||
"""Test that non-allowed /app/ paths are blocked in Docker mode."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
# Mock Docker environment
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
|
|
||||||
mock_container_workspace.exists.return_value = True
|
|
||||||
mock_container_workspace.__str__.return_value = "/workspace"
|
|
||||||
|
|
||||||
# Set WORKSPACE_ROOT to simulate Docker environment
|
|
||||||
with patch.dict(os.environ, {"WORKSPACE_ROOT": tmpdir}):
|
|
||||||
|
|
||||||
# Non-allowed internal app paths should be blocked in Docker for security
|
|
||||||
blocked_paths = [
|
|
||||||
"/app/conf/other_config.json",
|
|
||||||
"/app/logs/app.log",
|
|
||||||
"/app/server.py",
|
|
||||||
]
|
|
||||||
|
|
||||||
for blocked_path in blocked_paths:
|
|
||||||
result = translate_path_for_environment(blocked_path)
|
|
||||||
assert result.startswith(
|
|
||||||
"/inaccessible/"
|
|
||||||
), f"Docker mode should block non-allowed path {blocked_path}, got {result}"
|
|
||||||
|
|
||||||
def test_non_app_paths_unchanged_in_standalone(self):
|
|
||||||
"""Test that non-/app/ paths are unchanged in standalone mode."""
|
|
||||||
|
|
||||||
# Mock standalone environment
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
|
|
||||||
mock_container_workspace.exists.return_value = False
|
|
||||||
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
|
|
||||||
# Non-app paths should be unchanged
|
|
||||||
test_cases = [
|
|
||||||
"/home/user/file.py",
|
|
||||||
"/etc/config.conf",
|
|
||||||
"./local/file.txt",
|
|
||||||
"relative/path.py",
|
|
||||||
"/workspace/file.py",
|
|
||||||
]
|
|
||||||
|
|
||||||
for input_path in test_cases:
|
|
||||||
result = translate_path_for_environment(input_path)
|
|
||||||
assert result == input_path, f"Non-app path {input_path} should be unchanged, got {result}"
|
|
||||||
|
|
||||||
def test_edge_cases_in_app_translation(self):
|
|
||||||
"""Test edge cases in /app/ path translation."""
|
|
||||||
|
|
||||||
# Mock standalone environment
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE") as mock_container_workspace:
|
|
||||||
mock_container_workspace.exists.return_value = False
|
|
||||||
|
|
||||||
with patch.dict(os.environ, {}, clear=True):
|
|
||||||
|
|
||||||
# Test edge cases
|
|
||||||
test_cases = [
|
|
||||||
("/app/", "./"), # Root app directory
|
|
||||||
("/app", "/app"), # Exact match without trailing slash - not translated
|
|
||||||
("/app/file", "./file"), # File directly in app
|
|
||||||
("/app//double/slash", "./double/slash"), # Handle double slashes
|
|
||||||
]
|
|
||||||
|
|
||||||
for input_path, expected_output in test_cases:
|
|
||||||
result = translate_path_for_environment(input_path)
|
|
||||||
assert (
|
|
||||||
result == expected_output
|
|
||||||
), f"Edge case {input_path}: expected {expected_output}, got {result}"
|
|
||||||
591
tests/test_image_support_integration.py
Normal file
591
tests/test_image_support_integration.py
Normal file
@@ -0,0 +1,591 @@
|
|||||||
|
"""
|
||||||
|
Integration tests for native image support feature.
|
||||||
|
|
||||||
|
Tests the complete image support pipeline:
|
||||||
|
- Conversation memory integration with images
|
||||||
|
- Tool request validation and schema support
|
||||||
|
- Provider image processing capabilities
|
||||||
|
- Cross-tool image context preservation
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
import uuid
|
||||||
|
from unittest.mock import Mock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from tools.chat import ChatTool
|
||||||
|
from tools.debug import DebugIssueTool
|
||||||
|
from utils.conversation_memory import (
|
||||||
|
ConversationTurn,
|
||||||
|
ThreadContext,
|
||||||
|
add_turn,
|
||||||
|
create_thread,
|
||||||
|
get_conversation_image_list,
|
||||||
|
get_thread,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class TestImageSupportIntegration:
|
||||||
|
"""Integration tests for the complete image support feature."""
|
||||||
|
|
||||||
|
def test_conversation_turn_includes_images(self):
|
||||||
|
"""Test that ConversationTurn can store and track images."""
|
||||||
|
turn = ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Please analyze this diagram",
|
||||||
|
timestamp="2025-01-01T00:00:00Z",
|
||||||
|
files=["code.py"],
|
||||||
|
images=["diagram.png", "flowchart.jpg"],
|
||||||
|
tool_name="chat",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert turn.images == ["diagram.png", "flowchart.jpg"]
|
||||||
|
assert turn.files == ["code.py"]
|
||||||
|
assert turn.content == "Please analyze this diagram"
|
||||||
|
|
||||||
|
def test_get_conversation_image_list_newest_first(self):
|
||||||
|
"""Test that image list prioritizes newest references."""
|
||||||
|
# Create thread context with multiple turns
|
||||||
|
context = ThreadContext(
|
||||||
|
thread_id=str(uuid.uuid4()),
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:00:00Z",
|
||||||
|
tool_name="chat",
|
||||||
|
turns=[
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Turn 1",
|
||||||
|
timestamp="2025-01-01T00:00:00Z",
|
||||||
|
images=["old_diagram.png", "shared.png"],
|
||||||
|
),
|
||||||
|
ConversationTurn(
|
||||||
|
role="assistant", content="Turn 2", timestamp="2025-01-01T01:00:00Z", images=["middle.png"]
|
||||||
|
),
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Turn 3",
|
||||||
|
timestamp="2025-01-01T02:00:00Z",
|
||||||
|
images=["shared.png", "new_diagram.png"], # shared.png appears again
|
||||||
|
),
|
||||||
|
],
|
||||||
|
initial_context={},
|
||||||
|
)
|
||||||
|
|
||||||
|
image_list = get_conversation_image_list(context)
|
||||||
|
|
||||||
|
# Should prioritize newest first, with duplicates removed (newest wins)
|
||||||
|
expected = ["shared.png", "new_diagram.png", "middle.png", "old_diagram.png"]
|
||||||
|
assert image_list == expected
|
||||||
|
|
||||||
|
@patch("utils.conversation_memory.get_redis_client")
|
||||||
|
def test_add_turn_with_images(self, mock_redis):
|
||||||
|
"""Test adding a conversation turn with images."""
|
||||||
|
mock_client = Mock()
|
||||||
|
mock_redis.return_value = mock_client
|
||||||
|
|
||||||
|
# Mock the Redis operations to return success
|
||||||
|
mock_client.set.return_value = True
|
||||||
|
|
||||||
|
thread_id = create_thread("test_tool", {"initial": "context"})
|
||||||
|
|
||||||
|
# Set up initial thread context for add_turn to find
|
||||||
|
initial_context = ThreadContext(
|
||||||
|
thread_id=thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:00:00Z",
|
||||||
|
tool_name="test_tool",
|
||||||
|
turns=[], # Empty initially
|
||||||
|
initial_context={"initial": "context"},
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = initial_context.model_dump_json()
|
||||||
|
|
||||||
|
success = add_turn(
|
||||||
|
thread_id=thread_id,
|
||||||
|
role="user",
|
||||||
|
content="Analyze these screenshots",
|
||||||
|
files=["app.py"],
|
||||||
|
images=["screenshot1.png", "screenshot2.png"],
|
||||||
|
tool_name="debug",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert success
|
||||||
|
|
||||||
|
# Mock thread context for get_thread call
|
||||||
|
updated_context = ThreadContext(
|
||||||
|
thread_id=thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:00:00Z",
|
||||||
|
tool_name="test_tool",
|
||||||
|
turns=[
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Analyze these screenshots",
|
||||||
|
timestamp="2025-01-01T00:00:00Z",
|
||||||
|
files=["app.py"],
|
||||||
|
images=["screenshot1.png", "screenshot2.png"],
|
||||||
|
tool_name="debug",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
initial_context={"initial": "context"},
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = updated_context.model_dump_json()
|
||||||
|
|
||||||
|
# Retrieve and verify the thread
|
||||||
|
context = get_thread(thread_id)
|
||||||
|
assert context is not None
|
||||||
|
assert len(context.turns) == 1
|
||||||
|
|
||||||
|
turn = context.turns[0]
|
||||||
|
assert turn.images == ["screenshot1.png", "screenshot2.png"]
|
||||||
|
assert turn.files == ["app.py"]
|
||||||
|
assert turn.content == "Analyze these screenshots"
|
||||||
|
|
||||||
|
def test_chat_tool_schema_includes_images(self):
|
||||||
|
"""Test that ChatTool schema includes images field."""
|
||||||
|
tool = ChatTool()
|
||||||
|
schema = tool.get_input_schema()
|
||||||
|
|
||||||
|
assert "images" in schema["properties"]
|
||||||
|
images_field = schema["properties"]["images"]
|
||||||
|
assert images_field["type"] == "array"
|
||||||
|
assert images_field["items"]["type"] == "string"
|
||||||
|
assert "visual context" in images_field["description"].lower()
|
||||||
|
|
||||||
|
def test_debug_tool_schema_includes_images(self):
|
||||||
|
"""Test that DebugIssueTool schema includes images field."""
|
||||||
|
tool = DebugIssueTool()
|
||||||
|
schema = tool.get_input_schema()
|
||||||
|
|
||||||
|
assert "images" in schema["properties"]
|
||||||
|
images_field = schema["properties"]["images"]
|
||||||
|
assert images_field["type"] == "array"
|
||||||
|
assert images_field["items"]["type"] == "string"
|
||||||
|
assert "error screens" in images_field["description"].lower()
|
||||||
|
|
||||||
|
def test_tool_image_validation_limits(self):
|
||||||
|
"""Test that tools validate image size limits using real provider resolution."""
|
||||||
|
tool = ChatTool()
|
||||||
|
|
||||||
|
# Create small test images (each 0.5MB, total 1MB)
|
||||||
|
small_images = []
|
||||||
|
for _ in range(2):
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
|
||||||
|
# Write 0.5MB of data
|
||||||
|
temp_file.write(b"\x00" * (512 * 1024))
|
||||||
|
small_images.append(temp_file.name)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Test with a model that should fail (no provider available in test environment)
|
||||||
|
result = tool._validate_image_limits(small_images, "mistral-large")
|
||||||
|
# Should return error because model not available
|
||||||
|
assert result is not None
|
||||||
|
assert result["status"] == "error"
|
||||||
|
assert "does not support image processing" in result["content"]
|
||||||
|
|
||||||
|
# Test that empty/None images always pass regardless of model
|
||||||
|
result = tool._validate_image_limits([], "any-model")
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
result = tool._validate_image_limits(None, "any-model")
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Clean up temp files
|
||||||
|
for img_path in small_images:
|
||||||
|
if os.path.exists(img_path):
|
||||||
|
os.unlink(img_path)
|
||||||
|
|
||||||
|
def test_image_validation_model_specific_limits(self):
|
||||||
|
"""Test that different models have appropriate size limits using real provider resolution."""
|
||||||
|
import importlib
|
||||||
|
|
||||||
|
tool = ChatTool()
|
||||||
|
|
||||||
|
# Test OpenAI O3 model (20MB limit) - Create 15MB image (should pass)
|
||||||
|
small_image_path = None
|
||||||
|
large_image_path = None
|
||||||
|
|
||||||
|
# Save original environment
|
||||||
|
original_env = {
|
||||||
|
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
|
||||||
|
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Create 15MB image (under 20MB O3 limit)
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
|
||||||
|
temp_file.write(b"\x00" * (15 * 1024 * 1024)) # 15MB
|
||||||
|
small_image_path = temp_file.name
|
||||||
|
|
||||||
|
# Set up environment for OpenAI provider
|
||||||
|
os.environ["OPENAI_API_KEY"] = "test-key-o3-validation-test-not-real"
|
||||||
|
os.environ["DEFAULT_MODEL"] = "o3"
|
||||||
|
|
||||||
|
# Clear other provider keys to isolate to OpenAI
|
||||||
|
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
import config
|
||||||
|
|
||||||
|
importlib.reload(config)
|
||||||
|
from providers.registry import ModelProviderRegistry
|
||||||
|
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
result = tool._validate_image_limits([small_image_path], "o3")
|
||||||
|
assert result is None # Should pass (15MB < 20MB limit)
|
||||||
|
|
||||||
|
# Create 25MB image (over 20MB O3 limit)
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
|
||||||
|
temp_file.write(b"\x00" * (25 * 1024 * 1024)) # 25MB
|
||||||
|
large_image_path = temp_file.name
|
||||||
|
|
||||||
|
result = tool._validate_image_limits([large_image_path], "o3")
|
||||||
|
assert result is not None # Should fail (25MB > 20MB limit)
|
||||||
|
assert result["status"] == "error"
|
||||||
|
assert "Image size limit exceeded" in result["content"]
|
||||||
|
assert "20.0MB" in result["content"] # O3 limit
|
||||||
|
assert "25.0MB" in result["content"] # Provided size
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Clean up temp files
|
||||||
|
if small_image_path and os.path.exists(small_image_path):
|
||||||
|
os.unlink(small_image_path)
|
||||||
|
if large_image_path and os.path.exists(large_image_path):
|
||||||
|
os.unlink(large_image_path)
|
||||||
|
|
||||||
|
# Restore environment
|
||||||
|
for key, value in original_env.items():
|
||||||
|
if value is not None:
|
||||||
|
os.environ[key] = value
|
||||||
|
else:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
importlib.reload(config)
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_chat_tool_execution_with_images(self):
|
||||||
|
"""Test that ChatTool can execute with images parameter using real provider resolution."""
|
||||||
|
import importlib
|
||||||
|
|
||||||
|
# Create a temporary image file for testing
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as temp_file:
|
||||||
|
# Write a simple PNG header (minimal valid PNG)
|
||||||
|
png_header = b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\rIDATx\x9cc\x00\x01\x00\x00\x05\x00\x01\r\n-\xdb\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||||
|
temp_file.write(png_header)
|
||||||
|
temp_image_path = temp_file.name
|
||||||
|
|
||||||
|
# Save original environment
|
||||||
|
original_env = {
|
||||||
|
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
|
||||||
|
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Set up environment for real provider resolution
|
||||||
|
os.environ["OPENAI_API_KEY"] = "sk-test-key-images-test-not-real"
|
||||||
|
os.environ["DEFAULT_MODEL"] = "gpt-4o"
|
||||||
|
|
||||||
|
# Clear other provider keys to isolate to OpenAI
|
||||||
|
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
import config
|
||||||
|
|
||||||
|
importlib.reload(config)
|
||||||
|
from providers.registry import ModelProviderRegistry
|
||||||
|
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
tool = ChatTool()
|
||||||
|
|
||||||
|
# Test with real provider resolution
|
||||||
|
try:
|
||||||
|
result = await tool.execute(
|
||||||
|
{"prompt": "What do you see in this image?", "images": [temp_image_path], "model": "gpt-4o"}
|
||||||
|
)
|
||||||
|
|
||||||
|
# If we get here, check the response format
|
||||||
|
assert len(result) == 1
|
||||||
|
# Should be a valid JSON response
|
||||||
|
output = json.loads(result[0].text)
|
||||||
|
assert "status" in output
|
||||||
|
# Test passed - provider accepted images parameter
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
# Expected: API call will fail with fake key
|
||||||
|
error_msg = str(e)
|
||||||
|
# Should NOT be a mock-related error
|
||||||
|
assert "MagicMock" not in error_msg
|
||||||
|
assert "'<' not supported between instances" not in error_msg
|
||||||
|
|
||||||
|
# Should be a real provider error (API key or network)
|
||||||
|
assert any(
|
||||||
|
phrase in error_msg
|
||||||
|
for phrase in ["API", "key", "authentication", "provider", "network", "connection", "401", "403"]
|
||||||
|
)
|
||||||
|
# Test passed - provider processed images parameter before failing on auth
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Clean up temp file
|
||||||
|
os.unlink(temp_image_path)
|
||||||
|
|
||||||
|
# Restore environment
|
||||||
|
for key, value in original_env.items():
|
||||||
|
if value is not None:
|
||||||
|
os.environ[key] = value
|
||||||
|
else:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
importlib.reload(config)
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
@patch("utils.conversation_memory.get_redis_client")
|
||||||
|
def test_cross_tool_image_context_preservation(self, mock_redis):
|
||||||
|
"""Test that images are preserved across different tools in conversation."""
|
||||||
|
mock_client = Mock()
|
||||||
|
mock_redis.return_value = mock_client
|
||||||
|
|
||||||
|
# Mock the Redis operations to return success
|
||||||
|
mock_client.set.return_value = True
|
||||||
|
|
||||||
|
# Create initial thread with chat tool
|
||||||
|
thread_id = create_thread("chat", {"initial": "context"})
|
||||||
|
|
||||||
|
# Set up initial thread context for add_turn to find
|
||||||
|
initial_context = ThreadContext(
|
||||||
|
thread_id=thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:00:00Z",
|
||||||
|
tool_name="chat",
|
||||||
|
turns=[], # Empty initially
|
||||||
|
initial_context={"initial": "context"},
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = initial_context.model_dump_json()
|
||||||
|
|
||||||
|
# Add turn with images from chat tool
|
||||||
|
add_turn(
|
||||||
|
thread_id=thread_id,
|
||||||
|
role="user",
|
||||||
|
content="Here's my UI design",
|
||||||
|
images=["design.png", "mockup.jpg"],
|
||||||
|
tool_name="chat",
|
||||||
|
)
|
||||||
|
|
||||||
|
add_turn(
|
||||||
|
thread_id=thread_id, role="assistant", content="I can see your design. It looks good!", tool_name="chat"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add turn with different images from debug tool
|
||||||
|
add_turn(
|
||||||
|
thread_id=thread_id,
|
||||||
|
role="user",
|
||||||
|
content="Now I'm getting this error",
|
||||||
|
images=["error_screen.png"],
|
||||||
|
files=["error.log"],
|
||||||
|
tool_name="debug",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mock complete thread context for get_thread call
|
||||||
|
complete_context = ThreadContext(
|
||||||
|
thread_id=thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:05:00Z",
|
||||||
|
tool_name="chat",
|
||||||
|
turns=[
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Here's my UI design",
|
||||||
|
timestamp="2025-01-01T00:01:00Z",
|
||||||
|
images=["design.png", "mockup.jpg"],
|
||||||
|
tool_name="chat",
|
||||||
|
),
|
||||||
|
ConversationTurn(
|
||||||
|
role="assistant",
|
||||||
|
content="I can see your design. It looks good!",
|
||||||
|
timestamp="2025-01-01T00:02:00Z",
|
||||||
|
tool_name="chat",
|
||||||
|
),
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Now I'm getting this error",
|
||||||
|
timestamp="2025-01-01T00:03:00Z",
|
||||||
|
images=["error_screen.png"],
|
||||||
|
files=["error.log"],
|
||||||
|
tool_name="debug",
|
||||||
|
),
|
||||||
|
],
|
||||||
|
initial_context={"initial": "context"},
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = complete_context.model_dump_json()
|
||||||
|
|
||||||
|
# Retrieve thread and check image preservation
|
||||||
|
context = get_thread(thread_id)
|
||||||
|
assert context is not None
|
||||||
|
|
||||||
|
# Get conversation image list (should prioritize newest first)
|
||||||
|
image_list = get_conversation_image_list(context)
|
||||||
|
expected = ["error_screen.png", "design.png", "mockup.jpg"]
|
||||||
|
assert image_list == expected
|
||||||
|
|
||||||
|
# Verify each turn has correct images
|
||||||
|
assert context.turns[0].images == ["design.png", "mockup.jpg"]
|
||||||
|
assert context.turns[1].images is None # Assistant turn without images
|
||||||
|
assert context.turns[2].images == ["error_screen.png"]
|
||||||
|
|
||||||
|
def test_tool_request_base_class_has_images(self):
|
||||||
|
"""Test that base ToolRequest class includes images field."""
|
||||||
|
from tools.base import ToolRequest
|
||||||
|
|
||||||
|
# Create request with images
|
||||||
|
request = ToolRequest(images=["test.png", "test2.jpg"])
|
||||||
|
assert request.images == ["test.png", "test2.jpg"]
|
||||||
|
|
||||||
|
# Test default value
|
||||||
|
request_no_images = ToolRequest()
|
||||||
|
assert request_no_images.images is None
|
||||||
|
|
||||||
|
def test_data_url_image_format_support(self):
|
||||||
|
"""Test that tools can handle data URL format images."""
|
||||||
|
import importlib
|
||||||
|
|
||||||
|
tool = ChatTool()
|
||||||
|
|
||||||
|
# Test with data URL (base64 encoded 1x1 transparent PNG)
|
||||||
|
data_url = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
|
||||||
|
images = [data_url]
|
||||||
|
|
||||||
|
# Save original environment
|
||||||
|
original_env = {
|
||||||
|
"OPENAI_API_KEY": os.environ.get("OPENAI_API_KEY"),
|
||||||
|
"DEFAULT_MODEL": os.environ.get("DEFAULT_MODEL"),
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Set up environment for OpenAI provider
|
||||||
|
os.environ["OPENAI_API_KEY"] = "test-key-data-url-test-not-real"
|
||||||
|
os.environ["DEFAULT_MODEL"] = "o3"
|
||||||
|
|
||||||
|
# Clear other provider keys to isolate to OpenAI
|
||||||
|
for key in ["GEMINI_API_KEY", "XAI_API_KEY", "OPENROUTER_API_KEY"]:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
import config
|
||||||
|
|
||||||
|
importlib.reload(config)
|
||||||
|
from providers.registry import ModelProviderRegistry
|
||||||
|
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
# Use a model that should be available - o3 from OpenAI
|
||||||
|
result = tool._validate_image_limits(images, "o3")
|
||||||
|
assert result is None # Small data URL should pass validation
|
||||||
|
|
||||||
|
# Also test with a non-vision model to ensure validation works
|
||||||
|
result = tool._validate_image_limits(images, "mistral-large")
|
||||||
|
# This should fail because model not available with current setup
|
||||||
|
assert result is not None
|
||||||
|
assert result["status"] == "error"
|
||||||
|
assert "does not support image processing" in result["content"]
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Restore environment
|
||||||
|
for key, value in original_env.items():
|
||||||
|
if value is not None:
|
||||||
|
os.environ[key] = value
|
||||||
|
else:
|
||||||
|
os.environ.pop(key, None)
|
||||||
|
|
||||||
|
# Reload config and clear registry
|
||||||
|
importlib.reload(config)
|
||||||
|
ModelProviderRegistry._instance = None
|
||||||
|
|
||||||
|
def test_empty_images_handling(self):
|
||||||
|
"""Test that tools handle empty images lists gracefully."""
|
||||||
|
tool = ChatTool()
|
||||||
|
|
||||||
|
# Empty list should not fail validation (no need for provider setup)
|
||||||
|
result = tool._validate_image_limits([], "test_model")
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
# None should not fail validation (no need for provider setup)
|
||||||
|
result = tool._validate_image_limits(None, "test_model")
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
@patch("utils.conversation_memory.get_redis_client")
|
||||||
|
def test_conversation_memory_thread_chaining_with_images(self, mock_redis):
|
||||||
|
"""Test that images work correctly with conversation thread chaining."""
|
||||||
|
mock_client = Mock()
|
||||||
|
mock_redis.return_value = mock_client
|
||||||
|
|
||||||
|
# Mock the Redis operations to return success
|
||||||
|
mock_client.set.return_value = True
|
||||||
|
|
||||||
|
# Create parent thread with images
|
||||||
|
parent_thread_id = create_thread("chat", {"parent": "context"})
|
||||||
|
|
||||||
|
# Set up initial parent thread context for add_turn to find
|
||||||
|
parent_context = ThreadContext(
|
||||||
|
thread_id=parent_thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:00:00Z",
|
||||||
|
tool_name="chat",
|
||||||
|
turns=[], # Empty initially
|
||||||
|
initial_context={"parent": "context"},
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = parent_context.model_dump_json()
|
||||||
|
add_turn(
|
||||||
|
thread_id=parent_thread_id,
|
||||||
|
role="user",
|
||||||
|
content="Parent thread with images",
|
||||||
|
images=["parent1.png", "shared.png"],
|
||||||
|
tool_name="chat",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create child thread linked to parent
|
||||||
|
child_thread_id = create_thread("debug", {"child": "context"}, parent_thread_id=parent_thread_id)
|
||||||
|
add_turn(
|
||||||
|
thread_id=child_thread_id,
|
||||||
|
role="user",
|
||||||
|
content="Child thread with more images",
|
||||||
|
images=["child1.png", "shared.png"], # shared.png appears again (should prioritize newer)
|
||||||
|
tool_name="debug",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mock child thread context for get_thread call
|
||||||
|
child_context = ThreadContext(
|
||||||
|
thread_id=child_thread_id,
|
||||||
|
created_at="2025-01-01T00:00:00Z",
|
||||||
|
last_updated_at="2025-01-01T00:02:00Z",
|
||||||
|
tool_name="debug",
|
||||||
|
turns=[
|
||||||
|
ConversationTurn(
|
||||||
|
role="user",
|
||||||
|
content="Child thread with more images",
|
||||||
|
timestamp="2025-01-01T00:02:00Z",
|
||||||
|
images=["child1.png", "shared.png"],
|
||||||
|
tool_name="debug",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
initial_context={"child": "context"},
|
||||||
|
parent_thread_id=parent_thread_id,
|
||||||
|
)
|
||||||
|
mock_client.get.return_value = child_context.model_dump_json()
|
||||||
|
|
||||||
|
# Get child thread and verify image collection works across chain
|
||||||
|
child_context = get_thread(child_thread_id)
|
||||||
|
assert child_context is not None
|
||||||
|
assert child_context.parent_thread_id == parent_thread_id
|
||||||
|
|
||||||
|
# Test image collection for child thread only
|
||||||
|
child_images = get_conversation_image_list(child_context)
|
||||||
|
assert child_images == ["child1.png", "shared.png"]
|
||||||
@@ -1,290 +0,0 @@
|
|||||||
"""
|
|
||||||
Integration tests for internal application configuration file access.
|
|
||||||
|
|
||||||
These tests verify that:
|
|
||||||
1. Specific internal config files are accessible (exact path matching)
|
|
||||||
2. Path variations and traversal attempts are blocked (security)
|
|
||||||
3. The OpenRouter model configuration loads properly
|
|
||||||
4. Normal workspace file operations continue to work
|
|
||||||
|
|
||||||
This follows the established testing patterns from test_docker_path_integration.py
|
|
||||||
by using actual file operations and module reloading instead of mocks.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import importlib
|
|
||||||
import os
|
|
||||||
import tempfile
|
|
||||||
from pathlib import Path
|
|
||||||
from unittest.mock import patch
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
from utils.file_utils import translate_path_for_environment
|
|
||||||
|
|
||||||
|
|
||||||
class TestInternalConfigFileAccess:
|
|
||||||
"""Test access to internal application configuration files."""
|
|
||||||
|
|
||||||
def test_allowed_internal_config_file_access(self):
|
|
||||||
"""Test that the specific internal config file is accessible."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
# Set up Docker-like environment
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
# Reload modules to pick up environment
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
# Test with Docker environment simulation
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# The exact allowed path should pass through unchanged
|
|
||||||
result = translate_path_for_environment("/app/conf/custom_models.json")
|
|
||||||
assert result == "/app/conf/custom_models.json"
|
|
||||||
|
|
||||||
finally:
|
|
||||||
# Restore environment
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
def test_blocked_config_file_variations(self):
|
|
||||||
"""Test that variations of the config file path are blocked."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# Test blocked variations - these should return inaccessible paths
|
|
||||||
blocked_paths = [
|
|
||||||
"/app/conf/", # Directory
|
|
||||||
"/app/conf/other_file.json", # Different file
|
|
||||||
"/app/conf/custom_models.json.backup", # Extra extension
|
|
||||||
"/app/conf/custom_models.txt", # Different extension
|
|
||||||
"/app/conf/../server.py", # Path traversal
|
|
||||||
"/app/server.py", # Application code
|
|
||||||
"/etc/passwd", # System file
|
|
||||||
]
|
|
||||||
|
|
||||||
for path in blocked_paths:
|
|
||||||
result = translate_path_for_environment(path)
|
|
||||||
assert result.startswith("/inaccessible/"), f"Path {path} should be blocked but got: {result}"
|
|
||||||
|
|
||||||
finally:
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
def test_workspace_files_continue_to_work(self):
|
|
||||||
"""Test that normal workspace file operations are unaffected."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
# Create a test file in the workspace
|
|
||||||
test_file = host_workspace / "src" / "test.py"
|
|
||||||
test_file.parent.mkdir(parents=True)
|
|
||||||
test_file.write_text("# test file")
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# Normal workspace file should translate correctly
|
|
||||||
result = translate_path_for_environment(str(test_file))
|
|
||||||
expected = str(container_workspace / "src" / "test.py")
|
|
||||||
assert result == expected
|
|
||||||
|
|
||||||
finally:
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
def test_openrouter_config_loading_real_world(self):
|
|
||||||
"""Test that OpenRouter configuration loading works in real container environment."""
|
|
||||||
|
|
||||||
# This test validates that our fix works in the actual Docker environment
|
|
||||||
# by checking that the translate_path_for_environment function handles
|
|
||||||
# the exact internal config path correctly
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# Test that the function correctly handles the config path
|
|
||||||
result = translate_path_for_environment("/app/conf/custom_models.json")
|
|
||||||
|
|
||||||
# The path should pass through unchanged (not be blocked)
|
|
||||||
assert result == "/app/conf/custom_models.json"
|
|
||||||
|
|
||||||
# Verify it's not marked as inaccessible
|
|
||||||
assert not result.startswith("/inaccessible/")
|
|
||||||
|
|
||||||
finally:
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
def test_security_boundary_comprehensive(self):
|
|
||||||
"""Comprehensive test of all security boundaries in Docker environment."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
# Create a workspace file for testing
|
|
||||||
workspace_file = host_workspace / "project" / "main.py"
|
|
||||||
workspace_file.parent.mkdir(parents=True)
|
|
||||||
workspace_file.write_text("# workspace file")
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# Test cases: (path, should_be_allowed, description)
|
|
||||||
test_cases = [
|
|
||||||
# Allowed cases
|
|
||||||
("/app/conf/custom_models.json", True, "Exact allowed internal config"),
|
|
||||||
(str(workspace_file), True, "Workspace file"),
|
|
||||||
(str(container_workspace / "existing.py"), True, "Container path"),
|
|
||||||
# Blocked cases
|
|
||||||
("/app/conf/", False, "Directory access"),
|
|
||||||
("/app/conf/other.json", False, "Different config file"),
|
|
||||||
("/app/conf/custom_models.json.backup", False, "Config with extra extension"),
|
|
||||||
("/app/server.py", False, "Application source"),
|
|
||||||
("/etc/passwd", False, "System file"),
|
|
||||||
("../../../etc/passwd", False, "Relative path traversal"),
|
|
||||||
("/app/conf/../server.py", False, "Path traversal through config dir"),
|
|
||||||
]
|
|
||||||
|
|
||||||
for path, should_be_allowed, description in test_cases:
|
|
||||||
result = translate_path_for_environment(path)
|
|
||||||
|
|
||||||
if should_be_allowed:
|
|
||||||
# Should either pass through unchanged or translate to container path
|
|
||||||
assert not result.startswith(
|
|
||||||
"/inaccessible/"
|
|
||||||
), f"{description}: {path} should be allowed but was blocked"
|
|
||||||
else:
|
|
||||||
# Should be blocked with inaccessible path
|
|
||||||
assert result.startswith(
|
|
||||||
"/inaccessible/"
|
|
||||||
), f"{description}: {path} should be blocked but got: {result}"
|
|
||||||
|
|
||||||
finally:
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
def test_exact_path_matching_prevents_wildcards(self):
|
|
||||||
"""Test that using exact path matching prevents any wildcard-like behavior."""
|
|
||||||
|
|
||||||
with tempfile.TemporaryDirectory() as tmpdir:
|
|
||||||
host_workspace = Path(tmpdir) / "host_workspace"
|
|
||||||
host_workspace.mkdir()
|
|
||||||
container_workspace = Path(tmpdir) / "container_workspace"
|
|
||||||
container_workspace.mkdir()
|
|
||||||
|
|
||||||
original_env = os.environ.copy()
|
|
||||||
try:
|
|
||||||
os.environ["WORKSPACE_ROOT"] = str(host_workspace)
|
|
||||||
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
with patch("utils.file_utils.CONTAINER_WORKSPACE", container_workspace):
|
|
||||||
# Even subtle variations should be blocked
|
|
||||||
subtle_variations = [
|
|
||||||
"/app/conf/custom_models.jsonx", # Extra char
|
|
||||||
"/app/conf/custom_models.jso", # Missing char
|
|
||||||
"/app/conf/custom_models.JSON", # Different case
|
|
||||||
"/app/conf/custom_models.json ", # Trailing space
|
|
||||||
" /app/conf/custom_models.json", # Leading space
|
|
||||||
"/app/conf/./custom_models.json", # Current dir reference
|
|
||||||
"/app/conf/subdir/../custom_models.json", # Up and down
|
|
||||||
]
|
|
||||||
|
|
||||||
for variation in subtle_variations:
|
|
||||||
result = translate_path_for_environment(variation)
|
|
||||||
assert result.startswith(
|
|
||||||
"/inaccessible/"
|
|
||||||
), f"Variation {variation} should be blocked but got: {result}"
|
|
||||||
|
|
||||||
finally:
|
|
||||||
os.environ.clear()
|
|
||||||
os.environ.update(original_env)
|
|
||||||
import utils.security_config
|
|
||||||
|
|
||||||
importlib.reload(utils.security_config)
|
|
||||||
importlib.reload(utils.file_utils)
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
pytest.main([__file__, "-v"])
|
|
||||||
BIN
tests/triangle.png
Normal file
BIN
tests/triangle.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 51 KiB |
@@ -87,7 +87,13 @@ class AnalyzeTool(BaseTool):
|
|||||||
},
|
},
|
||||||
"use_websearch": {
|
"use_websearch": {
|
||||||
"type": "boolean",
|
"type": "boolean",
|
||||||
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.",
|
"description": (
|
||||||
|
"Enable web search for documentation, best practices, and current information. "
|
||||||
|
"Particularly useful for: brainstorming sessions, architectural design discussions, "
|
||||||
|
"exploring industry best practices, working with specific frameworks/technologies, "
|
||||||
|
"researching solutions to complex problems, or when current documentation and "
|
||||||
|
"community insights would enhance the analysis."
|
||||||
|
),
|
||||||
"default": True,
|
"default": True,
|
||||||
},
|
},
|
||||||
"continuation_id": {
|
"continuation_id": {
|
||||||
|
|||||||
162
tools/base.py
162
tools/base.py
@@ -27,6 +27,7 @@ if TYPE_CHECKING:
|
|||||||
|
|
||||||
from config import MCP_PROMPT_SIZE_LIMIT
|
from config import MCP_PROMPT_SIZE_LIMIT
|
||||||
from providers import ModelProvider, ModelProviderRegistry
|
from providers import ModelProvider, ModelProviderRegistry
|
||||||
|
from providers.base import ProviderType
|
||||||
from utils import check_token_limit
|
from utils import check_token_limit
|
||||||
from utils.conversation_memory import (
|
from utils.conversation_memory import (
|
||||||
MAX_CONVERSATION_TURNS,
|
MAX_CONVERSATION_TURNS,
|
||||||
@@ -84,6 +85,17 @@ class ToolRequest(BaseModel):
|
|||||||
"additional findings, or answers to follow-up questions. Can be used across different tools."
|
"additional findings, or answers to follow-up questions. Can be used across different tools."
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
None,
|
||||||
|
description=(
|
||||||
|
"Optional image(s) for visual context. Accepts absolute file paths or "
|
||||||
|
"base64 data URLs. Only provide when user explicitly mentions images. "
|
||||||
|
"When including images, please describe what you believe each image contains "
|
||||||
|
"(e.g., 'screenshot of error dialog', 'architecture diagram', 'code snippet') "
|
||||||
|
"to aid with contextual understanding. Useful for UI discussions, diagrams, "
|
||||||
|
"visual problems, error screens, architecture mockups, and visual analysis tasks."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class BaseTool(ABC):
|
class BaseTool(ABC):
|
||||||
@@ -981,6 +993,141 @@ When recommending searches, be specific about what information you need and why
|
|||||||
}
|
}
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
def _validate_image_limits(
|
||||||
|
self, images: Optional[list[str]], model_name: str, continuation_id: Optional[str] = None
|
||||||
|
) -> Optional[dict]:
|
||||||
|
"""
|
||||||
|
Validate image size against model capabilities at MCP boundary.
|
||||||
|
|
||||||
|
This performs strict validation to ensure we don't exceed model-specific
|
||||||
|
image size limits. Uses capability-based validation with actual model
|
||||||
|
configuration rather than hard-coded limits.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
images: List of image paths/data URLs to validate
|
||||||
|
model_name: Name of the model to check limits against
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Optional[dict]: Error response if validation fails, None if valid
|
||||||
|
"""
|
||||||
|
if not images:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Get model capabilities to check image support and size limits
|
||||||
|
try:
|
||||||
|
provider = self.get_model_provider(model_name)
|
||||||
|
capabilities = provider.get_capabilities(model_name)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to get capabilities for model {model_name}: {e}")
|
||||||
|
# Fall back to checking custom models configuration
|
||||||
|
capabilities = None
|
||||||
|
|
||||||
|
# Check if model supports images at all
|
||||||
|
supports_images = False
|
||||||
|
max_size_mb = 0.0
|
||||||
|
|
||||||
|
if capabilities:
|
||||||
|
supports_images = capabilities.supports_images
|
||||||
|
max_size_mb = capabilities.max_image_size_mb
|
||||||
|
else:
|
||||||
|
# Fall back to custom models configuration
|
||||||
|
try:
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
custom_models_path = Path(__file__).parent.parent / "conf" / "custom_models.json"
|
||||||
|
if custom_models_path.exists():
|
||||||
|
with open(custom_models_path) as f:
|
||||||
|
custom_config = json.load(f)
|
||||||
|
|
||||||
|
# Check if model is in custom models list
|
||||||
|
for model_config in custom_config.get("models", []):
|
||||||
|
if model_config.get("model_name") == model_name or model_name in model_config.get(
|
||||||
|
"aliases", []
|
||||||
|
):
|
||||||
|
supports_images = model_config.get("supports_images", False)
|
||||||
|
max_size_mb = model_config.get("max_image_size_mb", 0.0)
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to load custom models config: {e}")
|
||||||
|
|
||||||
|
# If model doesn't support images, reject
|
||||||
|
if not supports_images:
|
||||||
|
return {
|
||||||
|
"status": "error",
|
||||||
|
"content": (
|
||||||
|
f"Image support not available: Model '{model_name}' does not support image processing. "
|
||||||
|
f"Please use a vision-capable model such as 'gemini-2.5-flash-preview-05-20', 'o3', "
|
||||||
|
f"or 'claude-3-opus' for image analysis tasks."
|
||||||
|
),
|
||||||
|
"content_type": "text",
|
||||||
|
"metadata": {
|
||||||
|
"error_type": "validation_error",
|
||||||
|
"model_name": model_name,
|
||||||
|
"supports_images": False,
|
||||||
|
"image_count": len(images),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Calculate total size of all images
|
||||||
|
total_size_mb = 0.0
|
||||||
|
for image_path in images:
|
||||||
|
try:
|
||||||
|
if image_path.startswith("data:image/"):
|
||||||
|
# Handle data URL: data:image/png;base64,iVBORw0...
|
||||||
|
_, data = image_path.split(",", 1)
|
||||||
|
# Base64 encoding increases size by ~33%, so decode to get actual size
|
||||||
|
import base64
|
||||||
|
|
||||||
|
actual_size = len(base64.b64decode(data))
|
||||||
|
|
||||||
|
actual_size = len(base64.b64decode(data))
|
||||||
|
total_size_mb += actual_size / (1024 * 1024)
|
||||||
|
else:
|
||||||
|
# Handle file path
|
||||||
|
if os.path.exists(image_path):
|
||||||
|
file_size = os.path.getsize(image_path)
|
||||||
|
total_size_mb += file_size / (1024 * 1024)
|
||||||
|
else:
|
||||||
|
logger.warning(f"Image file not found: {image_path}")
|
||||||
|
# Assume a reasonable size for missing files to avoid breaking validation
|
||||||
|
total_size_mb += 1.0 # 1MB assumption
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to get size for image {image_path}: {e}")
|
||||||
|
# Assume a reasonable size for problematic files
|
||||||
|
total_size_mb += 1.0 # 1MB assumption
|
||||||
|
|
||||||
|
# Apply 40MB cap for custom models as requested
|
||||||
|
effective_limit_mb = max_size_mb
|
||||||
|
if hasattr(capabilities, "provider") and capabilities.provider == ProviderType.CUSTOM:
|
||||||
|
effective_limit_mb = min(max_size_mb, 40.0)
|
||||||
|
elif not capabilities: # Fallback case for custom models
|
||||||
|
effective_limit_mb = min(max_size_mb, 40.0)
|
||||||
|
|
||||||
|
# Validate against size limit
|
||||||
|
if total_size_mb > effective_limit_mb:
|
||||||
|
return {
|
||||||
|
"status": "error",
|
||||||
|
"content": (
|
||||||
|
f"Image size limit exceeded: Model '{model_name}' supports maximum {effective_limit_mb:.1f}MB "
|
||||||
|
f"for all images combined, but {total_size_mb:.1f}MB was provided. "
|
||||||
|
f"Please reduce image sizes or count and try again."
|
||||||
|
),
|
||||||
|
"content_type": "text",
|
||||||
|
"metadata": {
|
||||||
|
"error_type": "validation_error",
|
||||||
|
"model_name": model_name,
|
||||||
|
"total_size_mb": round(total_size_mb, 2),
|
||||||
|
"limit_mb": round(effective_limit_mb, 2),
|
||||||
|
"image_count": len(images),
|
||||||
|
"supports_images": supports_images,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# All validations passed
|
||||||
|
logger.debug(f"Image validation passed: {len(images)} images")
|
||||||
|
return None
|
||||||
|
|
||||||
def estimate_tokens_smart(self, file_path: str) -> int:
|
def estimate_tokens_smart(self, file_path: str) -> int:
|
||||||
"""
|
"""
|
||||||
Estimate tokens for a file using file-type aware ratios.
|
Estimate tokens for a file using file-type aware ratios.
|
||||||
@@ -1131,6 +1278,9 @@ When recommending searches, be specific about what information you need and why
|
|||||||
)
|
)
|
||||||
return [TextContent(type="text", text=error_output.model_dump_json())]
|
return [TextContent(type="text", text=error_output.model_dump_json())]
|
||||||
|
|
||||||
|
# Extract and validate images from request
|
||||||
|
images = getattr(request, "images", None) or []
|
||||||
|
|
||||||
# Check if we have continuation_id - if so, conversation history is already embedded
|
# Check if we have continuation_id - if so, conversation history is already embedded
|
||||||
continuation_id = getattr(request, "continuation_id", None)
|
continuation_id = getattr(request, "continuation_id", None)
|
||||||
|
|
||||||
@@ -1215,6 +1365,12 @@ When recommending searches, be specific about what information you need and why
|
|||||||
# Only set this after auto mode validation to prevent "auto" being used as a model name
|
# Only set this after auto mode validation to prevent "auto" being used as a model name
|
||||||
self._current_model_name = model_name
|
self._current_model_name = model_name
|
||||||
|
|
||||||
|
# Validate images at MCP boundary if any were provided
|
||||||
|
if images:
|
||||||
|
image_validation_error = self._validate_image_limits(images, model_name, continuation_id)
|
||||||
|
if image_validation_error:
|
||||||
|
return [TextContent(type="text", text=json.dumps(image_validation_error))]
|
||||||
|
|
||||||
temperature = getattr(request, "temperature", None)
|
temperature = getattr(request, "temperature", None)
|
||||||
if temperature is None:
|
if temperature is None:
|
||||||
temperature = self.get_default_temperature()
|
temperature = self.get_default_temperature()
|
||||||
@@ -1247,6 +1403,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
system_prompt=system_prompt,
|
system_prompt=system_prompt,
|
||||||
temperature=temperature,
|
temperature=temperature,
|
||||||
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
|
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
|
||||||
|
images=images if images else None, # Pass images via kwargs
|
||||||
)
|
)
|
||||||
|
|
||||||
logger.info(f"Received response from {provider.get_provider_type().value} API for {self.name}")
|
logger.info(f"Received response from {provider.get_provider_type().value} API for {self.name}")
|
||||||
@@ -1298,6 +1455,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
system_prompt=system_prompt,
|
system_prompt=system_prompt,
|
||||||
temperature=temperature,
|
temperature=temperature,
|
||||||
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
|
thinking_mode=thinking_mode if provider.supports_thinking_mode(model_name) else None,
|
||||||
|
images=images if images else None, # Pass images via kwargs in retry too
|
||||||
)
|
)
|
||||||
|
|
||||||
if retry_response.content:
|
if retry_response.content:
|
||||||
@@ -1398,6 +1556,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
continuation_id = getattr(request, "continuation_id", None)
|
continuation_id = getattr(request, "continuation_id", None)
|
||||||
if continuation_id:
|
if continuation_id:
|
||||||
request_files = getattr(request, "files", []) or []
|
request_files = getattr(request, "files", []) or []
|
||||||
|
request_images = getattr(request, "images", []) or []
|
||||||
# Extract model metadata for conversation tracking
|
# Extract model metadata for conversation tracking
|
||||||
model_provider = None
|
model_provider = None
|
||||||
model_name = None
|
model_name = None
|
||||||
@@ -1417,6 +1576,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
"assistant",
|
"assistant",
|
||||||
formatted_content,
|
formatted_content,
|
||||||
files=request_files,
|
files=request_files,
|
||||||
|
images=request_images,
|
||||||
tool_name=self.name,
|
tool_name=self.name,
|
||||||
model_provider=model_provider,
|
model_provider=model_provider,
|
||||||
model_name=model_name,
|
model_name=model_name,
|
||||||
@@ -1519,6 +1679,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
# Use actually processed files from file preparation instead of original request files
|
# Use actually processed files from file preparation instead of original request files
|
||||||
# This ensures directories are tracked as their individual expanded files
|
# This ensures directories are tracked as their individual expanded files
|
||||||
request_files = getattr(self, "_actually_processed_files", []) or getattr(request, "files", []) or []
|
request_files = getattr(self, "_actually_processed_files", []) or getattr(request, "files", []) or []
|
||||||
|
request_images = getattr(request, "images", []) or []
|
||||||
# Extract model metadata
|
# Extract model metadata
|
||||||
model_provider = None
|
model_provider = None
|
||||||
model_name = None
|
model_name = None
|
||||||
@@ -1538,6 +1699,7 @@ When recommending searches, be specific about what information you need and why
|
|||||||
"assistant",
|
"assistant",
|
||||||
content,
|
content,
|
||||||
files=request_files,
|
files=request_files,
|
||||||
|
images=request_images,
|
||||||
tool_name=self.name,
|
tool_name=self.name,
|
||||||
model_provider=model_provider,
|
model_provider=model_provider,
|
||||||
model_name=model_name,
|
model_name=model_name,
|
||||||
|
|||||||
@@ -20,12 +20,25 @@ class ChatRequest(ToolRequest):
|
|||||||
|
|
||||||
prompt: str = Field(
|
prompt: str = Field(
|
||||||
...,
|
...,
|
||||||
description="Your question, topic, or current thinking to discuss",
|
description=(
|
||||||
|
"Your thorough, expressive question with as much context as possible. Remember: you're talking to "
|
||||||
|
"another Claude assistant who has deep expertise and can provide nuanced insights. Include your "
|
||||||
|
"current thinking, specific challenges, background context, what you've already tried, and what "
|
||||||
|
"kind of response would be most helpful. The more context and detail you provide, the more "
|
||||||
|
"valuable and targeted the response will be."
|
||||||
|
),
|
||||||
)
|
)
|
||||||
files: Optional[list[str]] = Field(
|
files: Optional[list[str]] = Field(
|
||||||
default_factory=list,
|
default_factory=list,
|
||||||
description="Optional files for context (must be absolute paths)",
|
description="Optional files for context (must be absolute paths)",
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
default_factory=list,
|
||||||
|
description=(
|
||||||
|
"Optional images for visual context. Useful for UI discussions, diagrams, visual problems, "
|
||||||
|
"error screens, or architectural mockups."
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class ChatTool(BaseTool):
|
class ChatTool(BaseTool):
|
||||||
@@ -42,7 +55,8 @@ class ChatTool(BaseTool):
|
|||||||
"Also great for: explanations, comparisons, general development questions. "
|
"Also great for: explanations, comparisons, general development questions. "
|
||||||
"Use this when you want to ask questions, brainstorm ideas, get opinions, discuss topics, "
|
"Use this when you want to ask questions, brainstorm ideas, get opinions, discuss topics, "
|
||||||
"share your thinking, or need explanations about concepts and approaches. "
|
"share your thinking, or need explanations about concepts and approaches. "
|
||||||
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
|
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can "
|
||||||
|
"provide enhanced capabilities."
|
||||||
)
|
)
|
||||||
|
|
||||||
def get_input_schema(self) -> dict[str, Any]:
|
def get_input_schema(self) -> dict[str, Any]:
|
||||||
@@ -51,13 +65,27 @@ class ChatTool(BaseTool):
|
|||||||
"properties": {
|
"properties": {
|
||||||
"prompt": {
|
"prompt": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "Your question, topic, or current thinking to discuss",
|
"description": (
|
||||||
|
"Your thorough, expressive question with as much context as possible. Remember: you're "
|
||||||
|
"talking to another Claude assistant who has deep expertise and can provide nuanced "
|
||||||
|
"insights. Include your current thinking, specific challenges, background context, what "
|
||||||
|
"you've already tried, and what kind of response would be most helpful. The more context "
|
||||||
|
"and detail you provide, the more valuable and targeted the response will be."
|
||||||
|
),
|
||||||
},
|
},
|
||||||
"files": {
|
"files": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {"type": "string"},
|
"items": {"type": "string"},
|
||||||
"description": "Optional files for context (must be absolute paths)",
|
"description": "Optional files for context (must be absolute paths)",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": (
|
||||||
|
"Optional images for visual context. Useful for UI discussions, diagrams, visual "
|
||||||
|
"problems, error screens, or architectural mockups."
|
||||||
|
),
|
||||||
|
},
|
||||||
"model": self.get_model_field_schema(),
|
"model": self.get_model_field_schema(),
|
||||||
"temperature": {
|
"temperature": {
|
||||||
"type": "number",
|
"type": "number",
|
||||||
@@ -68,16 +96,29 @@ class ChatTool(BaseTool):
|
|||||||
"thinking_mode": {
|
"thinking_mode": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"enum": ["minimal", "low", "medium", "high", "max"],
|
"enum": ["minimal", "low", "medium", "high", "max"],
|
||||||
"description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
|
"description": (
|
||||||
|
"Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), "
|
||||||
|
"max (100% of model max)"
|
||||||
|
),
|
||||||
},
|
},
|
||||||
"use_websearch": {
|
"use_websearch": {
|
||||||
"type": "boolean",
|
"type": "boolean",
|
||||||
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.",
|
"description": (
|
||||||
|
"Enable web search for documentation, best practices, and current information. "
|
||||||
|
"Particularly useful for: brainstorming sessions, architectural design discussions, "
|
||||||
|
"exploring industry best practices, working with specific frameworks/technologies, "
|
||||||
|
"researching solutions to complex problems, or when current documentation and "
|
||||||
|
"community insights would enhance the analysis."
|
||||||
|
),
|
||||||
"default": True,
|
"default": True,
|
||||||
},
|
},
|
||||||
"continuation_id": {
|
"continuation_id": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.",
|
"description": (
|
||||||
|
"Thread continuation ID for multi-turn conversations. Can be used to continue "
|
||||||
|
"conversations across different tools. Only provide this if continuing a previous "
|
||||||
|
"conversation thread."
|
||||||
|
),
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
"required": ["prompt"] + (["model"] if self.is_effective_auto_mode() else []),
|
"required": ["prompt"] + (["model"] if self.is_effective_auto_mode() else []),
|
||||||
@@ -157,4 +198,7 @@ Please provide a thoughtful, comprehensive response:"""
|
|||||||
|
|
||||||
def format_response(self, response: str, request: ChatRequest, model_info: Optional[dict] = None) -> str:
|
def format_response(self, response: str, request: ChatRequest, model_info: Optional[dict] = None) -> str:
|
||||||
"""Format the chat response"""
|
"""Format the chat response"""
|
||||||
return f"{response}\n\n---\n\n**Claude's Turn:** Evaluate this perspective alongside your analysis to form a comprehensive solution and continue with the user's request and task at hand."
|
return (
|
||||||
|
f"{response}\n\n---\n\n**Claude's Turn:** Evaluate this perspective alongside your analysis to "
|
||||||
|
"form a comprehensive solution and continue with the user's request and task at hand."
|
||||||
|
)
|
||||||
|
|||||||
@@ -41,6 +41,10 @@ class CodeReviewRequest(ToolRequest):
|
|||||||
...,
|
...,
|
||||||
description="User's summary of what the code does, expected behavior, constraints, and review objectives",
|
description="User's summary of what the code does, expected behavior, constraints, and review objectives",
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
None,
|
||||||
|
description="Optional images of architecture diagrams, UI mockups, design documents, or visual references for code review context",
|
||||||
|
)
|
||||||
review_type: str = Field("full", description="Type of review: full|security|performance|quick")
|
review_type: str = Field("full", description="Type of review: full|security|performance|quick")
|
||||||
focus_on: Optional[str] = Field(
|
focus_on: Optional[str] = Field(
|
||||||
None,
|
None,
|
||||||
@@ -94,6 +98,11 @@ class CodeReviewTool(BaseTool):
|
|||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "User's summary of what the code does, expected behavior, constraints, and review objectives",
|
"description": "User's summary of what the code does, expected behavior, constraints, and review objectives",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": "Optional images of architecture diagrams, UI mockups, design documents, or visual references for code review context",
|
||||||
|
},
|
||||||
"review_type": {
|
"review_type": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"enum": ["full", "security", "performance", "quick"],
|
"enum": ["full", "security", "performance", "quick"],
|
||||||
|
|||||||
@@ -24,6 +24,10 @@ class DebugIssueRequest(ToolRequest):
|
|||||||
None,
|
None,
|
||||||
description="Files or directories that might be related to the issue (must be absolute paths)",
|
description="Files or directories that might be related to the issue (must be absolute paths)",
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
None,
|
||||||
|
description="Optional images showing error screens, UI issues, logs displays, or visual debugging information",
|
||||||
|
)
|
||||||
runtime_info: Optional[str] = Field(None, description="Environment, versions, or runtime information")
|
runtime_info: Optional[str] = Field(None, description="Environment, versions, or runtime information")
|
||||||
previous_attempts: Optional[str] = Field(None, description="What has been tried already")
|
previous_attempts: Optional[str] = Field(None, description="What has been tried already")
|
||||||
|
|
||||||
@@ -69,6 +73,11 @@ class DebugIssueTool(BaseTool):
|
|||||||
"items": {"type": "string"},
|
"items": {"type": "string"},
|
||||||
"description": "Files or directories that might be related to the issue (must be absolute paths)",
|
"description": "Files or directories that might be related to the issue (must be absolute paths)",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": "Optional images showing error screens, UI issues, logs displays, or visual debugging information",
|
||||||
|
},
|
||||||
"runtime_info": {
|
"runtime_info": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "Environment, versions, or runtime information",
|
"description": "Environment, versions, or runtime information",
|
||||||
|
|||||||
@@ -78,6 +78,10 @@ class PrecommitRequest(ToolRequest):
|
|||||||
None,
|
None,
|
||||||
description="Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
|
description="Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
None,
|
||||||
|
description="Optional images showing expected UI changes, design requirements, or visual references for the changes being validated",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class Precommit(BaseTool):
|
class Precommit(BaseTool):
|
||||||
@@ -170,6 +174,11 @@ class Precommit(BaseTool):
|
|||||||
"items": {"type": "string"},
|
"items": {"type": "string"},
|
||||||
"description": "Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
|
"description": "Optional files or directories to provide as context (must be absolute paths). These files are not part of the changes but provide helpful context like configs, docs, or related code.",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": "Optional images showing expected UI changes, design requirements, or visual references for the changes being validated",
|
||||||
|
},
|
||||||
"use_websearch": {
|
"use_websearch": {
|
||||||
"type": "boolean",
|
"type": "boolean",
|
||||||
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.",
|
"description": "Enable web search for documentation, best practices, and current information. Particularly useful for: brainstorming sessions, architectural design discussions, exploring industry best practices, working with specific frameworks/technologies, researching solutions to complex problems, or when current documentation and community insights would enhance the analysis.",
|
||||||
|
|||||||
@@ -33,6 +33,10 @@ class ThinkDeepRequest(ToolRequest):
|
|||||||
None,
|
None,
|
||||||
description="Optional file paths or directories for additional context (must be absolute paths)",
|
description="Optional file paths or directories for additional context (must be absolute paths)",
|
||||||
)
|
)
|
||||||
|
images: Optional[list[str]] = Field(
|
||||||
|
None,
|
||||||
|
description="Optional images for visual analysis - diagrams, charts, system architectures, or any visual information to analyze",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class ThinkDeepTool(BaseTool):
|
class ThinkDeepTool(BaseTool):
|
||||||
@@ -60,7 +64,13 @@ class ThinkDeepTool(BaseTool):
|
|||||||
"properties": {
|
"properties": {
|
||||||
"prompt": {
|
"prompt": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "Your current thinking/analysis to extend and validate. IMPORTANT: Before using this tool, Claude MUST first think deeply and establish a deep understanding of the topic and question by thinking through all relevant details, context, constraints, and implications. Share these extended thoughts and ideas in the prompt so the model has comprehensive information to work with for the best analysis.",
|
"description": (
|
||||||
|
"Your current thinking/analysis to extend and validate. IMPORTANT: Before using this tool, "
|
||||||
|
"Claude MUST first think deeply and establish a deep understanding of the topic and question "
|
||||||
|
"by thinking through all relevant details, context, constraints, and implications. Share "
|
||||||
|
"these extended thoughts and ideas in the prompt so the model has comprehensive information "
|
||||||
|
"to work with for the best analysis."
|
||||||
|
),
|
||||||
},
|
},
|
||||||
"model": self.get_model_field_schema(),
|
"model": self.get_model_field_schema(),
|
||||||
"problem_context": {
|
"problem_context": {
|
||||||
@@ -77,6 +87,11 @@ class ThinkDeepTool(BaseTool):
|
|||||||
"items": {"type": "string"},
|
"items": {"type": "string"},
|
||||||
"description": "Optional file paths or directories for additional context (must be absolute paths)",
|
"description": "Optional file paths or directories for additional context (must be absolute paths)",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": "Optional images for visual analysis - diagrams, charts, system architectures, or any visual information to analyze",
|
||||||
|
},
|
||||||
"temperature": {
|
"temperature": {
|
||||||
"type": "number",
|
"type": "number",
|
||||||
"description": "Temperature for creative thinking (0-1, default 0.7)",
|
"description": "Temperature for creative thinking (0-1, default 0.7)",
|
||||||
|
|||||||
@@ -22,11 +22,29 @@ class TracerRequest(ToolRequest):
|
|||||||
|
|
||||||
prompt: str = Field(
|
prompt: str = Field(
|
||||||
...,
|
...,
|
||||||
description="Detailed description of what to trace and WHY you need this analysis. Include context about what you're trying to understand, debug, or analyze. For precision mode: describe the specific method/function and what aspect of its execution flow you need to understand. For dependencies mode: describe the class/module and what relationships you need to map. Example: 'I need to understand how BookingManager.finalizeInvoice method is called throughout the system and what side effects it has, as I'm debugging payment processing issues' rather than just 'BookingManager finalizeInvoice method'",
|
description=(
|
||||||
|
"Detailed description of what to trace and WHY you need this analysis. Include context about what "
|
||||||
|
"you're trying to understand, debug, or analyze. For precision mode: describe the specific "
|
||||||
|
"method/function and what aspect of its execution flow you need to understand. For dependencies "
|
||||||
|
"mode: describe the class/module and what relationships you need to map. Example: 'I need to "
|
||||||
|
"understand how BookingManager.finalizeInvoice method is called throughout the system and what "
|
||||||
|
"side effects it has, as I'm debugging payment processing issues' rather than just "
|
||||||
|
"'BookingManager finalizeInvoice method'"
|
||||||
|
),
|
||||||
)
|
)
|
||||||
trace_mode: Literal["precision", "dependencies"] = Field(
|
trace_mode: Literal["precision", "dependencies"] = Field(
|
||||||
...,
|
...,
|
||||||
description="Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)",
|
description=(
|
||||||
|
"Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or "
|
||||||
|
"'dependencies' (for classes/modules/protocols - shows structural relationships)"
|
||||||
|
),
|
||||||
|
)
|
||||||
|
images: list[str] = Field(
|
||||||
|
default_factory=list,
|
||||||
|
description=(
|
||||||
|
"Optional images of system architecture diagrams, flow charts, or visual references to help "
|
||||||
|
"understand the tracing context"
|
||||||
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -44,11 +62,15 @@ class TracerTool(BaseTool):
|
|||||||
def get_description(self) -> str:
|
def get_description(self) -> str:
|
||||||
return (
|
return (
|
||||||
"ANALYSIS PROMPT GENERATOR - Creates structured prompts for static code analysis. "
|
"ANALYSIS PROMPT GENERATOR - Creates structured prompts for static code analysis. "
|
||||||
"Helps generate detailed analysis requests with specific method/function names, file paths, and component context. "
|
"Helps generate detailed analysis requests with specific method/function names, file paths, and "
|
||||||
"Type 'precision': For methods/functions - traces execution flow, call chains, call stacks, and shows when/how they are used. "
|
"component context. "
|
||||||
"Type 'dependencies': For classes/modules/protocols - maps structural relationships and bidirectional dependencies. "
|
"Type 'precision': For methods/functions - traces execution flow, call chains, call stacks, and "
|
||||||
|
"shows when/how they are used. "
|
||||||
|
"Type 'dependencies': For classes/modules/protocols - maps structural relationships and "
|
||||||
|
"bidirectional dependencies. "
|
||||||
"Returns detailed instructions on how to perform the analysis and format the results. "
|
"Returns detailed instructions on how to perform the analysis and format the results. "
|
||||||
"Use this to create focused analysis requests that can be fed back to Claude with the appropriate code files. "
|
"Use this to create focused analysis requests that can be fed back to Claude with the appropriate "
|
||||||
|
"code files. "
|
||||||
)
|
)
|
||||||
|
|
||||||
def get_input_schema(self) -> dict[str, Any]:
|
def get_input_schema(self) -> dict[str, Any]:
|
||||||
@@ -57,13 +79,26 @@ class TracerTool(BaseTool):
|
|||||||
"properties": {
|
"properties": {
|
||||||
"prompt": {
|
"prompt": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "Detailed description of what to trace and WHY you need this analysis. Include context about what you're trying to understand, debug, or analyze. For precision mode: describe the specific method/function and what aspect of its execution flow you need to understand. For dependencies mode: describe the class/module and what relationships you need to map. Example: 'I need to understand how BookingManager.finalizeInvoice method is called throughout the system and what side effects it has, as I'm debugging payment processing issues' rather than just 'BookingManager finalizeInvoice method'",
|
"description": (
|
||||||
|
"Detailed description of what to trace and WHY you need this analysis. Include context "
|
||||||
|
"about what you're trying to understand, debug, or analyze. For precision mode: describe "
|
||||||
|
"the specific method/function and what aspect of its execution flow you need to understand. "
|
||||||
|
"For dependencies mode: describe the class/module and what relationships you need to map. "
|
||||||
|
"Example: 'I need to understand how BookingManager.finalizeInvoice method is called "
|
||||||
|
"throughout the system and what side effects it has, as I'm debugging payment processing "
|
||||||
|
"issues' rather than just 'BookingManager finalizeInvoice method'"
|
||||||
|
),
|
||||||
},
|
},
|
||||||
"trace_mode": {
|
"trace_mode": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"enum": ["precision", "dependencies"],
|
"enum": ["precision", "dependencies"],
|
||||||
"description": "Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)",
|
"description": "Trace mode: 'precision' (for methods/functions - shows execution flow and usage patterns) or 'dependencies' (for classes/modules/protocols - shows structural relationships)",
|
||||||
},
|
},
|
||||||
|
"images": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
"description": "Optional images of system architecture diagrams, flow charts, or visual references to help understand the tracing context",
|
||||||
|
},
|
||||||
},
|
},
|
||||||
"required": ["prompt", "trace_mode"],
|
"required": ["prompt", "trace_mode"],
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -142,6 +142,7 @@ class ConversationTurn(BaseModel):
|
|||||||
content: The actual message content/response
|
content: The actual message content/response
|
||||||
timestamp: ISO timestamp when this turn was created
|
timestamp: ISO timestamp when this turn was created
|
||||||
files: List of file paths referenced in this specific turn
|
files: List of file paths referenced in this specific turn
|
||||||
|
images: List of image paths referenced in this specific turn
|
||||||
tool_name: Which tool generated this turn (for cross-tool tracking)
|
tool_name: Which tool generated this turn (for cross-tool tracking)
|
||||||
model_provider: Provider used (e.g., "google", "openai")
|
model_provider: Provider used (e.g., "google", "openai")
|
||||||
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
|
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
|
||||||
@@ -152,6 +153,7 @@ class ConversationTurn(BaseModel):
|
|||||||
content: str
|
content: str
|
||||||
timestamp: str
|
timestamp: str
|
||||||
files: Optional[list[str]] = None # Files referenced in this turn
|
files: Optional[list[str]] = None # Files referenced in this turn
|
||||||
|
images: Optional[list[str]] = None # Images referenced in this turn
|
||||||
tool_name: Optional[str] = None # Tool used for this turn
|
tool_name: Optional[str] = None # Tool used for this turn
|
||||||
model_provider: Optional[str] = None # Model provider (google, openai, etc)
|
model_provider: Optional[str] = None # Model provider (google, openai, etc)
|
||||||
model_name: Optional[str] = None # Specific model used
|
model_name: Optional[str] = None # Specific model used
|
||||||
@@ -300,6 +302,7 @@ def add_turn(
|
|||||||
role: str,
|
role: str,
|
||||||
content: str,
|
content: str,
|
||||||
files: Optional[list[str]] = None,
|
files: Optional[list[str]] = None,
|
||||||
|
images: Optional[list[str]] = None,
|
||||||
tool_name: Optional[str] = None,
|
tool_name: Optional[str] = None,
|
||||||
model_provider: Optional[str] = None,
|
model_provider: Optional[str] = None,
|
||||||
model_name: Optional[str] = None,
|
model_name: Optional[str] = None,
|
||||||
@@ -318,6 +321,7 @@ def add_turn(
|
|||||||
role: "user" (Claude) or "assistant" (Gemini/O3/etc)
|
role: "user" (Claude) or "assistant" (Gemini/O3/etc)
|
||||||
content: The actual message/response content
|
content: The actual message/response content
|
||||||
files: Optional list of files referenced in this turn
|
files: Optional list of files referenced in this turn
|
||||||
|
images: Optional list of images referenced in this turn
|
||||||
tool_name: Name of the tool adding this turn (for attribution)
|
tool_name: Name of the tool adding this turn (for attribution)
|
||||||
model_provider: Provider used (e.g., "google", "openai")
|
model_provider: Provider used (e.g., "google", "openai")
|
||||||
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
|
model_name: Specific model used (e.g., "gemini-2.5-flash-preview-05-20", "o3-mini")
|
||||||
@@ -335,6 +339,7 @@ def add_turn(
|
|||||||
- Refreshes thread TTL to configured timeout on successful update
|
- Refreshes thread TTL to configured timeout on successful update
|
||||||
- Turn limits prevent runaway conversations
|
- Turn limits prevent runaway conversations
|
||||||
- File references are preserved for cross-tool access with atomic ordering
|
- File references are preserved for cross-tool access with atomic ordering
|
||||||
|
- Image references are preserved for cross-tool visual context
|
||||||
- Model information enables cross-provider conversations
|
- Model information enables cross-provider conversations
|
||||||
"""
|
"""
|
||||||
logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
|
logger.debug(f"[FLOW] Adding {role} turn to {thread_id} ({tool_name})")
|
||||||
@@ -355,6 +360,7 @@ def add_turn(
|
|||||||
content=content,
|
content=content,
|
||||||
timestamp=datetime.now(timezone.utc).isoformat(),
|
timestamp=datetime.now(timezone.utc).isoformat(),
|
||||||
files=files, # Preserved for cross-tool file context
|
files=files, # Preserved for cross-tool file context
|
||||||
|
images=images, # Preserved for cross-tool visual context
|
||||||
tool_name=tool_name, # Track which tool generated this turn
|
tool_name=tool_name, # Track which tool generated this turn
|
||||||
model_provider=model_provider, # Track model provider
|
model_provider=model_provider, # Track model provider
|
||||||
model_name=model_name, # Track specific model
|
model_name=model_name, # Track specific model
|
||||||
@@ -489,6 +495,78 @@ def get_conversation_file_list(context: ThreadContext) -> list[str]:
|
|||||||
return file_list
|
return file_list
|
||||||
|
|
||||||
|
|
||||||
|
def get_conversation_image_list(context: ThreadContext) -> list[str]:
|
||||||
|
"""
|
||||||
|
Extract all unique images from conversation turns with newest-first prioritization.
|
||||||
|
|
||||||
|
This function implements the identical prioritization logic as get_conversation_file_list()
|
||||||
|
to ensure consistency in how images are handled across conversation turns. It walks
|
||||||
|
backwards through conversation turns (from newest to oldest) and collects unique image
|
||||||
|
references, ensuring that when the same image appears in multiple turns, the reference
|
||||||
|
from the NEWEST turn takes precedence.
|
||||||
|
|
||||||
|
PRIORITIZATION ALGORITHM:
|
||||||
|
1. Iterate through turns in REVERSE order (index len-1 down to 0)
|
||||||
|
2. For each turn, process images in the order they appear in turn.images
|
||||||
|
3. Add image to result list only if not already seen (newest reference wins)
|
||||||
|
4. Skip duplicate images that were already added from newer turns
|
||||||
|
|
||||||
|
This ensures that:
|
||||||
|
- Images from newer conversation turns appear first in the result
|
||||||
|
- When the same image is referenced multiple times, only the newest reference is kept
|
||||||
|
- The order reflects the most recent conversation context
|
||||||
|
|
||||||
|
Example:
|
||||||
|
Turn 1: images = ["diagram.png", "flow.jpg"]
|
||||||
|
Turn 2: images = ["error.png"]
|
||||||
|
Turn 3: images = ["diagram.png", "updated.png"] # diagram.png appears again
|
||||||
|
|
||||||
|
Result: ["diagram.png", "updated.png", "error.png", "flow.jpg"]
|
||||||
|
(diagram.png from Turn 3 takes precedence over Turn 1)
|
||||||
|
|
||||||
|
Args:
|
||||||
|
context: ThreadContext containing all conversation turns to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
list[str]: Unique image paths ordered by newest reference first.
|
||||||
|
Empty list if no turns exist or no images are referenced.
|
||||||
|
|
||||||
|
Performance:
|
||||||
|
- Time Complexity: O(n*m) where n=turns, m=avg images per turn
|
||||||
|
- Space Complexity: O(i) where i=total unique images
|
||||||
|
- Uses set for O(1) duplicate detection
|
||||||
|
"""
|
||||||
|
if not context.turns:
|
||||||
|
logger.debug("[IMAGES] No turns found, returning empty image list")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Collect images by walking backwards (newest to oldest turns)
|
||||||
|
seen_images = set()
|
||||||
|
image_list = []
|
||||||
|
|
||||||
|
logger.debug(f"[IMAGES] Collecting images from {len(context.turns)} turns (newest first)")
|
||||||
|
|
||||||
|
# Process turns in reverse order (newest first) - this is the CORE of newest-first prioritization
|
||||||
|
# By iterating from len-1 down to 0, we encounter newer turns before older turns
|
||||||
|
# When we find a duplicate image, we skip it because the newer version is already in our list
|
||||||
|
for i in range(len(context.turns) - 1, -1, -1): # REVERSE: newest turn first
|
||||||
|
turn = context.turns[i]
|
||||||
|
if turn.images:
|
||||||
|
logger.debug(f"[IMAGES] Turn {i + 1} has {len(turn.images)} images: {turn.images}")
|
||||||
|
for image_path in turn.images:
|
||||||
|
if image_path not in seen_images:
|
||||||
|
# First time seeing this image - add it (this is the NEWEST reference)
|
||||||
|
seen_images.add(image_path)
|
||||||
|
image_list.append(image_path)
|
||||||
|
logger.debug(f"[IMAGES] Added new image: {image_path} (from turn {i + 1})")
|
||||||
|
else:
|
||||||
|
# Image already seen from a NEWER turn - skip this older reference
|
||||||
|
logger.debug(f"[IMAGES] Skipping duplicate image: {image_path} (newer version already included)")
|
||||||
|
|
||||||
|
logger.debug(f"[IMAGES] Final image list ({len(image_list)}): {image_list}")
|
||||||
|
return image_list
|
||||||
|
|
||||||
|
|
||||||
def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
|
def _plan_file_inclusion_by_size(all_files: list[str], max_file_tokens: int) -> tuple[list[str], list[str], int]:
|
||||||
"""
|
"""
|
||||||
Plan which files to include based on size constraints.
|
Plan which files to include based on size constraints.
|
||||||
|
|||||||
@@ -88,8 +88,9 @@ TEXT_DATA = {
|
|||||||
".lock", # Lock files
|
".lock", # Lock files
|
||||||
}
|
}
|
||||||
|
|
||||||
# Image file extensions
|
# Image file extensions - limited to what AI models actually support
|
||||||
IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico", ".tiff", ".tif"}
|
# Based on OpenAI and Gemini supported formats: PNG, JPEG, GIF, WebP
|
||||||
|
IMAGES = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
|
||||||
|
|
||||||
# Binary executable and library extensions
|
# Binary executable and library extensions
|
||||||
BINARIES = {
|
BINARIES = {
|
||||||
@@ -240,3 +241,30 @@ def get_token_estimation_ratio(file_path: str) -> float:
|
|||||||
|
|
||||||
extension = Path(file_path).suffix.lower()
|
extension = Path(file_path).suffix.lower()
|
||||||
return TOKEN_ESTIMATION_RATIOS.get(extension, 3.5) # Conservative default
|
return TOKEN_ESTIMATION_RATIOS.get(extension, 3.5) # Conservative default
|
||||||
|
|
||||||
|
|
||||||
|
# MIME type mappings for image files - limited to what AI models actually support
|
||||||
|
# Based on OpenAI and Gemini supported formats: PNG, JPEG, GIF, WebP
|
||||||
|
IMAGE_MIME_TYPES = {
|
||||||
|
".jpg": "image/jpeg",
|
||||||
|
".jpeg": "image/jpeg",
|
||||||
|
".png": "image/png",
|
||||||
|
".gif": "image/gif",
|
||||||
|
".webp": "image/webp",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_image_mime_type(extension: str) -> str:
|
||||||
|
"""
|
||||||
|
Get the MIME type for an image file extension.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
extension: File extension (with or without leading dot)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
MIME type string (default: image/jpeg for unknown extensions)
|
||||||
|
"""
|
||||||
|
if not extension.startswith("."):
|
||||||
|
extension = "." + extension
|
||||||
|
extension = extension.lower()
|
||||||
|
return IMAGE_MIME_TYPES.get(extension, "image/jpeg")
|
||||||
|
|||||||
@@ -48,6 +48,36 @@ from .file_types import BINARY_EXTENSIONS, CODE_EXTENSIONS, IMAGE_EXTENSIONS, TE
|
|||||||
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
|
from .security_config import CONTAINER_WORKSPACE, EXCLUDED_DIRS, MCP_SIGNATURE_FILES, SECURITY_ROOT, WORKSPACE_ROOT
|
||||||
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
|
from .token_utils import DEFAULT_CONTEXT_WINDOW, estimate_tokens
|
||||||
|
|
||||||
|
|
||||||
|
def _is_builtin_custom_models_config(path_str: str) -> bool:
|
||||||
|
"""
|
||||||
|
Check if path points to the server's built-in custom_models.json config file.
|
||||||
|
|
||||||
|
This only matches the server's internal config, not user-specified CUSTOM_MODELS_CONFIG_PATH.
|
||||||
|
We identify the built-in config by checking if it resolves to the server's conf directory.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
path_str: Path to check
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if this is the server's built-in custom_models.json config file
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
path = Path(path_str)
|
||||||
|
|
||||||
|
# Get the server root by going up from this file: utils/file_utils.py -> server_root
|
||||||
|
server_root = Path(__file__).parent.parent
|
||||||
|
builtin_config = server_root / "conf" / "custom_models.json"
|
||||||
|
|
||||||
|
# Check if the path resolves to the same file as our built-in config
|
||||||
|
# This handles both relative and absolute paths to the same file
|
||||||
|
return path.resolve() == builtin_config.resolve()
|
||||||
|
|
||||||
|
except Exception:
|
||||||
|
# If path resolution fails, it's not our built-in config
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
@@ -271,7 +301,8 @@ def translate_path_for_environment(path_str: str) -> str:
|
|||||||
tools and utilities throughout the codebase. It handles:
|
tools and utilities throughout the codebase. It handles:
|
||||||
1. Docker host-to-container path translation (host paths -> /workspace/...)
|
1. Docker host-to-container path translation (host paths -> /workspace/...)
|
||||||
2. Direct mode (no translation needed)
|
2. Direct mode (no translation needed)
|
||||||
3. Security validation and error handling
|
3. Internal server files (conf/custom_models.json)
|
||||||
|
4. Security validation and error handling
|
||||||
|
|
||||||
Docker Path Translation Logic:
|
Docker Path Translation Logic:
|
||||||
- Input: /Users/john/project/src/file.py (host path from Claude)
|
- Input: /Users/john/project/src/file.py (host path from Claude)
|
||||||
@@ -284,32 +315,9 @@ def translate_path_for_environment(path_str: str) -> str:
|
|||||||
Returns:
|
Returns:
|
||||||
Translated path appropriate for the current environment
|
Translated path appropriate for the current environment
|
||||||
"""
|
"""
|
||||||
# Allow access to specific internal application configuration files
|
# Handle built-in server config file - no translation needed
|
||||||
# Store as relative paths so they work in both Docker and standalone modes
|
if _is_builtin_custom_models_config(path_str):
|
||||||
# Use exact paths for security - no wildcards or prefix matching
|
return path_str
|
||||||
ALLOWED_INTERNAL_PATHS = {
|
|
||||||
"conf/custom_models.json",
|
|
||||||
# Add other specific internal files here as needed
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check for internal app paths - extract relative part if it's an /app/ path
|
|
||||||
relative_internal_path = None
|
|
||||||
if path_str.startswith("/app/"):
|
|
||||||
relative_internal_path = path_str[5:] # Remove "/app/" prefix
|
|
||||||
if relative_internal_path.startswith("/"):
|
|
||||||
relative_internal_path = relative_internal_path[1:] # Remove leading slash if present
|
|
||||||
|
|
||||||
# Check if this is an allowed internal file
|
|
||||||
if relative_internal_path and relative_internal_path in ALLOWED_INTERNAL_PATHS:
|
|
||||||
# Translate to appropriate path for current environment
|
|
||||||
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
|
|
||||||
# Standalone mode: use relative path
|
|
||||||
return "./" + relative_internal_path
|
|
||||||
else:
|
|
||||||
# Docker mode: use absolute app path
|
|
||||||
return "/app/" + relative_internal_path
|
|
||||||
|
|
||||||
# Handle other /app/ paths in standalone mode (for non-whitelisted files)
|
|
||||||
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
|
if not WORKSPACE_ROOT or not WORKSPACE_ROOT.strip() or not CONTAINER_WORKSPACE.exists():
|
||||||
if path_str.startswith("/app/"):
|
if path_str.startswith("/app/"):
|
||||||
# Convert Docker internal paths to local relative paths for standalone mode
|
# Convert Docker internal paths to local relative paths for standalone mode
|
||||||
|
|||||||
Reference in New Issue
Block a user