- OpenAI provider now checks custom models registry for user configurations
- Custom models with supports_temperature=false no longer send temperature to API
- Fixes 400 errors for custom o3/gpt-5 models configured without temperature support
- Added comprehensive tests to verify the fix works correctly
- Maintains backward compatibility with built-in models
Fixes#245
- Fix ModelContext constructor call in consensus tool (remove invalid parameters)
- Refactor temperature pattern matching for better readability per code review
- All tests now passing (799/799 passed)
- Fix consensus tool hardcoded temperature=0.2 bypassing model capabilities
- Add intelligent temperature inference for unknown custom models
- Support multi-model collaboration (O3, Gemini, Claude, Mistral, DeepSeek)
- Only OpenAI O-series and DeepSeek reasoner models reject temperature
- Most reasoning models (Gemini Pro, Claude, Mistral) DO support temperature
- Comprehensive logging for temperature decisions and user guidance
Resolves: https://github.com/BeehiveInnovations/zen-mcp-server/issues/245
- Add GEMINI_BASE_URL configuration option in .env.example
- Implement custom endpoint support in GeminiModelProvider using HttpOptions
- Update registry to pass base_url parameter to Gemini provider
- Maintain backward compatibility - uses default Google endpoint when not configured
This commit updates all references to Claude Opus 4 and Sonnet 4 to their newer 4.1 versions throughout the codebase.
The changes include:
- Updating model names in `conf/custom_models.json` and `providers/dial.py`.
- Updating aliases and descriptions to match the new model versions.
- Updating `.env.example` to reflect the new model names.
- Updating all relevant test suites to use the new model names and ensure all tests pass.
- Remove redundant path checks between Path("conf/custom_models.json") and Path.cwd() variants
- Implement proper importlib.resources.files('conf') approach for robust packaging
- Create conf/__init__.py to make conf a proper Python package
- Update pyproject.toml to include conf* in package discovery
- Clean up verbose comments and simplify resource loading logic
- Fix test mocking to use correct importlib.resources.files target
- All tests passing (8/8) with proper resource and fallback functionality
Addresses all gemini-code-assist bot feedback from PR #227
Improvements based on gemini-code-assist bot feedback:
1. **Proper importlib.resources implementation:**
- Use files("providers") / "../conf/custom_models.json" for resource loading
- Prioritize resource loading over file system paths for packaged environments
- Maintain backward compatibility with explicit config paths and env variables
2. **Remove redundant path checks:**
- Eliminated duplicate Path("conf/custom_models.json") and Path.cwd() / "conf/custom_models.json"
- Streamlined fallback logic to development path + working directory only
3. **Enhanced test coverage:**
- Mock-based testing of actual fallback scenarios with Path.exists
- Proper resource loading simulation and failure testing
- Comprehensive coverage of both resource and file system modes
4. **Robust error handling:**
- Graceful fallback from resources to file system when resource loading fails
- Clear logging of which loading method is being used
- Better error messages indicating resource vs file system loading
The implementation now follows Python packaging best practices using importlib.resources
while maintaining full backward compatibility and robust fallback behavior.
Tested: All 8 test cases pass, resource loading works in development,
file system fallback works when resources fail.
Resolves issues #203, #186, #206, #185 where OpenRouter model registry
completely failed to load in uvx installations due to inaccessible
conf/custom_models.json file.
Changes:
- Implement multiple path resolution strategy in OpenRouterModelRegistry
- Development: Path(__file__).parent.parent / "conf" / "custom_models.json"
- UVX working dir: Path("conf/custom_models.json")
- Current working dir: Path.cwd() / "conf" / "custom_models.json"
- Add importlib-resources fallback for Python < 3.9 compatibility
- Add comprehensive test suite for path resolution scenarios
- Ensure graceful handling when config files are missing
The fix restores full OpenRouter functionality (15 models, 62+ aliases)
for users installing via uvx while maintaining backward compatibility
for development and explicit config scenarios.
Tested: All path resolution scenarios pass, OpenRouter models load correctly
- Use o3-pro throughout the codebase instead of o3-pro-2025-06-10
- Update test expectations to match o3-pro model name
- Update cassette to use o3-pro for consistency
- Ensure responses endpoint routing works correctly with o3-pro
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Use new output_text field format for o3-pro responses
- Update test expectations to use resolved model name o3-pro-2025-06-10
- Keep HTTP transport recorder and PII sanitization improvements
- Preserve both bug fix and recent GPT-5 updates
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Improvements to model name resolution
Improved instructions for multi-step workflows when continuation is available
Improved instructions for chat tool
Improved preferred model resolution, moved code from registry -> each provider
Updated tests
- Update docs/vcr-testing.md with new PII sanitization features
- Document transport_helpers.inject_transport() for simpler test setup
- Add sanitize_cassettes.py script documentation
- Update file structure to include all new components
- Fix PEP 8: Move copy import to top of openai_compatible.py
- Enhance security notes about automatic sanitization
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
✨ Key improvements:
• Added public reset_for_testing() method to registry for clean test state management
• Updated test setup/teardown to use new public API instead of private attributes
• Enhanced inject_transport helper to ensure OpenAI provider registration
• Migrated additional test files to use inject_transport pattern
• Reduced code duplication by ~30 lines across test files
🔧 Technical details:
• transport_helpers.py: Always register OpenAI provider for transport tests
• test_o3_pro_output_text_fix.py: Use reset_for_testing() API, remove redundant registration
• test_o3_pro_fixture_bisect.py: Migrate all 4 test methods to inject_transport
• test_o3_pro_simplified.py: Migrate both test methods to inject_transport
• providers/registry.py: Add reset_for_testing() public method
✅ Quality assurance:
• All 7 o3-pro tests pass with new helper pattern
• No regression in test isolation or provider state management
• Improved maintainability through centralized transport injection
• Follows single responsibility principle with focused helper function
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix TOCTOU race condition by removing os.path.exists() check before file open
- Move imports (base64, binascii, os, utils.file_types) to top of file
- Replace broad Exception catch with specific binascii.Error for base64 decoding
- Maintain proper error handling and test compatibility
Consolidates duplicated image validation logic from individual providers
into a reusable base class method. This improves maintainability and
ensures consistent validation across all providers.
- Added validate_image() method to ModelProvider base class
- Supports both file paths and data URLs
- Validates image format, size, and MIME types
- Added DEFAULT_MAX_IMAGE_SIZE_MB class constant (20MB)
- Refactored Gemini and OpenAI providers to use base validation
- Added comprehensive test suite with 19 tests
- Used minimal mocking approach with concrete test provider class
- Fix lint errors: trailing whitespace and deprecated typing imports
- Update test mock for o3-pro response format (output.content[] → output_text)
- Implement robust test isolation with monkeypatch fixture
- Clear provider registry cache to prevent test interference
- Ensure o3-pro tests pass in both individual and full suite execution
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix o3-pro response parsing to use output_text convenience field
- Replace respx with custom httpx transport solution for better reliability
- Implement comprehensive PII sanitization to prevent secret exposure
- Add HTTP request/response recording with cassette format for testing
- Sanitize all existing cassettes to remove exposed API keys
- Update documentation to reflect new HTTP transport recorder
- Add test suite for PII sanitization and HTTP recording
This change:
1. Fixes timeout issues with o3-pro API calls (was 2+ minutes, now ~15-22 seconds)
2. Properly captures response content without httpx.ResponseNotRead exceptions
3. Preserves original HTTP response format including gzip compression
4. Prevents future secret exposure with automatic PII sanitization
5. Enables reliable replay testing for o3-pro interactions
Co-Authored-By: Claude <noreply@anthropic.com>
Moved aliases as part of SUPPORTED_MODELS instead of shorthand, more in line with how custom_models are declared
Further refactoring to cleanup some code
## Description
This PR implements a new [DIAL](https://dialx.ai/dial_api) (Data & AI Layer) provider for the Zen MCP Server, enabling unified access to multiple AI models through the DIAL API platform. DIAL provides enterprise-grade AI model access with deployment-specific routing similar to Azure OpenAI.
## Changes Made
- [x] Added support of atexit:
- Ensures automatic cleanup of provider resources (HTTP clients, connection pools) on server shutdown
- Fixed bug using ModelProviderRegistry.get_available_providers() instead of accessing private _providers
- Works with SIGTERM/Ctrl+C for graceful shutdown in both development and containerized environments
- [x] Added new DIAL provider (`providers/dial.py`) inheriting from `OpenAICompatibleProvider`
- [x] Updated server.py to register DIAL provider during initialization
- [x] Updated provider registry to include DIAL provider type
- [x] Implemented deployment-specific routing for DIAL's Azure OpenAI-style endpoints
- [x] Implemented performance optimizations:
- Connection pooling with httpx for better performance
- Thread-safe client caching with double-check locking pattern
- Proper resource cleanup with `close()` method
- [x] Added comprehensive unit tests with 16 test cases (`tests/test_dial_provider.py`)
- [x] Added DIAL configuration to `.env.example` with documentation
- [x] Added support for configurable API version via `DIAL_API_VERSION` environment variable
- [x] Added DIAL model restrictions support via `DIAL_ALLOWED_MODELS` environment variable
### Supported DIAL Models:
- OpenAI models: o3, o4-mini (and their dated versions)
- Google models: gemini-2.5-pro, gemini-2.5-flash (including search variant)
- Anthropic models: Claude 4 Opus/Sonnet (with and without thinking mode)
### Environment Variables:
- `DIAL_API_KEY`: Required API key for DIAL authentication
- `DIAL_API_HOST`: Optional base URL (defaults to https://core.dialx.ai)
- `DIAL_API_VERSION`: Optional API version header (defaults to 2025-01-01-preview)
- `DIAL_ALLOWED_MODELS`: Optional comma-separated list of allowed models
### Breaking Changes:
- None
### Dependencies:
- No new dependencies added (uses existing OpenAI SDK with custom routing)
* feat: Update Claude model references from v3 to v4
- Update model configurations from claude-3-opus to claude-4-opus
- Update model configurations from claude-3-sonnet to claude-4-sonnet
- Maintain backward compatibility through existing aliases (opus, sonnet, claude)
- Update provider registry preferred models list
- Update all test cases and assertions to reflect new model names
- Update documentation and examples consistently across all files
- Add Claude 4 model support while preserving existing functionality
Files modified: 15 (config, docs, providers, tests, tools)
Pattern: Systematic claude-3-* → claude-4-* model reference migration
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* PR feedback: changed anthropic/claude-4-opus -> anthropic/claude-opus-4 and anthropic/claude-4-haiku -> anthropic/claude-3.5-haiku
* changed anthropic/claude-4-sonnet -> anthropic/claude-sonnet-4
* PR feedback removed specific model from test mock
* PR feedback removed base.py
---------
Co-authored-by: Omry Nachman <omry@wix.com>
Co-authored-by: Claude <noreply@anthropic.com>
Description: This feature adds support for UTF-8 encoding in JSON responses, allowing for proper handling of special characters and emojis.
- Implement unit tests for UTF-8 encoding in various model providers including Gemini, OpenAI, and OpenAI Compatible.
- Validate UTF-8 support in token counting, content generation, and error handling.
- Introduce tests for JSON serialization ensuring proper handling of French characters and emojis.
- Create tests for language instruction generation based on locale settings.
- Validate UTF-8 handling in workflow tools including AnalyzeTool, CodereviewTool, and DebugIssueTool.
- Ensure that all tests check for correct UTF-8 character preservation and proper JSON formatting.
- Add integration tests to verify the interaction between locale settings and model responses.
* WIP: new workflow architecture
* WIP: further improvements and cleanup
* WIP: cleanup and docks, replace old tool with new
* WIP: cleanup and docks, replace old tool with new
* WIP: new planner implementation using workflow
* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only
* WIP: precommit workflow version swapped with old
* WIP: codereview
* WIP: replaced codereview
* WIP: replaced codereview
* WIP: replaced refactor
* WIP: workflow for thinkdeep
* WIP: ensure files get embedded correctly
* WIP: thinkdeep replaced with workflow version
* WIP: improved messaging when an external model's response is received
* WIP: analyze tool swapped
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: fixed get_completion_next_steps_message missing param
* Fixed tests
Request for files consistently
* Fixed tests
Request for files consistently
* Fixed tests
* New testgen workflow tool
Updated docs
* Swap testgen workflow
* Fix CI test failures by excluding API-dependent tests
- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97
Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96
Improved retry error logging
Cleanup
* WIP - chat tool using new architecture and improved code sharing
* Removed todo
* Removed todo
* Cleanup old name
* Tweak wordings
* Tweak wordings
Migrate old tests
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
Fixed test
* Improved consensus to use the workflow base class
* Improved consensus to use the workflow base class
* Allow images
* Allow images
* Replaced old consensus tool
* Cleanup tests
* Tests for prompt size
* New tool: docgen
Tests for prompt size
Fixes: https://github.com/BeehiveInnovations/zen-mcp-server/issues/107
Use available token size limits: https://github.com/BeehiveInnovations/zen-mcp-server/issues/105
* Improved docgen prompt
Exclude TestGen from pytest inclusion
* Updated errors
* Lint
* DocGen instructed not to fix bugs, surface them and stick to d
* WIP
* Stop claude from being lazy and only documenting a small handful
* More style rules
---------
Co-authored-by: Claude <noreply@anthropic.com>
Fix for: https://github.com/BeehiveInnovations/zen-mcp-server/issues/102
- Removed centralized MODEL_CAPABILITIES_DESC from config.py
- Added model descriptions to individual provider SUPPORTED_MODELS
- Updated _get_available_models() to use ModelProviderRegistry for API key filtering
- Added comprehensive test suite validating bug reproduction and fix
* Migration from docker to standalone server
Migration handling
Fixed tests
Use simpler in-memory storage
Support for concurrent logging to disk
Simplified direct connections to localhost
* Migration from docker / redis to standalone script
Updated tests
Updated run script
Fixed requirements
Use dotenv
Ask if user would like to install MCP in Claude Desktop once
Updated docs
* More cleanup and references to docker removed
* Cleanup
* Comments
* Fixed tests
* Fix GitHub Actions workflow for standalone Python architecture
- Install requirements-dev.txt for pytest and testing dependencies
- Remove Docker setup from simulation tests (now standalone)
- Simplify linting job to use requirements-dev.txt
- Update simulation tests to run directly without Docker
Fixes unit test failures in CI due to missing pytest dependency.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Remove simulation tests from GitHub Actions
- Removed simulation-tests job that makes real API calls
- Keep only unit tests (mocked, no API costs) and linting
- Simulation tests should be run manually with real API keys
- Reduces CI costs and complexity
GitHub Actions now only runs:
- Unit tests (569 tests, all mocked)
- Code quality checks (ruff, black)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fixed tests
* Fixed tests
---------
Co-authored-by: Claude <noreply@anthropic.com>