The is_dangerous_path() function only did exact string matching,
allowing attackers to bypass protection by accessing subdirectories:
- /etc was blocked but /etc/passwd was allowed
- C:\Windows was blocked but C:\Windows\System32\... was allowed
This minimal fix changes is_dangerous_path() to use PREFIX MATCHING:
- Now blocks dangerous directories AND all their subdirectories
- Paths like /etcbackup are still allowed (not under /etc)
- No changes to DANGEROUS_PATHS list
Security:
- Fixes CWE-22: Path Traversal vulnerability
- Reported by: Team off-course (K-Shield.Jr 15th)
Fixes#312Fixes#293
feat: Azure OpenAI / Azure AI Foundry support. Models should be defined in conf/azure_models.json (or a custom path). See .env.example for environment variables or see readme. https://github.com/BeehiveInnovations/zen-mcp-server/issues/265
feat: OpenRouter / Custom Models / Azure can separately also use custom config paths now (see .env.example )
refactor: Model registry class made abstract, OpenRouter / Custom Provider / Azure OpenAI now subclass these
refactor: breaking change: `is_custom` property has been removed from model_capabilities.py (and thus custom_models.json) given each models are now read from separate configuration files
fix: restrictions should resolve canonical names for openrouter
fix: tools now correctly return restricted list by presenting model names in schema
fix: tests updated to ensure these manage their expected env vars properly
perf: cache model alias resolution to avoid repeated checks
## Description
This PR implements a new [DIAL](https://dialx.ai/dial_api) (Data & AI Layer) provider for the Zen MCP Server, enabling unified access to multiple AI models through the DIAL API platform. DIAL provides enterprise-grade AI model access with deployment-specific routing similar to Azure OpenAI.
## Changes Made
- [x] Added support of atexit:
- Ensures automatic cleanup of provider resources (HTTP clients, connection pools) on server shutdown
- Fixed bug using ModelProviderRegistry.get_available_providers() instead of accessing private _providers
- Works with SIGTERM/Ctrl+C for graceful shutdown in both development and containerized environments
- [x] Added new DIAL provider (`providers/dial.py`) inheriting from `OpenAICompatibleProvider`
- [x] Updated server.py to register DIAL provider during initialization
- [x] Updated provider registry to include DIAL provider type
- [x] Implemented deployment-specific routing for DIAL's Azure OpenAI-style endpoints
- [x] Implemented performance optimizations:
- Connection pooling with httpx for better performance
- Thread-safe client caching with double-check locking pattern
- Proper resource cleanup with `close()` method
- [x] Added comprehensive unit tests with 16 test cases (`tests/test_dial_provider.py`)
- [x] Added DIAL configuration to `.env.example` with documentation
- [x] Added support for configurable API version via `DIAL_API_VERSION` environment variable
- [x] Added DIAL model restrictions support via `DIAL_ALLOWED_MODELS` environment variable
### Supported DIAL Models:
- OpenAI models: o3, o4-mini (and their dated versions)
- Google models: gemini-2.5-pro, gemini-2.5-flash (including search variant)
- Anthropic models: Claude 4 Opus/Sonnet (with and without thinking mode)
### Environment Variables:
- `DIAL_API_KEY`: Required API key for DIAL authentication
- `DIAL_API_HOST`: Optional base URL (defaults to https://core.dialx.ai)
- `DIAL_API_VERSION`: Optional API version header (defaults to 2025-01-01-preview)
- `DIAL_ALLOWED_MODELS`: Optional comma-separated list of allowed models
### Breaking Changes:
- None
### Dependencies:
- No new dependencies added (uses existing OpenAI SDK with custom routing)
* WIP: new workflow architecture
* WIP: further improvements and cleanup
* WIP: cleanup and docks, replace old tool with new
* WIP: cleanup and docks, replace old tool with new
* WIP: new planner implementation using workflow
* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only
* WIP: precommit workflow version swapped with old
* WIP: codereview
* WIP: replaced codereview
* WIP: replaced codereview
* WIP: replaced refactor
* WIP: workflow for thinkdeep
* WIP: ensure files get embedded correctly
* WIP: thinkdeep replaced with workflow version
* WIP: improved messaging when an external model's response is received
* WIP: analyze tool swapped
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: fixed get_completion_next_steps_message missing param
* Fixed tests
Request for files consistently
* Fixed tests
Request for files consistently
* Fixed tests
* New testgen workflow tool
Updated docs
* Swap testgen workflow
* Fix CI test failures by excluding API-dependent tests
- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97
Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96
Improved retry error logging
Cleanup
* WIP - chat tool using new architecture and improved code sharing
* Removed todo
* Removed todo
* Cleanup old name
* Tweak wordings
* Tweak wordings
Migrate old tests
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
Fixed test
* Improved consensus to use the workflow base class
* Improved consensus to use the workflow base class
* Allow images
* Allow images
* Replaced old consensus tool
* Cleanup tests
* Tests for prompt size
* New tool: docgen
Tests for prompt size
Fixes: https://github.com/BeehiveInnovations/zen-mcp-server/issues/107
Use available token size limits: https://github.com/BeehiveInnovations/zen-mcp-server/issues/105
* Improved docgen prompt
Exclude TestGen from pytest inclusion
* Updated errors
* Lint
* DocGen instructed not to fix bugs, surface them and stick to d
* WIP
* Stop claude from being lazy and only documenting a small handful
* More style rules
---------
Co-authored-by: Claude <noreply@anthropic.com>
* WIP: new workflow architecture
* WIP: further improvements and cleanup
* WIP: cleanup and docks, replace old tool with new
* WIP: cleanup and docks, replace old tool with new
* WIP: new planner implementation using workflow
* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only
* WIP: precommit workflow version swapped with old
* WIP: codereview
* WIP: replaced codereview
* WIP: replaced codereview
* WIP: replaced refactor
* WIP: workflow for thinkdeep
* WIP: ensure files get embedded correctly
* WIP: thinkdeep replaced with workflow version
* WIP: improved messaging when an external model's response is received
* WIP: analyze tool swapped
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: fixed get_completion_next_steps_message missing param
* Fixed tests
Request for files consistently
* Fixed tests
Request for files consistently
* Fixed tests
* New testgen workflow tool
Updated docs
* Swap testgen workflow
* Fix CI test failures by excluding API-dependent tests
- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Migration from docker to standalone server
Migration handling
Fixed tests
Use simpler in-memory storage
Support for concurrent logging to disk
Simplified direct connections to localhost
* Migration from docker / redis to standalone script
Updated tests
Updated run script
Fixed requirements
Use dotenv
Ask if user would like to install MCP in Claude Desktop once
Updated docs
* More cleanup and references to docker removed
* Cleanup
* Comments
* Fixed tests
* Fix GitHub Actions workflow for standalone Python architecture
- Install requirements-dev.txt for pytest and testing dependencies
- Remove Docker setup from simulation tests (now standalone)
- Simplify linting job to use requirements-dev.txt
- Update simulation tests to run directly without Docker
Fixes unit test failures in CI due to missing pytest dependency.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Remove simulation tests from GitHub Actions
- Removed simulation-tests job that makes real API calls
- Keep only unit tests (mocked, no API costs) and linting
- Simulation tests should be run manually with real API keys
- Reduces CI costs and complexity
GitHub Actions now only runs:
- Unit tests (569 tests, all mocked)
- Code quality checks (ruff, black)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fixed tests
* Fixed tests
---------
Co-authored-by: Claude <noreply@anthropic.com>
* WIP
Refactor resolving mode_names, should be done once at MCP call boundary
Pass around model context instead
Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses.
* Deduplication of model resolution, model_context should be available before reaching deeper parts of the code
Improved abstraction when building conversations
Throw programmer errors early
* Guardrails
Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus
Model name now supports an optional ":option" for future use
* Simplified async flow
* Improved model for request to support natural language
Simplified async flow
* Improved model for request to support natural language
Simplified async flow
* Fix consensus tool async/sync patterns to match codebase standards
CRITICAL FIXES:
- Converted _get_consensus_responses from async to sync (matches other tools)
- Converted store_conversation_turn from async to sync (add_turn is synchronous)
- Removed unnecessary asyncio imports and sleep calls
- Fixed ClosedResourceError in MCP protocol during long consensus operations
PATTERN ALIGNMENT:
- Consensus tool now follows same sync patterns as all other tools
- Only execute() and prepare_prompt() are async (base class requirement)
- All internal operations are synchronous like analyze, chat, debug, etc.
TESTING:
- MCP simulation test now passes: consensus_stance ✅
- Two-model consensus works correctly in ~35 seconds
- Unknown stance handling defaults to neutral with warnings
- All 9 unit tests pass (100% success rate)
The consensus tool async patterns were anomalous in the codebase.
This fix aligns it with the established synchronous patterns used
by all other tools while maintaining full functionality.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Fixed call order and added new test
* Cleanup dead comments
Docs for the new tool
Improved tests
---------
Co-authored-by: Claude <noreply@anthropic.com>
- Kept version 4.8.0 for new features
- Preserved our _is_builtin_custom_models_config approach over main's ALLOWED_INTERNAL_PATHS
- Our targeted solution is cleaner than the general whitelist approach
Fixed MagicMock comparison errors across multiple test suites by:
- Adding proper ModelCapabilities mocks with real values instead of MagicMock objects
- Updating test_auto_mode.py with correct provider mocking for model availability tests
- Updating test_thinking_modes.py with proper capabilities mocking in all thinking mode tests
- Updating test_tools.py with proper capabilities mocking for CodeReview and Analyze tools
- Fixing test_large_prompt_handling.py by adding proper provider mocking to prevent errors before large prompt detection
Fixed pytest collection warnings by:
- Renaming TestGenRequest to TestGenerationRequest to avoid pytest collecting it as a test class
- Renaming TestGenTool to TestGenerationTool to avoid pytest collecting it as a test class
- Updated all imports and references across server.py, tools/__init__.py, and test files
All 459 tests now pass without warnings or MagicMock comparison errors.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Exit early at MCP boundary if files won't fit within given context of chosen model
- Encourage claude to re-run with better context
- Check file sizes before embedding
- Drop files from older conversations when building continuations and give priority to newer files
- List and mention excluded files to Claude on return
- Improved tests
- Improved precommit prompt
- Added a new Low severity to precommit
- Improved documentation of file embedding strategy
- Refactor
Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today!
Line numbers added to embedded code for better references from model -> claude
OPENROUTER_ALLOWED_MODELS environment variable support to further limit the models to allow from within Claude. This will put a limit on top of even the ones listed in custom_models.json
Simulation tests to confirm threading and history traversal
Chain of communication and branching validation tests from live simulation
Temperature enforcement per model