Renamed setup script to avoid confusion (https://github.com/BeehiveInnovations/zen-mcp-server/issues/35)
Further fixes to tests Pass O3 simulation test when keys are not set, along with a notice Updated docs on testing, simulation tests / contributing Support for OpenAI o4-mini and o4-mini-high
This commit is contained in:
37
.github/pull_request_template.md
vendored
37
.github/pull_request_template.md
vendored
@@ -36,10 +36,32 @@ Please provide a clear and concise description of what this PR does.
|
|||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
- [ ] Unit tests pass
|
**Please review our [Testing Guide](../docs/testing.md) before submitting.**
|
||||||
- [ ] Integration tests pass (if applicable)
|
|
||||||
- [ ] Manual testing completed
|
### Run all linting and tests (required):
|
||||||
- [ ] Documentation updated (if needed)
|
```bash
|
||||||
|
# Activate virtual environment first
|
||||||
|
source venv/bin/activate
|
||||||
|
|
||||||
|
# Run all linting checks
|
||||||
|
ruff check .
|
||||||
|
black --check .
|
||||||
|
isort --check-only .
|
||||||
|
|
||||||
|
# Run all unit tests
|
||||||
|
python -m pytest -xvs
|
||||||
|
|
||||||
|
# If you made tool changes, also run simulator tests
|
||||||
|
python communication_simulator_test.py
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] All linting passes (ruff, black, isort)
|
||||||
|
- [ ] All unit tests pass
|
||||||
|
- [ ] **For new features**: Unit tests added in `tests/`
|
||||||
|
- [ ] **For tool changes**: Simulator tests added in `simulator_tests/`
|
||||||
|
- [ ] **For bug fixes**: Tests added to prevent regression
|
||||||
|
- [ ] Simulator tests pass (if applicable)
|
||||||
|
- [ ] Manual testing completed with realistic scenarios
|
||||||
|
|
||||||
## Related Issues
|
## Related Issues
|
||||||
|
|
||||||
@@ -48,11 +70,12 @@ Fixes #(issue number)
|
|||||||
## Checklist
|
## Checklist
|
||||||
|
|
||||||
- [ ] PR title follows the format guidelines above
|
- [ ] PR title follows the format guidelines above
|
||||||
- [ ] Code follows the project's style guidelines
|
- [ ] Activated venv and ran all linting: `source venv/bin/activate && ruff check . && black --check . && isort --check-only .`
|
||||||
- [ ] Self-review completed
|
- [ ] Self-review completed
|
||||||
- [ ] Tests added/updated as needed
|
- [ ] **Tests added for ALL changes** (see Testing section above)
|
||||||
- [ ] Documentation updated as needed
|
- [ ] Documentation updated as needed
|
||||||
- [ ] All tests passing
|
- [ ] All unit tests passing: `python -m pytest -xvs`
|
||||||
|
- [ ] Relevant simulator tests passing (if tool changes)
|
||||||
- [ ] Ready for review
|
- [ ] Ready for review
|
||||||
|
|
||||||
## Additional Notes
|
## Additional Notes
|
||||||
|
|||||||
17
README.md
17
README.md
@@ -124,7 +124,7 @@ git clone https://github.com/BeehiveInnovations/zen-mcp-server.git
|
|||||||
cd zen-mcp-server
|
cd zen-mcp-server
|
||||||
|
|
||||||
# One-command setup (includes Redis for AI conversations)
|
# One-command setup (includes Redis for AI conversations)
|
||||||
./setup-docker.sh
|
./run-server.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
**What this does:**
|
**What this does:**
|
||||||
@@ -153,6 +153,9 @@ nano .env
|
|||||||
# WORKSPACE_ROOT=/Users/your-username (automatically configured)
|
# WORKSPACE_ROOT=/Users/your-username (automatically configured)
|
||||||
|
|
||||||
# Note: At least one API key OR custom URL is required
|
# Note: At least one API key OR custom URL is required
|
||||||
|
|
||||||
|
# After making changes to .env, restart the server:
|
||||||
|
# ./run-server.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Configure Claude
|
### 4. Configure Claude
|
||||||
@@ -184,7 +187,7 @@ This will open a folder revealing `claude_desktop_config.json`.
|
|||||||
|
|
||||||
2. ** Update Docker Configuration**
|
2. ** Update Docker Configuration**
|
||||||
|
|
||||||
The setup script shows you the exact configuration. It looks like this. When you ran `setup-docker.sh` it should
|
The setup script shows you the exact configuration. It looks like this. When you ran `run-server.sh` it should
|
||||||
have produced a configuration for you to copy:
|
have produced a configuration for you to copy:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
@@ -500,18 +503,24 @@ DEFAULT_MODEL=auto # Claude picks the best model automatically
|
|||||||
|
|
||||||
# API Keys (at least one required)
|
# API Keys (at least one required)
|
||||||
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
GEMINI_API_KEY=your-gemini-key # Enables Gemini Pro & Flash
|
||||||
OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini
|
OPENAI_API_KEY=your-openai-key # Enables O3, O3mini, O4-mini, O4-mini-high
|
||||||
```
|
```
|
||||||
|
|
||||||
**Available Models:**
|
**Available Models:**
|
||||||
- **`pro`** (Gemini 2.5 Pro): Extended thinking, deep analysis
|
- **`pro`** (Gemini 2.5 Pro): Extended thinking, deep analysis
|
||||||
- **`flash`** (Gemini 2.0 Flash): Ultra-fast responses
|
- **`flash`** (Gemini 2.0 Flash): Ultra-fast responses
|
||||||
- **`o3`**: Strong logical reasoning
|
- **`o3`**: Strong logical reasoning
|
||||||
- **`o3-mini`**: Balanced speed/quality
|
- **`o3mini`**: Balanced speed/quality
|
||||||
|
- **`o4-mini`**: Latest reasoning model, optimized for shorter contexts
|
||||||
|
- **`o4-mini-high`**: Enhanced O4 with higher reasoning effort
|
||||||
- **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.)
|
- **Custom models**: via OpenRouter or local APIs (Ollama, vLLM, etc.)
|
||||||
|
|
||||||
For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).
|
For detailed configuration options, see the [Advanced Usage Guide](docs/advanced-usage.md).
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
For information on running tests and contributing, see the [Testing Guide](docs/testing.md).
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
Apache 2.0 License - see LICENSE file for details.
|
Apache 2.0 License - see LICENSE file for details.
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ Usage:
|
|||||||
--tests: Run specific tests only (space-separated)
|
--tests: Run specific tests only (space-separated)
|
||||||
--list-tests: List all available tests
|
--list-tests: List all available tests
|
||||||
--individual: Run a single test individually
|
--individual: Run a single test individually
|
||||||
--rebuild: Force rebuild Docker environment using setup-docker.sh
|
--rebuild: Force rebuild Docker environment using run-server.sh
|
||||||
|
|
||||||
Available tests:
|
Available tests:
|
||||||
basic_conversation - Basic conversation flow with chat tool
|
basic_conversation - Basic conversation flow with chat tool
|
||||||
@@ -115,9 +115,9 @@ class CommunicationSimulator:
|
|||||||
self.temp_dir = tempfile.mkdtemp(prefix="mcp_test_")
|
self.temp_dir = tempfile.mkdtemp(prefix="mcp_test_")
|
||||||
self.logger.debug(f"Created temp directory: {self.temp_dir}")
|
self.logger.debug(f"Created temp directory: {self.temp_dir}")
|
||||||
|
|
||||||
# Only run setup-docker.sh if rebuild is requested
|
# Only run run-server.sh if rebuild is requested
|
||||||
if self.rebuild:
|
if self.rebuild:
|
||||||
if not self._run_setup_docker():
|
if not self._run_server_script():
|
||||||
return False
|
return False
|
||||||
|
|
||||||
# Always verify containers are running (regardless of rebuild)
|
# Always verify containers are running (regardless of rebuild)
|
||||||
@@ -127,34 +127,34 @@ class CommunicationSimulator:
|
|||||||
self.logger.error(f"Failed to setup test environment: {e}")
|
self.logger.error(f"Failed to setup test environment: {e}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def _run_setup_docker(self) -> bool:
|
def _run_server_script(self) -> bool:
|
||||||
"""Run the setup-docker.sh script"""
|
"""Run the run-server.sh script"""
|
||||||
try:
|
try:
|
||||||
self.logger.info("Running setup-docker.sh...")
|
self.logger.info("Running run-server.sh...")
|
||||||
|
|
||||||
# Check if setup-docker.sh exists
|
# Check if run-server.sh exists
|
||||||
setup_script = "./setup-docker.sh"
|
setup_script = "./run-server.sh"
|
||||||
if not os.path.exists(setup_script):
|
if not os.path.exists(setup_script):
|
||||||
self.logger.error(f"setup-docker.sh not found at {setup_script}")
|
self.logger.error(f"run-server.sh not found at {setup_script}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
# Make sure it's executable
|
# Make sure it's executable
|
||||||
result = self._run_command(["chmod", "+x", setup_script], capture_output=True)
|
result = self._run_command(["chmod", "+x", setup_script], capture_output=True)
|
||||||
if result.returncode != 0:
|
if result.returncode != 0:
|
||||||
self.logger.error(f"Failed to make setup-docker.sh executable: {result.stderr}")
|
self.logger.error(f"Failed to make run-server.sh executable: {result.stderr}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
# Run the setup script
|
# Run the setup script
|
||||||
result = self._run_command([setup_script], capture_output=True)
|
result = self._run_command([setup_script], capture_output=True)
|
||||||
if result.returncode != 0:
|
if result.returncode != 0:
|
||||||
self.logger.error(f"setup-docker.sh failed: {result.stderr}")
|
self.logger.error(f"run-server.sh failed: {result.stderr}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
self.logger.info("setup-docker.sh completed successfully")
|
self.logger.info("run-server.sh completed successfully")
|
||||||
return True
|
return True
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
self.logger.error(f"Failed to run setup-docker.sh: {e}")
|
self.logger.error(f"Failed to run run-server.sh: {e}")
|
||||||
return False
|
return False
|
||||||
|
|
||||||
def _verify_existing_containers(self) -> bool:
|
def _verify_existing_containers(self) -> bool:
|
||||||
@@ -345,9 +345,9 @@ class CommunicationSimulator:
|
|||||||
try:
|
try:
|
||||||
self.logger.info("Cleaning up test environment...")
|
self.logger.info("Cleaning up test environment...")
|
||||||
|
|
||||||
# Note: We don't stop Docker services ourselves - let setup-docker.sh handle Docker lifecycle
|
# Note: We don't stop Docker services ourselves - let run-server.sh handle Docker lifecycle
|
||||||
if not self.keep_logs:
|
if not self.keep_logs:
|
||||||
self.logger.info("Test completed. Docker containers left running (use setup-docker.sh to manage)")
|
self.logger.info("Test completed. Docker containers left running (use run-server.sh to manage)")
|
||||||
else:
|
else:
|
||||||
self.logger.info("Keeping logs and Docker services running for inspection")
|
self.logger.info("Keeping logs and Docker services running for inspection")
|
||||||
|
|
||||||
@@ -375,7 +375,7 @@ def parse_arguments():
|
|||||||
parser.add_argument("--tests", "-t", nargs="+", help="Specific tests to run (space-separated)")
|
parser.add_argument("--tests", "-t", nargs="+", help="Specific tests to run (space-separated)")
|
||||||
parser.add_argument("--list-tests", action="store_true", help="List available tests and exit")
|
parser.add_argument("--list-tests", action="store_true", help="List available tests and exit")
|
||||||
parser.add_argument("--individual", "-i", help="Run a single test individually")
|
parser.add_argument("--individual", "-i", help="Run a single test individually")
|
||||||
parser.add_argument("--rebuild", action="store_true", help="Force rebuild Docker environment using setup-docker.sh")
|
parser.add_argument("--rebuild", action="store_true", help="Force rebuild Docker environment using run-server.sh")
|
||||||
|
|
||||||
return parser.parse_args()
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|||||||
@@ -130,15 +130,42 @@
|
|||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3 model - well-rounded and powerful across domains"
|
"description": "OpenAI's o3 model - well-rounded and powerful across domains"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"model_name": "openai/o3-mini",
|
||||||
|
"aliases": ["o3-mini", "o3mini"],
|
||||||
|
"context_window": 200000,
|
||||||
|
"supports_extended_thinking": false,
|
||||||
|
"supports_json_mode": true,
|
||||||
|
"supports_function_calling": true,
|
||||||
|
"description": "OpenAI's o3-mini model - balanced performance and speed"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"model_name": "openai/o3-mini-high",
|
"model_name": "openai/o3-mini-high",
|
||||||
"aliases": ["o3-mini", "o3mini", "o3-mini-high", "o3mini-high"],
|
"aliases": ["o3-mini-high", "o3mini-high"],
|
||||||
"context_window": 200000,
|
"context_window": 200000,
|
||||||
"supports_extended_thinking": false,
|
"supports_extended_thinking": false,
|
||||||
"supports_json_mode": true,
|
"supports_json_mode": true,
|
||||||
"supports_function_calling": true,
|
"supports_function_calling": true,
|
||||||
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems"
|
"description": "OpenAI's o3-mini with high reasoning effort - optimized for complex problems"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"model_name": "openai/o4-mini",
|
||||||
|
"aliases": ["o4-mini", "o4mini"],
|
||||||
|
"context_window": 200000,
|
||||||
|
"supports_extended_thinking": false,
|
||||||
|
"supports_json_mode": true,
|
||||||
|
"supports_function_calling": true,
|
||||||
|
"description": "OpenAI's o4-mini model - optimized for shorter contexts with rapid reasoning"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"model_name": "openai/o4-mini-high",
|
||||||
|
"aliases": ["o4-mini-high", "o4mini-high", "o4minihigh", "o4minihi"],
|
||||||
|
"context_window": 200000,
|
||||||
|
"supports_extended_thinking": false,
|
||||||
|
"supports_json_mode": true,
|
||||||
|
"supports_function_calling": true,
|
||||||
|
"description": "OpenAI's o4-mini with high reasoning effort - enhanced for complex tasks"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"model_name": "llama3.2",
|
"model_name": "llama3.2",
|
||||||
"aliases": ["local-llama", "local", "llama3.2", "ollama-llama"],
|
"aliases": ["local-llama", "local", "llama3.2", "ollama-llama"],
|
||||||
|
|||||||
35
config.py
35
config.py
@@ -14,7 +14,7 @@ import os
|
|||||||
# These values are used in server responses and for tracking releases
|
# These values are used in server responses and for tracking releases
|
||||||
# IMPORTANT: This is the single source of truth for version and author info
|
# IMPORTANT: This is the single source of truth for version and author info
|
||||||
# Semantic versioning: MAJOR.MINOR.PATCH
|
# Semantic versioning: MAJOR.MINOR.PATCH
|
||||||
__version__ = "4.3.0"
|
__version__ = "4.3.1"
|
||||||
# Last update date in ISO format
|
# Last update date in ISO format
|
||||||
__updated__ = "2025-06-14"
|
__updated__ = "2025-06-14"
|
||||||
# Primary maintainer
|
# Primary maintainer
|
||||||
@@ -32,23 +32,44 @@ IS_AUTO_MODE = DEFAULT_MODEL.lower() == "auto"
|
|||||||
|
|
||||||
# Model capabilities descriptions for auto mode
|
# Model capabilities descriptions for auto mode
|
||||||
# These help Claude choose the best model for each task
|
# These help Claude choose the best model for each task
|
||||||
|
#
|
||||||
|
# IMPORTANT: These are the built-in natively supported models:
|
||||||
|
# - When GEMINI_API_KEY is set: Enables "flash", "pro" (and their full names)
|
||||||
|
# - When OPENAI_API_KEY is set: Enables "o3", "o3mini", "o4-mini", "o4-mini-high"
|
||||||
|
# - When both are set: All models below are available
|
||||||
|
# - When neither is set but OpenRouter/Custom API is configured: These model
|
||||||
|
# aliases will automatically map to equivalent models via the proxy provider
|
||||||
|
#
|
||||||
|
# In auto mode (DEFAULT_MODEL=auto), Claude will see these descriptions and
|
||||||
|
# intelligently select the best model for each task. The descriptions appear
|
||||||
|
# in the tool schema to guide Claude's selection based on task requirements.
|
||||||
MODEL_CAPABILITIES_DESC = {
|
MODEL_CAPABILITIES_DESC = {
|
||||||
|
# Gemini models - Available when GEMINI_API_KEY is configured
|
||||||
"flash": "Ultra-fast (1M context) - Quick analysis, simple queries, rapid iterations",
|
"flash": "Ultra-fast (1M context) - Quick analysis, simple queries, rapid iterations",
|
||||||
"pro": "Deep reasoning + thinking mode (1M context) - Complex problems, architecture, deep analysis",
|
"pro": "Deep reasoning + thinking mode (1M context) - Complex problems, architecture, deep analysis",
|
||||||
|
# OpenAI models - Available when OPENAI_API_KEY is configured
|
||||||
"o3": "Strong reasoning (200K context) - Logical problems, code generation, systematic analysis",
|
"o3": "Strong reasoning (200K context) - Logical problems, code generation, systematic analysis",
|
||||||
"o3-mini": "Fast O3 variant (200K context) - Balanced performance/speed, moderate complexity",
|
"o3-mini": "Fast O3 variant (200K context) - Balanced performance/speed, moderate complexity",
|
||||||
# Full model names also supported
|
"o4-mini": "Latest reasoning model (200K context) - Optimized for shorter contexts, rapid reasoning",
|
||||||
|
"o4-mini-high": "Enhanced O4 mini (200K context) - Higher reasoning effort for complex tasks",
|
||||||
|
# Full model names also supported (for explicit specification)
|
||||||
"gemini-2.5-flash-preview-05-20": "Ultra-fast (1M context) - Quick analysis, simple queries, rapid iterations",
|
"gemini-2.5-flash-preview-05-20": "Ultra-fast (1M context) - Quick analysis, simple queries, rapid iterations",
|
||||||
"gemini-2.5-pro-preview-06-05": (
|
"gemini-2.5-pro-preview-06-05": (
|
||||||
"Deep reasoning + thinking mode (1M context) - Complex problems, architecture, deep analysis"
|
"Deep reasoning + thinking mode (1M context) - Complex problems, architecture, deep analysis"
|
||||||
),
|
),
|
||||||
}
|
}
|
||||||
|
|
||||||
# Note: When only OpenRouter is configured, these model aliases automatically map to equivalent models:
|
# OpenRouter/Custom API Fallback Behavior:
|
||||||
# - "flash" → "google/gemini-2.5-flash-preview-05-20"
|
# When only OpenRouter or Custom API is configured (no native API keys), these
|
||||||
# - "pro" → "google/gemini-2.5-pro-preview-06-05"
|
# model aliases automatically map to equivalent models through the proxy:
|
||||||
# - "o3" → "openai/gpt-4o"
|
# - "flash" → "google/gemini-2.5-flash-preview-05-20" (via OpenRouter)
|
||||||
# - "o3-mini" → "openai/gpt-4o-mini"
|
# - "pro" → "google/gemini-2.5-pro-preview-06-05" (via OpenRouter)
|
||||||
|
# - "o3" → "openai/o3" (via OpenRouter)
|
||||||
|
# - "o3mini" → "openai/o3-mini" (via OpenRouter)
|
||||||
|
# - "o4-mini" → "openai/o4-mini" (via OpenRouter)
|
||||||
|
# - "o4-mini-high" → "openai/o4-mini-high" (via OpenRouter)
|
||||||
|
#
|
||||||
|
# This ensures the same model names work regardless of which provider is configured.
|
||||||
|
|
||||||
|
|
||||||
# Temperature defaults for different tool types
|
# Temperature defaults for different tool types
|
||||||
|
|||||||
@@ -55,6 +55,8 @@ DEFAULT_MODEL=flash # Always use Flash
|
|||||||
DEFAULT_MODEL=o3 # Always use O3
|
DEFAULT_MODEL=o3 # Always use O3
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Important:** After changing any configuration in `.env` (including `DEFAULT_MODEL`, API keys, or other settings), restart the server with `./run-server.sh` to apply the changes.
|
||||||
|
|
||||||
**Per-Request Model Override:**
|
**Per-Request Model Override:**
|
||||||
Regardless of your default setting, you can specify models per request:
|
Regardless of your default setting, you can specify models per request:
|
||||||
- "Use **pro** for deep security analysis of auth.py"
|
- "Use **pro** for deep security analysis of auth.py"
|
||||||
|
|||||||
126
docs/testing.md
Normal file
126
docs/testing.md
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
# Testing Guide
|
||||||
|
|
||||||
|
This project includes comprehensive test coverage through unit tests and integration simulator tests.
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Python virtual environment activated: `source venv/bin/activate`
|
||||||
|
- All dependencies installed: `pip install -r requirements.txt`
|
||||||
|
- Docker containers running (for simulator tests): `./run-server.sh`
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
|
||||||
|
Run all unit tests with pytest:
|
||||||
|
```bash
|
||||||
|
# Run all tests with verbose output
|
||||||
|
python -m pytest -xvs
|
||||||
|
|
||||||
|
# Run specific test file
|
||||||
|
python -m pytest tests/test_providers.py -xvs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Simulator Tests
|
||||||
|
|
||||||
|
Simulator tests replicate real-world Claude CLI interactions with the MCP server running in Docker. Unlike unit tests that test isolated functions, simulator tests validate the complete end-to-end flow including:
|
||||||
|
- Actual MCP protocol communication
|
||||||
|
- Docker container interactions
|
||||||
|
- Multi-turn conversations across tools
|
||||||
|
- Log output validation
|
||||||
|
|
||||||
|
**Important**: Simulator tests require `LOG_LEVEL=DEBUG` in your `.env` file to validate detailed execution logs.
|
||||||
|
|
||||||
|
#### Running All Simulator Tests
|
||||||
|
```bash
|
||||||
|
# Run all simulator tests
|
||||||
|
python communication_simulator_test.py
|
||||||
|
|
||||||
|
# Run with verbose output for debugging
|
||||||
|
python communication_simulator_test.py --verbose
|
||||||
|
|
||||||
|
# Keep Docker logs after tests for inspection
|
||||||
|
python communication_simulator_test.py --keep-logs
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Running Individual Tests
|
||||||
|
To run a single simulator test in isolation (useful for debugging or test development):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run a specific test by name
|
||||||
|
python communication_simulator_test.py --individual basic_conversation
|
||||||
|
|
||||||
|
# Examples of available tests:
|
||||||
|
python communication_simulator_test.py --individual content_validation
|
||||||
|
python communication_simulator_test.py --individual cross_tool_continuation
|
||||||
|
python communication_simulator_test.py --individual redis_validation
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Other Options
|
||||||
|
```bash
|
||||||
|
# List all available simulator tests with descriptions
|
||||||
|
python communication_simulator_test.py --list-tests
|
||||||
|
|
||||||
|
# Run multiple specific tests (not all)
|
||||||
|
python communication_simulator_test.py --tests basic_conversation content_validation
|
||||||
|
|
||||||
|
# Force Docker environment rebuild before running tests
|
||||||
|
python communication_simulator_test.py --rebuild
|
||||||
|
```
|
||||||
|
|
||||||
|
### Code Quality Checks
|
||||||
|
|
||||||
|
Before committing, ensure all linting passes:
|
||||||
|
```bash
|
||||||
|
# Run all linting checks
|
||||||
|
ruff check .
|
||||||
|
black --check .
|
||||||
|
isort --check-only .
|
||||||
|
|
||||||
|
# Auto-fix issues
|
||||||
|
ruff check . --fix
|
||||||
|
black .
|
||||||
|
isort .
|
||||||
|
```
|
||||||
|
|
||||||
|
## What Each Test Suite Covers
|
||||||
|
|
||||||
|
### Unit Tests (256 tests)
|
||||||
|
Test isolated components and functions:
|
||||||
|
- **Provider functionality**: Model initialization, API interactions, capability checks
|
||||||
|
- **Tool operations**: All MCP tools (chat, analyze, debug, etc.)
|
||||||
|
- **Conversation memory**: Threading, continuation, history management
|
||||||
|
- **File handling**: Path validation, token limits, deduplication
|
||||||
|
- **Auto mode**: Model selection logic and fallback behavior
|
||||||
|
|
||||||
|
### Simulator Tests (14 tests)
|
||||||
|
Validate real-world usage scenarios by simulating actual Claude prompts:
|
||||||
|
- **Basic conversations**: Multi-turn chat functionality with real prompts
|
||||||
|
- **Cross-tool continuation**: Context preservation across different tools
|
||||||
|
- **File deduplication**: Efficient handling of repeated file references
|
||||||
|
- **Model selection**: Proper routing to configured providers
|
||||||
|
- **Token allocation**: Context window management in practice
|
||||||
|
- **Redis validation**: Conversation persistence and retrieval
|
||||||
|
|
||||||
|
## Contributing: Test Requirements
|
||||||
|
|
||||||
|
When contributing to this project:
|
||||||
|
|
||||||
|
1. **New features MUST include tests**:
|
||||||
|
- Add unit tests in `tests/` for new functions or classes
|
||||||
|
- Test both success and error cases
|
||||||
|
|
||||||
|
2. **Tool changes require simulator tests**:
|
||||||
|
- Add simulator tests in `simulator_tests/` for new or modified tools
|
||||||
|
- Use realistic prompts that demonstrate the feature
|
||||||
|
- Validate output through Docker logs
|
||||||
|
|
||||||
|
3. **Test naming conventions**:
|
||||||
|
- Unit tests: `test_<feature>_<scenario>.py`
|
||||||
|
- Simulator tests: `test_<tool>_<behavior>.py`
|
||||||
|
|
||||||
|
4. **Before submitting PR**:
|
||||||
|
- Run all unit tests: `python -m pytest -xvs`
|
||||||
|
- Run relevant simulator tests
|
||||||
|
- Ensure all linting passes
|
||||||
|
|
||||||
|
Remember: Tests are documentation. They show how features are intended to be used and help prevent regressions.
|
||||||
@@ -43,7 +43,7 @@ cat .env
|
|||||||
If you need to update your API keys, edit the `.env` file and then run:
|
If you need to update your API keys, edit the `.env` file and then run:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./setup-docker.sh
|
./run-server.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
This will validate your configuration and restart the services.
|
This will validate your configuration and restart the services.
|
||||||
@@ -73,7 +73,7 @@ See [Logging Documentation](logging.md) for more details on accessing logs.
|
|||||||
|
|
||||||
**"API key environment variable is required"**
|
**"API key environment variable is required"**
|
||||||
- Add your API key to the `.env` file
|
- Add your API key to the `.env` file
|
||||||
- Run: `./setup-docker.sh` to validate and restart
|
- Run: `./run-server.sh` to validate and restart
|
||||||
|
|
||||||
**File path errors**
|
**File path errors**
|
||||||
- Always use absolute paths: `/Users/you/project/file.py`
|
- Always use absolute paths: `/Users/you/project/file.py`
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
{
|
{
|
||||||
"comment": "macOS configuration using Docker",
|
"comment": "macOS configuration using Docker",
|
||||||
"comment2": "Ensure Docker is running and containers are started",
|
"comment2": "Ensure Docker is running and containers are started",
|
||||||
"comment3": "Run './setup-docker.sh' first to set up the environment",
|
"comment3": "Run './run-server.sh' first to set up the environment",
|
||||||
"mcpServers": {
|
"mcpServers": {
|
||||||
"zen": {
|
"zen": {
|
||||||
"command": "docker",
|
"command": "docker",
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
{
|
{
|
||||||
"comment": "Windows configuration using WSL with Docker",
|
"comment": "Windows configuration using WSL with Docker",
|
||||||
"comment2": "Ensure Docker Desktop is running and WSL integration is enabled",
|
"comment2": "Ensure Docker Desktop is running and WSL integration is enabled",
|
||||||
"comment3": "Run './setup-docker.sh' in WSL first to set up the environment",
|
"comment3": "Run './run-server.sh' in WSL first to set up the environment",
|
||||||
"mcpServers": {
|
"mcpServers": {
|
||||||
"zen": {
|
"zen": {
|
||||||
"command": "wsl.exe",
|
"command": "wsl.exe",
|
||||||
|
|||||||
@@ -22,6 +22,19 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
"context_window": 200_000, # 200K tokens
|
"context_window": 200_000, # 200K tokens
|
||||||
"supports_extended_thinking": False,
|
"supports_extended_thinking": False,
|
||||||
},
|
},
|
||||||
|
"o4-mini": {
|
||||||
|
"context_window": 200_000, # 200K tokens
|
||||||
|
"supports_extended_thinking": False,
|
||||||
|
},
|
||||||
|
"o4-mini-high": {
|
||||||
|
"context_window": 200_000, # 200K tokens
|
||||||
|
"supports_extended_thinking": False,
|
||||||
|
},
|
||||||
|
# Shorthands
|
||||||
|
"o3mini": "o3-mini",
|
||||||
|
"o4mini": "o4-mini",
|
||||||
|
"o4minihigh": "o4-mini-high",
|
||||||
|
"o4minihi": "o4-mini-high",
|
||||||
}
|
}
|
||||||
|
|
||||||
def __init__(self, api_key: str, **kwargs):
|
def __init__(self, api_key: str, **kwargs):
|
||||||
@@ -32,14 +45,17 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
|
|
||||||
def get_capabilities(self, model_name: str) -> ModelCapabilities:
|
def get_capabilities(self, model_name: str) -> ModelCapabilities:
|
||||||
"""Get capabilities for a specific OpenAI model."""
|
"""Get capabilities for a specific OpenAI model."""
|
||||||
if model_name not in self.SUPPORTED_MODELS:
|
# Resolve shorthand
|
||||||
|
resolved_name = self._resolve_model_name(model_name)
|
||||||
|
|
||||||
|
if resolved_name not in self.SUPPORTED_MODELS or isinstance(self.SUPPORTED_MODELS[resolved_name], str):
|
||||||
raise ValueError(f"Unsupported OpenAI model: {model_name}")
|
raise ValueError(f"Unsupported OpenAI model: {model_name}")
|
||||||
|
|
||||||
config = self.SUPPORTED_MODELS[model_name]
|
config = self.SUPPORTED_MODELS[resolved_name]
|
||||||
|
|
||||||
# Define temperature constraints per model
|
# Define temperature constraints per model
|
||||||
if model_name in ["o3", "o3-mini"]:
|
if resolved_name in ["o3", "o3-mini", "o4-mini", "o4-mini-high"]:
|
||||||
# O3 models only support temperature=1.0
|
# O3 and O4 reasoning models only support temperature=1.0
|
||||||
temp_constraint = FixedTemperatureConstraint(1.0)
|
temp_constraint = FixedTemperatureConstraint(1.0)
|
||||||
else:
|
else:
|
||||||
# Other OpenAI models support 0.0-2.0 range
|
# Other OpenAI models support 0.0-2.0 range
|
||||||
@@ -63,10 +79,19 @@ class OpenAIModelProvider(OpenAICompatibleProvider):
|
|||||||
|
|
||||||
def validate_model_name(self, model_name: str) -> bool:
|
def validate_model_name(self, model_name: str) -> bool:
|
||||||
"""Validate if the model name is supported."""
|
"""Validate if the model name is supported."""
|
||||||
return model_name in self.SUPPORTED_MODELS
|
resolved_name = self._resolve_model_name(model_name)
|
||||||
|
return resolved_name in self.SUPPORTED_MODELS and isinstance(self.SUPPORTED_MODELS[resolved_name], dict)
|
||||||
|
|
||||||
def supports_thinking_mode(self, model_name: str) -> bool:
|
def supports_thinking_mode(self, model_name: str) -> bool:
|
||||||
"""Check if the model supports extended thinking mode."""
|
"""Check if the model supports extended thinking mode."""
|
||||||
# Currently no OpenAI models support extended thinking
|
# Currently no OpenAI models support extended thinking
|
||||||
# This may change with future O3 models
|
# This may change with future O3 models
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
def _resolve_model_name(self, model_name: str) -> str:
|
||||||
|
"""Resolve model shorthand to full name."""
|
||||||
|
# Check if it's a shorthand
|
||||||
|
shorthand_value = self.SUPPORTED_MODELS.get(model_name)
|
||||||
|
if isinstance(shorthand_value, str):
|
||||||
|
return shorthand_value
|
||||||
|
return model_name
|
||||||
|
|||||||
@@ -3,8 +3,12 @@
|
|||||||
# Exit on any error, undefined variables, and pipe failures
|
# Exit on any error, undefined variables, and pipe failures
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
# Modern Docker setup script for Zen MCP Server with Redis
|
# Run/Restart script for Zen MCP Server with Redis
|
||||||
# This script sets up the complete Docker environment including Redis for conversation threading
|
# This script builds, starts, and manages the Docker environment including Redis for conversation threading
|
||||||
|
# Run this script to:
|
||||||
|
# - Initial setup of the Docker environment
|
||||||
|
# - Restart services after changing .env configuration
|
||||||
|
# - Rebuild and restart after code changes
|
||||||
|
|
||||||
# Spinner function for long-running operations
|
# Spinner function for long-running operations
|
||||||
show_spinner() {
|
show_spinner() {
|
||||||
@@ -71,6 +71,15 @@ class O3ModelSelectionTest(BaseSimulatorTest):
|
|||||||
self.logger.info(" ℹ️ Only OpenRouter configured - O3 models will be routed through OpenRouter")
|
self.logger.info(" ℹ️ Only OpenRouter configured - O3 models will be routed through OpenRouter")
|
||||||
return self._run_openrouter_o3_test()
|
return self._run_openrouter_o3_test()
|
||||||
|
|
||||||
|
# If neither OpenAI nor OpenRouter is configured, skip the test
|
||||||
|
if not has_openai and not has_openrouter:
|
||||||
|
self.logger.info(" ⚠️ Neither OpenAI nor OpenRouter API keys configured - skipping test")
|
||||||
|
self.logger.info(
|
||||||
|
" ℹ️ This test requires either OPENAI_API_KEY or OPENROUTER_API_KEY to be set in .env"
|
||||||
|
)
|
||||||
|
self.logger.info(" ✅ Test skipped (no API keys configured)")
|
||||||
|
return True # Return True to indicate test passed/skipped
|
||||||
|
|
||||||
# Original test for when OpenAI is configured
|
# Original test for when OpenAI is configured
|
||||||
self.logger.info(" ℹ️ OpenAI API configured - expecting direct OpenAI API calls")
|
self.logger.info(" ℹ️ OpenAI API configured - expecting direct OpenAI API calls")
|
||||||
|
|
||||||
|
|||||||
@@ -85,8 +85,10 @@ def mock_provider_availability(request, monkeypatch):
|
|||||||
the tools don't require model selection unless explicitly testing auto mode.
|
the tools don't require model selection unless explicitly testing auto mode.
|
||||||
"""
|
"""
|
||||||
# Skip this fixture for tests that need real providers
|
# Skip this fixture for tests that need real providers
|
||||||
if hasattr(request, "node") and request.node.get_closest_marker("no_mock_provider"):
|
if hasattr(request, "node"):
|
||||||
return
|
marker = request.node.get_closest_marker("no_mock_provider")
|
||||||
|
if marker:
|
||||||
|
return
|
||||||
|
|
||||||
from unittest.mock import MagicMock
|
from unittest.mock import MagicMock
|
||||||
|
|
||||||
|
|||||||
@@ -2,7 +2,6 @@
|
|||||||
Test that conversation history is correctly mapped to tool-specific fields
|
Test that conversation history is correctly mapped to tool-specific fields
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from unittest.mock import MagicMock, patch
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
@@ -130,8 +129,7 @@ async def test_unknown_tool_defaults_to_prompt():
|
|||||||
with patch("utils.conversation_memory.get_thread", return_value=mock_context):
|
with patch("utils.conversation_memory.get_thread", return_value=mock_context):
|
||||||
with patch("utils.conversation_memory.add_turn", return_value=True):
|
with patch("utils.conversation_memory.add_turn", return_value=True):
|
||||||
with patch("utils.conversation_memory.build_conversation_history", return_value=("History", 500)):
|
with patch("utils.conversation_memory.build_conversation_history", return_value=("History", 500)):
|
||||||
# The test uses the conftest fixture which should handle provider mocking
|
# The autouse fixture should handle provider mocking
|
||||||
# We just need to ensure the arguments are correct
|
|
||||||
arguments = {
|
arguments = {
|
||||||
"continuation_id": "test-thread-456",
|
"continuation_id": "test-thread-456",
|
||||||
"prompt": "User input",
|
"prompt": "User input",
|
||||||
|
|||||||
@@ -72,7 +72,10 @@ class TestOpenRouterProvider:
|
|||||||
assert provider._resolve_model_name("opus") == "anthropic/claude-3-opus"
|
assert provider._resolve_model_name("opus") == "anthropic/claude-3-opus"
|
||||||
assert provider._resolve_model_name("sonnet") == "anthropic/claude-3-sonnet"
|
assert provider._resolve_model_name("sonnet") == "anthropic/claude-3-sonnet"
|
||||||
assert provider._resolve_model_name("o3") == "openai/o3"
|
assert provider._resolve_model_name("o3") == "openai/o3"
|
||||||
assert provider._resolve_model_name("o3-mini") == "openai/o3-mini-high"
|
assert provider._resolve_model_name("o3-mini") == "openai/o3-mini"
|
||||||
|
assert provider._resolve_model_name("o3mini") == "openai/o3-mini"
|
||||||
|
assert provider._resolve_model_name("o4-mini") == "openai/o4-mini"
|
||||||
|
assert provider._resolve_model_name("o4-mini-high") == "openai/o4-mini-high"
|
||||||
assert provider._resolve_model_name("claude") == "anthropic/claude-3-sonnet"
|
assert provider._resolve_model_name("claude") == "anthropic/claude-3-sonnet"
|
||||||
assert provider._resolve_model_name("mistral") == "mistral/mistral-large"
|
assert provider._resolve_model_name("mistral") == "mistral/mistral-large"
|
||||||
assert provider._resolve_model_name("deepseek") == "deepseek/deepseek-r1-0528"
|
assert provider._resolve_model_name("deepseek") == "deepseek/deepseek-r1-0528"
|
||||||
|
|||||||
@@ -183,12 +183,31 @@ class TestOpenAIProvider:
|
|||||||
assert capabilities.context_window == 200_000
|
assert capabilities.context_window == 200_000
|
||||||
assert not capabilities.supports_extended_thinking
|
assert not capabilities.supports_extended_thinking
|
||||||
|
|
||||||
|
def test_get_capabilities_o4_mini(self):
|
||||||
|
"""Test getting O4-mini model capabilities"""
|
||||||
|
provider = OpenAIModelProvider(api_key="test-key")
|
||||||
|
|
||||||
|
capabilities = provider.get_capabilities("o4-mini")
|
||||||
|
|
||||||
|
assert capabilities.provider == ProviderType.OPENAI
|
||||||
|
assert capabilities.model_name == "o4-mini"
|
||||||
|
assert capabilities.context_window == 200_000
|
||||||
|
assert not capabilities.supports_extended_thinking
|
||||||
|
# Check temperature constraint is fixed at 1.0
|
||||||
|
assert capabilities.temperature_constraint.value == 1.0
|
||||||
|
|
||||||
def test_validate_model_names(self):
|
def test_validate_model_names(self):
|
||||||
"""Test model name validation"""
|
"""Test model name validation"""
|
||||||
provider = OpenAIModelProvider(api_key="test-key")
|
provider = OpenAIModelProvider(api_key="test-key")
|
||||||
|
|
||||||
assert provider.validate_model_name("o3")
|
assert provider.validate_model_name("o3")
|
||||||
assert provider.validate_model_name("o3-mini")
|
assert provider.validate_model_name("o3mini")
|
||||||
|
assert provider.validate_model_name("o3-mini") # Backwards compatibility
|
||||||
|
assert provider.validate_model_name("o4-mini")
|
||||||
|
assert provider.validate_model_name("o4mini")
|
||||||
|
assert provider.validate_model_name("o4-mini-high")
|
||||||
|
assert provider.validate_model_name("o4minihigh")
|
||||||
|
assert provider.validate_model_name("o4minihi")
|
||||||
assert not provider.validate_model_name("gpt-4o")
|
assert not provider.validate_model_name("gpt-4o")
|
||||||
assert not provider.validate_model_name("invalid-model")
|
assert not provider.validate_model_name("invalid-model")
|
||||||
|
|
||||||
@@ -197,4 +216,7 @@ class TestOpenAIProvider:
|
|||||||
provider = OpenAIModelProvider(api_key="test-key")
|
provider = OpenAIModelProvider(api_key="test-key")
|
||||||
|
|
||||||
assert not provider.supports_thinking_mode("o3")
|
assert not provider.supports_thinking_mode("o3")
|
||||||
|
assert not provider.supports_thinking_mode("o3mini")
|
||||||
assert not provider.supports_thinking_mode("o3-mini")
|
assert not provider.supports_thinking_mode("o3-mini")
|
||||||
|
assert not provider.supports_thinking_mode("o4-mini")
|
||||||
|
assert not provider.supports_thinking_mode("o4-mini-high")
|
||||||
|
|||||||
Reference in New Issue
Block a user