- Repository URL consistency: Updated all references to BeehiveInnovations/zen-mcp-server format - Documentation clarity: Fixed misleading table headers and improved Docker configuration examples - File conventions: Added missing final newlines to all files - Configuration consistency: Clarified API key placeholder format in documentation Addresses all points raised in PR #17 review by Gemini Code Assist. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
692 lines
19 KiB
Markdown
692 lines
19 KiB
Markdown
# Testing Strategy & Guidelines
|
|
|
|
## Overview
|
|
|
|
This document outlines the comprehensive testing strategy for the Gemini MCP Server project, including unit testing, integration testing, and quality assurance practices that align with CLAUDE.md collaboration patterns.
|
|
|
|
## Testing Philosophy
|
|
|
|
### Test-Driven Development (TDD)
|
|
|
|
**TDD Cycle**:
|
|
1. **Red**: Write failing test for new functionality
|
|
2. **Green**: Implement minimal code to pass the test
|
|
3. **Refactor**: Improve code while maintaining test coverage
|
|
4. **Repeat**: Continue cycle for all new features
|
|
|
|
**Example TDD Flow**:
|
|
```python
|
|
# 1. Write failing test
|
|
def test_chat_tool_should_process_simple_prompt():
|
|
tool = ChatTool()
|
|
result = tool.execute({"prompt": "Hello"})
|
|
assert result.status == "success"
|
|
assert "hello" in result.content.lower()
|
|
|
|
# 2. Implement minimal functionality
|
|
class ChatTool:
|
|
def execute(self, request):
|
|
return ToolOutput(content="Hello!", status="success")
|
|
|
|
# 3. Refactor and enhance
|
|
```
|
|
|
|
### Testing Pyramid
|
|
|
|
```
|
|
/\
|
|
/ \ E2E Tests (Few, High-Value)
|
|
/____\ Integration Tests (Some, Key Paths)
|
|
/______\ Unit Tests (Many, Fast, Isolated)
|
|
/________\
|
|
```
|
|
|
|
**Distribution**:
|
|
- **70% Unit Tests**: Fast, isolated, comprehensive coverage
|
|
- **20% Integration Tests**: Component interaction validation
|
|
- **10% End-to-End Tests**: Complete workflow validation
|
|
|
|
## Test Categories
|
|
|
|
### 1. Unit Tests
|
|
|
|
**Purpose**: Test individual functions and classes in isolation
|
|
|
|
**Location**: `tests/unit/`
|
|
|
|
**Example Structure**:
|
|
```python
|
|
# tests/unit/test_file_utils.py
|
|
import pytest
|
|
from unittest.mock import Mock, patch, mock_open
|
|
|
|
from utils.file_utils import validate_file_path, read_file_with_token_limit
|
|
|
|
class TestFileUtils:
|
|
"""Unit tests for file utility functions."""
|
|
|
|
def test_validate_file_path_with_safe_path(self):
|
|
"""Test that safe file paths pass validation."""
|
|
safe_path = "/workspace/tools/chat.py"
|
|
assert validate_file_path(safe_path) is True
|
|
|
|
def test_validate_file_path_with_traversal_attack(self):
|
|
"""Test that directory traversal attempts are blocked."""
|
|
dangerous_path = "/workspace/../../../etc/passwd"
|
|
with pytest.raises(SecurityError):
|
|
validate_file_path(dangerous_path)
|
|
|
|
@patch('builtins.open', new_callable=mock_open, read_data="test content")
|
|
def test_read_file_with_token_limit(self, mock_file):
|
|
"""Test file reading with token budget enforcement."""
|
|
content = read_file_with_token_limit("/test/file.py", max_tokens=100)
|
|
assert "test content" in content
|
|
mock_file.assert_called_once_with("/test/file.py", 'r', encoding='utf-8')
|
|
```
|
|
|
|
**Unit Test Guidelines**:
|
|
- **Isolation**: Mock external dependencies (file system, network, database)
|
|
- **Fast Execution**: Each test should complete in milliseconds
|
|
- **Single Responsibility**: One test per behavior/scenario
|
|
- **Descriptive Names**: Test names should describe the scenario and expected outcome
|
|
|
|
### 2. Integration Tests
|
|
|
|
**Purpose**: Test component interactions and system integration
|
|
|
|
**Location**: `tests/integration/`
|
|
|
|
**Example Structure**:
|
|
```python
|
|
# tests/integration/test_tool_execution.py
|
|
import pytest
|
|
import asyncio
|
|
from unittest.mock import patch
|
|
|
|
from server import call_tool
|
|
from tools.chat import ChatTool
|
|
from utils.conversation_memory import ConversationMemory
|
|
|
|
class TestToolExecution:
|
|
"""Integration tests for tool execution pipeline."""
|
|
|
|
@pytest.fixture
|
|
def mock_redis(self):
|
|
"""Mock Redis for conversation memory testing."""
|
|
with patch('redis.Redis') as mock:
|
|
yield mock
|
|
|
|
@pytest.fixture
|
|
def conversation_memory(self, mock_redis):
|
|
"""Create conversation memory with mocked Redis."""
|
|
return ConversationMemory("redis://mock")
|
|
|
|
async def test_chat_tool_execution_with_memory(self, conversation_memory):
|
|
"""Test chat tool execution with conversation memory integration."""
|
|
# Arrange
|
|
request = {
|
|
"name": "chat",
|
|
"arguments": {
|
|
"prompt": "Hello",
|
|
"continuation_id": "test-thread-123"
|
|
}
|
|
}
|
|
|
|
# Act
|
|
result = await call_tool(request["name"], request["arguments"])
|
|
|
|
# Assert
|
|
assert len(result) == 1
|
|
assert result[0].type == "text"
|
|
assert "hello" in result[0].text.lower()
|
|
|
|
async def test_tool_execution_error_handling(self):
|
|
"""Test error handling in tool execution pipeline."""
|
|
# Test with invalid tool name
|
|
with pytest.raises(ToolNotFoundError):
|
|
await call_tool("nonexistent_tool", {})
|
|
```
|
|
|
|
**Integration Test Guidelines**:
|
|
- **Real Component Interaction**: Test actual component communication
|
|
- **Mock External Services**: Mock external APIs (Gemini, Redis) for reliability
|
|
- **Error Scenarios**: Test error propagation and handling
|
|
- **Async Testing**: Use pytest-asyncio for async code testing
|
|
|
|
### 3. Live Integration Tests
|
|
|
|
**Purpose**: Test real API integration with external services
|
|
|
|
**Location**: `tests/live/`
|
|
|
|
**Requirements**:
|
|
- Valid `GEMINI_API_KEY` environment variable
|
|
- Redis server running (for conversation memory tests)
|
|
- Network connectivity
|
|
|
|
**Example Structure**:
|
|
```python
|
|
# tests/live/test_gemini_integration.py
|
|
import pytest
|
|
import os
|
|
|
|
from tools.chat import ChatTool
|
|
from tools.models import GeminiClient
|
|
|
|
@pytest.mark.live
|
|
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
|
|
class TestGeminiIntegration:
|
|
"""Live tests requiring actual Gemini API access."""
|
|
|
|
def setup_method(self):
|
|
"""Set up for live testing."""
|
|
self.api_key = os.getenv("GEMINI_API_KEY")
|
|
self.client = GeminiClient(self.api_key)
|
|
|
|
async def test_basic_gemini_request(self):
|
|
"""Test basic Gemini API request/response."""
|
|
response = await self.client.generate_response(
|
|
prompt="Say 'test successful'",
|
|
thinking_mode="minimal"
|
|
)
|
|
assert "test successful" in response.lower()
|
|
|
|
async def test_chat_tool_with_real_api(self):
|
|
"""Test ChatTool with real Gemini API integration."""
|
|
tool = ChatTool()
|
|
result = await tool.execute({
|
|
"prompt": "What is 2+2?",
|
|
"thinking_mode": "minimal"
|
|
})
|
|
|
|
assert result.status == "success"
|
|
assert "4" in result.content
|
|
```
|
|
|
|
**Live Test Guidelines**:
|
|
- **Skip When Unavailable**: Skip if API keys or services unavailable
|
|
- **Rate Limiting**: Respect API rate limits with delays
|
|
- **Minimal Mode**: Use minimal thinking mode for speed
|
|
- **Cleanup**: Clean up any created resources
|
|
|
|
### 4. Security Tests
|
|
|
|
**Purpose**: Validate security measures and vulnerability prevention
|
|
|
|
**Location**: `tests/security/`
|
|
|
|
**Example Structure**:
|
|
```python
|
|
# tests/security/test_path_validation.py
|
|
import pytest
|
|
|
|
from utils.file_utils import validate_file_path
|
|
from exceptions import SecurityError
|
|
|
|
class TestSecurityValidation:
|
|
"""Security-focused tests for input validation."""
|
|
|
|
@pytest.mark.parametrize("dangerous_path", [
|
|
"../../../etc/passwd",
|
|
"/etc/shadow",
|
|
"~/../../root/.ssh/id_rsa",
|
|
"/var/log/auth.log",
|
|
"\\..\\..\\windows\\system32\\config\\sam"
|
|
])
|
|
def test_dangerous_path_rejection(self, dangerous_path):
|
|
"""Test that dangerous file paths are rejected."""
|
|
with pytest.raises(SecurityError):
|
|
validate_file_path(dangerous_path)
|
|
|
|
def test_secret_sanitization_in_logs(self):
|
|
"""Test that sensitive data is sanitized in log output."""
|
|
request_data = {
|
|
"prompt": "Hello",
|
|
"api_key": "sk-secret123",
|
|
"token": "bearer-token-456"
|
|
}
|
|
|
|
sanitized = sanitize_for_logging(request_data)
|
|
|
|
assert sanitized["api_key"] == "[REDACTED]"
|
|
assert sanitized["token"] == "[REDACTED]"
|
|
assert sanitized["prompt"] == "Hello" # Non-sensitive data preserved
|
|
```
|
|
|
|
## Test Configuration
|
|
|
|
### pytest Configuration
|
|
|
|
**pytest.ini**:
|
|
```ini
|
|
[tool:pytest]
|
|
testpaths = tests
|
|
python_files = test_*.py *_test.py
|
|
python_classes = Test*
|
|
python_functions = test_*
|
|
addopts =
|
|
-v
|
|
--strict-markers
|
|
--disable-warnings
|
|
--cov=tools
|
|
--cov=utils
|
|
--cov-report=html
|
|
--cov-report=term-missing
|
|
--cov-fail-under=80
|
|
|
|
markers =
|
|
unit: Unit tests (fast, isolated)
|
|
integration: Integration tests (component interaction)
|
|
live: Live tests requiring API keys and external services
|
|
security: Security-focused tests
|
|
slow: Tests that take more than 1 second
|
|
```
|
|
|
|
**conftest.py**:
|
|
```python
|
|
# tests/conftest.py
|
|
import pytest
|
|
import asyncio
|
|
from unittest.mock import Mock, patch
|
|
|
|
@pytest.fixture(scope="session")
|
|
def event_loop():
|
|
"""Create an instance of the default event loop for the test session."""
|
|
loop = asyncio.get_event_loop_policy().new_event_loop()
|
|
yield loop
|
|
loop.close()
|
|
|
|
@pytest.fixture
|
|
def mock_gemini_client():
|
|
"""Mock Gemini client for testing without API calls."""
|
|
with patch('tools.models.GeminiClient') as mock:
|
|
mock_instance = Mock()
|
|
mock_instance.generate_response.return_value = "Mocked response"
|
|
mock.return_value = mock_instance
|
|
yield mock_instance
|
|
|
|
@pytest.fixture
|
|
def mock_redis():
|
|
"""Mock Redis client for testing without Redis server."""
|
|
with patch('redis.Redis') as mock:
|
|
yield mock
|
|
|
|
@pytest.fixture
|
|
def sample_file_content():
|
|
"""Sample file content for testing file processing."""
|
|
return """
|
|
def example_function():
|
|
# This is a sample function
|
|
return "hello world"
|
|
|
|
class ExampleClass:
|
|
def method(self):
|
|
pass
|
|
"""
|
|
|
|
@pytest.fixture
|
|
def temp_project_directory(tmp_path):
|
|
"""Create temporary project directory structure for testing."""
|
|
project_dir = tmp_path / "test_project"
|
|
project_dir.mkdir()
|
|
|
|
# Create subdirectories
|
|
(project_dir / "tools").mkdir()
|
|
(project_dir / "utils").mkdir()
|
|
(project_dir / "tests").mkdir()
|
|
|
|
# Create sample files
|
|
(project_dir / "tools" / "sample.py").write_text("# Sample tool")
|
|
(project_dir / "utils" / "helper.py").write_text("# Helper utility")
|
|
|
|
return project_dir
|
|
```
|
|
|
|
## Test Data Management
|
|
|
|
### Test Fixtures
|
|
|
|
**File-based Fixtures**:
|
|
```python
|
|
# tests/fixtures/sample_code.py
|
|
PYTHON_CODE_SAMPLE = '''
|
|
import asyncio
|
|
from typing import Dict, List
|
|
|
|
async def process_data(items: List[str]) -> Dict[str, int]:
|
|
"""Process a list of items and return counts."""
|
|
result = {}
|
|
for item in items:
|
|
result[item] = len(item)
|
|
return result
|
|
'''
|
|
|
|
JAVASCRIPT_CODE_SAMPLE = '''
|
|
async function processData(items) {
|
|
const result = {};
|
|
for (const item of items) {
|
|
result[item] = item.length;
|
|
}
|
|
return result;
|
|
}
|
|
'''
|
|
|
|
ERROR_LOGS_SAMPLE = '''
|
|
2025-01-11 23:45:12 ERROR [tool_execution] Tool 'analyze' failed: File not found
|
|
Traceback (most recent call last):
|
|
File "/app/tools/analyze.py", line 45, in execute
|
|
content = read_file(file_path)
|
|
File "/app/utils/file_utils.py", line 23, in read_file
|
|
with open(file_path, 'r') as f:
|
|
FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/file.py'
|
|
'''
|
|
```
|
|
|
|
### Mock Data Factories
|
|
|
|
**ToolOutput Factory**:
|
|
```python
|
|
# tests/factories.py
|
|
from dataclasses import dataclass
|
|
from typing import Dict, Any, List
|
|
|
|
def create_tool_output(
|
|
content: str = "Default response",
|
|
status: str = "success",
|
|
metadata: Dict[str, Any] = None,
|
|
files_processed: List[str] = None
|
|
) -> ToolOutput:
|
|
"""Factory for creating ToolOutput test instances."""
|
|
return ToolOutput(
|
|
content=content,
|
|
metadata=metadata or {},
|
|
files_processed=files_processed or [],
|
|
status=status
|
|
)
|
|
|
|
def create_thread_context(
|
|
thread_id: str = "test-thread-123",
|
|
files: List[str] = None
|
|
) -> ThreadContext:
|
|
"""Factory for creating ThreadContext test instances."""
|
|
return ThreadContext(
|
|
thread_id=thread_id,
|
|
conversation_files=set(files or []),
|
|
tool_history=[],
|
|
context_tokens=0
|
|
)
|
|
```
|
|
|
|
## Mocking Strategies
|
|
|
|
### External Service Mocking
|
|
|
|
**Gemini API Mocking**:
|
|
```python
|
|
class MockGeminiClient:
|
|
"""Mock Gemini client for testing."""
|
|
|
|
def __init__(self, responses: Dict[str, str] = None):
|
|
self.responses = responses or {
|
|
"default": "This is a mocked response from Gemini"
|
|
}
|
|
self.call_count = 0
|
|
|
|
async def generate_response(self, prompt: str, **kwargs) -> str:
|
|
"""Mock response generation."""
|
|
self.call_count += 1
|
|
|
|
# Return specific response for specific prompts
|
|
for key, response in self.responses.items():
|
|
if key in prompt.lower():
|
|
return response
|
|
|
|
return self.responses.get("default", "Mock response")
|
|
|
|
# Usage in tests
|
|
@patch('tools.models.GeminiClient', MockGeminiClient)
|
|
def test_with_mocked_gemini():
|
|
# Test implementation
|
|
pass
|
|
```
|
|
|
|
**File System Mocking**:
|
|
```python
|
|
@patch('builtins.open', mock_open(read_data="file content"))
|
|
@patch('os.path.exists', return_value=True)
|
|
@patch('os.path.getsize', return_value=1024)
|
|
def test_file_operations():
|
|
"""Test file operations with mocked file system."""
|
|
content = read_file("/mocked/file.py")
|
|
assert content == "file content"
|
|
```
|
|
|
|
## Performance Testing
|
|
|
|
### Load Testing
|
|
|
|
**Concurrent Tool Execution**:
|
|
```python
|
|
# tests/performance/test_load.py
|
|
import asyncio
|
|
import pytest
|
|
import time
|
|
|
|
@pytest.mark.slow
|
|
class TestPerformance:
|
|
"""Performance tests for system load handling."""
|
|
|
|
async def test_concurrent_tool_execution(self):
|
|
"""Test system performance under concurrent load."""
|
|
start_time = time.time()
|
|
|
|
# Create 10 concurrent tool execution tasks
|
|
tasks = []
|
|
for i in range(10):
|
|
task = asyncio.create_task(
|
|
call_tool("chat", {"prompt": f"Request {i}"})
|
|
)
|
|
tasks.append(task)
|
|
|
|
# Wait for all tasks to complete
|
|
results = await asyncio.gather(*tasks)
|
|
|
|
end_time = time.time()
|
|
execution_time = end_time - start_time
|
|
|
|
# Verify all requests succeeded
|
|
assert len(results) == 10
|
|
assert all(len(result) == 1 for result in results)
|
|
|
|
# Performance assertion (adjust based on requirements)
|
|
assert execution_time < 30.0 # All requests should complete within 30s
|
|
|
|
async def test_memory_usage_stability(self):
|
|
"""Test that memory usage remains stable under load."""
|
|
import psutil
|
|
import gc
|
|
|
|
process = psutil.Process()
|
|
initial_memory = process.memory_info().rss
|
|
|
|
# Execute multiple operations
|
|
for i in range(100):
|
|
await call_tool("chat", {"prompt": f"Memory test {i}"})
|
|
|
|
# Force garbage collection periodically
|
|
if i % 10 == 0:
|
|
gc.collect()
|
|
|
|
final_memory = process.memory_info().rss
|
|
memory_growth = final_memory - initial_memory
|
|
|
|
# Memory growth should be reasonable (adjust threshold as needed)
|
|
assert memory_growth < 100 * 1024 * 1024 # Less than 100MB growth
|
|
```
|
|
|
|
## Test Execution
|
|
|
|
### Running Tests
|
|
|
|
**Basic Test Execution**:
|
|
```bash
|
|
# Run all tests
|
|
pytest
|
|
|
|
# Run specific test categories
|
|
pytest -m unit # Unit tests only
|
|
pytest -m integration # Integration tests only
|
|
pytest -m "not live" # All tests except live tests
|
|
pytest -m "live and not slow" # Live tests that are fast
|
|
|
|
# Run with coverage
|
|
pytest --cov=tools --cov=utils --cov-report=html
|
|
|
|
# Run specific test file
|
|
pytest tests/unit/test_file_utils.py -v
|
|
|
|
# Run specific test method
|
|
pytest tests/unit/test_file_utils.py::TestFileUtils::test_validate_file_path -v
|
|
```
|
|
|
|
**Continuous Integration**:
|
|
```bash
|
|
# CI test script
|
|
#!/bin/bash
|
|
set -e
|
|
|
|
echo "Running unit tests..."
|
|
pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
|
|
|
|
echo "Running integration tests..."
|
|
pytest -m integration
|
|
|
|
echo "Running security tests..."
|
|
pytest -m security
|
|
|
|
echo "Checking code quality..."
|
|
flake8 tools/ utils/ tests/
|
|
mypy tools/ utils/
|
|
|
|
echo "All tests passed!"
|
|
```
|
|
|
|
### Test Reports
|
|
|
|
**Coverage Reports**:
|
|
```bash
|
|
# Generate HTML coverage report
|
|
pytest --cov=tools --cov=utils --cov-report=html
|
|
open htmlcov/index.html
|
|
|
|
# Generate terminal coverage report
|
|
pytest --cov=tools --cov=utils --cov-report=term-missing
|
|
```
|
|
|
|
**Test Results Export**:
|
|
```bash
|
|
# Export test results to JUnit XML (for CI integration)
|
|
pytest --junitxml=test-results.xml
|
|
|
|
# Export test results with timing information
|
|
pytest --durations=10 # Show 10 slowest tests
|
|
```
|
|
|
|
## Quality Metrics
|
|
|
|
### Coverage Targets
|
|
|
|
**Minimum Coverage Requirements**:
|
|
- **Overall Coverage**: 80%
|
|
- **Critical Modules**: 90% (security, file_utils, conversation_memory)
|
|
- **Tool Modules**: 85%
|
|
- **Utility Modules**: 80%
|
|
|
|
**Coverage Enforcement**:
|
|
```bash
|
|
# Fail build if coverage drops below threshold
|
|
pytest --cov-fail-under=80
|
|
```
|
|
|
|
### Test Quality Metrics
|
|
|
|
**Test Suite Characteristics**:
|
|
- **Fast Execution**: Unit test suite should complete in <30 seconds
|
|
- **Reliable**: Tests should have <1% flaky failure rate
|
|
- **Maintainable**: Test code should follow same quality standards as production code
|
|
- **Comprehensive**: All critical paths and edge cases covered
|
|
|
|
## Integration with Development Workflow
|
|
|
|
### Pre-commit Testing
|
|
|
|
**Git Hook Integration**:
|
|
```bash
|
|
#!/bin/sh
|
|
# .git/hooks/pre-commit
|
|
|
|
echo "Running pre-commit tests..."
|
|
|
|
# Run fast tests before commit
|
|
pytest -m "unit and not slow" --cov-fail-under=80
|
|
|
|
if [ $? -ne 0 ]; then
|
|
echo "Tests failed. Commit blocked."
|
|
exit 1
|
|
fi
|
|
|
|
echo "Pre-commit tests passed."
|
|
```
|
|
|
|
### CI/CD Integration
|
|
|
|
**GitHub Actions Workflow**:
|
|
```yaml
|
|
name: Test Suite
|
|
on: [push, pull_request]
|
|
|
|
jobs:
|
|
test:
|
|
runs-on: ubuntu-latest
|
|
strategy:
|
|
matrix:
|
|
python-version: [3.9, 3.10, 3.11]
|
|
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
- name: Set up Python ${{ matrix.python-version }}
|
|
uses: actions/setup-python@v4
|
|
with:
|
|
python-version: ${{ matrix.python-version }}
|
|
|
|
- name: Install dependencies
|
|
run: |
|
|
pip install -r requirements.txt
|
|
pip install pytest pytest-cov pytest-asyncio
|
|
|
|
- name: Run unit tests
|
|
run: pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
|
|
|
|
- name: Run integration tests
|
|
run: pytest -m integration
|
|
|
|
- name: Run security tests
|
|
run: pytest -m security
|
|
|
|
- name: Upload coverage reports
|
|
uses: codecov/codecov-action@v3
|
|
with:
|
|
file: ./coverage.xml
|
|
```
|
|
|
|
## Detailed Test Structure Analysis
|
|
|
|
For a comprehensive analysis of the existing test suite, including detailed breakdowns of all 17 test files, security testing patterns, and collaboration feature validation, see:
|
|
|
|
**[Test Structure Documentation](test-structure.md)** - Complete analysis of existing test organization, mocking strategies, and quality assurance patterns
|
|
|
|
---
|
|
|
|
This comprehensive testing strategy ensures high-quality, reliable code while maintaining development velocity and supporting the collaborative patterns defined in CLAUDE.md.
|
|
|