- Repository URL consistency: Updated all references to BeehiveInnovations/zen-mcp-server format - Documentation clarity: Fixed misleading table headers and improved Docker configuration examples - File conventions: Added missing final newlines to all files - Configuration consistency: Clarified API key placeholder format in documentation Addresses all points raised in PR #17 review by Gemini Code Assist. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
19 KiB
Testing Strategy & Guidelines
Overview
This document outlines the comprehensive testing strategy for the Gemini MCP Server project, including unit testing, integration testing, and quality assurance practices that align with CLAUDE.md collaboration patterns.
Testing Philosophy
Test-Driven Development (TDD)
TDD Cycle:
- Red: Write failing test for new functionality
- Green: Implement minimal code to pass the test
- Refactor: Improve code while maintaining test coverage
- Repeat: Continue cycle for all new features
Example TDD Flow:
# 1. Write failing test
def test_chat_tool_should_process_simple_prompt():
tool = ChatTool()
result = tool.execute({"prompt": "Hello"})
assert result.status == "success"
assert "hello" in result.content.lower()
# 2. Implement minimal functionality
class ChatTool:
def execute(self, request):
return ToolOutput(content="Hello!", status="success")
# 3. Refactor and enhance
Testing Pyramid
/\
/ \ E2E Tests (Few, High-Value)
/____\ Integration Tests (Some, Key Paths)
/______\ Unit Tests (Many, Fast, Isolated)
/________\
Distribution:
- 70% Unit Tests: Fast, isolated, comprehensive coverage
- 20% Integration Tests: Component interaction validation
- 10% End-to-End Tests: Complete workflow validation
Test Categories
1. Unit Tests
Purpose: Test individual functions and classes in isolation
Location: tests/unit/
Example Structure:
# tests/unit/test_file_utils.py
import pytest
from unittest.mock import Mock, patch, mock_open
from utils.file_utils import validate_file_path, read_file_with_token_limit
class TestFileUtils:
"""Unit tests for file utility functions."""
def test_validate_file_path_with_safe_path(self):
"""Test that safe file paths pass validation."""
safe_path = "/workspace/tools/chat.py"
assert validate_file_path(safe_path) is True
def test_validate_file_path_with_traversal_attack(self):
"""Test that directory traversal attempts are blocked."""
dangerous_path = "/workspace/../../../etc/passwd"
with pytest.raises(SecurityError):
validate_file_path(dangerous_path)
@patch('builtins.open', new_callable=mock_open, read_data="test content")
def test_read_file_with_token_limit(self, mock_file):
"""Test file reading with token budget enforcement."""
content = read_file_with_token_limit("/test/file.py", max_tokens=100)
assert "test content" in content
mock_file.assert_called_once_with("/test/file.py", 'r', encoding='utf-8')
Unit Test Guidelines:
- Isolation: Mock external dependencies (file system, network, database)
- Fast Execution: Each test should complete in milliseconds
- Single Responsibility: One test per behavior/scenario
- Descriptive Names: Test names should describe the scenario and expected outcome
2. Integration Tests
Purpose: Test component interactions and system integration
Location: tests/integration/
Example Structure:
# tests/integration/test_tool_execution.py
import pytest
import asyncio
from unittest.mock import patch
from server import call_tool
from tools.chat import ChatTool
from utils.conversation_memory import ConversationMemory
class TestToolExecution:
"""Integration tests for tool execution pipeline."""
@pytest.fixture
def mock_redis(self):
"""Mock Redis for conversation memory testing."""
with patch('redis.Redis') as mock:
yield mock
@pytest.fixture
def conversation_memory(self, mock_redis):
"""Create conversation memory with mocked Redis."""
return ConversationMemory("redis://mock")
async def test_chat_tool_execution_with_memory(self, conversation_memory):
"""Test chat tool execution with conversation memory integration."""
# Arrange
request = {
"name": "chat",
"arguments": {
"prompt": "Hello",
"continuation_id": "test-thread-123"
}
}
# Act
result = await call_tool(request["name"], request["arguments"])
# Assert
assert len(result) == 1
assert result[0].type == "text"
assert "hello" in result[0].text.lower()
async def test_tool_execution_error_handling(self):
"""Test error handling in tool execution pipeline."""
# Test with invalid tool name
with pytest.raises(ToolNotFoundError):
await call_tool("nonexistent_tool", {})
Integration Test Guidelines:
- Real Component Interaction: Test actual component communication
- Mock External Services: Mock external APIs (Gemini, Redis) for reliability
- Error Scenarios: Test error propagation and handling
- Async Testing: Use pytest-asyncio for async code testing
3. Live Integration Tests
Purpose: Test real API integration with external services
Location: tests/live/
Requirements:
- Valid
GEMINI_API_KEYenvironment variable - Redis server running (for conversation memory tests)
- Network connectivity
Example Structure:
# tests/live/test_gemini_integration.py
import pytest
import os
from tools.chat import ChatTool
from tools.models import GeminiClient
@pytest.mark.live
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
class TestGeminiIntegration:
"""Live tests requiring actual Gemini API access."""
def setup_method(self):
"""Set up for live testing."""
self.api_key = os.getenv("GEMINI_API_KEY")
self.client = GeminiClient(self.api_key)
async def test_basic_gemini_request(self):
"""Test basic Gemini API request/response."""
response = await self.client.generate_response(
prompt="Say 'test successful'",
thinking_mode="minimal"
)
assert "test successful" in response.lower()
async def test_chat_tool_with_real_api(self):
"""Test ChatTool with real Gemini API integration."""
tool = ChatTool()
result = await tool.execute({
"prompt": "What is 2+2?",
"thinking_mode": "minimal"
})
assert result.status == "success"
assert "4" in result.content
Live Test Guidelines:
- Skip When Unavailable: Skip if API keys or services unavailable
- Rate Limiting: Respect API rate limits with delays
- Minimal Mode: Use minimal thinking mode for speed
- Cleanup: Clean up any created resources
4. Security Tests
Purpose: Validate security measures and vulnerability prevention
Location: tests/security/
Example Structure:
# tests/security/test_path_validation.py
import pytest
from utils.file_utils import validate_file_path
from exceptions import SecurityError
class TestSecurityValidation:
"""Security-focused tests for input validation."""
@pytest.mark.parametrize("dangerous_path", [
"../../../etc/passwd",
"/etc/shadow",
"~/../../root/.ssh/id_rsa",
"/var/log/auth.log",
"\\..\\..\\windows\\system32\\config\\sam"
])
def test_dangerous_path_rejection(self, dangerous_path):
"""Test that dangerous file paths are rejected."""
with pytest.raises(SecurityError):
validate_file_path(dangerous_path)
def test_secret_sanitization_in_logs(self):
"""Test that sensitive data is sanitized in log output."""
request_data = {
"prompt": "Hello",
"api_key": "sk-secret123",
"token": "bearer-token-456"
}
sanitized = sanitize_for_logging(request_data)
assert sanitized["api_key"] == "[REDACTED]"
assert sanitized["token"] == "[REDACTED]"
assert sanitized["prompt"] == "Hello" # Non-sensitive data preserved
Test Configuration
pytest Configuration
pytest.ini:
[tool:pytest]
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*
addopts =
-v
--strict-markers
--disable-warnings
--cov=tools
--cov=utils
--cov-report=html
--cov-report=term-missing
--cov-fail-under=80
markers =
unit: Unit tests (fast, isolated)
integration: Integration tests (component interaction)
live: Live tests requiring API keys and external services
security: Security-focused tests
slow: Tests that take more than 1 second
conftest.py:
# tests/conftest.py
import pytest
import asyncio
from unittest.mock import Mock, patch
@pytest.fixture(scope="session")
def event_loop():
"""Create an instance of the default event loop for the test session."""
loop = asyncio.get_event_loop_policy().new_event_loop()
yield loop
loop.close()
@pytest.fixture
def mock_gemini_client():
"""Mock Gemini client for testing without API calls."""
with patch('tools.models.GeminiClient') as mock:
mock_instance = Mock()
mock_instance.generate_response.return_value = "Mocked response"
mock.return_value = mock_instance
yield mock_instance
@pytest.fixture
def mock_redis():
"""Mock Redis client for testing without Redis server."""
with patch('redis.Redis') as mock:
yield mock
@pytest.fixture
def sample_file_content():
"""Sample file content for testing file processing."""
return """
def example_function():
# This is a sample function
return "hello world"
class ExampleClass:
def method(self):
pass
"""
@pytest.fixture
def temp_project_directory(tmp_path):
"""Create temporary project directory structure for testing."""
project_dir = tmp_path / "test_project"
project_dir.mkdir()
# Create subdirectories
(project_dir / "tools").mkdir()
(project_dir / "utils").mkdir()
(project_dir / "tests").mkdir()
# Create sample files
(project_dir / "tools" / "sample.py").write_text("# Sample tool")
(project_dir / "utils" / "helper.py").write_text("# Helper utility")
return project_dir
Test Data Management
Test Fixtures
File-based Fixtures:
# tests/fixtures/sample_code.py
PYTHON_CODE_SAMPLE = '''
import asyncio
from typing import Dict, List
async def process_data(items: List[str]) -> Dict[str, int]:
"""Process a list of items and return counts."""
result = {}
for item in items:
result[item] = len(item)
return result
'''
JAVASCRIPT_CODE_SAMPLE = '''
async function processData(items) {
const result = {};
for (const item of items) {
result[item] = item.length;
}
return result;
}
'''
ERROR_LOGS_SAMPLE = '''
2025-01-11 23:45:12 ERROR [tool_execution] Tool 'analyze' failed: File not found
Traceback (most recent call last):
File "/app/tools/analyze.py", line 45, in execute
content = read_file(file_path)
File "/app/utils/file_utils.py", line 23, in read_file
with open(file_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/file.py'
'''
Mock Data Factories
ToolOutput Factory:
# tests/factories.py
from dataclasses import dataclass
from typing import Dict, Any, List
def create_tool_output(
content: str = "Default response",
status: str = "success",
metadata: Dict[str, Any] = None,
files_processed: List[str] = None
) -> ToolOutput:
"""Factory for creating ToolOutput test instances."""
return ToolOutput(
content=content,
metadata=metadata or {},
files_processed=files_processed or [],
status=status
)
def create_thread_context(
thread_id: str = "test-thread-123",
files: List[str] = None
) -> ThreadContext:
"""Factory for creating ThreadContext test instances."""
return ThreadContext(
thread_id=thread_id,
conversation_files=set(files or []),
tool_history=[],
context_tokens=0
)
Mocking Strategies
External Service Mocking
Gemini API Mocking:
class MockGeminiClient:
"""Mock Gemini client for testing."""
def __init__(self, responses: Dict[str, str] = None):
self.responses = responses or {
"default": "This is a mocked response from Gemini"
}
self.call_count = 0
async def generate_response(self, prompt: str, **kwargs) -> str:
"""Mock response generation."""
self.call_count += 1
# Return specific response for specific prompts
for key, response in self.responses.items():
if key in prompt.lower():
return response
return self.responses.get("default", "Mock response")
# Usage in tests
@patch('tools.models.GeminiClient', MockGeminiClient)
def test_with_mocked_gemini():
# Test implementation
pass
File System Mocking:
@patch('builtins.open', mock_open(read_data="file content"))
@patch('os.path.exists', return_value=True)
@patch('os.path.getsize', return_value=1024)
def test_file_operations():
"""Test file operations with mocked file system."""
content = read_file("/mocked/file.py")
assert content == "file content"
Performance Testing
Load Testing
Concurrent Tool Execution:
# tests/performance/test_load.py
import asyncio
import pytest
import time
@pytest.mark.slow
class TestPerformance:
"""Performance tests for system load handling."""
async def test_concurrent_tool_execution(self):
"""Test system performance under concurrent load."""
start_time = time.time()
# Create 10 concurrent tool execution tasks
tasks = []
for i in range(10):
task = asyncio.create_task(
call_tool("chat", {"prompt": f"Request {i}"})
)
tasks.append(task)
# Wait for all tasks to complete
results = await asyncio.gather(*tasks)
end_time = time.time()
execution_time = end_time - start_time
# Verify all requests succeeded
assert len(results) == 10
assert all(len(result) == 1 for result in results)
# Performance assertion (adjust based on requirements)
assert execution_time < 30.0 # All requests should complete within 30s
async def test_memory_usage_stability(self):
"""Test that memory usage remains stable under load."""
import psutil
import gc
process = psutil.Process()
initial_memory = process.memory_info().rss
# Execute multiple operations
for i in range(100):
await call_tool("chat", {"prompt": f"Memory test {i}"})
# Force garbage collection periodically
if i % 10 == 0:
gc.collect()
final_memory = process.memory_info().rss
memory_growth = final_memory - initial_memory
# Memory growth should be reasonable (adjust threshold as needed)
assert memory_growth < 100 * 1024 * 1024 # Less than 100MB growth
Test Execution
Running Tests
Basic Test Execution:
# Run all tests
pytest
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not live" # All tests except live tests
pytest -m "live and not slow" # Live tests that are fast
# Run with coverage
pytest --cov=tools --cov=utils --cov-report=html
# Run specific test file
pytest tests/unit/test_file_utils.py -v
# Run specific test method
pytest tests/unit/test_file_utils.py::TestFileUtils::test_validate_file_path -v
Continuous Integration:
# CI test script
#!/bin/bash
set -e
echo "Running unit tests..."
pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
echo "Running integration tests..."
pytest -m integration
echo "Running security tests..."
pytest -m security
echo "Checking code quality..."
flake8 tools/ utils/ tests/
mypy tools/ utils/
echo "All tests passed!"
Test Reports
Coverage Reports:
# Generate HTML coverage report
pytest --cov=tools --cov=utils --cov-report=html
open htmlcov/index.html
# Generate terminal coverage report
pytest --cov=tools --cov=utils --cov-report=term-missing
Test Results Export:
# Export test results to JUnit XML (for CI integration)
pytest --junitxml=test-results.xml
# Export test results with timing information
pytest --durations=10 # Show 10 slowest tests
Quality Metrics
Coverage Targets
Minimum Coverage Requirements:
- Overall Coverage: 80%
- Critical Modules: 90% (security, file_utils, conversation_memory)
- Tool Modules: 85%
- Utility Modules: 80%
Coverage Enforcement:
# Fail build if coverage drops below threshold
pytest --cov-fail-under=80
Test Quality Metrics
Test Suite Characteristics:
- Fast Execution: Unit test suite should complete in <30 seconds
- Reliable: Tests should have <1% flaky failure rate
- Maintainable: Test code should follow same quality standards as production code
- Comprehensive: All critical paths and edge cases covered
Integration with Development Workflow
Pre-commit Testing
Git Hook Integration:
#!/bin/sh
# .git/hooks/pre-commit
echo "Running pre-commit tests..."
# Run fast tests before commit
pytest -m "unit and not slow" --cov-fail-under=80
if [ $? -ne 0 ]; then
echo "Tests failed. Commit blocked."
exit 1
fi
echo "Pre-commit tests passed."
CI/CD Integration
GitHub Actions Workflow:
name: Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, 3.10, 3.11]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov pytest-asyncio
- name: Run unit tests
run: pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
- name: Run integration tests
run: pytest -m integration
- name: Run security tests
run: pytest -m security
- name: Upload coverage reports
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
Detailed Test Structure Analysis
For a comprehensive analysis of the existing test suite, including detailed breakdowns of all 17 test files, security testing patterns, and collaboration feature validation, see:
Test Structure Documentation - Complete analysis of existing test organization, mocking strategies, and quality assurance patterns
This comprehensive testing strategy ensures high-quality, reliable code while maintaining development velocity and supporting the collaborative patterns defined in CLAUDE.md.