Files
my-pal-mcp-server/docs/contributing/testing.md
Patryk Ciechanski edef58eebf adding docs for tests
2025-06-11 15:00:20 +02:00

690 lines
19 KiB
Markdown

# Testing Strategy & Guidelines
## Overview
This document outlines the comprehensive testing strategy for the Gemini MCP Server project, including unit testing, integration testing, and quality assurance practices that align with CLAUDE.md collaboration patterns.
## Testing Philosophy
### Test-Driven Development (TDD)
**TDD Cycle**:
1. **Red**: Write failing test for new functionality
2. **Green**: Implement minimal code to pass the test
3. **Refactor**: Improve code while maintaining test coverage
4. **Repeat**: Continue cycle for all new features
**Example TDD Flow**:
```python
# 1. Write failing test
def test_chat_tool_should_process_simple_prompt():
tool = ChatTool()
result = tool.execute({"prompt": "Hello"})
assert result.status == "success"
assert "hello" in result.content.lower()
# 2. Implement minimal functionality
class ChatTool:
def execute(self, request):
return ToolOutput(content="Hello!", status="success")
# 3. Refactor and enhance
```
### Testing Pyramid
```
/\
/ \ E2E Tests (Few, High-Value)
/____\ Integration Tests (Some, Key Paths)
/______\ Unit Tests (Many, Fast, Isolated)
/________\
```
**Distribution**:
- **70% Unit Tests**: Fast, isolated, comprehensive coverage
- **20% Integration Tests**: Component interaction validation
- **10% End-to-End Tests**: Complete workflow validation
## Test Categories
### 1. Unit Tests
**Purpose**: Test individual functions and classes in isolation
**Location**: `tests/unit/`
**Example Structure**:
```python
# tests/unit/test_file_utils.py
import pytest
from unittest.mock import Mock, patch, mock_open
from utils.file_utils import validate_file_path, read_file_with_token_limit
class TestFileUtils:
"""Unit tests for file utility functions."""
def test_validate_file_path_with_safe_path(self):
"""Test that safe file paths pass validation."""
safe_path = "/workspace/tools/chat.py"
assert validate_file_path(safe_path) is True
def test_validate_file_path_with_traversal_attack(self):
"""Test that directory traversal attempts are blocked."""
dangerous_path = "/workspace/../../../etc/passwd"
with pytest.raises(SecurityError):
validate_file_path(dangerous_path)
@patch('builtins.open', new_callable=mock_open, read_data="test content")
def test_read_file_with_token_limit(self, mock_file):
"""Test file reading with token budget enforcement."""
content = read_file_with_token_limit("/test/file.py", max_tokens=100)
assert "test content" in content
mock_file.assert_called_once_with("/test/file.py", 'r', encoding='utf-8')
```
**Unit Test Guidelines**:
- **Isolation**: Mock external dependencies (file system, network, database)
- **Fast Execution**: Each test should complete in milliseconds
- **Single Responsibility**: One test per behavior/scenario
- **Descriptive Names**: Test names should describe the scenario and expected outcome
### 2. Integration Tests
**Purpose**: Test component interactions and system integration
**Location**: `tests/integration/`
**Example Structure**:
```python
# tests/integration/test_tool_execution.py
import pytest
import asyncio
from unittest.mock import patch
from server import call_tool
from tools.chat import ChatTool
from utils.conversation_memory import ConversationMemory
class TestToolExecution:
"""Integration tests for tool execution pipeline."""
@pytest.fixture
def mock_redis(self):
"""Mock Redis for conversation memory testing."""
with patch('redis.Redis') as mock:
yield mock
@pytest.fixture
def conversation_memory(self, mock_redis):
"""Create conversation memory with mocked Redis."""
return ConversationMemory("redis://mock")
async def test_chat_tool_execution_with_memory(self, conversation_memory):
"""Test chat tool execution with conversation memory integration."""
# Arrange
request = {
"name": "chat",
"arguments": {
"prompt": "Hello",
"continuation_id": "test-thread-123"
}
}
# Act
result = await call_tool(request["name"], request["arguments"])
# Assert
assert len(result) == 1
assert result[0].type == "text"
assert "hello" in result[0].text.lower()
async def test_tool_execution_error_handling(self):
"""Test error handling in tool execution pipeline."""
# Test with invalid tool name
with pytest.raises(ToolNotFoundError):
await call_tool("nonexistent_tool", {})
```
**Integration Test Guidelines**:
- **Real Component Interaction**: Test actual component communication
- **Mock External Services**: Mock external APIs (Gemini, Redis) for reliability
- **Error Scenarios**: Test error propagation and handling
- **Async Testing**: Use pytest-asyncio for async code testing
### 3. Live Integration Tests
**Purpose**: Test real API integration with external services
**Location**: `tests/live/`
**Requirements**:
- Valid `GEMINI_API_KEY` environment variable
- Redis server running (for conversation memory tests)
- Network connectivity
**Example Structure**:
```python
# tests/live/test_gemini_integration.py
import pytest
import os
from tools.chat import ChatTool
from tools.models import GeminiClient
@pytest.mark.live
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
class TestGeminiIntegration:
"""Live tests requiring actual Gemini API access."""
def setup_method(self):
"""Set up for live testing."""
self.api_key = os.getenv("GEMINI_API_KEY")
self.client = GeminiClient(self.api_key)
async def test_basic_gemini_request(self):
"""Test basic Gemini API request/response."""
response = await self.client.generate_response(
prompt="Say 'test successful'",
thinking_mode="minimal"
)
assert "test successful" in response.lower()
async def test_chat_tool_with_real_api(self):
"""Test ChatTool with real Gemini API integration."""
tool = ChatTool()
result = await tool.execute({
"prompt": "What is 2+2?",
"thinking_mode": "minimal"
})
assert result.status == "success"
assert "4" in result.content
```
**Live Test Guidelines**:
- **Skip When Unavailable**: Skip if API keys or services unavailable
- **Rate Limiting**: Respect API rate limits with delays
- **Minimal Mode**: Use minimal thinking mode for speed
- **Cleanup**: Clean up any created resources
### 4. Security Tests
**Purpose**: Validate security measures and vulnerability prevention
**Location**: `tests/security/`
**Example Structure**:
```python
# tests/security/test_path_validation.py
import pytest
from utils.file_utils import validate_file_path
from exceptions import SecurityError
class TestSecurityValidation:
"""Security-focused tests for input validation."""
@pytest.mark.parametrize("dangerous_path", [
"../../../etc/passwd",
"/etc/shadow",
"~/../../root/.ssh/id_rsa",
"/var/log/auth.log",
"\\..\\..\\windows\\system32\\config\\sam"
])
def test_dangerous_path_rejection(self, dangerous_path):
"""Test that dangerous file paths are rejected."""
with pytest.raises(SecurityError):
validate_file_path(dangerous_path)
def test_secret_sanitization_in_logs(self):
"""Test that sensitive data is sanitized in log output."""
request_data = {
"prompt": "Hello",
"api_key": "sk-secret123",
"token": "bearer-token-456"
}
sanitized = sanitize_for_logging(request_data)
assert sanitized["api_key"] == "[REDACTED]"
assert sanitized["token"] == "[REDACTED]"
assert sanitized["prompt"] == "Hello" # Non-sensitive data preserved
```
## Test Configuration
### pytest Configuration
**pytest.ini**:
```ini
[tool:pytest]
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*
addopts =
-v
--strict-markers
--disable-warnings
--cov=tools
--cov=utils
--cov-report=html
--cov-report=term-missing
--cov-fail-under=80
markers =
unit: Unit tests (fast, isolated)
integration: Integration tests (component interaction)
live: Live tests requiring API keys and external services
security: Security-focused tests
slow: Tests that take more than 1 second
```
**conftest.py**:
```python
# tests/conftest.py
import pytest
import asyncio
from unittest.mock import Mock, patch
@pytest.fixture(scope="session")
def event_loop():
"""Create an instance of the default event loop for the test session."""
loop = asyncio.get_event_loop_policy().new_event_loop()
yield loop
loop.close()
@pytest.fixture
def mock_gemini_client():
"""Mock Gemini client for testing without API calls."""
with patch('tools.models.GeminiClient') as mock:
mock_instance = Mock()
mock_instance.generate_response.return_value = "Mocked response"
mock.return_value = mock_instance
yield mock_instance
@pytest.fixture
def mock_redis():
"""Mock Redis client for testing without Redis server."""
with patch('redis.Redis') as mock:
yield mock
@pytest.fixture
def sample_file_content():
"""Sample file content for testing file processing."""
return """
def example_function():
# This is a sample function
return "hello world"
class ExampleClass:
def method(self):
pass
"""
@pytest.fixture
def temp_project_directory(tmp_path):
"""Create temporary project directory structure for testing."""
project_dir = tmp_path / "test_project"
project_dir.mkdir()
# Create subdirectories
(project_dir / "tools").mkdir()
(project_dir / "utils").mkdir()
(project_dir / "tests").mkdir()
# Create sample files
(project_dir / "tools" / "sample.py").write_text("# Sample tool")
(project_dir / "utils" / "helper.py").write_text("# Helper utility")
return project_dir
```
## Test Data Management
### Test Fixtures
**File-based Fixtures**:
```python
# tests/fixtures/sample_code.py
PYTHON_CODE_SAMPLE = '''
import asyncio
from typing import Dict, List
async def process_data(items: List[str]) -> Dict[str, int]:
"""Process a list of items and return counts."""
result = {}
for item in items:
result[item] = len(item)
return result
'''
JAVASCRIPT_CODE_SAMPLE = '''
async function processData(items) {
const result = {};
for (const item of items) {
result[item] = item.length;
}
return result;
}
'''
ERROR_LOGS_SAMPLE = '''
2025-01-11 23:45:12 ERROR [tool_execution] Tool 'analyze' failed: File not found
Traceback (most recent call last):
File "/app/tools/analyze.py", line 45, in execute
content = read_file(file_path)
File "/app/utils/file_utils.py", line 23, in read_file
with open(file_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/file.py'
'''
```
### Mock Data Factories
**ToolOutput Factory**:
```python
# tests/factories.py
from dataclasses import dataclass
from typing import Dict, Any, List
def create_tool_output(
content: str = "Default response",
status: str = "success",
metadata: Dict[str, Any] = None,
files_processed: List[str] = None
) -> ToolOutput:
"""Factory for creating ToolOutput test instances."""
return ToolOutput(
content=content,
metadata=metadata or {},
files_processed=files_processed or [],
status=status
)
def create_thread_context(
thread_id: str = "test-thread-123",
files: List[str] = None
) -> ThreadContext:
"""Factory for creating ThreadContext test instances."""
return ThreadContext(
thread_id=thread_id,
conversation_files=set(files or []),
tool_history=[],
context_tokens=0
)
```
## Mocking Strategies
### External Service Mocking
**Gemini API Mocking**:
```python
class MockGeminiClient:
"""Mock Gemini client for testing."""
def __init__(self, responses: Dict[str, str] = None):
self.responses = responses or {
"default": "This is a mocked response from Gemini"
}
self.call_count = 0
async def generate_response(self, prompt: str, **kwargs) -> str:
"""Mock response generation."""
self.call_count += 1
# Return specific response for specific prompts
for key, response in self.responses.items():
if key in prompt.lower():
return response
return self.responses.get("default", "Mock response")
# Usage in tests
@patch('tools.models.GeminiClient', MockGeminiClient)
def test_with_mocked_gemini():
# Test implementation
pass
```
**File System Mocking**:
```python
@patch('builtins.open', mock_open(read_data="file content"))
@patch('os.path.exists', return_value=True)
@patch('os.path.getsize', return_value=1024)
def test_file_operations():
"""Test file operations with mocked file system."""
content = read_file("/mocked/file.py")
assert content == "file content"
```
## Performance Testing
### Load Testing
**Concurrent Tool Execution**:
```python
# tests/performance/test_load.py
import asyncio
import pytest
import time
@pytest.mark.slow
class TestPerformance:
"""Performance tests for system load handling."""
async def test_concurrent_tool_execution(self):
"""Test system performance under concurrent load."""
start_time = time.time()
# Create 10 concurrent tool execution tasks
tasks = []
for i in range(10):
task = asyncio.create_task(
call_tool("chat", {"prompt": f"Request {i}"})
)
tasks.append(task)
# Wait for all tasks to complete
results = await asyncio.gather(*tasks)
end_time = time.time()
execution_time = end_time - start_time
# Verify all requests succeeded
assert len(results) == 10
assert all(len(result) == 1 for result in results)
# Performance assertion (adjust based on requirements)
assert execution_time < 30.0 # All requests should complete within 30s
async def test_memory_usage_stability(self):
"""Test that memory usage remains stable under load."""
import psutil
import gc
process = psutil.Process()
initial_memory = process.memory_info().rss
# Execute multiple operations
for i in range(100):
await call_tool("chat", {"prompt": f"Memory test {i}"})
# Force garbage collection periodically
if i % 10 == 0:
gc.collect()
final_memory = process.memory_info().rss
memory_growth = final_memory - initial_memory
# Memory growth should be reasonable (adjust threshold as needed)
assert memory_growth < 100 * 1024 * 1024 # Less than 100MB growth
```
## Test Execution
### Running Tests
**Basic Test Execution**:
```bash
# Run all tests
pytest
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not live" # All tests except live tests
pytest -m "live and not slow" # Live tests that are fast
# Run with coverage
pytest --cov=tools --cov=utils --cov-report=html
# Run specific test file
pytest tests/unit/test_file_utils.py -v
# Run specific test method
pytest tests/unit/test_file_utils.py::TestFileUtils::test_validate_file_path -v
```
**Continuous Integration**:
```bash
# CI test script
#!/bin/bash
set -e
echo "Running unit tests..."
pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
echo "Running integration tests..."
pytest -m integration
echo "Running security tests..."
pytest -m security
echo "Checking code quality..."
flake8 tools/ utils/ tests/
mypy tools/ utils/
echo "All tests passed!"
```
### Test Reports
**Coverage Reports**:
```bash
# Generate HTML coverage report
pytest --cov=tools --cov=utils --cov-report=html
open htmlcov/index.html
# Generate terminal coverage report
pytest --cov=tools --cov=utils --cov-report=term-missing
```
**Test Results Export**:
```bash
# Export test results to JUnit XML (for CI integration)
pytest --junitxml=test-results.xml
# Export test results with timing information
pytest --durations=10 # Show 10 slowest tests
```
## Quality Metrics
### Coverage Targets
**Minimum Coverage Requirements**:
- **Overall Coverage**: 80%
- **Critical Modules**: 90% (security, file_utils, conversation_memory)
- **Tool Modules**: 85%
- **Utility Modules**: 80%
**Coverage Enforcement**:
```bash
# Fail build if coverage drops below threshold
pytest --cov-fail-under=80
```
### Test Quality Metrics
**Test Suite Characteristics**:
- **Fast Execution**: Unit test suite should complete in <30 seconds
- **Reliable**: Tests should have <1% flaky failure rate
- **Maintainable**: Test code should follow same quality standards as production code
- **Comprehensive**: All critical paths and edge cases covered
## Integration with Development Workflow
### Pre-commit Testing
**Git Hook Integration**:
```bash
#!/bin/sh
# .git/hooks/pre-commit
echo "Running pre-commit tests..."
# Run fast tests before commit
pytest -m "unit and not slow" --cov-fail-under=80
if [ $? -ne 0 ]; then
echo "Tests failed. Commit blocked."
exit 1
fi
echo "Pre-commit tests passed."
```
### CI/CD Integration
**GitHub Actions Workflow**:
```yaml
name: Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.9, 3.10, 3.11]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov pytest-asyncio
- name: Run unit tests
run: pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
- name: Run integration tests
run: pytest -m integration
- name: Run security tests
run: pytest -m security
- name: Upload coverage reports
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
```
## Detailed Test Structure Analysis
For a comprehensive analysis of the existing test suite, including detailed breakdowns of all 17 test files, security testing patterns, and collaboration feature validation, see:
**[Test Structure Documentation](test-structure.md)** - Complete analysis of existing test organization, mocking strategies, and quality assurance patterns
---
This comprehensive testing strategy ensures high-quality, reliable code while maintaining development velocity and supporting the collaborative patterns defined in CLAUDE.md.