Files
my-pal-mcp-server/docs/contributing/test-structure.md
Patryk Ciechanski edef58eebf adding docs for tests
2025-06-11 15:00:20 +02:00

470 lines
14 KiB
Markdown

# Test Structure Documentation
## Overview
This document provides a comprehensive analysis of the existing test structure in the Gemini MCP Server project. The test suite consists of **17 specialized test files** organized to validate all aspects of the system from unit-level functionality to complex AI collaboration workflows.
## Test Organization
### Test Directory Structure
```
tests/
├── __init__.py # Package initialization
├── conftest.py # Global test configuration and fixtures
├── test_claude_continuation.py # Claude continuation opportunities
├── test_collaboration.py # AI-to-AI collaboration features
├── test_config.py # Configuration validation
├── test_conversation_history_bug.py # Bug fix regression tests
├── test_conversation_memory.py # Redis-based conversation persistence
├── test_cross_tool_continuation.py # Cross-tool conversation threading
├── test_docker_path_integration.py # Docker environment path translation
├── test_large_prompt_handling.py # Large prompt detection and handling
├── test_live_integration.py # Live API testing (excluded from CI)
├── test_precommit.py # Pre-commit validation and git integration
├── test_prompt_regression.py # Normal prompt handling regression
├── test_server.py # Main server functionality
├── test_thinking_modes.py # Thinking mode functionality
├── test_tools.py # Individual tool implementations
└── test_utils.py # Utility function testing
```
## Test Categories and Analysis
### 1. Core Functionality Tests
#### `test_server.py` - Main Server Functionality
**Purpose**: Tests the core MCP server implementation and tool dispatch mechanism
**Key Test Classes**:
- **Server startup and initialization**
- **Tool registration and availability**
- **Request routing and handling**
- **Error propagation and handling**
**Example Coverage**:
```python
# Tests tool listing functionality
def test_list_tools()
# Tests tool execution pipeline
async def test_call_tool()
# Tests error handling for invalid tools
async def test_call_invalid_tool()
```
#### `test_config.py` - Configuration Management
**Purpose**: Validates configuration loading, environment variable handling, and settings validation
**Key Areas**:
- **Environment variable parsing**
- **Default value handling**
- **Configuration validation**
- **Error handling for missing required config**
#### `test_tools.py` - Tool Implementation Testing
**Purpose**: Tests individual tool implementations with comprehensive input validation
**Key Features**:
- **Absolute path enforcement across all tools**
- **Parameter validation for each tool**
- **Error handling for malformed inputs**
- **Tool-specific behavior validation**
**Critical Security Testing**:
```python
# Tests that all tools enforce absolute paths
async def test_tool_absolute_path_requirement()
# Tests path traversal attack prevention
async def test_tool_path_traversal_prevention()
```
#### `test_utils.py` - Utility Function Testing
**Purpose**: Tests file utilities, token counting, and directory handling functions
**Coverage Areas**:
- **File reading and processing**
- **Token counting and limits**
- **Directory traversal and expansion**
- **Path validation and security**
### 2. Advanced Feature Tests
#### `test_collaboration.py` - AI-to-AI Collaboration
**Purpose**: Tests dynamic context requests and collaborative AI workflows
**Key Scenarios**:
- **Clarification request parsing**
- **Dynamic context expansion**
- **AI-to-AI communication protocols**
- **Collaboration workflow validation**
**Example Test**:
```python
async def test_clarification_request_parsing():
"""Test parsing of AI clarification requests for additional context."""
# Validates that Gemini can request additional files/context
# and Claude can respond appropriately
```
#### `test_cross_tool_continuation.py` - Cross-Tool Threading
**Purpose**: Tests conversation continuity across different tools
**Critical Features**:
- **Continuation ID persistence**
- **Context preservation between tools**
- **Thread management across tool switches**
- **File context sharing between AI agents**
#### `test_conversation_memory.py` - Memory Persistence
**Purpose**: Tests Redis-based conversation storage and retrieval
**Test Coverage**:
- **Conversation storage and retrieval**
- **Thread context management**
- **TTL (time-to-live) handling**
- **Memory cleanup and optimization**
#### `test_thinking_modes.py` - Cognitive Load Management
**Purpose**: Tests thinking mode functionality across all tools
**Validation Areas**:
- **Token budget enforcement**
- **Mode selection and application**
- **Performance characteristics**
- **Quality vs. cost trade-offs**
### 3. Specialized Testing
#### `test_large_prompt_handling.py` - Scale Testing
**Purpose**: Tests handling of prompts exceeding MCP token limits
**Key Scenarios**:
- **Large prompt detection (>50,000 characters)**
- **Automatic file-based prompt handling**
- **MCP token limit workarounds**
- **Response capacity preservation**
**Critical Flow Testing**:
```python
async def test_large_prompt_file_handling():
"""Test that large prompts are automatically handled via file mechanism."""
# Validates the workaround for MCP's 25K token limit
```
#### `test_docker_path_integration.py` - Environment Testing
**Purpose**: Tests Docker environment path translation and workspace mounting
**Coverage**:
- **Host-to-container path mapping**
- **Workspace directory access**
- **Cross-platform path handling**
- **Security boundary enforcement**
#### `test_precommit.py` - Quality Gate Testing
**Purpose**: Tests pre-commit validation and git integration
**Validation Areas**:
- **Git repository discovery**
- **Change detection and analysis**
- **Multi-repository support**
- **Security scanning of changes**
### 4. Regression and Bug Fix Tests
#### `test_conversation_history_bug.py` - Bug Fix Validation
**Purpose**: Regression test for conversation history duplication bug
**Specific Coverage**:
- **Conversation deduplication**
- **History consistency**
- **Memory leak prevention**
- **Thread integrity**
#### `test_prompt_regression.py` - Normal Operation Validation
**Purpose**: Ensures normal prompt handling continues to work correctly
**Test Focus**:
- **Standard prompt processing**
- **Backward compatibility**
- **Feature regression prevention**
- **Performance baseline maintenance**
#### `test_claude_continuation.py` - Session Management
**Purpose**: Tests Claude continuation opportunities and session management
**Key Areas**:
- **Session state management**
- **Continuation opportunity detection**
- **Context preservation**
- **Session cleanup and termination**
### 5. Live Integration Testing
#### `test_live_integration.py` - Real API Testing
**Purpose**: Tests actual Gemini API integration (excluded from regular CI)
**Requirements**:
- Valid `GEMINI_API_KEY` environment variable
- Network connectivity to Google AI services
- Redis server for conversation memory testing
**Test Categories**:
- **Basic API request/response validation**
- **Tool execution with real Gemini responses**
- **Conversation threading with actual AI**
- **Error handling with real API responses**
**Exclusion from CI**:
```python
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
class TestLiveIntegration:
"""Tests requiring actual Gemini API access."""
```
## Test Configuration Analysis
### `conftest.py` - Global Test Setup
**Key Fixtures and Configuration**:
#### Environment Isolation
```python
# Ensures tests run in isolated sandbox environment
os.environ["MCP_PROJECT_ROOT"] = str(temp_dir)
```
#### Dummy API Keys
```python
# Provides safe dummy keys for testing without real credentials
os.environ["GEMINI_API_KEY"] = "dummy-key-for-testing"
```
#### Cross-Platform Compatibility
```python
# Handles Windows async event loop configuration
if platform.system() == "Windows":
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
```
#### Project Path Fixtures
```python
@pytest.fixture
def project_path():
"""Provides safe project path for file operations in tests."""
```
### `pytest.ini` - Test Runner Configuration
**Key Settings**:
```ini
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
asyncio_mode = auto
addopts =
-v
--strict-markers
--tb=short
```
## Mocking Strategies
### 1. Gemini API Mocking
**Pattern Used**:
```python
@patch("tools.base.BaseTool.create_model")
async def test_tool_execution(self, mock_create_model):
mock_model = Mock()
mock_model.generate_content.return_value = Mock(
candidates=[Mock(content=Mock(parts=[Mock(text="Mocked response")]))]
)
mock_create_model.return_value = mock_model
```
**Benefits**:
- **No API key required** for unit and integration tests
- **Predictable responses** for consistent testing
- **Fast execution** without network dependencies
- **Cost-effective** testing without API charges
### 2. Redis Memory Mocking
**Pattern Used**:
```python
@patch("utils.conversation_memory.get_redis_client")
def test_conversation_flow(self, mock_redis):
mock_client = Mock()
mock_redis.return_value = mock_client
# Test conversation persistence logic
```
**Advantages**:
- **No Redis server required** for testing
- **Controlled state** for predictable test scenarios
- **Error simulation** for resilience testing
### 3. File System Mocking
**Pattern Used**:
```python
@patch("builtins.open", mock_open(read_data="test file content"))
@patch("os.path.exists", return_value=True)
def test_file_operations():
# Test file reading without actual files
```
**Security Benefits**:
- **No file system access** during testing
- **Path validation testing** without security risks
- **Consistent test data** across environments
## Security Testing Focus
### Path Validation Testing
**Critical Security Tests**:
1. **Absolute path enforcement** - All tools must reject relative paths
2. **Directory traversal prevention** - Block `../` and similar patterns
3. **Symlink attack prevention** - Detect and block symbolic link attacks
4. **Sandbox boundary enforcement** - Restrict access to allowed directories
**Example Security Test**:
```python
async def test_path_traversal_attack_prevention():
"""Test that directory traversal attacks are blocked."""
dangerous_paths = [
"../../../etc/passwd",
"/etc/shadow",
"~/../../root/.ssh/id_rsa"
]
for path in dangerous_paths:
with pytest.raises(SecurityError):
await tool.execute({"files": [path]})
```
### Docker Security Testing
**Container Security Validation**:
- **Workspace mounting** - Verify read-only access enforcement
- **Path translation** - Test host-to-container path mapping
- **Privilege boundaries** - Ensure container cannot escape sandbox
## Test Execution Patterns
### Parallel Test Execution
**Strategy**: Tests are designed for parallel execution with proper isolation
**Benefits**:
- **Faster test suite** execution
- **Resource efficiency** for CI/CD
- **Scalable testing** for large codebases
### Conditional Test Execution
**Live Test Skipping**:
```python
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
```
**Platform-Specific Tests**:
```python
@pytest.mark.skipif(platform.system() == "Windows", reason="Unix-specific test")
```
## Test Quality Metrics
### Coverage Analysis
**Current Test Coverage by Category**:
-**Tool Functionality**: All 7 tools comprehensively tested
-**Server Operations**: Complete request/response cycle coverage
-**Security Validation**: Path safety and access control testing
-**Collaboration Features**: AI-to-AI communication patterns
-**Memory Management**: Conversation persistence and threading
-**Error Handling**: Graceful degradation and error recovery
### Test Reliability
**Design Characteristics**:
- **Deterministic**: Tests produce consistent results
- **Isolated**: No test dependencies or shared state
- **Fast**: Unit tests complete in milliseconds
- **Comprehensive**: Edge cases and error conditions covered
## Integration with Development Workflow
### Test-Driven Development Support
**TDD Cycle Integration**:
1. **Red**: Write failing test for new functionality
2. **Green**: Implement minimal code to pass test
3. **Refactor**: Improve code while maintaining test coverage
### Pre-Commit Testing
**Quality Gates**:
- **Security validation** before commits
- **Functionality regression** prevention
- **Code quality** maintenance
- **Performance baseline** protection
### CI/CD Integration
**GitHub Actions Workflow**:
- **Multi-Python version** testing (3.10, 3.11, 3.12)
- **Parallel test execution** for efficiency
- **Selective live testing** when API keys available
- **Coverage reporting** and quality gates
## Best Practices Demonstrated
### 1. Comprehensive Mocking
Every external dependency is properly mocked for reliable testing
### 2. Security-First Approach
Strong emphasis on security validation and vulnerability prevention
### 3. Collaboration Testing
Extensive testing of AI-to-AI communication and workflow patterns
### 4. Real-World Scenarios
Tests cover actual usage patterns and edge cases
### 5. Maintainable Structure
Clear organization and focused test files for easy maintenance
## Recommendations for Contributors
### Adding New Tests
1. **Follow Naming Conventions**: Use descriptive test names that explain the scenario
2. **Maintain Isolation**: Mock all external dependencies
3. **Test Security**: Include path validation and security checks
4. **Cover Edge Cases**: Test error conditions and boundary cases
5. **Document Purpose**: Use docstrings to explain test objectives
### Test Quality Standards
1. **Fast Execution**: Unit tests should complete in milliseconds
2. **Predictable Results**: Tests should be deterministic
3. **Clear Assertions**: Use descriptive assertion messages
4. **Proper Cleanup**: Ensure tests don't leave side effects
### Testing New Features
1. **Start with Unit Tests**: Test individual components first
2. **Add Integration Tests**: Test component interactions
3. **Include Security Tests**: Validate security measures
4. **Test Collaboration**: If relevant, test AI-to-AI workflows
---
This test structure demonstrates a mature, production-ready testing approach that ensures code quality, security, and reliability while supporting the collaborative AI development patterns that make this project unique.