- Repository URL consistency: Updated all references to BeehiveInnovations/zen-mcp-server format - Documentation clarity: Fixed misleading table headers and improved Docker configuration examples - File conventions: Added missing final newlines to all files - Configuration consistency: Clarified API key placeholder format in documentation Addresses all points raised in PR #17 review by Gemini Code Assist. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
472 lines
14 KiB
Markdown
472 lines
14 KiB
Markdown
# Test Structure Documentation
|
|
|
|
## Overview
|
|
|
|
This document provides a comprehensive analysis of the existing test structure in the Gemini MCP Server project. The test suite consists of **17 specialized test files** organized to validate all aspects of the system from unit-level functionality to complex AI collaboration workflows.
|
|
|
|
## Test Organization
|
|
|
|
### Test Directory Structure
|
|
|
|
```
|
|
tests/
|
|
├── __init__.py # Package initialization
|
|
├── conftest.py # Global test configuration and fixtures
|
|
├── test_claude_continuation.py # Claude continuation opportunities
|
|
├── test_collaboration.py # AI-to-AI collaboration features
|
|
├── test_config.py # Configuration validation
|
|
├── test_conversation_history_bug.py # Bug fix regression tests
|
|
├── test_conversation_memory.py # Redis-based conversation persistence
|
|
├── test_cross_tool_continuation.py # Cross-tool conversation threading
|
|
├── test_docker_path_integration.py # Docker environment path translation
|
|
├── test_large_prompt_handling.py # Large prompt detection and handling
|
|
├── test_live_integration.py # Live API testing (excluded from CI)
|
|
├── test_precommit.py # Pre-commit validation and git integration
|
|
├── test_prompt_regression.py # Normal prompt handling regression
|
|
├── test_server.py # Main server functionality
|
|
├── test_thinking_modes.py # Thinking mode functionality
|
|
├── test_tools.py # Individual tool implementations
|
|
└── test_utils.py # Utility function testing
|
|
```
|
|
|
|
## Test Categories and Analysis
|
|
|
|
### 1. Core Functionality Tests
|
|
|
|
#### `test_server.py` - Main Server Functionality
|
|
**Purpose**: Tests the core MCP server implementation and tool dispatch mechanism
|
|
|
|
**Key Test Classes**:
|
|
- **Server startup and initialization**
|
|
- **Tool registration and availability**
|
|
- **Request routing and handling**
|
|
- **Error propagation and handling**
|
|
|
|
**Example Coverage**:
|
|
```python
|
|
# Tests tool listing functionality
|
|
def test_list_tools()
|
|
|
|
# Tests tool execution pipeline
|
|
async def test_call_tool()
|
|
|
|
# Tests error handling for invalid tools
|
|
async def test_call_invalid_tool()
|
|
```
|
|
|
|
#### `test_config.py` - Configuration Management
|
|
**Purpose**: Validates configuration loading, environment variable handling, and settings validation
|
|
|
|
**Key Areas**:
|
|
- **Environment variable parsing**
|
|
- **Default value handling**
|
|
- **Configuration validation**
|
|
- **Error handling for missing required config**
|
|
|
|
#### `test_tools.py` - Tool Implementation Testing
|
|
**Purpose**: Tests individual tool implementations with comprehensive input validation
|
|
|
|
**Key Features**:
|
|
- **Absolute path enforcement across all tools**
|
|
- **Parameter validation for each tool**
|
|
- **Error handling for malformed inputs**
|
|
- **Tool-specific behavior validation**
|
|
|
|
**Critical Security Testing**:
|
|
```python
|
|
# Tests that all tools enforce absolute paths
|
|
async def test_tool_absolute_path_requirement()
|
|
|
|
# Tests path traversal attack prevention
|
|
async def test_tool_path_traversal_prevention()
|
|
```
|
|
|
|
#### `test_utils.py` - Utility Function Testing
|
|
**Purpose**: Tests file utilities, token counting, and directory handling functions
|
|
|
|
**Coverage Areas**:
|
|
- **File reading and processing**
|
|
- **Token counting and limits**
|
|
- **Directory traversal and expansion**
|
|
- **Path validation and security**
|
|
|
|
### 2. Advanced Feature Tests
|
|
|
|
#### `test_collaboration.py` - AI-to-AI Collaboration
|
|
**Purpose**: Tests dynamic context requests and collaborative AI workflows
|
|
|
|
**Key Scenarios**:
|
|
- **Clarification request parsing**
|
|
- **Dynamic context expansion**
|
|
- **AI-to-AI communication protocols**
|
|
- **Collaboration workflow validation**
|
|
|
|
**Example Test**:
|
|
```python
|
|
async def test_clarification_request_parsing():
|
|
"""Test parsing of AI clarification requests for additional context."""
|
|
# Validates that Gemini can request additional files/context
|
|
# and Claude can respond appropriately
|
|
```
|
|
|
|
#### `test_cross_tool_continuation.py` - Cross-Tool Threading
|
|
**Purpose**: Tests conversation continuity across different tools
|
|
|
|
**Critical Features**:
|
|
- **Continuation ID persistence**
|
|
- **Context preservation between tools**
|
|
- **Thread management across tool switches**
|
|
- **File context sharing between AI agents**
|
|
|
|
#### `test_conversation_memory.py` - Memory Persistence
|
|
**Purpose**: Tests Redis-based conversation storage and retrieval
|
|
|
|
**Test Coverage**:
|
|
- **Conversation storage and retrieval**
|
|
- **Thread context management**
|
|
- **TTL (time-to-live) handling**
|
|
- **Memory cleanup and optimization**
|
|
|
|
#### `test_thinking_modes.py` - Cognitive Load Management
|
|
**Purpose**: Tests thinking mode functionality across all tools
|
|
|
|
**Validation Areas**:
|
|
- **Token budget enforcement**
|
|
- **Mode selection and application**
|
|
- **Performance characteristics**
|
|
- **Quality vs. cost trade-offs**
|
|
|
|
### 3. Specialized Testing
|
|
|
|
#### `test_large_prompt_handling.py` - Scale Testing
|
|
**Purpose**: Tests handling of prompts exceeding MCP token limits
|
|
|
|
**Key Scenarios**:
|
|
- **Large prompt detection (>50,000 characters)**
|
|
- **Automatic file-based prompt handling**
|
|
- **MCP token limit workarounds**
|
|
- **Response capacity preservation**
|
|
|
|
**Critical Flow Testing**:
|
|
```python
|
|
async def test_large_prompt_file_handling():
|
|
"""Test that large prompts are automatically handled via file mechanism."""
|
|
# Validates the workaround for MCP's 25K token limit
|
|
```
|
|
|
|
#### `test_docker_path_integration.py` - Environment Testing
|
|
**Purpose**: Tests Docker environment path translation and workspace mounting
|
|
|
|
**Coverage**:
|
|
- **Host-to-container path mapping**
|
|
- **Workspace directory access**
|
|
- **Cross-platform path handling**
|
|
- **Security boundary enforcement**
|
|
|
|
#### `test_precommit.py` - Quality Gate Testing
|
|
**Purpose**: Tests pre-commit validation and git integration
|
|
|
|
**Validation Areas**:
|
|
- **Git repository discovery**
|
|
- **Change detection and analysis**
|
|
- **Multi-repository support**
|
|
- **Security scanning of changes**
|
|
|
|
### 4. Regression and Bug Fix Tests
|
|
|
|
#### `test_conversation_history_bug.py` - Bug Fix Validation
|
|
**Purpose**: Regression test for conversation history duplication bug
|
|
|
|
**Specific Coverage**:
|
|
- **Conversation deduplication**
|
|
- **History consistency**
|
|
- **Memory leak prevention**
|
|
- **Thread integrity**
|
|
|
|
#### `test_prompt_regression.py` - Normal Operation Validation
|
|
**Purpose**: Ensures normal prompt handling continues to work correctly
|
|
|
|
**Test Focus**:
|
|
- **Standard prompt processing**
|
|
- **Backward compatibility**
|
|
- **Feature regression prevention**
|
|
- **Performance baseline maintenance**
|
|
|
|
#### `test_claude_continuation.py` - Session Management
|
|
**Purpose**: Tests Claude continuation opportunities and session management
|
|
|
|
**Key Areas**:
|
|
- **Session state management**
|
|
- **Continuation opportunity detection**
|
|
- **Context preservation**
|
|
- **Session cleanup and termination**
|
|
|
|
### 5. Live Integration Testing
|
|
|
|
#### `test_live_integration.py` - Real API Testing
|
|
**Purpose**: Tests actual Gemini API integration (excluded from regular CI)
|
|
|
|
**Requirements**:
|
|
- Valid `GEMINI_API_KEY` environment variable
|
|
- Network connectivity to Google AI services
|
|
- Redis server for conversation memory testing
|
|
|
|
**Test Categories**:
|
|
- **Basic API request/response validation**
|
|
- **Tool execution with real Gemini responses**
|
|
- **Conversation threading with actual AI**
|
|
- **Error handling with real API responses**
|
|
|
|
**Exclusion from CI**:
|
|
```python
|
|
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
|
|
class TestLiveIntegration:
|
|
"""Tests requiring actual Gemini API access."""
|
|
```
|
|
|
|
## Test Configuration Analysis
|
|
|
|
### `conftest.py` - Global Test Setup
|
|
|
|
**Key Fixtures and Configuration**:
|
|
|
|
#### Environment Isolation
|
|
```python
|
|
# Ensures tests run in isolated sandbox environment
|
|
os.environ["MCP_PROJECT_ROOT"] = str(temp_dir)
|
|
```
|
|
|
|
#### Dummy API Keys
|
|
```python
|
|
# Provides safe dummy keys for testing without real credentials
|
|
os.environ["GEMINI_API_KEY"] = "dummy-key-for-testing"
|
|
```
|
|
|
|
#### Cross-Platform Compatibility
|
|
```python
|
|
# Handles Windows async event loop configuration
|
|
if platform.system() == "Windows":
|
|
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
|
```
|
|
|
|
#### Project Path Fixtures
|
|
```python
|
|
@pytest.fixture
|
|
def project_path():
|
|
"""Provides safe project path for file operations in tests."""
|
|
```
|
|
|
|
### `pytest.ini` - Test Runner Configuration
|
|
|
|
**Key Settings**:
|
|
```ini
|
|
[pytest]
|
|
testpaths = tests
|
|
python_files = test_*.py
|
|
python_classes = Test*
|
|
python_functions = test_*
|
|
asyncio_mode = auto
|
|
addopts =
|
|
-v
|
|
--strict-markers
|
|
--tb=short
|
|
```
|
|
|
|
## Mocking Strategies
|
|
|
|
### 1. Gemini API Mocking
|
|
|
|
**Pattern Used**:
|
|
```python
|
|
@patch("tools.base.BaseTool.create_model")
|
|
async def test_tool_execution(self, mock_create_model):
|
|
mock_model = Mock()
|
|
mock_model.generate_content.return_value = Mock(
|
|
candidates=[Mock(content=Mock(parts=[Mock(text="Mocked response")]))]
|
|
)
|
|
mock_create_model.return_value = mock_model
|
|
```
|
|
|
|
**Benefits**:
|
|
- **No API key required** for unit and integration tests
|
|
- **Predictable responses** for consistent testing
|
|
- **Fast execution** without network dependencies
|
|
- **Cost-effective** testing without API charges
|
|
|
|
### 2. Redis Memory Mocking
|
|
|
|
**Pattern Used**:
|
|
```python
|
|
@patch("utils.conversation_memory.get_redis_client")
|
|
def test_conversation_flow(self, mock_redis):
|
|
mock_client = Mock()
|
|
mock_redis.return_value = mock_client
|
|
# Test conversation persistence logic
|
|
```
|
|
|
|
**Advantages**:
|
|
- **No Redis server required** for testing
|
|
- **Controlled state** for predictable test scenarios
|
|
- **Error simulation** for resilience testing
|
|
|
|
### 3. File System Mocking
|
|
|
|
**Pattern Used**:
|
|
```python
|
|
@patch("builtins.open", mock_open(read_data="test file content"))
|
|
@patch("os.path.exists", return_value=True)
|
|
def test_file_operations():
|
|
# Test file reading without actual files
|
|
```
|
|
|
|
**Security Benefits**:
|
|
- **No file system access** during testing
|
|
- **Path validation testing** without security risks
|
|
- **Consistent test data** across environments
|
|
|
|
## Security Testing Focus
|
|
|
|
### Path Validation Testing
|
|
|
|
**Critical Security Tests**:
|
|
1. **Absolute path enforcement** - All tools must reject relative paths
|
|
2. **Directory traversal prevention** - Block `../` and similar patterns
|
|
3. **Symlink attack prevention** - Detect and block symbolic link attacks
|
|
4. **Sandbox boundary enforcement** - Restrict access to allowed directories
|
|
|
|
**Example Security Test**:
|
|
```python
|
|
async def test_path_traversal_attack_prevention():
|
|
"""Test that directory traversal attacks are blocked."""
|
|
dangerous_paths = [
|
|
"../../../etc/passwd",
|
|
"/etc/shadow",
|
|
"~/../../root/.ssh/id_rsa"
|
|
]
|
|
|
|
for path in dangerous_paths:
|
|
with pytest.raises(SecurityError):
|
|
await tool.execute({"files": [path]})
|
|
```
|
|
|
|
### Docker Security Testing
|
|
|
|
**Container Security Validation**:
|
|
- **Workspace mounting** - Verify read-only access enforcement
|
|
- **Path translation** - Test host-to-container path mapping
|
|
- **Privilege boundaries** - Ensure container cannot escape sandbox
|
|
|
|
## Test Execution Patterns
|
|
|
|
### Parallel Test Execution
|
|
|
|
**Strategy**: Tests are designed for parallel execution with proper isolation
|
|
|
|
**Benefits**:
|
|
- **Faster test suite** execution
|
|
- **Resource efficiency** for CI/CD
|
|
- **Scalable testing** for large codebases
|
|
|
|
### Conditional Test Execution
|
|
|
|
**Live Test Skipping**:
|
|
```python
|
|
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
|
|
```
|
|
|
|
**Platform-Specific Tests**:
|
|
```python
|
|
@pytest.mark.skipif(platform.system() == "Windows", reason="Unix-specific test")
|
|
```
|
|
|
|
## Test Quality Metrics
|
|
|
|
### Coverage Analysis
|
|
|
|
**Current Test Coverage by Category**:
|
|
- ✅ **Tool Functionality**: All 7 tools comprehensively tested
|
|
- ✅ **Server Operations**: Complete request/response cycle coverage
|
|
- ✅ **Security Validation**: Path safety and access control testing
|
|
- ✅ **Collaboration Features**: AI-to-AI communication patterns
|
|
- ✅ **Memory Management**: Conversation persistence and threading
|
|
- ✅ **Error Handling**: Graceful degradation and error recovery
|
|
|
|
### Test Reliability
|
|
|
|
**Design Characteristics**:
|
|
- **Deterministic**: Tests produce consistent results
|
|
- **Isolated**: No test dependencies or shared state
|
|
- **Fast**: Unit tests complete in milliseconds
|
|
- **Comprehensive**: Edge cases and error conditions covered
|
|
|
|
## Integration with Development Workflow
|
|
|
|
### Test-Driven Development Support
|
|
|
|
**TDD Cycle Integration**:
|
|
1. **Red**: Write failing test for new functionality
|
|
2. **Green**: Implement minimal code to pass test
|
|
3. **Refactor**: Improve code while maintaining test coverage
|
|
|
|
### Pre-Commit Testing
|
|
|
|
**Quality Gates**:
|
|
- **Security validation** before commits
|
|
- **Functionality regression** prevention
|
|
- **Code quality** maintenance
|
|
- **Performance baseline** protection
|
|
|
|
### CI/CD Integration
|
|
|
|
**GitHub Actions Workflow**:
|
|
- **Multi-Python version** testing (3.10, 3.11, 3.12)
|
|
- **Parallel test execution** for efficiency
|
|
- **Selective live testing** when API keys available
|
|
- **Coverage reporting** and quality gates
|
|
|
|
## Best Practices Demonstrated
|
|
|
|
### 1. Comprehensive Mocking
|
|
Every external dependency is properly mocked for reliable testing
|
|
|
|
### 2. Security-First Approach
|
|
Strong emphasis on security validation and vulnerability prevention
|
|
|
|
### 3. Collaboration Testing
|
|
Extensive testing of AI-to-AI communication and workflow patterns
|
|
|
|
### 4. Real-World Scenarios
|
|
Tests cover actual usage patterns and edge cases
|
|
|
|
### 5. Maintainable Structure
|
|
Clear organization and focused test files for easy maintenance
|
|
|
|
## Recommendations for Contributors
|
|
|
|
### Adding New Tests
|
|
|
|
1. **Follow Naming Conventions**: Use descriptive test names that explain the scenario
|
|
2. **Maintain Isolation**: Mock all external dependencies
|
|
3. **Test Security**: Include path validation and security checks
|
|
4. **Cover Edge Cases**: Test error conditions and boundary cases
|
|
5. **Document Purpose**: Use docstrings to explain test objectives
|
|
|
|
### Test Quality Standards
|
|
|
|
1. **Fast Execution**: Unit tests should complete in milliseconds
|
|
2. **Predictable Results**: Tests should be deterministic
|
|
3. **Clear Assertions**: Use descriptive assertion messages
|
|
4. **Proper Cleanup**: Ensure tests don't leave side effects
|
|
|
|
### Testing New Features
|
|
|
|
1. **Start with Unit Tests**: Test individual components first
|
|
2. **Add Integration Tests**: Test component interactions
|
|
3. **Include Security Tests**: Validate security measures
|
|
4. **Test Collaboration**: If relevant, test AI-to-AI workflows
|
|
|
|
---
|
|
|
|
This test structure demonstrates a mature, production-ready testing approach that ensures code quality, security, and reliability while supporting the collaborative AI development patterns that make this project unique.
|
|
|