Files

Patryk Ciechanski edef58eebf adding docs for tests

2025-06-11 15:00:20 +02:00

14 KiB

Raw Blame History

Test Structure Documentation

Overview

This document provides a comprehensive analysis of the existing test structure in the Gemini MCP Server project. The test suite consists of 17 specialized test files organized to validate all aspects of the system from unit-level functionality to complex AI collaboration workflows.

Test Organization

Test Directory Structure

tests/
├── __init__.py                     # Package initialization
├── conftest.py                     # Global test configuration and fixtures
├── test_claude_continuation.py     # Claude continuation opportunities
├── test_collaboration.py          # AI-to-AI collaboration features
├── test_config.py                  # Configuration validation
├── test_conversation_history_bug.py # Bug fix regression tests
├── test_conversation_memory.py     # Redis-based conversation persistence
├── test_cross_tool_continuation.py # Cross-tool conversation threading
├── test_docker_path_integration.py # Docker environment path translation
├── test_large_prompt_handling.py  # Large prompt detection and handling
├── test_live_integration.py       # Live API testing (excluded from CI)
├── test_precommit.py              # Pre-commit validation and git integration
├── test_prompt_regression.py      # Normal prompt handling regression
├── test_server.py                 # Main server functionality
├── test_thinking_modes.py         # Thinking mode functionality
├── test_tools.py                  # Individual tool implementations
└── test_utils.py                  # Utility function testing

Test Categories and Analysis

1. Core Functionality Tests

`test_server.py` - Main Server Functionality

Purpose: Tests the core MCP server implementation and tool dispatch mechanism

Key Test Classes:

Server startup and initialization
Tool registration and availability
Request routing and handling
Error propagation and handling

Example Coverage:

# Tests tool listing functionality
def test_list_tools()

# Tests tool execution pipeline
async def test_call_tool()

# Tests error handling for invalid tools
async def test_call_invalid_tool()

`test_config.py` - Configuration Management

Purpose: Validates configuration loading, environment variable handling, and settings validation

Key Areas:

Environment variable parsing
Default value handling
Configuration validation
Error handling for missing required config

`test_tools.py` - Tool Implementation Testing

Purpose: Tests individual tool implementations with comprehensive input validation

Key Features:

Absolute path enforcement across all tools
Parameter validation for each tool
Error handling for malformed inputs
Tool-specific behavior validation

Critical Security Testing:

# Tests that all tools enforce absolute paths
async def test_tool_absolute_path_requirement()

# Tests path traversal attack prevention
async def test_tool_path_traversal_prevention()

`test_utils.py` - Utility Function Testing

Purpose: Tests file utilities, token counting, and directory handling functions

Coverage Areas:

File reading and processing
Token counting and limits
Directory traversal and expansion
Path validation and security

2. Advanced Feature Tests

`test_collaboration.py` - AI-to-AI Collaboration

Purpose: Tests dynamic context requests and collaborative AI workflows

Key Scenarios:

Clarification request parsing
Dynamic context expansion
AI-to-AI communication protocols
Collaboration workflow validation

Example Test:

async def test_clarification_request_parsing():
    """Test parsing of AI clarification requests for additional context."""
    # Validates that Gemini can request additional files/context
    # and Claude can respond appropriately

`test_cross_tool_continuation.py` - Cross-Tool Threading

Purpose: Tests conversation continuity across different tools

Critical Features:

Continuation ID persistence
Context preservation between tools
Thread management across tool switches
File context sharing between AI agents

`test_conversation_memory.py` - Memory Persistence

Purpose: Tests Redis-based conversation storage and retrieval

Test Coverage:

Conversation storage and retrieval
Thread context management
TTL (time-to-live) handling
Memory cleanup and optimization

`test_thinking_modes.py` - Cognitive Load Management

Purpose: Tests thinking mode functionality across all tools

Validation Areas:

Token budget enforcement
Mode selection and application
Performance characteristics
Quality vs. cost trade-offs

3. Specialized Testing

`test_large_prompt_handling.py` - Scale Testing

Purpose: Tests handling of prompts exceeding MCP token limits

Key Scenarios:

Large prompt detection (>50,000 characters)
Automatic file-based prompt handling
MCP token limit workarounds
Response capacity preservation

Critical Flow Testing:

async def test_large_prompt_file_handling():
    """Test that large prompts are automatically handled via file mechanism."""
    # Validates the workaround for MCP's 25K token limit

`test_docker_path_integration.py` - Environment Testing

Purpose: Tests Docker environment path translation and workspace mounting

Coverage:

Host-to-container path mapping
Workspace directory access
Cross-platform path handling
Security boundary enforcement

`test_precommit.py` - Quality Gate Testing

Purpose: Tests pre-commit validation and git integration

Validation Areas:

Git repository discovery
Change detection and analysis
Multi-repository support
Security scanning of changes

4. Regression and Bug Fix Tests

`test_conversation_history_bug.py` - Bug Fix Validation

Purpose: Regression test for conversation history duplication bug

Specific Coverage:

Conversation deduplication
History consistency
Memory leak prevention
Thread integrity

`test_prompt_regression.py` - Normal Operation Validation

Purpose: Ensures normal prompt handling continues to work correctly

Test Focus:

Standard prompt processing
Backward compatibility
Feature regression prevention
Performance baseline maintenance

`test_claude_continuation.py` - Session Management

Purpose: Tests Claude continuation opportunities and session management

Key Areas:

Session state management
Continuation opportunity detection
Context preservation
Session cleanup and termination

5. Live Integration Testing

`test_live_integration.py` - Real API Testing

Purpose: Tests actual Gemini API integration (excluded from regular CI)

Requirements:

Valid GEMINI_API_KEY environment variable
Network connectivity to Google AI services
Redis server for conversation memory testing

Test Categories:

Basic API request/response validation
Tool execution with real Gemini responses
Conversation threading with actual AI
Error handling with real API responses

Exclusion from CI:

@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
class TestLiveIntegration:
    """Tests requiring actual Gemini API access."""

Test Configuration Analysis

`conftest.py` - Global Test Setup

Key Fixtures and Configuration:

Environment Isolation

# Ensures tests run in isolated sandbox environment
os.environ["MCP_PROJECT_ROOT"] = str(temp_dir)

Dummy API Keys

# Provides safe dummy keys for testing without real credentials
os.environ["GEMINI_API_KEY"] = "dummy-key-for-testing"

Cross-Platform Compatibility

# Handles Windows async event loop configuration
if platform.system() == "Windows":
    asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())

Project Path Fixtures

@pytest.fixture
def project_path():
    """Provides safe project path for file operations in tests."""

`pytest.ini` - Test Runner Configuration

Key Settings:

[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
asyncio_mode = auto
addopts = 
    -v
    --strict-markers
    --tb=short

Mocking Strategies

1. Gemini API Mocking

Pattern Used:

@patch("tools.base.BaseTool.create_model")
async def test_tool_execution(self, mock_create_model):
    mock_model = Mock()
    mock_model.generate_content.return_value = Mock(
        candidates=[Mock(content=Mock(parts=[Mock(text="Mocked response")]))]
    )
    mock_create_model.return_value = mock_model

Benefits:

No API key required for unit and integration tests
Predictable responses for consistent testing
Fast execution without network dependencies
Cost-effective testing without API charges

2. Redis Memory Mocking

Pattern Used:

@patch("utils.conversation_memory.get_redis_client")
def test_conversation_flow(self, mock_redis):
    mock_client = Mock()
    mock_redis.return_value = mock_client
    # Test conversation persistence logic

Advantages:

No Redis server required for testing
Controlled state for predictable test scenarios
Error simulation for resilience testing

3. File System Mocking

Pattern Used:

@patch("builtins.open", mock_open(read_data="test file content"))
@patch("os.path.exists", return_value=True)
def test_file_operations():
    # Test file reading without actual files

Security Benefits:

No file system access during testing
Path validation testing without security risks
Consistent test data across environments

Security Testing Focus

Path Validation Testing

Critical Security Tests:

Absolute path enforcement - All tools must reject relative paths
Directory traversal prevention - Block ../ and similar patterns
Symlink attack prevention - Detect and block symbolic link attacks
Sandbox boundary enforcement - Restrict access to allowed directories

Example Security Test:

async def test_path_traversal_attack_prevention():
    """Test that directory traversal attacks are blocked."""
    dangerous_paths = [
        "../../../etc/passwd",
        "/etc/shadow",
        "~/../../root/.ssh/id_rsa"
    ]
    
    for path in dangerous_paths:
        with pytest.raises(SecurityError):
            await tool.execute({"files": [path]})

Docker Security Testing

Container Security Validation:

Workspace mounting - Verify read-only access enforcement
Path translation - Test host-to-container path mapping
Privilege boundaries - Ensure container cannot escape sandbox

Test Execution Patterns

Parallel Test Execution

Strategy: Tests are designed for parallel execution with proper isolation

Benefits:

Faster test suite execution
Resource efficiency for CI/CD
Scalable testing for large codebases

Conditional Test Execution

Live Test Skipping:

@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")

Platform-Specific Tests:

@pytest.mark.skipif(platform.system() == "Windows", reason="Unix-specific test")

Test Quality Metrics

Coverage Analysis

Current Test Coverage by Category:

✅ Tool Functionality: All 7 tools comprehensively tested
✅ Server Operations: Complete request/response cycle coverage
✅ Security Validation: Path safety and access control testing
✅ Collaboration Features: AI-to-AI communication patterns
✅ Memory Management: Conversation persistence and threading
✅ Error Handling: Graceful degradation and error recovery

Test Reliability

Design Characteristics:

Deterministic: Tests produce consistent results
Isolated: No test dependencies or shared state
Fast: Unit tests complete in milliseconds
Comprehensive: Edge cases and error conditions covered

Integration with Development Workflow

Test-Driven Development Support

TDD Cycle Integration:

Red: Write failing test for new functionality
Green: Implement minimal code to pass test
Refactor: Improve code while maintaining test coverage

Pre-Commit Testing

Quality Gates:

Security validation before commits
Functionality regression prevention
Code quality maintenance
Performance baseline protection

CI/CD Integration

GitHub Actions Workflow:

Multi-Python version testing (3.10, 3.11, 3.12)
Parallel test execution for efficiency
Selective live testing when API keys available
Coverage reporting and quality gates

Best Practices Demonstrated

1. Comprehensive Mocking

Every external dependency is properly mocked for reliable testing

2. Security-First Approach

Strong emphasis on security validation and vulnerability prevention

3. Collaboration Testing

Extensive testing of AI-to-AI communication and workflow patterns

4. Real-World Scenarios

Tests cover actual usage patterns and edge cases

5. Maintainable Structure

Clear organization and focused test files for easy maintenance

Recommendations for Contributors

Adding New Tests

Follow Naming Conventions: Use descriptive test names that explain the scenario
Maintain Isolation: Mock all external dependencies
Test Security: Include path validation and security checks
Cover Edge Cases: Test error conditions and boundary cases
Document Purpose: Use docstrings to explain test objectives

Test Quality Standards

Fast Execution: Unit tests should complete in milliseconds
Predictable Results: Tests should be deterministic
Clear Assertions: Use descriptive assertion messages
Proper Cleanup: Ensure tests don't leave side effects

Testing New Features

Start with Unit Tests: Test individual components first
Add Integration Tests: Test component interactions
Include Security Tests: Validate security measures
Test Collaboration: If relevant, test AI-to-AI workflows

This test structure demonstrates a mature, production-ready testing approach that ensures code quality, security, and reliability while supporting the collaborative AI development patterns that make this project unique.

14 KiB Raw Blame History

Test Structure Documentation

Overview

Test Organization

Test Directory Structure

Test Categories and Analysis

1. Core Functionality Tests

test_server.py - Main Server Functionality

test_config.py - Configuration Management

test_tools.py - Tool Implementation Testing

test_utils.py - Utility Function Testing

2. Advanced Feature Tests

test_collaboration.py - AI-to-AI Collaboration

test_cross_tool_continuation.py - Cross-Tool Threading

test_conversation_memory.py - Memory Persistence

test_thinking_modes.py - Cognitive Load Management

3. Specialized Testing

test_large_prompt_handling.py - Scale Testing

test_docker_path_integration.py - Environment Testing

test_precommit.py - Quality Gate Testing

4. Regression and Bug Fix Tests

test_conversation_history_bug.py - Bug Fix Validation

test_prompt_regression.py - Normal Operation Validation

test_claude_continuation.py - Session Management

5. Live Integration Testing

test_live_integration.py - Real API Testing

Test Configuration Analysis

conftest.py - Global Test Setup

Environment Isolation

Dummy API Keys

Cross-Platform Compatibility

Project Path Fixtures

pytest.ini - Test Runner Configuration

Mocking Strategies

1. Gemini API Mocking

2. Redis Memory Mocking

3. File System Mocking

Security Testing Focus

Path Validation Testing

Docker Security Testing

Test Execution Patterns

Parallel Test Execution

Conditional Test Execution

Test Quality Metrics

Coverage Analysis

Test Reliability

Integration with Development Workflow

Test-Driven Development Support

Pre-Commit Testing

CI/CD Integration

Best Practices Demonstrated

1. Comprehensive Mocking

2. Security-First Approach

3. Collaboration Testing

4. Real-World Scenarios

5. Maintainable Structure

Recommendations for Contributors

Adding New Tests

Test Quality Standards

Testing New Features

14 KiB

Raw Blame History

`test_server.py` - Main Server Functionality

`test_config.py` - Configuration Management

`test_tools.py` - Tool Implementation Testing

`test_utils.py` - Utility Function Testing

`test_collaboration.py` - AI-to-AI Collaboration

`test_cross_tool_continuation.py` - Cross-Tool Threading

`test_conversation_memory.py` - Memory Persistence

`test_thinking_modes.py` - Cognitive Load Management

`test_large_prompt_handling.py` - Scale Testing

`test_docker_path_integration.py` - Environment Testing

`test_precommit.py` - Quality Gate Testing

`test_conversation_history_bug.py` - Bug Fix Validation

`test_prompt_regression.py` - Normal Operation Validation

`test_claude_continuation.py` - Session Management

`test_live_integration.py` - Real API Testing

`conftest.py` - Global Test Setup

`pytest.ini` - Test Runner Configuration