Files

Patryk Ciechanski e37cb71da2 fix: Address Gemini Code Assist review feedback

- Repository URL consistency: Updated all references to BeehiveInnovations/zen-mcp-server format
- Documentation clarity: Fixed misleading table headers and improved Docker configuration examples
- File conventions: Added missing final newlines to all files
- Configuration consistency: Clarified API key placeholder format in documentation

Addresses all points raised in PR #17 review by Gemini Code Assist.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-06-12 15:38:01 +02:00

19 KiB

Raw Blame History

Testing Strategy & Guidelines

Overview

This document outlines the comprehensive testing strategy for the Gemini MCP Server project, including unit testing, integration testing, and quality assurance practices that align with CLAUDE.md collaboration patterns.

Testing Philosophy

Test-Driven Development (TDD)

TDD Cycle:

Red: Write failing test for new functionality
Green: Implement minimal code to pass the test
Refactor: Improve code while maintaining test coverage
Repeat: Continue cycle for all new features

Example TDD Flow:

# 1. Write failing test
def test_chat_tool_should_process_simple_prompt():
    tool = ChatTool()
    result = tool.execute({"prompt": "Hello"})
    assert result.status == "success"
    assert "hello" in result.content.lower()

# 2. Implement minimal functionality
class ChatTool:
    def execute(self, request):
        return ToolOutput(content="Hello!", status="success")

# 3. Refactor and enhance

Testing Pyramid

    /\
   /  \     E2E Tests (Few, High-Value)
  /____\    Integration Tests (Some, Key Paths)  
 /______\   Unit Tests (Many, Fast, Isolated)
/________\

Distribution:

70% Unit Tests: Fast, isolated, comprehensive coverage
20% Integration Tests: Component interaction validation
10% End-to-End Tests: Complete workflow validation

Test Categories

1. Unit Tests

Purpose: Test individual functions and classes in isolation

Location: tests/unit/

Example Structure:

# tests/unit/test_file_utils.py
import pytest
from unittest.mock import Mock, patch, mock_open

from utils.file_utils import validate_file_path, read_file_with_token_limit

class TestFileUtils:
    """Unit tests for file utility functions."""
    
    def test_validate_file_path_with_safe_path(self):
        """Test that safe file paths pass validation."""
        safe_path = "/workspace/tools/chat.py"
        assert validate_file_path(safe_path) is True
    
    def test_validate_file_path_with_traversal_attack(self):
        """Test that directory traversal attempts are blocked."""
        dangerous_path = "/workspace/../../../etc/passwd"
        with pytest.raises(SecurityError):
            validate_file_path(dangerous_path)
    
    @patch('builtins.open', new_callable=mock_open, read_data="test content")
    def test_read_file_with_token_limit(self, mock_file):
        """Test file reading with token budget enforcement."""
        content = read_file_with_token_limit("/test/file.py", max_tokens=100)
        assert "test content" in content
        mock_file.assert_called_once_with("/test/file.py", 'r', encoding='utf-8')

Unit Test Guidelines:

Isolation: Mock external dependencies (file system, network, database)
Fast Execution: Each test should complete in milliseconds
Single Responsibility: One test per behavior/scenario
Descriptive Names: Test names should describe the scenario and expected outcome

2. Integration Tests

Purpose: Test component interactions and system integration

Location: tests/integration/

Example Structure:

# tests/integration/test_tool_execution.py
import pytest
import asyncio
from unittest.mock import patch

from server import call_tool
from tools.chat import ChatTool
from utils.conversation_memory import ConversationMemory

class TestToolExecution:
    """Integration tests for tool execution pipeline."""
    
    @pytest.fixture
    def mock_redis(self):
        """Mock Redis for conversation memory testing."""
        with patch('redis.Redis') as mock:
            yield mock
    
    @pytest.fixture
    def conversation_memory(self, mock_redis):
        """Create conversation memory with mocked Redis."""
        return ConversationMemory("redis://mock")
    
    async def test_chat_tool_execution_with_memory(self, conversation_memory):
        """Test chat tool execution with conversation memory integration."""
        # Arrange
        request = {
            "name": "chat",
            "arguments": {
                "prompt": "Hello",
                "continuation_id": "test-thread-123"
            }
        }
        
        # Act
        result = await call_tool(request["name"], request["arguments"])
        
        # Assert
        assert len(result) == 1
        assert result[0].type == "text"
        assert "hello" in result[0].text.lower()
    
    async def test_tool_execution_error_handling(self):
        """Test error handling in tool execution pipeline."""
        # Test with invalid tool name
        with pytest.raises(ToolNotFoundError):
            await call_tool("nonexistent_tool", {})

Integration Test Guidelines:

Real Component Interaction: Test actual component communication
Mock External Services: Mock external APIs (Gemini, Redis) for reliability
Error Scenarios: Test error propagation and handling
Async Testing: Use pytest-asyncio for async code testing

3. Live Integration Tests

Purpose: Test real API integration with external services

Location: tests/live/

Requirements:

Valid GEMINI_API_KEY environment variable
Redis server running (for conversation memory tests)
Network connectivity

Example Structure:

# tests/live/test_gemini_integration.py
import pytest
import os

from tools.chat import ChatTool
from tools.models import GeminiClient

@pytest.mark.live
@pytest.mark.skipif(not os.getenv("GEMINI_API_KEY"), reason="API key required")
class TestGeminiIntegration:
    """Live tests requiring actual Gemini API access."""
    
    def setup_method(self):
        """Set up for live testing."""
        self.api_key = os.getenv("GEMINI_API_KEY")
        self.client = GeminiClient(self.api_key)
    
    async def test_basic_gemini_request(self):
        """Test basic Gemini API request/response."""
        response = await self.client.generate_response(
            prompt="Say 'test successful'",
            thinking_mode="minimal"
        )
        assert "test successful" in response.lower()
    
    async def test_chat_tool_with_real_api(self):
        """Test ChatTool with real Gemini API integration."""
        tool = ChatTool()
        result = await tool.execute({
            "prompt": "What is 2+2?",
            "thinking_mode": "minimal"
        })
        
        assert result.status == "success"
        assert "4" in result.content

Live Test Guidelines:

Skip When Unavailable: Skip if API keys or services unavailable
Rate Limiting: Respect API rate limits with delays
Minimal Mode: Use minimal thinking mode for speed
Cleanup: Clean up any created resources

4. Security Tests

Purpose: Validate security measures and vulnerability prevention

Location: tests/security/

Example Structure:

# tests/security/test_path_validation.py
import pytest

from utils.file_utils import validate_file_path
from exceptions import SecurityError

class TestSecurityValidation:
    """Security-focused tests for input validation."""
    
    @pytest.mark.parametrize("dangerous_path", [
        "../../../etc/passwd",
        "/etc/shadow", 
        "~/../../root/.ssh/id_rsa",
        "/var/log/auth.log",
        "\\..\\..\\windows\\system32\\config\\sam"
    ])
    def test_dangerous_path_rejection(self, dangerous_path):
        """Test that dangerous file paths are rejected."""
        with pytest.raises(SecurityError):
            validate_file_path(dangerous_path)
    
    def test_secret_sanitization_in_logs(self):
        """Test that sensitive data is sanitized in log output."""
        request_data = {
            "prompt": "Hello",
            "api_key": "sk-secret123",
            "token": "bearer-token-456"
        }
        
        sanitized = sanitize_for_logging(request_data)
        
        assert sanitized["api_key"] == "[REDACTED]"
        assert sanitized["token"] == "[REDACTED]"
        assert sanitized["prompt"] == "Hello"  # Non-sensitive data preserved

Test Configuration

pytest Configuration

pytest.ini:

[tool:pytest]
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*
addopts = 
    -v
    --strict-markers
    --disable-warnings
    --cov=tools
    --cov=utils
    --cov-report=html
    --cov-report=term-missing
    --cov-fail-under=80

markers =
    unit: Unit tests (fast, isolated)
    integration: Integration tests (component interaction)
    live: Live tests requiring API keys and external services
    security: Security-focused tests
    slow: Tests that take more than 1 second

conftest.py:

# tests/conftest.py
import pytest
import asyncio
from unittest.mock import Mock, patch

@pytest.fixture(scope="session")
def event_loop():
    """Create an instance of the default event loop for the test session."""
    loop = asyncio.get_event_loop_policy().new_event_loop()
    yield loop
    loop.close()

@pytest.fixture
def mock_gemini_client():
    """Mock Gemini client for testing without API calls."""
    with patch('tools.models.GeminiClient') as mock:
        mock_instance = Mock()
        mock_instance.generate_response.return_value = "Mocked response"
        mock.return_value = mock_instance
        yield mock_instance

@pytest.fixture
def mock_redis():
    """Mock Redis client for testing without Redis server."""
    with patch('redis.Redis') as mock:
        yield mock

@pytest.fixture
def sample_file_content():
    """Sample file content for testing file processing."""
    return """
def example_function():
    # This is a sample function
    return "hello world"

class ExampleClass:
    def method(self):
        pass
"""

@pytest.fixture
def temp_project_directory(tmp_path):
    """Create temporary project directory structure for testing."""
    project_dir = tmp_path / "test_project"
    project_dir.mkdir()
    
    # Create subdirectories
    (project_dir / "tools").mkdir()
    (project_dir / "utils").mkdir()
    (project_dir / "tests").mkdir()
    
    # Create sample files
    (project_dir / "tools" / "sample.py").write_text("# Sample tool")
    (project_dir / "utils" / "helper.py").write_text("# Helper utility")
    
    return project_dir

Test Data Management

Test Fixtures

File-based Fixtures:

# tests/fixtures/sample_code.py
PYTHON_CODE_SAMPLE = '''
import asyncio
from typing import Dict, List

async def process_data(items: List[str]) -> Dict[str, int]:
    """Process a list of items and return counts."""
    result = {}
    for item in items:
        result[item] = len(item)
    return result
'''

JAVASCRIPT_CODE_SAMPLE = '''
async function processData(items) {
    const result = {};
    for (const item of items) {
        result[item] = item.length;
    }
    return result;
}
'''

ERROR_LOGS_SAMPLE = '''
2025-01-11 23:45:12 ERROR [tool_execution] Tool 'analyze' failed: File not found
Traceback (most recent call last):
  File "/app/tools/analyze.py", line 45, in execute
    content = read_file(file_path)
  File "/app/utils/file_utils.py", line 23, in read_file
    with open(file_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/file.py'
'''

Mock Data Factories

ToolOutput Factory:

# tests/factories.py
from dataclasses import dataclass
from typing import Dict, Any, List

def create_tool_output(
    content: str = "Default response",
    status: str = "success",
    metadata: Dict[str, Any] = None,
    files_processed: List[str] = None
) -> ToolOutput:
    """Factory for creating ToolOutput test instances."""
    return ToolOutput(
        content=content,
        metadata=metadata or {},
        files_processed=files_processed or [],
        status=status
    )

def create_thread_context(
    thread_id: str = "test-thread-123",
    files: List[str] = None
) -> ThreadContext:
    """Factory for creating ThreadContext test instances."""
    return ThreadContext(
        thread_id=thread_id,
        conversation_files=set(files or []),
        tool_history=[],
        context_tokens=0
    )

Mocking Strategies

External Service Mocking

Gemini API Mocking:

class MockGeminiClient:
    """Mock Gemini client for testing."""
    
    def __init__(self, responses: Dict[str, str] = None):
        self.responses = responses or {
            "default": "This is a mocked response from Gemini"
        }
        self.call_count = 0
    
    async def generate_response(self, prompt: str, **kwargs) -> str:
        """Mock response generation."""
        self.call_count += 1
        
        # Return specific response for specific prompts
        for key, response in self.responses.items():
            if key in prompt.lower():
                return response
        
        return self.responses.get("default", "Mock response")

# Usage in tests
@patch('tools.models.GeminiClient', MockGeminiClient)
def test_with_mocked_gemini():
    # Test implementation
    pass

File System Mocking:

@patch('builtins.open', mock_open(read_data="file content"))
@patch('os.path.exists', return_value=True)
@patch('os.path.getsize', return_value=1024)
def test_file_operations():
    """Test file operations with mocked file system."""
    content = read_file("/mocked/file.py")
    assert content == "file content"

Performance Testing

Load Testing

Concurrent Tool Execution:

# tests/performance/test_load.py
import asyncio
import pytest
import time

@pytest.mark.slow
class TestPerformance:
    """Performance tests for system load handling."""
    
    async def test_concurrent_tool_execution(self):
        """Test system performance under concurrent load."""
        start_time = time.time()
        
        # Create 10 concurrent tool execution tasks
        tasks = []
        for i in range(10):
            task = asyncio.create_task(
                call_tool("chat", {"prompt": f"Request {i}"})
            )
            tasks.append(task)
        
        # Wait for all tasks to complete
        results = await asyncio.gather(*tasks)
        
        end_time = time.time()
        execution_time = end_time - start_time
        
        # Verify all requests succeeded
        assert len(results) == 10
        assert all(len(result) == 1 for result in results)
        
        # Performance assertion (adjust based on requirements)
        assert execution_time < 30.0  # All requests should complete within 30s
    
    async def test_memory_usage_stability(self):
        """Test that memory usage remains stable under load."""
        import psutil
        import gc
        
        process = psutil.Process()
        initial_memory = process.memory_info().rss
        
        # Execute multiple operations
        for i in range(100):
            await call_tool("chat", {"prompt": f"Memory test {i}"})
            
            # Force garbage collection periodically
            if i % 10 == 0:
                gc.collect()
        
        final_memory = process.memory_info().rss
        memory_growth = final_memory - initial_memory
        
        # Memory growth should be reasonable (adjust threshold as needed)
        assert memory_growth < 100 * 1024 * 1024  # Less than 100MB growth

Test Execution

Running Tests

Basic Test Execution:

# Run all tests
pytest

# Run specific test categories
pytest -m unit                    # Unit tests only
pytest -m integration            # Integration tests only
pytest -m "not live"             # All tests except live tests
pytest -m "live and not slow"    # Live tests that are fast

# Run with coverage
pytest --cov=tools --cov=utils --cov-report=html

# Run specific test file
pytest tests/unit/test_file_utils.py -v

# Run specific test method
pytest tests/unit/test_file_utils.py::TestFileUtils::test_validate_file_path -v

Continuous Integration:

# CI test script
#!/bin/bash
set -e

echo "Running unit tests..."
pytest -m unit --cov=tools --cov=utils --cov-fail-under=80

echo "Running integration tests..."
pytest -m integration

echo "Running security tests..."
pytest -m security

echo "Checking code quality..."
flake8 tools/ utils/ tests/
mypy tools/ utils/

echo "All tests passed!"

Test Reports

Coverage Reports:

# Generate HTML coverage report
pytest --cov=tools --cov=utils --cov-report=html
open htmlcov/index.html

# Generate terminal coverage report
pytest --cov=tools --cov=utils --cov-report=term-missing

Test Results Export:

# Export test results to JUnit XML (for CI integration)
pytest --junitxml=test-results.xml

# Export test results with timing information
pytest --durations=10  # Show 10 slowest tests

Quality Metrics

Coverage Targets

Minimum Coverage Requirements:

Overall Coverage: 80%
Critical Modules: 90% (security, file_utils, conversation_memory)
Tool Modules: 85%
Utility Modules: 80%

Coverage Enforcement:

# Fail build if coverage drops below threshold
pytest --cov-fail-under=80

Test Quality Metrics

Test Suite Characteristics:

Fast Execution: Unit test suite should complete in <30 seconds
Reliable: Tests should have <1% flaky failure rate
Maintainable: Test code should follow same quality standards as production code
Comprehensive: All critical paths and edge cases covered

Integration with Development Workflow

Pre-commit Testing

Git Hook Integration:

#!/bin/sh
# .git/hooks/pre-commit

echo "Running pre-commit tests..."

# Run fast tests before commit
pytest -m "unit and not slow" --cov-fail-under=80

if [ $? -ne 0 ]; then
    echo "Tests failed. Commit blocked."
    exit 1
fi

echo "Pre-commit tests passed."

CI/CD Integration

GitHub Actions Workflow:

name: Test Suite
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.9, 3.10, 3.11]
    
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-asyncio
      
      - name: Run unit tests
        run: pytest -m unit --cov=tools --cov=utils --cov-fail-under=80
      
      - name: Run integration tests
        run: pytest -m integration
      
      - name: Run security tests
        run: pytest -m security
      
      - name: Upload coverage reports
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml

Detailed Test Structure Analysis

For a comprehensive analysis of the existing test suite, including detailed breakdowns of all 17 test files, security testing patterns, and collaboration feature validation, see:

Test Structure Documentation - Complete analysis of existing test organization, mocking strategies, and quality assurance patterns

This comprehensive testing strategy ensures high-quality, reliable code while maintaining development velocity and supporting the collaborative patterns defined in CLAUDE.md.

19 KiB Raw Blame History

Testing Strategy & Guidelines

Overview

Testing Philosophy

Test-Driven Development (TDD)

Testing Pyramid

Test Categories

1. Unit Tests

2. Integration Tests

3. Live Integration Tests

4. Security Tests

Test Configuration

pytest Configuration

Test Data Management

Test Fixtures

Mock Data Factories

Mocking Strategies

External Service Mocking

Performance Testing

Load Testing

Test Execution

Running Tests

Test Reports

Quality Metrics

Coverage Targets

Test Quality Metrics

Integration with Development Workflow

Pre-commit Testing

CI/CD Integration

Detailed Test Structure Analysis

19 KiB

Raw Blame History