New tool: testgen

Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
2025-06-14 15:41:47 +04:00
parent 7d33aafcab
commit 4086306c58
14 changed files with 1118 additions and 9 deletions
--- a/README.md
+++ b/README.md
@@ -49,6 +49,7 @@ and review into consideration to aid with its pre-commit review.
  - [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation
  - [`debug`](#5-debug---expert-debugging-assistant) - Debugging help
  - [`analyze`](#6-analyze---smart-file-analysis) - File analysis
+  - [`testgen`](#7-testgen---comprehensive-test-generation) - Test generation with edge cases

 - **Advanced Usage**
  - [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
@@ -254,6 +255,7 @@ Just ask Claude naturally:
 - **Pre-commit validation?** → `precommit` (validate git changes before committing)
 - **Something's broken?** → `debug` (root cause analysis, error tracing)
 - **Want to understand code?** → `analyze` (architecture, patterns, dependencies)
+- **Need comprehensive tests?** → `testgen` (generates test suites with edge cases)
 - **Server info?** → `get_version` (version and configuration details)

 **Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
@@ -274,7 +276,8 @@ Just ask Claude naturally:
 4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing
 5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging
 6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis
-7. [`get_version`](#7-get_version---server-information) - Get server version and configuration
+7. [`testgen`](#7-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage
+8. [`get_version`](#8-get_version---server-information) - Get server version and configuration

 ### 1. `chat` - General Development Chat & Collaborative Thinking
 **Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
@@ -421,7 +424,30 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
 - Uses file paths (not content) for clean terminal output
 - Can identify patterns, anti-patterns, and refactoring opportunities
 - **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
-### 7. `get_version` - Server Information
+### 7. `testgen` - Comprehensive Test Generation
+**Generates thorough test suites with edge case coverage** based on existing code and test framework used.
+
+**Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
+
+#### Example Prompts:
+
+**Basic Usage:**
+```
+"Use zen to generate tests for User.login() method"
+"Generate comprehensive tests for the sorting method in src/new_sort.py using o3"
+"Create tests for edge cases not already covered in our tests using gemini pro"
+```
+
+**Key Features:**
+- Multi-agent workflow analyzing code paths and identifying realistic failure modes
+- Generates framework-specific tests following project conventions
+- Supports test pattern following when examples are provided
+- Dynamic token allocation (25% for test examples, 75% for main code)
+- Prioritizes smallest test files for pattern detection
+- Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
+- Specific code coverage - target specific functions/classes rather than testing everything
+
+### 8. `get_version` - Server Information
 ```
 "Get zen to show its version"
 ```
--- a/config.py
+++ b/config.py
@@ -14,7 +14,7 @@ import os
 # These values are used in server responses and for tracking releases
 # IMPORTANT: This is the single source of truth for version and author info
 # Semantic versioning: MAJOR.MINOR.PATCH
-__version__ = "4.3.3"
+__version__ = "4.4.0"
 # Last update date in ISO format
 __updated__ = "2025-06-14"
 # Primary maintainer
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -245,6 +245,20 @@ All tools that work with files support **both individual files and entire direct
 "Use o3 to think deeper about the logical flow in this algorithm"
 ```

+**`testgen`** - Comprehensive test generation with edge case coverage
+- `files`: Code files or directories to generate tests for (required)
+- `prompt`: Description of what to test, testing objectives, and scope (required)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
+- `test_examples`: Optional existing test files as style/pattern reference
+- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
+
+```
+"Generate tests for User.login() method with edge cases" (auto mode picks best model)
+"Use pro to generate comprehensive tests for src/payment.py with max thinking mode"
+"Use o3 to generate tests for algorithm correctness in sort_functions.py"
+"Generate tests following patterns from tests/unit/ for new auth module"
+```
+
 ## Collaborative Workflows

 ### Design → Review → Implement
@@ -277,13 +291,15 @@ To help choose the right tool for your needs:
 1. **Have a specific error/exception?** → Use `debug`
 2. **Want to find bugs/issues in code?** → Use `codereview`
 3. **Want to understand how code works?** → Use `analyze`
-4. **Have analysis that needs extension/validation?** → Use `thinkdeep`
-5. **Want to brainstorm or discuss?** → Use `chat`
+4. **Need comprehensive test coverage?** → Use `testgen`
+5. **Have analysis that needs extension/validation?** → Use `thinkdeep`
+6. **Want to brainstorm or discuss?** → Use `chat`

 **Key Distinctions:**
 - `analyze` vs `codereview`: analyze explains, codereview prescribes fixes
 - `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis
 - `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues
+- `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions

 ## Working with Large Prompts

--- a/server.py
+++ b/server.py
@@ -44,6 +44,7 @@ from tools import (
    CodeReviewTool,
    DebugIssueTool,
    Precommit,
+    TestGenTool,
    ThinkDeepTool,
 )
 from tools.models import ToolOutput
@@ -144,6 +145,7 @@ TOOLS = {
    "analyze": AnalyzeTool(),  # General-purpose file and code analysis
    "chat": ChatTool(),  # Interactive development chat and brainstorming
    "precommit": Precommit(),  # Pre-commit validation of git changes
+    "testgen": TestGenTool(),  # Comprehensive test generation with edge case coverage
 }


--- a/simulator_tests/init.py
+++ b/simulator_tests/init.py
@@ -19,6 +19,7 @@ from .test_openrouter_fallback import OpenRouterFallbackTest
 from .test_openrouter_models import OpenRouterModelsTest
 from .test_per_tool_deduplication import PerToolDeduplicationTest
 from .test_redis_validation import RedisValidationTest
+from .test_testgen_validation import TestGenValidationTest
 from .test_token_allocation_validation import TokenAllocationValidationTest

 # Test registry for dynamic loading
@@ -36,6 +37,7 @@ TEST_REGISTRY = {
    "openrouter_fallback": OpenRouterFallbackTest,
    "openrouter_models": OpenRouterModelsTest,
    "token_allocation_validation": TokenAllocationValidationTest,
+    "testgen_validation": TestGenValidationTest,
    "conversation_chain_validation": ConversationChainValidationTest,
 }

@@ -54,6 +56,7 @@ __all__ = [
    "OpenRouterFallbackTest",
    "OpenRouterModelsTest",
    "TokenAllocationValidationTest",
+    "TestGenValidationTest",
    "ConversationChainValidationTest",
    "TEST_REGISTRY",
 ]
--- a/simulator_tests/test_testgen_validation.py
+++ b/simulator_tests/test_testgen_validation.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python3
+"""
+TestGen Tool Validation Test
+
+Tests the testgen tool by:
+- Creating a test code file with a specific function
+- Using testgen to generate tests with a specific function name
+- Validating that the output contains the expected test function
+- Confirming the format matches test generation patterns
+"""
+
+from .base_test import BaseSimulatorTest
+
+
+class TestGenValidationTest(BaseSimulatorTest):
+    """Test testgen tool validation with specific function name"""
+
+    @property
+    def test_name(self) -> str:
+        return "testgen_validation"
+
+    @property
+    def test_description(self) -> str:
+        return "TestGen tool validation with specific test function"
+
+    def run_test(self) -> bool:
+        """Test testgen tool with specific function name validation"""
+        try:
+            self.logger.info("Test: TestGen tool validation")
+
+            # Setup test files
+            self.setup_test_files()
+
+            # Create a specific code file for test generation
+            test_code_content = '''"""
+Sample authentication module for testing testgen
+"""
+
+class UserAuthenticator:
+    """Handles user authentication logic"""
+
+    def __init__(self):
+        self.failed_attempts = {}
+        self.max_attempts = 3
+
+    def validate_password(self, username, password):
+        """Validate user password with security checks"""
+        if not username or not password:
+            return False
+
+        if username in self.failed_attempts:
+            if self.failed_attempts[username] >= self.max_attempts:
+                return False  # Account locked
+
+        # Simple validation for demo
+        if len(password) < 8:
+            self._record_failed_attempt(username)
+            return False
+
+        if password == "password123":  # Demo valid password
+            self._reset_failed_attempts(username)
+            return True
+
+        self._record_failed_attempt(username)
+        return False
+
+    def _record_failed_attempt(self, username):
+        """Record a failed login attempt"""
+        self.failed_attempts[username] = self.failed_attempts.get(username, 0) + 1
+
+    def _reset_failed_attempts(self, username):
+        """Reset failed attempts after successful login"""
+        if username in self.failed_attempts:
+            del self.failed_attempts[username]
+'''
+
+            # Create the auth code file
+            auth_file = self.create_additional_test_file("user_auth.py", test_code_content)
+
+            # Test testgen tool with specific requirements
+            self.logger.info("  1.1: Generate tests with specific function name")
+            response, continuation_id = self.call_mcp_tool(
+                "testgen",
+                {
+                    "files": [auth_file],
+                    "prompt": "Generate comprehensive tests for the UserAuthenticator.validate_password method. Include tests for edge cases, security scenarios, and account locking. Use the specific test function name 'test_password_validation_edge_cases' for one of the test methods.",
+                    "model": "flash",
+                },
+            )
+
+            if not response:
+                self.logger.error("Failed to get testgen response")
+                return False
+
+            self.logger.info("  1.2: Validate response contains expected test function")
+
+            # Check that the response contains the specific test function name
+            if "test_password_validation_edge_cases" not in response:
+                self.logger.error("Response does not contain the requested test function name")
+                self.logger.debug(f"Response content: {response[:500]}...")
+                return False
+
+            # Check for common test patterns
+            test_patterns = [
+                "def test_",  # Test function definition
+                "assert",  # Assertion statements
+                "UserAuthenticator",  # Class being tested
+                "validate_password",  # Method being tested
+            ]
+
+            missing_patterns = []
+            for pattern in test_patterns:
+                if pattern not in response:
+                    missing_patterns.append(pattern)
+
+            if missing_patterns:
+                self.logger.error(f"Response missing expected test patterns: {missing_patterns}")
+                self.logger.debug(f"Response content: {response[:500]}...")
+                return False
+
+            self.logger.info("  ✅ TestGen tool validation successful")
+            self.logger.info("  ✅ Generated tests contain expected function name")
+            self.logger.info("  ✅ Generated tests follow proper test patterns")
+
+            return True
+
+        except Exception as e:
+            self.logger.error(f"TestGen validation test failed: {e}")
+            return False
+        finally:
+            self.cleanup_test_files()
--- a/systemprompts/init.py
+++ b/systemprompts/init.py
@@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT
 from .codereview_prompt import CODEREVIEW_PROMPT
 from .debug_prompt import DEBUG_ISSUE_PROMPT
 from .precommit_prompt import PRECOMMIT_PROMPT
+from .testgen_prompt import TESTGEN_PROMPT
 from .thinkdeep_prompt import THINKDEEP_PROMPT

 __all__ = [
@@ -16,4 +17,5 @@ __all__ = [
    "ANALYZE_PROMPT",
    "CHAT_PROMPT",
    "PRECOMMIT_PROMPT",
+    "TESTGEN_PROMPT",
 ]
--- a/systemprompts/testgen_prompt.py
+++ b/systemprompts/testgen_prompt.py
@@ -0,0 +1,100 @@
+"""
+TestGen tool system prompt
+"""
+
+TESTGEN_PROMPT = """
+ROLE
+You are a principal software engineer who specialises in writing bullet-proof production code **and** surgical,
+high-signal test suites. You reason about control flow, data flow, mutation, concurrency, failure modes, and security
+in equal measure. Your mission: design and write tests that surface real-world defects before code ever leaves CI.
+
+IF MORE INFORMATION IS NEEDED
+If you need additional context (e.g., test framework details, dependencies, existing test patterns) to provide
+accurate test generation, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the
+same file you've been provided unless for some reason its content is missing or incomplete:
+{"status": "clarification_required", "question": "<your brief question>",
+ "files_needed": ["[file name here]", "[or some folder/]"]}
+
+MULTI-AGENT WORKFLOW
+You sequentially inhabit five expert personas—each passes a concise artefact to the next:
+
+1. **Context Profiler** – derives language(s), test framework(s), build tooling, domain constraints, and existing
+test idioms from the code snapshot provided.
+2. **Path Analyzer** – builds a map of reachable code paths (happy, error, exceptional) plus any external interactions
+ that are directly involved (network, DB, file-system, IPC).
+3. **Adversarial Thinker** – enumerates realistic failures, boundary conditions, race conditions, and misuse patterns
+ that historically break similar systems.
+4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or out-of-scope cases.
+5. **Test Scaffolder** – produces deterministic, isolated tests that follow the *project's* conventions (assert style,
+fixture layout, naming, any mocking strategy, language and tooling etc).
+
+TEST-GENERATION STRATEGY
+- Start from public API / interface boundaries, then walk inward to critical private helpers.
+- Analyze function signatures, parameters, return types, and side effects
+- Map all code paths including happy paths and error conditions
+- Test behaviour, not implementation details, unless white-box inspection is required to reach untestable paths.
+- Include both positive and negative test cases
+- Prefer property-based or table-driven tests where inputs form simple algebraic domains.
+- Stub or fake **only** the minimal surface area needed; prefer in-memory fakes over mocks when feasible.
+- Flag any code that cannot be tested deterministically and suggest realistic refactors (seams, dependency injection,
+pure functions).
+- Surface concurrency hazards with stress or fuzz tests when the language/runtime supports them.
+- Focus on realistic failure modes that actually occur in production
+- Remain within scope of language, framework, project. Do not over-step. Do not add unnecessary dependencies.
+
+EDGE-CASE TAXONOMY (REAL-WORLD, HIGH-VALUE)
+- **Data Shape Issues**: `null` / `undefined`, zero-length, surrogate-pair emojis, malformed UTF-8, mixed EOLs.
+- **Numeric Boundaries**: −1, 0, 1, `MAX_…`, floating-point rounding, 64-bit truncation.
+- **Temporal Pitfalls**: DST shifts, leap seconds, 29 Feb, Unix epoch 2038, timezone conversions.
+- **Collections & Iteration**: off-by-one, concurrent modification, empty vs singleton vs large (>10⁶ items).
+- **State & Sequence**: API calls out of order, idempotency violations, replay attacks.
+- **External Dependencies**: slow responses, 5xx, malformed JSON/XML, TLS errors, retry storms, cancelled promises.
+- **Concurrency / Async**: race conditions, deadlocks, promise rejection leaks, thread starvation.
+- **Resource Exhaustion**: memory spikes, file-descriptor leaks, connection-pool saturation.
+- **Locale & Encoding**: RTL scripts, uncommon locales, locale-specific formatting.
+- **Security Surfaces**: injection (SQL, shell, LDAP), path traversal, privilege escalation on shared state.
+
+TEST QUALITY PRINCIPLES
+- Clear Arrange-Act-Assert sections (or given/when/then per project style) but retain and apply project norms, language
+norms and framework norms and best practices.
+- One behavioural assertion per test unless grouping is conventional.
+- Fast: sub-100 ms/unit test; parallelisable; no remote calls.
+- Deterministic: seeded randomness only; fixed stable clocks when time matters.
+- Self-documenting: names read like specs; failures explain *why*, not just *what*.
+
+FRAMEWORK SELECTION
+Always autodetect from the repository. When a test framework or existing tests are not found, detect from existing
+code; examples:
+- **Swift / Objective-C** → XCTest (Xcode default) or Swift Testing (Apple provided frameworks)
+- **C# / .NET** → xUnit.net preferred; fall back to NUnit or MSTest if they dominate the repo.
+- **C / C++** → GoogleTest (gtest/gmock) or Catch2, matching existing tooling.
+- **JS/TS** → Jest, Vitest, Mocha, or project-specific wrapper.
+- **Python** → pytest, unittest.
+- **Java/Kotlin** → JUnit 5, TestNG.
+- **Go** → built-in `testing`, `testify`.
+- **Rust** → `#[test]`, `proptest`.
+- **Anything Else** → follow existing conventions; never introduce a new framework without strong justification.
+
+IF FRAMEWORK SELECTION FAILS
+If you are unable to confidently determine which framework to use based on the existing test samples supplied, or if
+additional test samples would help in making a final decision, you MUST respond ONLY with this JSON
+format (and nothing else). Do NOT ask for the same file you've been provided unless for some reason its content
+is missing or incomplete:
+{"status": "test_sample_needed", "reason": "<brief reason why additional sampling is required>"}
+
+SCOPE CONTROL
+Stay strictly within the presented codebase, tech stack, and domain.
+Do **not** invent features, frameworks, or speculative integrations.
+Do **not** write tests for functions or classes that do not exist.
+If a test idea falls outside project scope, discard it.
+If a test would be a "good to have" but seems impossible given the current structure, setup of the project, highlight
+it but do not approach or offer refactoring ideas.
+
+DELIVERABLE
+Return only the artefacts (analysis summary, coverage plan, and generated tests) that fit the detected framework
+and code / project layout.
+No extra commentary, no generic boilerplate.
+Must comment and document logic, test reason / hypothesis in delivered code
+
+Remember: your value is catching the hard bugs—not inflating coverage numbers.
+"""
--- a/tests/test_server.py
+++ b/tests/test_server.py
@@ -26,10 +26,11 @@ class TestServerTools:
        assert "analyze" in tool_names
        assert "chat" in tool_names
        assert "precommit" in tool_names
+        assert "testgen" in tool_names
        assert "get_version" in tool_names

-        # Should have exactly 7 tools
-        assert len(tools) == 7
+        # Should have exactly 8 tools (including testgen)
+        assert len(tools) == 8

        # Check descriptions are verbose
        for tool in tools:
--- a/tests/test_testgen.py
+++ b/tests/test_testgen.py
@@ -0,0 +1,381 @@
+"""
+Tests for TestGen tool implementation
+"""
+
+import json
+import tempfile
+from pathlib import Path
+from unittest.mock import Mock, patch
+
+import pytest
+
+from tests.mock_helpers import create_mock_provider
+from tools.testgen import TestGenRequest, TestGenTool
+
+
+class TestTestGenTool:
+    """Test the TestGen tool"""
+
+    @pytest.fixture
+    def tool(self):
+        return TestGenTool()
+
+    @pytest.fixture
+    def temp_files(self):
+        """Create temporary test files"""
+        with tempfile.TemporaryDirectory() as temp_dir:
+            temp_path = Path(temp_dir)
+
+            # Create sample code files
+            code_file = temp_path / "calculator.py"
+            code_file.write_text(
+                """
+def add(a, b):
+    '''Add two numbers'''
+    return a + b
+
+def divide(a, b):
+    '''Divide two numbers'''
+    if b == 0:
+        raise ValueError("Cannot divide by zero")
+    return a / b
+"""
+            )
+
+            # Create sample test files (different sizes)
+            small_test = temp_path / "test_small.py"
+            small_test.write_text(
+                """
+import unittest
+
+class TestBasic(unittest.TestCase):
+    def test_simple(self):
+        self.assertEqual(1 + 1, 2)
+"""
+            )
+
+            large_test = temp_path / "test_large.py"
+            large_test.write_text(
+                """
+import unittest
+from unittest.mock import Mock, patch
+
+class TestComprehensive(unittest.TestCase):
+    def setUp(self):
+        self.mock_data = Mock()
+
+    def test_feature_one(self):
+        # Comprehensive test with lots of setup
+        result = self.process_data()
+        self.assertIsNotNone(result)
+
+    def test_feature_two(self):
+        # Another comprehensive test
+        with patch('some.module') as mock_module:
+            mock_module.return_value = 'test'
+            result = self.process_data()
+            self.assertEqual(result, 'expected')
+
+    def process_data(self):
+        return "test_result"
+"""
+            )
+
+            yield {
+                "temp_dir": temp_dir,
+                "code_file": str(code_file),
+                "small_test": str(small_test),
+                "large_test": str(large_test),
+            }
+
+    def test_tool_metadata(self, tool):
+        """Test tool metadata"""
+        assert tool.get_name() == "testgen"
+        assert "COMPREHENSIVE TEST GENERATION" in tool.get_description()
+        assert "BE SPECIFIC about scope" in tool.get_description()
+        assert tool.get_default_temperature() == 0.2  # Analytical temperature
+
+        # Check model category
+        from tools.models import ToolModelCategory
+
+        assert tool.get_model_category() == ToolModelCategory.EXTENDED_REASONING
+
+    def test_input_schema_structure(self, tool):
+        """Test input schema structure"""
+        schema = tool.get_input_schema()
+
+        # Required fields
+        assert "files" in schema["properties"]
+        assert "prompt" in schema["properties"]
+        assert "files" in schema["required"]
+        assert "prompt" in schema["required"]
+
+        # Optional fields
+        assert "test_examples" in schema["properties"]
+        assert "thinking_mode" in schema["properties"]
+        assert "continuation_id" in schema["properties"]
+
+        # Should not have temperature or use_websearch
+        assert "temperature" not in schema["properties"]
+        assert "use_websearch" not in schema["properties"]
+
+        # Check test_examples description
+        test_examples_desc = schema["properties"]["test_examples"]["description"]
+        assert "absolute paths" in test_examples_desc
+        assert "smallest representative tests" in test_examples_desc
+
+    def test_request_model_validation(self):
+        """Test request model validation"""
+        # Valid request
+        valid_request = TestGenRequest(files=["/tmp/test.py"], prompt="Generate tests for calculator functions")
+        assert valid_request.files == ["/tmp/test.py"]
+        assert valid_request.prompt == "Generate tests for calculator functions"
+        assert valid_request.test_examples is None
+
+        # With test examples
+        request_with_examples = TestGenRequest(
+            files=["/tmp/test.py"], prompt="Generate tests", test_examples=["/tmp/test_example.py"]
+        )
+        assert request_with_examples.test_examples == ["/tmp/test_example.py"]
+
+        # Invalid request (missing required fields)
+        with pytest.raises(ValueError):
+            TestGenRequest(files=["/tmp/test.py"])  # Missing prompt
+
+    @pytest.mark.asyncio
+    @patch("tools.base.BaseTool.get_model_provider")
+    async def test_execute_success(self, mock_get_provider, tool, temp_files):
+        """Test successful execution"""
+        # Mock provider
+        mock_provider = create_mock_provider()
+        mock_provider.get_provider_type.return_value = Mock(value="google")
+        mock_provider.generate_content.return_value = Mock(
+            content="Generated comprehensive test suite with edge cases",
+            usage={"input_tokens": 100, "output_tokens": 200},
+            model_name="gemini-2.5-flash-preview-05-20",
+            metadata={"finish_reason": "STOP"},
+        )
+        mock_get_provider.return_value = mock_provider
+
+        result = await tool.execute(
+            {"files": [temp_files["code_file"]], "prompt": "Generate comprehensive tests for the calculator functions"}
+        )
+
+        # Verify result structure
+        assert len(result) == 1
+        response_data = json.loads(result[0].text)
+        assert response_data["status"] == "success"
+        assert "Generated comprehensive test suite" in response_data["content"]
+
+    @pytest.mark.asyncio
+    @patch("tools.base.BaseTool.get_model_provider")
+    async def test_execute_with_test_examples(self, mock_get_provider, tool, temp_files):
+        """Test execution with test examples"""
+        mock_provider = create_mock_provider()
+        mock_provider.generate_content.return_value = Mock(
+            content="Generated tests following the provided examples",
+            usage={"input_tokens": 150, "output_tokens": 250},
+            model_name="gemini-2.5-flash-preview-05-20",
+            metadata={"finish_reason": "STOP"},
+        )
+        mock_get_provider.return_value = mock_provider
+
+        result = await tool.execute(
+            {
+                "files": [temp_files["code_file"]],
+                "prompt": "Generate tests following existing patterns",
+                "test_examples": [temp_files["small_test"]],
+            }
+        )
+
+        # Verify result
+        assert len(result) == 1
+        response_data = json.loads(result[0].text)
+        assert response_data["status"] == "success"
+
+    def test_process_test_examples_empty(self, tool):
+        """Test processing empty test examples"""
+        content, note = tool._process_test_examples([], None)
+        assert content == ""
+        assert note == ""
+
+    def test_process_test_examples_budget_allocation(self, tool, temp_files):
+        """Test token budget allocation for test examples"""
+        with patch.object(tool, "filter_new_files") as mock_filter:
+            mock_filter.return_value = [temp_files["small_test"], temp_files["large_test"]]
+
+            with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+                mock_prepare.return_value = "Mocked test content"
+
+                # Test with available tokens
+                content, note = tool._process_test_examples(
+                    [temp_files["small_test"], temp_files["large_test"]], None, available_tokens=100000
+                )
+
+                # Should allocate 25% of 100k = 25k tokens for test examples
+                mock_prepare.assert_called_once()
+                call_args = mock_prepare.call_args
+                assert call_args[1]["max_tokens"] == 25000  # 25% of 100k
+
+    def test_process_test_examples_size_sorting(self, tool, temp_files):
+        """Test that test examples are sorted by size (smallest first)"""
+        with patch.object(tool, "filter_new_files") as mock_filter:
+            # Return files in random order
+            mock_filter.return_value = [temp_files["large_test"], temp_files["small_test"]]
+
+            with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+                mock_prepare.return_value = "test content"
+
+                tool._process_test_examples(
+                    [temp_files["large_test"], temp_files["small_test"]], None, available_tokens=50000
+                )
+
+                # Check that files were passed in size order (smallest first)
+                call_args = mock_prepare.call_args[0]
+                files_passed = call_args[0]
+
+                # Verify smallest file comes first
+                assert files_passed[0] == temp_files["small_test"]
+                assert files_passed[1] == temp_files["large_test"]
+
+    @pytest.mark.asyncio
+    async def test_prepare_prompt_structure(self, tool, temp_files):
+        """Test prompt preparation structure"""
+        request = TestGenRequest(files=[temp_files["code_file"]], prompt="Test the calculator functions")
+
+        with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+            mock_prepare.return_value = "mocked file content"
+
+            prompt = await tool.prepare_prompt(request)
+
+            # Check prompt structure
+            assert "=== USER CONTEXT ===" in prompt
+            assert "Test the calculator functions" in prompt
+            assert "=== CODE TO TEST ===" in prompt
+            assert "mocked file content" in prompt
+            assert tool.get_system_prompt() in prompt
+
+    @pytest.mark.asyncio
+    async def test_prepare_prompt_with_examples(self, tool, temp_files):
+        """Test prompt preparation with test examples"""
+        request = TestGenRequest(
+            files=[temp_files["code_file"]], prompt="Generate tests", test_examples=[temp_files["small_test"]]
+        )
+
+        with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+            mock_prepare.return_value = "mocked content"
+
+            with patch.object(tool, "_process_test_examples") as mock_process:
+                mock_process.return_value = ("test examples content", "Note: examples included")
+
+                prompt = await tool.prepare_prompt(request)
+
+                # Check test examples section
+                assert "=== TEST EXAMPLES FOR STYLE REFERENCE ===" in prompt
+                assert "test examples content" in prompt
+                assert "Note: examples included" in prompt
+
+    def test_format_response(self, tool):
+        """Test response formatting"""
+        request = TestGenRequest(files=["/tmp/test.py"], prompt="Generate tests")
+
+        raw_response = "Generated test cases with edge cases"
+        formatted = tool.format_response(raw_response, request)
+
+        # Check formatting includes next steps
+        assert raw_response in formatted
+        assert "**Next Steps:**" in formatted
+        assert "Review Generated Tests" in formatted
+        assert "Setup Test Environment" in formatted
+
+    @pytest.mark.asyncio
+    async def test_error_handling_invalid_files(self, tool):
+        """Test error handling for invalid file paths"""
+        result = await tool.execute(
+            {"files": ["relative/path.py"], "prompt": "Generate tests"}  # Invalid: not absolute
+        )
+
+        # Should return error for relative path
+        response_data = json.loads(result[0].text)
+        assert response_data["status"] == "error"
+        assert "absolute" in response_data["content"]
+
+    @pytest.mark.asyncio
+    async def test_large_prompt_handling(self, tool):
+        """Test handling of large prompts"""
+        large_prompt = "x" * 60000  # Exceeds MCP_PROMPT_SIZE_LIMIT
+
+        result = await tool.execute({"files": ["/tmp/test.py"], "prompt": large_prompt})
+
+        # Should return resend_prompt status
+        response_data = json.loads(result[0].text)
+        assert response_data["status"] == "resend_prompt"
+        assert "too large" in response_data["content"]
+
+    def test_token_budget_calculation(self, tool):
+        """Test token budget calculation logic"""
+        # Mock model capabilities
+        with patch.object(tool, "get_model_provider") as mock_get_provider:
+            mock_provider = create_mock_provider(context_window=200000)
+            mock_get_provider.return_value = mock_provider
+
+            # Simulate model name being set
+            tool._current_model_name = "test-model"
+
+            with patch.object(tool, "_process_test_examples") as mock_process:
+                mock_process.return_value = ("test content", "")
+
+                with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+                    mock_prepare.return_value = "code content"
+
+                    request = TestGenRequest(
+                        files=["/tmp/test.py"], prompt="Test prompt", test_examples=["/tmp/example.py"]
+                    )
+
+                    # This should trigger token budget calculation
+                    import asyncio
+
+                    asyncio.run(tool.prepare_prompt(request))
+
+                    # Verify test examples got 25% of 150k tokens (75% of 200k context)
+                    mock_process.assert_called_once()
+                    call_args = mock_process.call_args[0]
+                    assert call_args[2] == 150000  # 75% of 200k context window
+
+    @pytest.mark.asyncio
+    async def test_continuation_support(self, tool, temp_files):
+        """Test continuation ID support"""
+        with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+            mock_prepare.return_value = "code content"
+
+            request = TestGenRequest(
+                files=[temp_files["code_file"]], prompt="Continue testing", continuation_id="test-thread-123"
+            )
+
+            await tool.prepare_prompt(request)
+
+            # Verify continuation_id was passed to _prepare_file_content_for_prompt
+            # The method should be called twice (once for code, once for test examples logic)
+            assert mock_prepare.call_count >= 1
+
+            # Check that continuation_id was passed in at least one call
+            calls = mock_prepare.call_args_list
+            continuation_passed = any(
+                call[0][1] == "test-thread-123" for call in calls  # continuation_id is second argument
+            )
+            assert continuation_passed, f"continuation_id not passed. Calls: {calls}"
+
+    def test_no_websearch_in_prompt(self, tool, temp_files):
+        """Test that web search instructions are not included"""
+        request = TestGenRequest(files=[temp_files["code_file"]], prompt="Generate tests")
+
+        with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
+            mock_prepare.return_value = "code content"
+
+            import asyncio
+
+            prompt = asyncio.run(tool.prepare_prompt(request))
+
+            # Should not contain web search instructions
+            assert "WEB SEARCH CAPABILITY" not in prompt
+            assert "web search" not in prompt.lower()
--- a/tests/test_tools.py
+++ b/tests/test_tools.py
@@ -284,6 +284,22 @@ class TestAbsolutePathValidation:
        assert "must be absolute" in response["content"]
        assert "code.py" in response["content"]

+    @pytest.mark.asyncio
+    async def test_testgen_tool_relative_path_rejected(self):
+        """Test that testgen tool rejects relative paths"""
+        from tools import TestGenTool
+
+        tool = TestGenTool()
+        result = await tool.execute(
+            {"files": ["src/main.py"], "prompt": "Generate tests for the functions"}  # relative path
+        )
+
+        assert len(result) == 1
+        response = json.loads(result[0].text)
+        assert response["status"] == "error"
+        assert "must be absolute" in response["content"]
+        assert "src/main.py" in response["content"]
+
    @pytest.mark.asyncio
    @patch("tools.AnalyzeTool.get_model_provider")
    async def test_analyze_tool_accepts_absolute_paths(self, mock_get_provider):
--- a/tools/init.py
+++ b/tools/init.py
@@ -7,6 +7,7 @@ from .chat import ChatTool
 from .codereview import CodeReviewTool
 from .debug import DebugIssueTool
 from .precommit import Precommit
+from .testgen import TestGenTool
 from .thinkdeep import ThinkDeepTool

 __all__ = [
@@ -16,4 +17,5 @@ __all__ = [
    "AnalyzeTool",
    "ChatTool",
    "Precommit",
+    "TestGenTool",
 ]
--- a/tools/codereview.py
+++ b/tools/codereview.py
@@ -2,7 +2,7 @@
 Code Review tool - Comprehensive code analysis and review

 This tool provides professional-grade code review capabilities using
-Gemini's understanding of code patterns, best practices, and common issues.
+the chosen model's understanding of code patterns, best practices, and common issues.
 It can analyze individual files or entire codebases, providing actionable
 feedback categorized by severity.

@@ -177,7 +177,7 @@ class CodeReviewTool(BaseTool):
            request: The validated review request

        Returns:
-            str: Complete prompt for the Gemini model
+            str: Complete prompt for the model

        Raises:
            ValueError: If the code exceeds token limits
--- a/tools/testgen.py
+++ b/tools/testgen.py
@@ -0,0 +1,429 @@
+"""
+TestGen tool - Comprehensive test suite generation with edge case coverage
+
+This tool generates comprehensive test suites by analyzing code paths,
+identifying edge cases, and producing test scaffolding that follows
+project conventions when test examples are provided.
+
+Key Features:
+- Multi-file and directory support
+- Framework detection from existing tests
+- Edge case identification (nulls, boundaries, async issues, etc.)
+- Test pattern following when examples provided
+- Deterministic test example sampling for large test suites
+"""
+
+import logging
+import os
+from typing import Any, Optional
+
+from mcp.types import TextContent
+from pydantic import Field
+
+from config import TEMPERATURE_ANALYTICAL
+from systemprompts import TESTGEN_PROMPT
+
+from .base import BaseTool, ToolRequest
+from .models import ToolOutput
+
+logger = logging.getLogger(__name__)
+
+
+class TestGenRequest(ToolRequest):
+    """
+    Request model for the test generation tool.
+
+    This model defines all parameters that can be used to customize
+    the test generation process, from selecting code files to providing
+    test examples for style consistency.
+    """
+
+    files: list[str] = Field(
+        ...,
+        description="Code files or directories to generate tests for (must be absolute paths)",
+    )
+    prompt: str = Field(
+        ...,
+        description="Description of what to test, testing objectives, and specific scope/focus areas",
+    )
+    test_examples: Optional[list[str]] = Field(
+        None,
+        description=(
+            "Optional existing test files or directories to use as style/pattern reference (must be absolute paths). "
+            "If not provided, the tool will determine the best testing approach based on the code structure. "
+            "For large test directories, only the smallest representative tests should be included to determine testing patterns. "
+            "If similar tests exist for the code being tested, include those for the most relevant patterns."
+        ),
+    )
+
+
+class TestGenTool(BaseTool):
+    """
+    Test generation tool implementation.
+
+    This tool analyzes code to generate comprehensive test suites with
+    edge case coverage, following existing test patterns when examples
+    are provided.
+    """
+
+    def get_name(self) -> str:
+        return "testgen"
+
+    def get_description(self) -> str:
+        return (
+            "COMPREHENSIVE TEST GENERATION - Creates thorough test suites with edge case coverage. "
+            "Use this when you need to generate tests for code, create test scaffolding, or improve test coverage. "
+            "BE SPECIFIC about scope: target specific functions/classes/modules rather than testing everything. "
+            "Examples: 'Generate tests for User.login() method', 'Test payment processing validation', "
+            "'Create tests for authentication error handling'. If user request is vague, either ask for "
+            "clarification about specific components to test, or make focused scope decisions and explain them. "
+            "Analyzes code paths, identifies realistic failure modes, and generates framework-specific tests. "
+            "Supports test pattern following when examples are provided. "
+            "Choose thinking_mode based on code complexity: 'low' for simple functions, "
+            "'medium' for standard modules (default), 'high' for complex systems with many interactions, "
+            "'max' for critical systems requiring exhaustive test coverage. "
+            "Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
+        )
+
+    def get_input_schema(self) -> dict[str, Any]:
+        schema = {
+            "type": "object",
+            "properties": {
+                "files": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": "Code files or directories to generate tests for (must be absolute paths)",
+                },
+                "model": self.get_model_field_schema(),
+                "prompt": {
+                    "type": "string",
+                    "description": "Description of what to test, testing objectives, and specific scope/focus areas",
+                },
+                "test_examples": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": (
+                        "Optional existing test files or directories to use as style/pattern reference (must be absolute paths). "
+                        "If not provided, the tool will determine the best testing approach based on the code structure. "
+                        "For large test directories, only the smallest representative tests will be included to determine testing patterns. "
+                        "If similar tests exist for the code being tested, include those for the most relevant patterns."
+                    ),
+                },
+                "thinking_mode": {
+                    "type": "string",
+                    "enum": ["minimal", "low", "medium", "high", "max"],
+                    "description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
+                },
+                "continuation_id": {
+                    "type": "string",
+                    "description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.",
+                },
+            },
+            "required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
+        }
+
+        return schema
+
+    def get_system_prompt(self) -> str:
+        return TESTGEN_PROMPT
+
+    def get_default_temperature(self) -> float:
+        return TEMPERATURE_ANALYTICAL
+
+    def get_model_category(self):
+        """TestGen requires extended reasoning for comprehensive test analysis"""
+        from tools.models import ToolModelCategory
+
+        return ToolModelCategory.EXTENDED_REASONING
+
+    def get_request_model(self):
+        return TestGenRequest
+
+    async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
+        """Override execute to check prompt size before processing"""
+        # First validate request
+        request_model = self.get_request_model()
+        request = request_model(**arguments)
+
+        # Check prompt size if provided
+        if request.prompt:
+            size_check = self.check_prompt_size(request.prompt)
+            if size_check:
+                return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
+
+        # Continue with normal execution
+        return await super().execute(arguments)
+
+    def _process_test_examples(
+        self, test_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
+    ) -> tuple[str, str]:
+        """
+        Process test example files using available token budget for optimal sampling.
+
+        Args:
+            test_examples: List of test file paths
+            continuation_id: Continuation ID for filtering already embedded files
+            available_tokens: Available token budget for test examples
+
+        Returns:
+            tuple: (formatted_content, summary_note)
+        """
+        logger.debug(f"[TESTGEN] Processing {len(test_examples)} test examples")
+
+        if not test_examples:
+            logger.debug("[TESTGEN] No test examples provided")
+            return "", ""
+
+        # Use existing file filtering to avoid duplicates in continuation
+        examples_to_process = self.filter_new_files(test_examples, continuation_id)
+        logger.debug(f"[TESTGEN] After filtering: {len(examples_to_process)} new test examples to process")
+
+        if not examples_to_process:
+            logger.info(f"[TESTGEN] All {len(test_examples)} test examples already in conversation history")
+            return "", ""
+
+        # Calculate token budget for test examples (25% of available tokens, or fallback)
+        if available_tokens:
+            test_examples_budget = int(available_tokens * 0.25)  # 25% for test examples
+            logger.debug(
+                f"[TESTGEN] Allocating {test_examples_budget:,} tokens (25% of {available_tokens:,}) for test examples"
+            )
+        else:
+            test_examples_budget = 30000  # Fallback if no budget provided
+            logger.debug(f"[TESTGEN] Using fallback budget of {test_examples_budget:,} tokens for test examples")
+
+        original_count = len(examples_to_process)
+        logger.debug(
+            f"[TESTGEN] Processing {original_count} test example files with {test_examples_budget:,} token budget"
+        )
+
+        # Sort by file size (smallest first) for pattern-focused selection
+        file_sizes = []
+        for file_path in examples_to_process:
+            try:
+                size = os.path.getsize(file_path)
+                file_sizes.append((file_path, size))
+                logger.debug(f"[TESTGEN] Test example {os.path.basename(file_path)}: {size:,} bytes")
+            except (OSError, FileNotFoundError) as e:
+                # If we can't get size, put it at the end
+                logger.warning(f"[TESTGEN] Could not get size for {file_path}: {e}")
+                file_sizes.append((file_path, float("inf")))
+
+        # Sort by size and take smallest files for pattern reference
+        file_sizes.sort(key=lambda x: x[1])
+        examples_to_process = [f[0] for f in file_sizes]  # All files, sorted by size
+        logger.debug(
+            f"[TESTGEN] Sorted test examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
+        )
+
+        # Use standard file content preparation with dynamic token budget
+        try:
+            logger.debug(f"[TESTGEN] Preparing file content for {len(examples_to_process)} test examples")
+            content = self._prepare_file_content_for_prompt(
+                examples_to_process,
+                continuation_id,
+                "Test examples",
+                max_tokens=test_examples_budget,
+                reserve_tokens=1000,
+            )
+
+            # Determine how many files were actually included
+            if content:
+                from utils.token_utils import estimate_tokens
+
+                used_tokens = estimate_tokens(content)
+                logger.info(
+                    f"[TESTGEN] Successfully embedded test examples: {used_tokens:,} tokens used ({test_examples_budget:,} available)"
+                )
+                if original_count > 1:
+                    truncation_note = f"Note: Used {used_tokens:,} tokens ({test_examples_budget:,} available) for test examples from {original_count} files to determine testing patterns."
+                else:
+                    truncation_note = ""
+            else:
+                logger.warning("[TESTGEN] No content generated for test examples")
+                truncation_note = ""
+
+            return content, truncation_note
+
+        except Exception as e:
+            # If test example processing fails, continue without examples rather than failing
+            logger.error(f"[TESTGEN] Failed to process test examples: {type(e).__name__}: {e}")
+            return "", f"Warning: Could not process test examples: {str(e)}"
+
+    async def prepare_prompt(self, request: TestGenRequest) -> str:
+        """
+        Prepare the test generation prompt with code analysis and optional test examples.
+
+        This method reads the requested files, processes any test examples,
+        and constructs a detailed prompt for comprehensive test generation.
+
+        Args:
+            request: The validated test generation request
+
+        Returns:
+            str: Complete prompt for the model
+
+        Raises:
+            ValueError: If the code exceeds token limits
+        """
+        logger.debug(f"[TESTGEN] Preparing prompt for {len(request.files)} code files")
+        if request.test_examples:
+            logger.debug(f"[TESTGEN] Including {len(request.test_examples)} test examples for pattern reference")
+        # Check for prompt.txt in files
+        prompt_content, updated_files = self.handle_prompt_file(request.files)
+
+        # If prompt.txt was found, incorporate it into the prompt
+        if prompt_content:
+            logger.debug("[TESTGEN] Found prompt.txt file, incorporating content")
+            request.prompt = prompt_content + "\n\n" + request.prompt
+
+        # Update request files list
+        if updated_files is not None:
+            logger.debug(f"[TESTGEN] Updated files list after prompt.txt processing: {len(updated_files)} files")
+            request.files = updated_files
+
+        # Calculate available token budget for dynamic allocation
+        continuation_id = getattr(request, "continuation_id", None)
+
+        # Get model context for token budget calculation
+        model_name = getattr(self, "_current_model_name", None)
+        available_tokens = None
+
+        if model_name:
+            try:
+                provider = self.get_model_provider(model_name)
+                capabilities = provider.get_capabilities(model_name)
+                # Use 75% of context for content (code + test examples), 25% for response
+                available_tokens = int(capabilities.context_window * 0.75)
+                logger.debug(
+                    f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}"
+                )
+            except Exception as e:
+                # Fallback to conservative estimate
+                logger.warning(f"[TESTGEN] Could not get model capabilities for {model_name}: {e}")
+                available_tokens = 120000  # Conservative fallback
+                logger.debug(f"[TESTGEN] Using fallback token budget: {available_tokens:,} tokens")
+
+        # Process test examples first to determine token allocation
+        test_examples_content = ""
+        test_examples_note = ""
+
+        if request.test_examples:
+            logger.debug(f"[TESTGEN] Processing {len(request.test_examples)} test examples")
+            test_examples_content, test_examples_note = self._process_test_examples(
+                request.test_examples, continuation_id, available_tokens
+            )
+            if test_examples_content:
+                logger.info("[TESTGEN] Test examples processed successfully for pattern reference")
+            else:
+                logger.info("[TESTGEN] No test examples content after processing")
+
+        # Calculate remaining tokens for main code after test examples
+        if test_examples_content and available_tokens:
+            from utils.token_utils import estimate_tokens
+
+            test_tokens = estimate_tokens(test_examples_content)
+            remaining_tokens = available_tokens - test_tokens - 5000  # Reserve for prompt structure
+            logger.debug(
+                f"[TESTGEN] Token allocation: {test_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
+            )
+        else:
+            remaining_tokens = available_tokens - 10000 if available_tokens else None
+            if remaining_tokens:
+                logger.debug(
+                    f"[TESTGEN] Token allocation: {remaining_tokens:,} tokens available for code files (no test examples)"
+                )
+
+        # Use centralized file processing logic for main code files
+        logger.debug(f"[TESTGEN] Preparing {len(request.files)} code files for analysis")
+        code_content = self._prepare_file_content_for_prompt(
+            request.files, continuation_id, "Code to test", max_tokens=remaining_tokens, reserve_tokens=2000
+        )
+
+        if code_content:
+            from utils.token_utils import estimate_tokens
+
+            code_tokens = estimate_tokens(code_content)
+            logger.info(f"[TESTGEN] Code files embedded successfully: {code_tokens:,} tokens")
+        else:
+            logger.warning("[TESTGEN] No code content after file processing")
+
+        # Test generation is based on code analysis, no web search needed
+        logger.debug("[TESTGEN] Building complete test generation prompt")
+
+        # Build the complete prompt
+        prompt_parts = []
+
+        # Add system prompt
+        prompt_parts.append(self.get_system_prompt())
+
+        # Add user context
+        prompt_parts.append("=== USER CONTEXT ===")
+        prompt_parts.append(request.prompt)
+        prompt_parts.append("=== END CONTEXT ===")
+
+        # Add test examples if provided
+        if test_examples_content:
+            prompt_parts.append("\n=== TEST EXAMPLES FOR STYLE REFERENCE ===")
+            if test_examples_note:
+                prompt_parts.append(f"// {test_examples_note}")
+            prompt_parts.append(test_examples_content)
+            prompt_parts.append("=== END TEST EXAMPLES ===")
+
+        # Add main code to test
+        prompt_parts.append("\n=== CODE TO TEST ===")
+        prompt_parts.append(code_content)
+        prompt_parts.append("=== END CODE ===")
+
+        # Add generation instructions
+        prompt_parts.append(
+            "\nPlease analyze the code and generate comprehensive tests following the multi-agent workflow specified in the system prompt."
+        )
+        if test_examples_content:
+            prompt_parts.append(
+                "Use the provided test examples as a reference for style, framework, and testing patterns."
+            )
+
+        full_prompt = "\n".join(prompt_parts)
+
+        # Log final prompt statistics
+        from utils.token_utils import estimate_tokens
+
+        total_tokens = estimate_tokens(full_prompt)
+        logger.info(f"[TESTGEN] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
+
+        return full_prompt
+
+    def format_response(self, response: str, request: TestGenRequest, model_info: Optional[dict] = None) -> str:
+        """
+        Format the test generation response.
+
+        Args:
+            response: The raw test generation from the model
+            request: The original request for context
+            model_info: Optional dict with model metadata
+
+        Returns:
+            str: Formatted response with next steps
+        """
+        return f"""{response}
+
+---
+
+**Next Steps:**
+
+1. **Review Generated Tests**: Check if the structure, coverage, and edge cases are valid and useful. Ensure they meet your requirements.
+   Confirm the tests cover missing scenarios, follow project conventions, and can be safely added without duplication.
+
+2. **Setup Test Environment**: Ensure the testing framework and dependencies identified are properly configured in your project.
+
+3. **Run Initial Tests**: Execute the generated tests to verify they work correctly with your code.
+
+4. **Customize as Needed**: Modify generated test code, add project-specific edge cases, refine or adjust test structure based on your specific needs if deemed necessary
+based on your existing knowledge of the code.
+
+5. **Integrate with CI/CD**: Add the tests to your continuous integration pipeline to maintain code quality if this has already been setup and available.
+
+6. Refine requirements and continue the conversation if additional coverage or improvements are needed.
+
+Remember: Review the generated tests for completeness and adapt and integrate them to your specific project requirements and testing standards. Continue with your next step in implementation."""