🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)

* WIP: new workflow architecture * WIP: further improvements and cleanup * WIP: cleanup and docks, replace old tool with new * WIP: cleanup and docks, replace old tool with new * WIP: new planner implementation using workflow * WIP: precommit tool working as a workflow instead of a basic tool Support for passing False to use_assistant_model to skip external models completely and use Claude only * WIP: precommit workflow version swapped with old * WIP: codereview * WIP: replaced codereview * WIP: replaced codereview * WIP: replaced refactor * WIP: workflow for thinkdeep * WIP: ensure files get embedded correctly * WIP: thinkdeep replaced with workflow version * WIP: improved messaging when an external model's response is received * WIP: analyze tool swapped * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: fixed get_completion_next_steps_message missing param * Fixed tests Request for files consistently * Fixed tests Request for files consistently * Fixed tests * New testgen workflow tool Updated docs * Swap testgen workflow * Fix CI test failures by excluding API-dependent tests - Update GitHub Actions workflow to exclude simulation tests that require API keys - Fix collaboration tests to properly mock workflow tool expert analysis calls - Update test assertions to handle new workflow tool response format - Ensure unit tests run without external API dependencies in CI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * WIP - Update tests to match new tools * WIP - Update tests to match new tools --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 00:08:11 +04:00
parent 4dae6e457e
commit 69a3121452
76 changed files with 17111 additions and 7725 deletions
--- a/tools/testgen.py
+++ b/tools/testgen.py
@@ -1,67 +1,155 @@
 """
-TestGen tool - Comprehensive test suite generation with edge case coverage
+TestGen Workflow tool - Step-by-step test generation with expert validation

-This tool generates comprehensive test suites by analyzing code paths,
-identifying edge cases, and producing test scaffolding that follows
-project conventions when test examples are provided.
+This tool provides a structured workflow for comprehensive test generation.
+It guides Claude through systematic investigation steps with forced pauses between each step
+to ensure thorough code examination, test planning, and pattern identification before proceeding.
+The tool supports backtracking, finding updates, and expert analysis integration for
+comprehensive test suite generation.

-Key Features:
- Multi-file and directory support
- Framework detection from existing tests
- Edge case identification (nulls, boundaries, async issues, etc.)
- Test pattern following when examples provided
- Deterministic test example sampling for large test suites
+Key features:
+- Step-by-step test generation workflow with progress tracking
+- Context-aware file embedding (references during investigation, full content for analysis)
+- Automatic test pattern detection and framework identification
+- Expert analysis integration with external models for additional test suggestions
+- Support for edge case identification and comprehensive coverage
+- Confidence-based workflow optimization
 """

 import logging
-import os
-from typing import Any, Optional
+from typing import TYPE_CHECKING, Any, Optional

-from pydantic import Field
+from pydantic import Field, model_validator
+
+if TYPE_CHECKING:
+    from tools.models import ToolModelCategory

 from config import TEMPERATURE_ANALYTICAL
 from systemprompts import TESTGEN_PROMPT
+from tools.shared.base_models import WorkflowRequest

-from .base import BaseTool, ToolRequest
+from .workflow.base import WorkflowTool

 logger = logging.getLogger(__name__)

-# Field descriptions to avoid duplication between Pydantic and JSON schema
-TESTGEN_FIELD_DESCRIPTIONS = {
-    "files": "Code files or directories to generate tests for (must be FULL absolute paths to real files / folders - DO NOT SHORTEN)",
-    "prompt": "Description of what to test, testing objectives, and specific scope/focus areas. Be specific about any "
-    "particular component, module, class of function you would like to generate tests for.",
-    "test_examples": (
-        "Optional existing test files or directories to use as style/pattern reference (must be FULL absolute paths to real files / folders - DO NOT SHORTEN). "
-        "If not provided, the tool will determine the best testing approach based on the code structure. "
-        "For large test directories, only the smallest representative tests should be included to determine testing patterns. "
-        "If similar tests exist for the code being tested, include those for the most relevant patterns."
+# Tool-specific field descriptions for test generation workflow
+TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS = {
+    "step": (
+        "What to analyze or look for in this step. In step 1, describe what you want to test and begin forming an "
+        "analytical approach after thinking carefully about what needs to be examined. Consider code structure, "
+        "business logic, critical paths, edge cases, and potential failure modes. Map out the codebase structure, "
+        "understand the functionality, and identify areas requiring test coverage. In later steps, continue exploring "
+        "with precision and adapt your understanding as you uncover more insights about testable behaviors."
+    ),
+    "step_number": (
+        "The index of the current step in the test generation sequence, beginning at 1. Each step should build upon or "
+        "revise the previous one."
+    ),
+    "total_steps": (
+        "Your current estimate for how many steps will be needed to complete the test generation analysis. "
+        "Adjust as new findings emerge."
+    ),
+    "next_step_required": (
+        "Set to true if you plan to continue the investigation with another step. False means you believe the "
+        "test generation analysis is complete and ready for expert validation."
+    ),
+    "findings": (
+        "Summarize everything discovered in this step about the code being tested. Include analysis of functionality, "
+        "critical paths, edge cases, boundary conditions, error handling, async behavior, state management, and "
+        "integration points. Be specific and avoid vague language—document what you now know about the code and "
+        "what test scenarios are needed. IMPORTANT: Document both the happy paths and potential failure modes. "
+        "Identify existing test patterns if examples were provided. In later steps, confirm or update past findings "
+        "with additional evidence."
+    ),
+    "files_checked": (
+        "List all files (as absolute paths, do not clip or shrink file names) examined during the test generation "
+        "investigation so far. Include even files ruled out or found to be unrelated, as this tracks your "
+        "exploration path."
+    ),
+    "relevant_files": (
+        "Subset of files_checked (as full absolute paths) that contain code directly needing tests or are essential "
+        "for understanding test requirements. Only list those that are directly tied to the functionality being tested. "
+        "This could include implementation files, interfaces, dependencies, or existing test examples."
+    ),
+    "relevant_context": (
+        "List methods, functions, classes, or modules that need test coverage, in the format "
+        "'ClassName.methodName', 'functionName', or 'module.ClassName'. Prioritize critical business logic, "
+        "public APIs, complex algorithms, and error-prone code paths."
+    ),
+    "confidence": (
+        "Indicate your current confidence in the test generation assessment. Use: 'exploring' (starting analysis), "
+        "'low' (early investigation), 'medium' (some patterns identified), 'high' (strong understanding), 'certain' "
+        "(only when the test plan is thoroughly complete and all test scenarios are identified). Do NOT use 'certain' "
+        "unless the test generation analysis is comprehensively complete, use 'high' instead not 100% sure. Using "
+        "'certain' prevents additional expert analysis."
+    ),
+    "backtrack_from_step": (
+        "If an earlier finding or assessment needs to be revised or discarded, specify the step number from which to "
+        "start over. Use this to acknowledge investigative dead ends and correct the course."
+    ),
+    "images": (
+        "Optional list of absolute paths to architecture diagrams, flow charts, or visual documentation that help "
+        "understand the code structure and test requirements. Only include if they materially assist test planning."
    ),
 }


-class TestGenerationRequest(ToolRequest):
-    """
-    Request model for the test generation tool.
+class TestGenRequest(WorkflowRequest):
+    """Request model for test generation workflow investigation steps"""

-    This model defines all parameters that can be used to customize
-    the test generation process, from selecting code files to providing
-    test examples for style consistency.
+    # Required fields for each investigation step
+    step: str = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step"])
+    step_number: int = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step_number"])
+    total_steps: int = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["total_steps"])
+    next_step_required: bool = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["next_step_required"])
+
+    # Investigation tracking fields
+    findings: str = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["findings"])
+    files_checked: list[str] = Field(
+        default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["files_checked"]
+    )
+    relevant_files: list[str] = Field(
+        default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_files"]
+    )
+    relevant_context: list[str] = Field(
+        default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_context"]
+    )
+    confidence: Optional[str] = Field("low", description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["confidence"])
+
+    # Optional backtracking field
+    backtrack_from_step: Optional[int] = Field(
+        None, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["backtrack_from_step"]
+    )
+
+    # Optional images for visual context
+    images: Optional[list[str]] = Field(default=None, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["images"])
+
+    # Override inherited fields to exclude them from schema (except model which needs to be available)
+    temperature: Optional[float] = Field(default=None, exclude=True)
+    thinking_mode: Optional[str] = Field(default=None, exclude=True)
+    use_websearch: Optional[bool] = Field(default=None, exclude=True)
+
+    @model_validator(mode="after")
+    def validate_step_one_requirements(self):
+        """Ensure step 1 has required relevant_files field."""
+        if self.step_number == 1 and not self.relevant_files:
+            raise ValueError("Step 1 requires 'relevant_files' field to specify code files to generate tests for")
+        return self
+
+
+class TestGenTool(WorkflowTool):
+    """
+    Test Generation workflow tool for step-by-step test planning and expert validation.
+
+    This tool implements a structured test generation workflow that guides users through
+    methodical investigation steps, ensuring thorough code examination, pattern identification,
+    and test scenario planning before reaching conclusions. It supports complex testing scenarios
+    including edge case identification, framework detection, and comprehensive coverage planning.
    """

-    files: list[str] = Field(..., description=TESTGEN_FIELD_DESCRIPTIONS["files"])
-    prompt: str = Field(..., description=TESTGEN_FIELD_DESCRIPTIONS["prompt"])
-    test_examples: Optional[list[str]] = Field(None, description=TESTGEN_FIELD_DESCRIPTIONS["test_examples"])
-
-
-class TestGenerationTool(BaseTool):
-    """
-    Test generation tool implementation.
-
-    This tool analyzes code to generate comprehensive test suites with
-    edge case coverage, following existing test patterns when examples
-    are provided.
-    """
+    def __init__(self):
+        super().__init__()
+        self.initial_request = None

    def get_name(self) -> str:
        return "testgen"
@@ -75,390 +163,406 @@ class TestGenerationTool(BaseTool):
            "'Create tests for authentication error handling'. If user request is vague, either ask for "
            "clarification about specific components to test, or make focused scope decisions and explain them. "
            "Analyzes code paths, identifies realistic failure modes, and generates framework-specific tests. "
-            "Supports test pattern following when examples are provided. "
-            "Choose thinking_mode based on code complexity: 'low' for simple functions, "
-            "'medium' for standard modules (default), 'high' for complex systems with many interactions, "
-            "'max' for critical systems requiring exhaustive test coverage. "
-            "Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
+            "Supports test pattern following when examples are provided. Choose thinking_mode based on "
+            "code complexity: 'low' for simple functions, 'medium' for standard modules (default), "
+            "'high' for complex systems with many interactions, 'max' for critical systems requiring "
+            "exhaustive test coverage. Note: If you're not currently using a top-tier model such as "
+            "Opus 4 or above, these tools can provide enhanced capabilities."
        )

-    def get_input_schema(self) -> dict[str, Any]:
-        schema = {
-            "type": "object",
-            "properties": {
-                "files": {
-                    "type": "array",
-                    "items": {"type": "string"},
-                    "description": TESTGEN_FIELD_DESCRIPTIONS["files"],
-                },
-                "model": self.get_model_field_schema(),
-                "prompt": {
-                    "type": "string",
-                    "description": TESTGEN_FIELD_DESCRIPTIONS["prompt"],
-                },
-                "test_examples": {
-                    "type": "array",
-                    "items": {"type": "string"},
-                    "description": TESTGEN_FIELD_DESCRIPTIONS["test_examples"],
-                },
-                "thinking_mode": {
-                    "type": "string",
-                    "enum": ["minimal", "low", "medium", "high", "max"],
-                    "description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
-                },
-                "continuation_id": {
-                    "type": "string",
-                    "description": (
-                        "Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
-                        "across different tools. Only provide this if continuing a previous conversation thread."
-                    ),
-                },
-            },
-            "required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
-        }
-
-        return schema
-
    def get_system_prompt(self) -> str:
        return TESTGEN_PROMPT

    def get_default_temperature(self) -> float:
        return TEMPERATURE_ANALYTICAL

-    # Line numbers are enabled by default from base class for precise targeting
-
-    def get_model_category(self):
-        """TestGen requires extended reasoning for comprehensive test analysis"""
+    def get_model_category(self) -> "ToolModelCategory":
+        """Test generation requires thorough analysis and reasoning"""
        from tools.models import ToolModelCategory

        return ToolModelCategory.EXTENDED_REASONING

-    def get_request_model(self):
-        return TestGenerationRequest
+    def get_workflow_request_model(self):
+        """Return the test generation workflow-specific request model."""
+        return TestGenRequest

-    def _process_test_examples(
-        self, test_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
-    ) -> tuple[str, str]:
-        """
-        Process test example files using available token budget for optimal sampling.
+    def get_input_schema(self) -> dict[str, Any]:
+        """Generate input schema using WorkflowSchemaBuilder with test generation-specific overrides."""
+        from .workflow.schema_builders import WorkflowSchemaBuilder

-        Args:
-            test_examples: List of test file paths
-            continuation_id: Continuation ID for filtering already embedded files
-            available_tokens: Available token budget for test examples
+        # Test generation workflow-specific field overrides
+        testgen_field_overrides = {
+            "step": {
+                "type": "string",
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step"],
+            },
+            "step_number": {
+                "type": "integer",
+                "minimum": 1,
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step_number"],
+            },
+            "total_steps": {
+                "type": "integer",
+                "minimum": 1,
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["total_steps"],
+            },
+            "next_step_required": {
+                "type": "boolean",
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["next_step_required"],
+            },
+            "findings": {
+                "type": "string",
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["findings"],
+            },
+            "files_checked": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["files_checked"],
+            },
+            "relevant_files": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_files"],
+            },
+            "confidence": {
+                "type": "string",
+                "enum": ["exploring", "low", "medium", "high", "certain"],
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["confidence"],
+            },
+            "backtrack_from_step": {
+                "type": "integer",
+                "minimum": 1,
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["backtrack_from_step"],
+            },
+            "images": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["images"],
+            },
+        }

-        Returns:
-            tuple: (formatted_content, summary_note)
-        """
-        logger.debug(f"[TESTGEN] Processing {len(test_examples)} test examples")
-
-        if not test_examples:
-            logger.debug("[TESTGEN] No test examples provided")
-            return "", ""
-
-        # Use existing file filtering to avoid duplicates in continuation
-        examples_to_process = self.filter_new_files(test_examples, continuation_id)
-        logger.debug(f"[TESTGEN] After filtering: {len(examples_to_process)} new test examples to process")
-
-        if not examples_to_process:
-            logger.info(f"[TESTGEN] All {len(test_examples)} test examples already in conversation history")
-            return "", ""
-
-        logger.debug(f"[TESTGEN] Processing {len(examples_to_process)} file paths")
-
-        # Calculate token budget for test examples (25% of available tokens, or fallback)
-        if available_tokens:
-            test_examples_budget = int(available_tokens * 0.25)  # 25% for test examples
-            logger.debug(
-                f"[TESTGEN] Allocating {test_examples_budget:,} tokens (25% of {available_tokens:,}) for test examples"
-            )
-        else:
-            test_examples_budget = 30000  # Fallback if no budget provided
-            logger.debug(f"[TESTGEN] Using fallback budget of {test_examples_budget:,} tokens for test examples")
-
-        original_count = len(examples_to_process)
-        logger.debug(
-            f"[TESTGEN] Processing {original_count} test example files with {test_examples_budget:,} token budget"
+        # Use WorkflowSchemaBuilder with test generation-specific tool fields
+        return WorkflowSchemaBuilder.build_schema(
+            tool_specific_fields=testgen_field_overrides,
+            model_field_schema=self.get_model_field_schema(),
+            auto_mode=self.is_effective_auto_mode(),
+            tool_name=self.get_name(),
        )

-        # Sort by file size (smallest first) for pattern-focused selection
-        file_sizes = []
-        for file_path in examples_to_process:
-            try:
-                size = os.path.getsize(file_path)
-                file_sizes.append((file_path, size))
-                logger.debug(f"[TESTGEN] Test example {os.path.basename(file_path)}: {size:,} bytes")
-            except (OSError, FileNotFoundError) as e:
-                # If we can't get size, put it at the end
-                logger.warning(f"[TESTGEN] Could not get size for {file_path}: {e}")
-                file_sizes.append((file_path, float("inf")))
-
-        # Sort by size and take smallest files for pattern reference
-        file_sizes.sort(key=lambda x: x[1])
-        examples_to_process = [f[0] for f in file_sizes]  # All files, sorted by size
-        logger.debug(
-            f"[TESTGEN] Sorted test examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
-        )
-
-        # Use standard file content preparation with dynamic token budget
-        try:
-            logger.debug(f"[TESTGEN] Preparing file content for {len(examples_to_process)} test examples")
-            content, processed_files = self._prepare_file_content_for_prompt(
-                examples_to_process,
-                continuation_id,
-                "Test examples",
-                max_tokens=test_examples_budget,
-                reserve_tokens=1000,
-            )
-            # Store processed files for tracking - test examples are tracked separately from main code files
-
-            # Determine how many files were actually included
-            if content:
-                from utils.token_utils import estimate_tokens
-
-                used_tokens = estimate_tokens(content)
-                logger.info(
-                    f"[TESTGEN] Successfully embedded test examples: {used_tokens:,} tokens used ({test_examples_budget:,} available)"
-                )
-                if original_count > 1:
-                    truncation_note = f"Note: Used {used_tokens:,} tokens ({test_examples_budget:,} available) for test examples from {original_count} files to determine testing patterns."
-                else:
-                    truncation_note = ""
-            else:
-                logger.warning("[TESTGEN] No content generated for test examples")
-                truncation_note = ""
-
-            return content, truncation_note
-
-        except Exception as e:
-            # If test example processing fails, continue without examples rather than failing
-            logger.error(f"[TESTGEN] Failed to process test examples: {type(e).__name__}: {e}")
-            return "", f"Warning: Could not process test examples: {str(e)}"
-
-    async def prepare_prompt(self, request: TestGenerationRequest) -> str:
-        """
-        Prepare the test generation prompt with code analysis and optional test examples.
-
-        This method reads the requested files, processes any test examples,
-        and constructs a detailed prompt for comprehensive test generation.
-
-        Args:
-            request: The validated test generation request
-
-        Returns:
-            str: Complete prompt for the model
-
-        Raises:
-            ValueError: If the code exceeds token limits
-        """
-        logger.debug(f"[TESTGEN] Preparing prompt for {len(request.files)} code files")
-        if request.test_examples:
-            logger.debug(f"[TESTGEN] Including {len(request.test_examples)} test examples for pattern reference")
-        # Check for prompt.txt in files
-        prompt_content, updated_files = self.handle_prompt_file(request.files)
-
-        # If prompt.txt was found, incorporate it into the prompt
-        if prompt_content:
-            logger.debug("[TESTGEN] Found prompt.txt file, incorporating content")
-            request.prompt = prompt_content + "\n\n" + request.prompt
-
-        # Update request files list
-        if updated_files is not None:
-            logger.debug(f"[TESTGEN] Updated files list after prompt.txt processing: {len(updated_files)} files")
-            request.files = updated_files
-
-        # Check user input size at MCP transport boundary (before adding internal content)
-        user_content = request.prompt
-        size_check = self.check_prompt_size(user_content)
-        if size_check:
-            from tools.models import ToolOutput
-
-            raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
-
-        # Calculate available token budget for dynamic allocation
-        continuation_id = getattr(request, "continuation_id", None)
-
-        # Get model context for token budget calculation
-        available_tokens = None
-
-        if hasattr(self, "_model_context") and self._model_context:
-            try:
-                capabilities = self._model_context.capabilities
-                # Use 75% of context for content (code + test examples), 25% for response
-                available_tokens = int(capabilities.context_window * 0.75)
-                logger.debug(
-                    f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {self._model_context.model_name}"
-                )
-            except Exception as e:
-                # Fallback to conservative estimate
-                logger.warning(f"[TESTGEN] Could not get model capabilities: {e}")
-                available_tokens = 120000  # Conservative fallback
-                logger.debug(f"[TESTGEN] Using fallback token budget: {available_tokens:,} tokens")
+    def get_required_actions(self, step_number: int, confidence: str, findings: str, total_steps: int) -> list[str]:
+        """Define required actions for each investigation phase."""
+        if step_number == 1:
+            # Initial test generation investigation tasks
+            return [
+                "Read and understand the code files specified for test generation",
+                "Analyze the overall structure, public APIs, and main functionality",
+                "Identify critical business logic and complex algorithms that need testing",
+                "Look for existing test patterns or examples if provided",
+                "Understand dependencies, external interactions, and integration points",
+                "Note any potential testability issues or areas that might be hard to test",
+            ]
+        elif confidence in ["exploring", "low"]:
+            # Need deeper investigation
+            return [
+                "Examine specific functions and methods to understand their behavior",
+                "Trace through code paths to identify all possible execution flows",
+                "Identify edge cases, boundary conditions, and error scenarios",
+                "Check for async operations, state management, and side effects",
+                "Look for non-deterministic behavior or external dependencies",
+                "Analyze error handling and exception cases that need testing",
+            ]
+        elif confidence in ["medium", "high"]:
+            # Close to completion - need final verification
+            return [
+                "Verify all critical paths have been identified for testing",
+                "Confirm edge cases and boundary conditions are comprehensive",
+                "Check that test scenarios cover both success and failure cases",
+                "Ensure async behavior and concurrency issues are addressed",
+                "Validate that the testing strategy aligns with code complexity",
+                "Double-check that findings include actionable test scenarios",
+            ]
        else:
-            # No model context available (shouldn't happen in normal flow)
-            available_tokens = 120000  # Conservative fallback
-            logger.debug(f"[TESTGEN] No model context, using fallback token budget: {available_tokens:,} tokens")
-
-        # Process test examples first to determine token allocation
-        test_examples_content = ""
-        test_examples_note = ""
-
-        if request.test_examples:
-            logger.debug(f"[TESTGEN] Processing {len(request.test_examples)} test examples")
-            test_examples_content, test_examples_note = self._process_test_examples(
-                request.test_examples, continuation_id, available_tokens
-            )
-            if test_examples_content:
-                logger.info("[TESTGEN] Test examples processed successfully for pattern reference")
-            else:
-                logger.info("[TESTGEN] No test examples content after processing")
-
-        # Remove files that appear in both 'files' and 'test_examples' to avoid duplicate embedding
-        # Files in test_examples take precedence as they're used for pattern reference
-        code_files_to_process = request.files.copy()
-        if request.test_examples:
-            # Normalize paths for comparison (resolve any relative paths, handle case sensitivity)
-            test_example_set = {os.path.normpath(os.path.abspath(f)) for f in request.test_examples}
-            original_count = len(code_files_to_process)
-
-            code_files_to_process = [
-                f for f in code_files_to_process if os.path.normpath(os.path.abspath(f)) not in test_example_set
+            # General investigation needed
+            return [
+                "Continue examining the codebase for additional test scenarios",
+                "Gather more evidence about code behavior and dependencies",
+                "Test your assumptions about how the code should be tested",
+                "Look for patterns that confirm your testing strategy",
+                "Focus on areas that haven't been thoroughly examined yet",
            ]

-            duplicates_removed = original_count - len(code_files_to_process)
-            if duplicates_removed > 0:
-                logger.info(
-                    f"[TESTGEN] Removed {duplicates_removed} duplicate files from code files list "
-                    f"(already included in test examples for pattern reference)"
-                )
+    def should_call_expert_analysis(self, consolidated_findings, request=None) -> bool:
+        """
+        Decide when to call external model based on investigation completeness.

-        # Calculate remaining tokens for main code after test examples
-        if test_examples_content and available_tokens:
-            from utils.token_utils import estimate_tokens
+        Always call expert analysis for test generation to get additional test ideas.
+        """
+        # Check if user requested to skip assistant model
+        if request and not self.get_request_use_assistant_model(request):
+            return False

-            test_tokens = estimate_tokens(test_examples_content)
-            remaining_tokens = available_tokens - test_tokens - 5000  # Reserve for prompt structure
-            logger.debug(
-                f"[TESTGEN] Token allocation: {test_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
+        # Always benefit from expert analysis for comprehensive test coverage
+        return len(consolidated_findings.relevant_files) > 0 or len(consolidated_findings.findings) >= 1
+
+    def prepare_expert_analysis_context(self, consolidated_findings) -> str:
+        """Prepare context for external model call for test generation validation."""
+        context_parts = [
+            f"=== TEST GENERATION REQUEST ===\\n{self.initial_request or 'Test generation workflow initiated'}\\n=== END REQUEST ==="
+        ]
+
+        # Add investigation summary
+        investigation_summary = self._build_test_generation_summary(consolidated_findings)
+        context_parts.append(
+            f"\\n=== CLAUDE'S TEST PLANNING INVESTIGATION ===\\n{investigation_summary}\\n=== END INVESTIGATION ==="
+        )
+
+        # Add relevant code elements if available
+        if consolidated_findings.relevant_context:
+            methods_text = "\\n".join(f"- {method}" for method in consolidated_findings.relevant_context)
+            context_parts.append(f"\\n=== CODE ELEMENTS TO TEST ===\\n{methods_text}\\n=== END CODE ELEMENTS ===")
+
+        # Add images if available
+        if consolidated_findings.images:
+            images_text = "\\n".join(f"- {img}" for img in consolidated_findings.images)
+            context_parts.append(f"\\n=== VISUAL DOCUMENTATION ===\\n{images_text}\\n=== END VISUAL DOCUMENTATION ===")
+
+        return "\\n".join(context_parts)
+
+    def _build_test_generation_summary(self, consolidated_findings) -> str:
+        """Prepare a comprehensive summary of the test generation investigation."""
+        summary_parts = [
+            "=== SYSTEMATIC TEST GENERATION INVESTIGATION SUMMARY ===",
+            f"Total steps: {len(consolidated_findings.findings)}",
+            f"Files examined: {len(consolidated_findings.files_checked)}",
+            f"Relevant files identified: {len(consolidated_findings.relevant_files)}",
+            f"Code elements to test: {len(consolidated_findings.relevant_context)}",
+            "",
+            "=== INVESTIGATION PROGRESSION ===",
+        ]
+
+        for finding in consolidated_findings.findings:
+            summary_parts.append(finding)
+
+        return "\\n".join(summary_parts)
+
+    def should_include_files_in_expert_prompt(self) -> bool:
+        """Include files in expert analysis for comprehensive test generation."""
+        return True
+
+    def should_embed_system_prompt(self) -> bool:
+        """Embed system prompt in expert analysis for proper context."""
+        return True
+
+    def get_expert_thinking_mode(self) -> str:
+        """Use high thinking mode for thorough test generation analysis."""
+        return "high"
+
+    def get_expert_analysis_instruction(self) -> str:
+        """Get specific instruction for test generation expert analysis."""
+        return (
+            "Please provide comprehensive test generation guidance based on the investigation findings. "
+            "Focus on identifying additional test scenarios, edge cases not yet covered, framework-specific "
+            "best practices, and providing concrete test implementation examples following the multi-agent "
+            "workflow specified in the system prompt."
+        )
+
+    # Hook method overrides for test generation-specific behavior
+
+    def prepare_step_data(self, request) -> dict:
+        """
+        Map test generation-specific fields for internal processing.
+        """
+        step_data = {
+            "step": request.step,
+            "step_number": request.step_number,
+            "findings": request.findings,
+            "files_checked": request.files_checked,
+            "relevant_files": request.relevant_files,
+            "relevant_context": request.relevant_context,
+            "confidence": request.confidence,
+            "images": request.images or [],
+        }
+        return step_data
+
+    def should_skip_expert_analysis(self, request, consolidated_findings) -> bool:
+        """
+        Test generation workflow skips expert analysis when Claude has "certain" confidence.
+        """
+        return request.confidence == "certain" and not request.next_step_required
+
+    def store_initial_issue(self, step_description: str):
+        """Store initial request for expert analysis."""
+        self.initial_request = step_description
+
+    # Override inheritance hooks for test generation-specific behavior
+
+    def get_completion_status(self) -> str:
+        """Test generation tools use test-specific status."""
+        return "test_generation_complete_ready_for_implementation"
+
+    def get_completion_data_key(self) -> str:
+        """Test generation uses 'complete_test_generation' key."""
+        return "complete_test_generation"
+
+    def get_final_analysis_from_request(self, request):
+        """Test generation tools use findings for final analysis."""
+        return request.findings
+
+    def get_confidence_level(self, request) -> str:
+        """Test generation tools use 'certain' for high confidence."""
+        return "certain"
+
+    def get_completion_message(self) -> str:
+        """Test generation-specific completion message."""
+        return (
+            "Test generation analysis complete with CERTAIN confidence. You have identified all test scenarios "
+            "and provided comprehensive coverage strategy. MANDATORY: Present the user with the complete test plan "
+            "and IMMEDIATELY proceed with creating the test files following the identified patterns and framework. "
+            "Focus on implementing concrete, runnable tests with proper assertions."
+        )
+
+    def get_skip_reason(self) -> str:
+        """Test generation-specific skip reason."""
+        return "Claude completed comprehensive test planning with full confidence"
+
+    def get_skip_expert_analysis_status(self) -> str:
+        """Test generation-specific expert analysis skip status."""
+        return "skipped_due_to_certain_test_confidence"
+
+    def prepare_work_summary(self) -> str:
+        """Test generation-specific work summary."""
+        return self._build_test_generation_summary(self.consolidated_findings)
+
+    def get_completion_next_steps_message(self, expert_analysis_used: bool = False) -> str:
+        """
+        Test generation-specific completion message.
+        """
+        base_message = (
+            "TEST GENERATION ANALYSIS IS COMPLETE. You MUST now implement ALL identified test scenarios, "
+            "creating comprehensive test files that cover happy paths, edge cases, error conditions, and "
+            "boundary scenarios. Organize tests by functionality, use appropriate assertions, and follow "
+            "the identified framework patterns. Provide concrete, executable test code—make it easy for "
+            "a developer to run the tests and understand what each test validates."
+        )
+
+        # Add expert analysis guidance only when expert analysis was actually used
+        if expert_analysis_used:
+            expert_guidance = self.get_expert_analysis_guidance()
+            if expert_guidance:
+                return f"{base_message}\\n\\n{expert_guidance}"
+
+        return base_message
+
+    def get_expert_analysis_guidance(self) -> str:
+        """
+        Provide specific guidance for handling expert analysis in test generation.
+        """
+        return (
+            "IMPORTANT: Additional test scenarios and edge cases have been provided by the expert analysis above. "
+            "You MUST incorporate these suggestions into your test implementation, ensuring comprehensive coverage. "
+            "Validate that the expert's test ideas are practical and align with the codebase structure. Combine "
+            "your systematic investigation findings with the expert's additional scenarios to create a thorough "
+            "test suite that catches real-world bugs before they reach production."
+        )
+
+    def get_step_guidance_message(self, request) -> str:
+        """
+        Test generation-specific step guidance with detailed investigation instructions.
+        """
+        step_guidance = self.get_test_generation_step_guidance(request.step_number, request.confidence, request)
+        return step_guidance["next_steps"]
+
+    def get_test_generation_step_guidance(self, step_number: int, confidence: str, request) -> dict[str, Any]:
+        """
+        Provide step-specific guidance for test generation workflow.
+        """
+        # Generate the next steps instruction based on required actions
+        required_actions = self.get_required_actions(step_number, confidence, request.findings, request.total_steps)
+
+        if step_number == 1:
+            next_steps = (
+                f"MANDATORY: DO NOT call the {self.get_name()} tool again immediately. You MUST first analyze "
+                f"the code thoroughly using appropriate tools. CRITICAL AWARENESS: You need to understand "
+                f"the code structure, identify testable behaviors, find edge cases and boundary conditions, "
+                f"and determine the appropriate testing strategy. Use file reading tools, code analysis, and "
+                f"systematic examination to gather comprehensive information about what needs to be tested. "
+                f"Only call {self.get_name()} again AFTER completing your investigation. When you call "
+                f"{self.get_name()} next time, use step_number: {step_number + 1} and report specific "
+                f"code paths examined, test scenarios identified, and testing patterns discovered."
+            )
+        elif confidence in ["exploring", "low"]:
+            next_steps = (
+                f"STOP! Do NOT call {self.get_name()} again yet. Based on your findings, you've identified areas that need "
+                f"deeper analysis for test generation. MANDATORY ACTIONS before calling {self.get_name()} step {step_number + 1}:\\n"
+                + "\\n".join(f"{i+1}. {action}" for i, action in enumerate(required_actions))
+                + f"\\n\\nOnly call {self.get_name()} again with step_number: {step_number + 1} AFTER "
+                + "completing these test planning tasks."
+            )
+        elif confidence in ["medium", "high"]:
+            next_steps = (
+                f"WAIT! Your test generation analysis needs final verification. DO NOT call {self.get_name()} immediately. REQUIRED ACTIONS:\\n"
+                + "\\n".join(f"{i+1}. {action}" for i, action in enumerate(required_actions))
+                + f"\\n\\nREMEMBER: Ensure you have identified all test scenarios including edge cases and error conditions. "
+                f"Document findings with specific test cases to implement, then call {self.get_name()} "
+                f"with step_number: {step_number + 1}."
            )
        else:
-            remaining_tokens = available_tokens - 10000 if available_tokens else None
-            if remaining_tokens:
-                logger.debug(
-                    f"[TESTGEN] Token allocation: {remaining_tokens:,} tokens available for code files (no test examples)"
-                )
-
-        # Use centralized file processing logic for main code files (after deduplication)
-        logger.debug(f"[TESTGEN] Preparing {len(code_files_to_process)} code files for analysis")
-        code_content, processed_files = self._prepare_file_content_for_prompt(
-            code_files_to_process, continuation_id, "Code to test", max_tokens=remaining_tokens, reserve_tokens=2000
-        )
-        self._actually_processed_files = processed_files
-
-        if code_content:
-            from utils.token_utils import estimate_tokens
-
-            code_tokens = estimate_tokens(code_content)
-            logger.info(f"[TESTGEN] Code files embedded successfully: {code_tokens:,} tokens")
-        else:
-            logger.warning("[TESTGEN] No code content after file processing")
-
-        # Test generation is based on code analysis, no web search needed
-        logger.debug("[TESTGEN] Building complete test generation prompt")
-
-        # Build the complete prompt
-        prompt_parts = []
-
-        # Add system prompt
-        prompt_parts.append(self.get_system_prompt())
-
-        # Add user context
-        prompt_parts.append("=== USER CONTEXT ===")
-        prompt_parts.append(request.prompt)
-        prompt_parts.append("=== END CONTEXT ===")
-
-        # Add test examples if provided
-        if test_examples_content:
-            prompt_parts.append("\n=== TEST EXAMPLES FOR STYLE REFERENCE ===")
-            if test_examples_note:
-                prompt_parts.append(f"// {test_examples_note}")
-            prompt_parts.append(test_examples_content)
-            prompt_parts.append("=== END TEST EXAMPLES ===")
-
-        # Add main code to test
-        prompt_parts.append("\n=== CODE TO TEST ===")
-        prompt_parts.append(code_content)
-        prompt_parts.append("=== END CODE ===")
-
-        # Add generation instructions
-        prompt_parts.append(
-            "\nPlease analyze the code and generate comprehensive tests following the multi-agent workflow specified in the system prompt."
-        )
-        if test_examples_content:
-            prompt_parts.append(
-                "Use the provided test examples as a reference for style, framework, and testing patterns."
+            next_steps = (
+                f"PAUSE ANALYSIS. Before calling {self.get_name()} step {step_number + 1}, you MUST examine more code thoroughly. "
+                + "Required: "
+                + ", ".join(required_actions[:2])
+                + ". "
+                + f"Your next {self.get_name()} call (step_number: {step_number + 1}) must include "
+                f"NEW test scenarios from actual code analysis, not just theories. NO recursive {self.get_name()} calls "
+                f"without investigation work!"
            )

-        full_prompt = "\n".join(prompt_parts)
+        return {"next_steps": next_steps}

-        # Log final prompt statistics
-        from utils.token_utils import estimate_tokens
-
-        total_tokens = estimate_tokens(full_prompt)
-        logger.info(f"[TESTGEN] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
-
-        return full_prompt
-
-    def format_response(self, response: str, request: TestGenerationRequest, model_info: Optional[dict] = None) -> str:
+    def customize_workflow_response(self, response_data: dict, request) -> dict:
        """
-        Format the test generation response.
-
-        Args:
-            response: The raw test generation from the model
-            request: The original request for context
-            model_info: Optional dict with model metadata
-
-        Returns:
-            str: Formatted response with next steps
+        Customize response to match test generation workflow format.
        """
-        return f"""{response}
+        # Store initial request on first step
+        if request.step_number == 1:
+            self.initial_request = request.step

---
+        # Convert generic status names to test generation-specific ones
+        tool_name = self.get_name()
+        status_mapping = {
+            f"{tool_name}_in_progress": "test_generation_in_progress",
+            f"pause_for_{tool_name}": "pause_for_test_analysis",
+            f"{tool_name}_required": "test_analysis_required",
+            f"{tool_name}_complete": "test_generation_complete",
+        }

-Claude, you are now in EXECUTION MODE. Take immediate action:
+        if response_data["status"] in status_mapping:
+            response_data["status"] = status_mapping[response_data["status"]]

-## Step 1: THINK & CREATE TESTS
-ULTRATHINK while creating these in order to verify that every code reference, import, function name, and logic path is
-100% accurate before saving.
+        # Rename status field to match test generation workflow
+        if f"{tool_name}_status" in response_data:
+            response_data["test_generation_status"] = response_data.pop(f"{tool_name}_status")
+            # Add test generation-specific status fields
+            response_data["test_generation_status"]["test_scenarios_identified"] = len(
+                self.consolidated_findings.relevant_context
+            )
+            response_data["test_generation_status"]["analysis_confidence"] = self.get_request_confidence(request)

- CREATE all test files in the correct project structure
- SAVE each test using proper naming conventions
- VALIDATE all imports, references, and dependencies are correct as required by the current framework / project / file
+        # Map complete_testgen to complete_test_generation
+        if f"complete_{tool_name}" in response_data:
+            response_data["complete_test_generation"] = response_data.pop(f"complete_{tool_name}")

-## Step 2: DISPLAY RESULTS TO USER
-After creating each test file, MUST show the user:
-```
-✅ Created: path/to/test_file.py
-   - test_function_name(): Brief description of what it tests
-   - test_another_function(): Brief description
-   - [Total: X test functions]
-```
+        # Map the completion flag to match test generation workflow
+        if f"{tool_name}_complete" in response_data:
+            response_data["test_generation_complete"] = response_data.pop(f"{tool_name}_complete")

-## Step 3: VALIDATE BY EXECUTION
-CRITICAL: Run the tests immediately to confirm they work:
- Install any missing dependencies first or request user to perform step if this cannot be automated
- Execute the test suite
- Fix any failures or errors
- Confirm 100% pass rate. If there's a failure, re-iterate, go over each test, validate and understand why it's failing
+        return response_data

-## Step 4: INTEGRATION VERIFICATION
- Verify tests integrate with existing test infrastructure
- Confirm test discovery works
- Validate test naming and organization
+    # Required abstract methods from BaseTool
+    def get_request_model(self):
+        """Return the test generation workflow-specific request model."""
+        return TestGenRequest

-## Step 5: MOVE TO NEXT ACTION
-Once tests are confirmed working, immediately proceed to the next logical step for the project.
-
-MANDATORY: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work and all of the
-steps listed above are carried out correctly. Take full ownership of the testing implementation and move to your
-next work. If you were supplied a more_work_required request in the response above, you MUST honor it."""
+    async def prepare_prompt(self, request) -> str:
+        """Not used - workflow tools use execute_workflow()."""
+        return ""  # Workflow tools use execute_workflow() directly