🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)

* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Beehive Innovations
2025-06-21 00:08:11 +04:00
committed by GitHub
parent 4dae6e457e
commit 69a3121452
76 changed files with 17111 additions and 7725 deletions

View File

@@ -1,67 +1,155 @@
"""
TestGen tool - Comprehensive test suite generation with edge case coverage
TestGen Workflow tool - Step-by-step test generation with expert validation
This tool generates comprehensive test suites by analyzing code paths,
identifying edge cases, and producing test scaffolding that follows
project conventions when test examples are provided.
This tool provides a structured workflow for comprehensive test generation.
It guides Claude through systematic investigation steps with forced pauses between each step
to ensure thorough code examination, test planning, and pattern identification before proceeding.
The tool supports backtracking, finding updates, and expert analysis integration for
comprehensive test suite generation.
Key Features:
- Multi-file and directory support
- Framework detection from existing tests
- Edge case identification (nulls, boundaries, async issues, etc.)
- Test pattern following when examples provided
- Deterministic test example sampling for large test suites
Key features:
- Step-by-step test generation workflow with progress tracking
- Context-aware file embedding (references during investigation, full content for analysis)
- Automatic test pattern detection and framework identification
- Expert analysis integration with external models for additional test suggestions
- Support for edge case identification and comprehensive coverage
- Confidence-based workflow optimization
"""
import logging
import os
from typing import Any, Optional
from typing import TYPE_CHECKING, Any, Optional
from pydantic import Field
from pydantic import Field, model_validator
if TYPE_CHECKING:
from tools.models import ToolModelCategory
from config import TEMPERATURE_ANALYTICAL
from systemprompts import TESTGEN_PROMPT
from tools.shared.base_models import WorkflowRequest
from .base import BaseTool, ToolRequest
from .workflow.base import WorkflowTool
logger = logging.getLogger(__name__)
# Field descriptions to avoid duplication between Pydantic and JSON schema
TESTGEN_FIELD_DESCRIPTIONS = {
"files": "Code files or directories to generate tests for (must be FULL absolute paths to real files / folders - DO NOT SHORTEN)",
"prompt": "Description of what to test, testing objectives, and specific scope/focus areas. Be specific about any "
"particular component, module, class of function you would like to generate tests for.",
"test_examples": (
"Optional existing test files or directories to use as style/pattern reference (must be FULL absolute paths to real files / folders - DO NOT SHORTEN). "
"If not provided, the tool will determine the best testing approach based on the code structure. "
"For large test directories, only the smallest representative tests should be included to determine testing patterns. "
"If similar tests exist for the code being tested, include those for the most relevant patterns."
# Tool-specific field descriptions for test generation workflow
TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS = {
"step": (
"What to analyze or look for in this step. In step 1, describe what you want to test and begin forming an "
"analytical approach after thinking carefully about what needs to be examined. Consider code structure, "
"business logic, critical paths, edge cases, and potential failure modes. Map out the codebase structure, "
"understand the functionality, and identify areas requiring test coverage. In later steps, continue exploring "
"with precision and adapt your understanding as you uncover more insights about testable behaviors."
),
"step_number": (
"The index of the current step in the test generation sequence, beginning at 1. Each step should build upon or "
"revise the previous one."
),
"total_steps": (
"Your current estimate for how many steps will be needed to complete the test generation analysis. "
"Adjust as new findings emerge."
),
"next_step_required": (
"Set to true if you plan to continue the investigation with another step. False means you believe the "
"test generation analysis is complete and ready for expert validation."
),
"findings": (
"Summarize everything discovered in this step about the code being tested. Include analysis of functionality, "
"critical paths, edge cases, boundary conditions, error handling, async behavior, state management, and "
"integration points. Be specific and avoid vague language—document what you now know about the code and "
"what test scenarios are needed. IMPORTANT: Document both the happy paths and potential failure modes. "
"Identify existing test patterns if examples were provided. In later steps, confirm or update past findings "
"with additional evidence."
),
"files_checked": (
"List all files (as absolute paths, do not clip or shrink file names) examined during the test generation "
"investigation so far. Include even files ruled out or found to be unrelated, as this tracks your "
"exploration path."
),
"relevant_files": (
"Subset of files_checked (as full absolute paths) that contain code directly needing tests or are essential "
"for understanding test requirements. Only list those that are directly tied to the functionality being tested. "
"This could include implementation files, interfaces, dependencies, or existing test examples."
),
"relevant_context": (
"List methods, functions, classes, or modules that need test coverage, in the format "
"'ClassName.methodName', 'functionName', or 'module.ClassName'. Prioritize critical business logic, "
"public APIs, complex algorithms, and error-prone code paths."
),
"confidence": (
"Indicate your current confidence in the test generation assessment. Use: 'exploring' (starting analysis), "
"'low' (early investigation), 'medium' (some patterns identified), 'high' (strong understanding), 'certain' "
"(only when the test plan is thoroughly complete and all test scenarios are identified). Do NOT use 'certain' "
"unless the test generation analysis is comprehensively complete, use 'high' instead not 100% sure. Using "
"'certain' prevents additional expert analysis."
),
"backtrack_from_step": (
"If an earlier finding or assessment needs to be revised or discarded, specify the step number from which to "
"start over. Use this to acknowledge investigative dead ends and correct the course."
),
"images": (
"Optional list of absolute paths to architecture diagrams, flow charts, or visual documentation that help "
"understand the code structure and test requirements. Only include if they materially assist test planning."
),
}
class TestGenerationRequest(ToolRequest):
"""
Request model for the test generation tool.
class TestGenRequest(WorkflowRequest):
"""Request model for test generation workflow investigation steps"""
This model defines all parameters that can be used to customize
the test generation process, from selecting code files to providing
test examples for style consistency.
# Required fields for each investigation step
step: str = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step"])
step_number: int = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step_number"])
total_steps: int = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["total_steps"])
next_step_required: bool = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["next_step_required"])
# Investigation tracking fields
findings: str = Field(..., description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["findings"])
files_checked: list[str] = Field(
default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["files_checked"]
)
relevant_files: list[str] = Field(
default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_files"]
)
relevant_context: list[str] = Field(
default_factory=list, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_context"]
)
confidence: Optional[str] = Field("low", description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["confidence"])
# Optional backtracking field
backtrack_from_step: Optional[int] = Field(
None, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["backtrack_from_step"]
)
# Optional images for visual context
images: Optional[list[str]] = Field(default=None, description=TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["images"])
# Override inherited fields to exclude them from schema (except model which needs to be available)
temperature: Optional[float] = Field(default=None, exclude=True)
thinking_mode: Optional[str] = Field(default=None, exclude=True)
use_websearch: Optional[bool] = Field(default=None, exclude=True)
@model_validator(mode="after")
def validate_step_one_requirements(self):
"""Ensure step 1 has required relevant_files field."""
if self.step_number == 1 and not self.relevant_files:
raise ValueError("Step 1 requires 'relevant_files' field to specify code files to generate tests for")
return self
class TestGenTool(WorkflowTool):
"""
Test Generation workflow tool for step-by-step test planning and expert validation.
This tool implements a structured test generation workflow that guides users through
methodical investigation steps, ensuring thorough code examination, pattern identification,
and test scenario planning before reaching conclusions. It supports complex testing scenarios
including edge case identification, framework detection, and comprehensive coverage planning.
"""
files: list[str] = Field(..., description=TESTGEN_FIELD_DESCRIPTIONS["files"])
prompt: str = Field(..., description=TESTGEN_FIELD_DESCRIPTIONS["prompt"])
test_examples: Optional[list[str]] = Field(None, description=TESTGEN_FIELD_DESCRIPTIONS["test_examples"])
class TestGenerationTool(BaseTool):
"""
Test generation tool implementation.
This tool analyzes code to generate comprehensive test suites with
edge case coverage, following existing test patterns when examples
are provided.
"""
def __init__(self):
super().__init__()
self.initial_request = None
def get_name(self) -> str:
return "testgen"
@@ -75,390 +163,406 @@ class TestGenerationTool(BaseTool):
"'Create tests for authentication error handling'. If user request is vague, either ask for "
"clarification about specific components to test, or make focused scope decisions and explain them. "
"Analyzes code paths, identifies realistic failure modes, and generates framework-specific tests. "
"Supports test pattern following when examples are provided. "
"Choose thinking_mode based on code complexity: 'low' for simple functions, "
"'medium' for standard modules (default), 'high' for complex systems with many interactions, "
"'max' for critical systems requiring exhaustive test coverage. "
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
"Supports test pattern following when examples are provided. Choose thinking_mode based on "
"code complexity: 'low' for simple functions, 'medium' for standard modules (default), "
"'high' for complex systems with many interactions, 'max' for critical systems requiring "
"exhaustive test coverage. Note: If you're not currently using a top-tier model such as "
"Opus 4 or above, these tools can provide enhanced capabilities."
)
def get_input_schema(self) -> dict[str, Any]:
schema = {
"type": "object",
"properties": {
"files": {
"type": "array",
"items": {"type": "string"},
"description": TESTGEN_FIELD_DESCRIPTIONS["files"],
},
"model": self.get_model_field_schema(),
"prompt": {
"type": "string",
"description": TESTGEN_FIELD_DESCRIPTIONS["prompt"],
},
"test_examples": {
"type": "array",
"items": {"type": "string"},
"description": TESTGEN_FIELD_DESCRIPTIONS["test_examples"],
},
"thinking_mode": {
"type": "string",
"enum": ["minimal", "low", "medium", "high", "max"],
"description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
},
"continuation_id": {
"type": "string",
"description": (
"Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
"across different tools. Only provide this if continuing a previous conversation thread."
),
},
},
"required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
}
return schema
def get_system_prompt(self) -> str:
return TESTGEN_PROMPT
def get_default_temperature(self) -> float:
return TEMPERATURE_ANALYTICAL
# Line numbers are enabled by default from base class for precise targeting
def get_model_category(self):
"""TestGen requires extended reasoning for comprehensive test analysis"""
def get_model_category(self) -> "ToolModelCategory":
"""Test generation requires thorough analysis and reasoning"""
from tools.models import ToolModelCategory
return ToolModelCategory.EXTENDED_REASONING
def get_request_model(self):
return TestGenerationRequest
def get_workflow_request_model(self):
"""Return the test generation workflow-specific request model."""
return TestGenRequest
def _process_test_examples(
self, test_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
) -> tuple[str, str]:
"""
Process test example files using available token budget for optimal sampling.
def get_input_schema(self) -> dict[str, Any]:
"""Generate input schema using WorkflowSchemaBuilder with test generation-specific overrides."""
from .workflow.schema_builders import WorkflowSchemaBuilder
Args:
test_examples: List of test file paths
continuation_id: Continuation ID for filtering already embedded files
available_tokens: Available token budget for test examples
# Test generation workflow-specific field overrides
testgen_field_overrides = {
"step": {
"type": "string",
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step"],
},
"step_number": {
"type": "integer",
"minimum": 1,
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["step_number"],
},
"total_steps": {
"type": "integer",
"minimum": 1,
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["total_steps"],
},
"next_step_required": {
"type": "boolean",
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["next_step_required"],
},
"findings": {
"type": "string",
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["findings"],
},
"files_checked": {
"type": "array",
"items": {"type": "string"},
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["files_checked"],
},
"relevant_files": {
"type": "array",
"items": {"type": "string"},
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["relevant_files"],
},
"confidence": {
"type": "string",
"enum": ["exploring", "low", "medium", "high", "certain"],
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["confidence"],
},
"backtrack_from_step": {
"type": "integer",
"minimum": 1,
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["backtrack_from_step"],
},
"images": {
"type": "array",
"items": {"type": "string"},
"description": TESTGEN_WORKFLOW_FIELD_DESCRIPTIONS["images"],
},
}
Returns:
tuple: (formatted_content, summary_note)
"""
logger.debug(f"[TESTGEN] Processing {len(test_examples)} test examples")
if not test_examples:
logger.debug("[TESTGEN] No test examples provided")
return "", ""
# Use existing file filtering to avoid duplicates in continuation
examples_to_process = self.filter_new_files(test_examples, continuation_id)
logger.debug(f"[TESTGEN] After filtering: {len(examples_to_process)} new test examples to process")
if not examples_to_process:
logger.info(f"[TESTGEN] All {len(test_examples)} test examples already in conversation history")
return "", ""
logger.debug(f"[TESTGEN] Processing {len(examples_to_process)} file paths")
# Calculate token budget for test examples (25% of available tokens, or fallback)
if available_tokens:
test_examples_budget = int(available_tokens * 0.25) # 25% for test examples
logger.debug(
f"[TESTGEN] Allocating {test_examples_budget:,} tokens (25% of {available_tokens:,}) for test examples"
)
else:
test_examples_budget = 30000 # Fallback if no budget provided
logger.debug(f"[TESTGEN] Using fallback budget of {test_examples_budget:,} tokens for test examples")
original_count = len(examples_to_process)
logger.debug(
f"[TESTGEN] Processing {original_count} test example files with {test_examples_budget:,} token budget"
# Use WorkflowSchemaBuilder with test generation-specific tool fields
return WorkflowSchemaBuilder.build_schema(
tool_specific_fields=testgen_field_overrides,
model_field_schema=self.get_model_field_schema(),
auto_mode=self.is_effective_auto_mode(),
tool_name=self.get_name(),
)
# Sort by file size (smallest first) for pattern-focused selection
file_sizes = []
for file_path in examples_to_process:
try:
size = os.path.getsize(file_path)
file_sizes.append((file_path, size))
logger.debug(f"[TESTGEN] Test example {os.path.basename(file_path)}: {size:,} bytes")
except (OSError, FileNotFoundError) as e:
# If we can't get size, put it at the end
logger.warning(f"[TESTGEN] Could not get size for {file_path}: {e}")
file_sizes.append((file_path, float("inf")))
# Sort by size and take smallest files for pattern reference
file_sizes.sort(key=lambda x: x[1])
examples_to_process = [f[0] for f in file_sizes] # All files, sorted by size
logger.debug(
f"[TESTGEN] Sorted test examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
)
# Use standard file content preparation with dynamic token budget
try:
logger.debug(f"[TESTGEN] Preparing file content for {len(examples_to_process)} test examples")
content, processed_files = self._prepare_file_content_for_prompt(
examples_to_process,
continuation_id,
"Test examples",
max_tokens=test_examples_budget,
reserve_tokens=1000,
)
# Store processed files for tracking - test examples are tracked separately from main code files
# Determine how many files were actually included
if content:
from utils.token_utils import estimate_tokens
used_tokens = estimate_tokens(content)
logger.info(
f"[TESTGEN] Successfully embedded test examples: {used_tokens:,} tokens used ({test_examples_budget:,} available)"
)
if original_count > 1:
truncation_note = f"Note: Used {used_tokens:,} tokens ({test_examples_budget:,} available) for test examples from {original_count} files to determine testing patterns."
else:
truncation_note = ""
else:
logger.warning("[TESTGEN] No content generated for test examples")
truncation_note = ""
return content, truncation_note
except Exception as e:
# If test example processing fails, continue without examples rather than failing
logger.error(f"[TESTGEN] Failed to process test examples: {type(e).__name__}: {e}")
return "", f"Warning: Could not process test examples: {str(e)}"
async def prepare_prompt(self, request: TestGenerationRequest) -> str:
"""
Prepare the test generation prompt with code analysis and optional test examples.
This method reads the requested files, processes any test examples,
and constructs a detailed prompt for comprehensive test generation.
Args:
request: The validated test generation request
Returns:
str: Complete prompt for the model
Raises:
ValueError: If the code exceeds token limits
"""
logger.debug(f"[TESTGEN] Preparing prompt for {len(request.files)} code files")
if request.test_examples:
logger.debug(f"[TESTGEN] Including {len(request.test_examples)} test examples for pattern reference")
# Check for prompt.txt in files
prompt_content, updated_files = self.handle_prompt_file(request.files)
# If prompt.txt was found, incorporate it into the prompt
if prompt_content:
logger.debug("[TESTGEN] Found prompt.txt file, incorporating content")
request.prompt = prompt_content + "\n\n" + request.prompt
# Update request files list
if updated_files is not None:
logger.debug(f"[TESTGEN] Updated files list after prompt.txt processing: {len(updated_files)} files")
request.files = updated_files
# Check user input size at MCP transport boundary (before adding internal content)
user_content = request.prompt
size_check = self.check_prompt_size(user_content)
if size_check:
from tools.models import ToolOutput
raise ValueError(f"MCP_SIZE_CHECK:{ToolOutput(**size_check).model_dump_json()}")
# Calculate available token budget for dynamic allocation
continuation_id = getattr(request, "continuation_id", None)
# Get model context for token budget calculation
available_tokens = None
if hasattr(self, "_model_context") and self._model_context:
try:
capabilities = self._model_context.capabilities
# Use 75% of context for content (code + test examples), 25% for response
available_tokens = int(capabilities.context_window * 0.75)
logger.debug(
f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {self._model_context.model_name}"
)
except Exception as e:
# Fallback to conservative estimate
logger.warning(f"[TESTGEN] Could not get model capabilities: {e}")
available_tokens = 120000 # Conservative fallback
logger.debug(f"[TESTGEN] Using fallback token budget: {available_tokens:,} tokens")
def get_required_actions(self, step_number: int, confidence: str, findings: str, total_steps: int) -> list[str]:
"""Define required actions for each investigation phase."""
if step_number == 1:
# Initial test generation investigation tasks
return [
"Read and understand the code files specified for test generation",
"Analyze the overall structure, public APIs, and main functionality",
"Identify critical business logic and complex algorithms that need testing",
"Look for existing test patterns or examples if provided",
"Understand dependencies, external interactions, and integration points",
"Note any potential testability issues or areas that might be hard to test",
]
elif confidence in ["exploring", "low"]:
# Need deeper investigation
return [
"Examine specific functions and methods to understand their behavior",
"Trace through code paths to identify all possible execution flows",
"Identify edge cases, boundary conditions, and error scenarios",
"Check for async operations, state management, and side effects",
"Look for non-deterministic behavior or external dependencies",
"Analyze error handling and exception cases that need testing",
]
elif confidence in ["medium", "high"]:
# Close to completion - need final verification
return [
"Verify all critical paths have been identified for testing",
"Confirm edge cases and boundary conditions are comprehensive",
"Check that test scenarios cover both success and failure cases",
"Ensure async behavior and concurrency issues are addressed",
"Validate that the testing strategy aligns with code complexity",
"Double-check that findings include actionable test scenarios",
]
else:
# No model context available (shouldn't happen in normal flow)
available_tokens = 120000 # Conservative fallback
logger.debug(f"[TESTGEN] No model context, using fallback token budget: {available_tokens:,} tokens")
# Process test examples first to determine token allocation
test_examples_content = ""
test_examples_note = ""
if request.test_examples:
logger.debug(f"[TESTGEN] Processing {len(request.test_examples)} test examples")
test_examples_content, test_examples_note = self._process_test_examples(
request.test_examples, continuation_id, available_tokens
)
if test_examples_content:
logger.info("[TESTGEN] Test examples processed successfully for pattern reference")
else:
logger.info("[TESTGEN] No test examples content after processing")
# Remove files that appear in both 'files' and 'test_examples' to avoid duplicate embedding
# Files in test_examples take precedence as they're used for pattern reference
code_files_to_process = request.files.copy()
if request.test_examples:
# Normalize paths for comparison (resolve any relative paths, handle case sensitivity)
test_example_set = {os.path.normpath(os.path.abspath(f)) for f in request.test_examples}
original_count = len(code_files_to_process)
code_files_to_process = [
f for f in code_files_to_process if os.path.normpath(os.path.abspath(f)) not in test_example_set
# General investigation needed
return [
"Continue examining the codebase for additional test scenarios",
"Gather more evidence about code behavior and dependencies",
"Test your assumptions about how the code should be tested",
"Look for patterns that confirm your testing strategy",
"Focus on areas that haven't been thoroughly examined yet",
]
duplicates_removed = original_count - len(code_files_to_process)
if duplicates_removed > 0:
logger.info(
f"[TESTGEN] Removed {duplicates_removed} duplicate files from code files list "
f"(already included in test examples for pattern reference)"
)
def should_call_expert_analysis(self, consolidated_findings, request=None) -> bool:
"""
Decide when to call external model based on investigation completeness.
# Calculate remaining tokens for main code after test examples
if test_examples_content and available_tokens:
from utils.token_utils import estimate_tokens
Always call expert analysis for test generation to get additional test ideas.
"""
# Check if user requested to skip assistant model
if request and not self.get_request_use_assistant_model(request):
return False
test_tokens = estimate_tokens(test_examples_content)
remaining_tokens = available_tokens - test_tokens - 5000 # Reserve for prompt structure
logger.debug(
f"[TESTGEN] Token allocation: {test_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
# Always benefit from expert analysis for comprehensive test coverage
return len(consolidated_findings.relevant_files) > 0 or len(consolidated_findings.findings) >= 1
def prepare_expert_analysis_context(self, consolidated_findings) -> str:
"""Prepare context for external model call for test generation validation."""
context_parts = [
f"=== TEST GENERATION REQUEST ===\\n{self.initial_request or 'Test generation workflow initiated'}\\n=== END REQUEST ==="
]
# Add investigation summary
investigation_summary = self._build_test_generation_summary(consolidated_findings)
context_parts.append(
f"\\n=== CLAUDE'S TEST PLANNING INVESTIGATION ===\\n{investigation_summary}\\n=== END INVESTIGATION ==="
)
# Add relevant code elements if available
if consolidated_findings.relevant_context:
methods_text = "\\n".join(f"- {method}" for method in consolidated_findings.relevant_context)
context_parts.append(f"\\n=== CODE ELEMENTS TO TEST ===\\n{methods_text}\\n=== END CODE ELEMENTS ===")
# Add images if available
if consolidated_findings.images:
images_text = "\\n".join(f"- {img}" for img in consolidated_findings.images)
context_parts.append(f"\\n=== VISUAL DOCUMENTATION ===\\n{images_text}\\n=== END VISUAL DOCUMENTATION ===")
return "\\n".join(context_parts)
def _build_test_generation_summary(self, consolidated_findings) -> str:
"""Prepare a comprehensive summary of the test generation investigation."""
summary_parts = [
"=== SYSTEMATIC TEST GENERATION INVESTIGATION SUMMARY ===",
f"Total steps: {len(consolidated_findings.findings)}",
f"Files examined: {len(consolidated_findings.files_checked)}",
f"Relevant files identified: {len(consolidated_findings.relevant_files)}",
f"Code elements to test: {len(consolidated_findings.relevant_context)}",
"",
"=== INVESTIGATION PROGRESSION ===",
]
for finding in consolidated_findings.findings:
summary_parts.append(finding)
return "\\n".join(summary_parts)
def should_include_files_in_expert_prompt(self) -> bool:
"""Include files in expert analysis for comprehensive test generation."""
return True
def should_embed_system_prompt(self) -> bool:
"""Embed system prompt in expert analysis for proper context."""
return True
def get_expert_thinking_mode(self) -> str:
"""Use high thinking mode for thorough test generation analysis."""
return "high"
def get_expert_analysis_instruction(self) -> str:
"""Get specific instruction for test generation expert analysis."""
return (
"Please provide comprehensive test generation guidance based on the investigation findings. "
"Focus on identifying additional test scenarios, edge cases not yet covered, framework-specific "
"best practices, and providing concrete test implementation examples following the multi-agent "
"workflow specified in the system prompt."
)
# Hook method overrides for test generation-specific behavior
def prepare_step_data(self, request) -> dict:
"""
Map test generation-specific fields for internal processing.
"""
step_data = {
"step": request.step,
"step_number": request.step_number,
"findings": request.findings,
"files_checked": request.files_checked,
"relevant_files": request.relevant_files,
"relevant_context": request.relevant_context,
"confidence": request.confidence,
"images": request.images or [],
}
return step_data
def should_skip_expert_analysis(self, request, consolidated_findings) -> bool:
"""
Test generation workflow skips expert analysis when Claude has "certain" confidence.
"""
return request.confidence == "certain" and not request.next_step_required
def store_initial_issue(self, step_description: str):
"""Store initial request for expert analysis."""
self.initial_request = step_description
# Override inheritance hooks for test generation-specific behavior
def get_completion_status(self) -> str:
"""Test generation tools use test-specific status."""
return "test_generation_complete_ready_for_implementation"
def get_completion_data_key(self) -> str:
"""Test generation uses 'complete_test_generation' key."""
return "complete_test_generation"
def get_final_analysis_from_request(self, request):
"""Test generation tools use findings for final analysis."""
return request.findings
def get_confidence_level(self, request) -> str:
"""Test generation tools use 'certain' for high confidence."""
return "certain"
def get_completion_message(self) -> str:
"""Test generation-specific completion message."""
return (
"Test generation analysis complete with CERTAIN confidence. You have identified all test scenarios "
"and provided comprehensive coverage strategy. MANDATORY: Present the user with the complete test plan "
"and IMMEDIATELY proceed with creating the test files following the identified patterns and framework. "
"Focus on implementing concrete, runnable tests with proper assertions."
)
def get_skip_reason(self) -> str:
"""Test generation-specific skip reason."""
return "Claude completed comprehensive test planning with full confidence"
def get_skip_expert_analysis_status(self) -> str:
"""Test generation-specific expert analysis skip status."""
return "skipped_due_to_certain_test_confidence"
def prepare_work_summary(self) -> str:
"""Test generation-specific work summary."""
return self._build_test_generation_summary(self.consolidated_findings)
def get_completion_next_steps_message(self, expert_analysis_used: bool = False) -> str:
"""
Test generation-specific completion message.
"""
base_message = (
"TEST GENERATION ANALYSIS IS COMPLETE. You MUST now implement ALL identified test scenarios, "
"creating comprehensive test files that cover happy paths, edge cases, error conditions, and "
"boundary scenarios. Organize tests by functionality, use appropriate assertions, and follow "
"the identified framework patterns. Provide concrete, executable test code—make it easy for "
"a developer to run the tests and understand what each test validates."
)
# Add expert analysis guidance only when expert analysis was actually used
if expert_analysis_used:
expert_guidance = self.get_expert_analysis_guidance()
if expert_guidance:
return f"{base_message}\\n\\n{expert_guidance}"
return base_message
def get_expert_analysis_guidance(self) -> str:
"""
Provide specific guidance for handling expert analysis in test generation.
"""
return (
"IMPORTANT: Additional test scenarios and edge cases have been provided by the expert analysis above. "
"You MUST incorporate these suggestions into your test implementation, ensuring comprehensive coverage. "
"Validate that the expert's test ideas are practical and align with the codebase structure. Combine "
"your systematic investigation findings with the expert's additional scenarios to create a thorough "
"test suite that catches real-world bugs before they reach production."
)
def get_step_guidance_message(self, request) -> str:
"""
Test generation-specific step guidance with detailed investigation instructions.
"""
step_guidance = self.get_test_generation_step_guidance(request.step_number, request.confidence, request)
return step_guidance["next_steps"]
def get_test_generation_step_guidance(self, step_number: int, confidence: str, request) -> dict[str, Any]:
"""
Provide step-specific guidance for test generation workflow.
"""
# Generate the next steps instruction based on required actions
required_actions = self.get_required_actions(step_number, confidence, request.findings, request.total_steps)
if step_number == 1:
next_steps = (
f"MANDATORY: DO NOT call the {self.get_name()} tool again immediately. You MUST first analyze "
f"the code thoroughly using appropriate tools. CRITICAL AWARENESS: You need to understand "
f"the code structure, identify testable behaviors, find edge cases and boundary conditions, "
f"and determine the appropriate testing strategy. Use file reading tools, code analysis, and "
f"systematic examination to gather comprehensive information about what needs to be tested. "
f"Only call {self.get_name()} again AFTER completing your investigation. When you call "
f"{self.get_name()} next time, use step_number: {step_number + 1} and report specific "
f"code paths examined, test scenarios identified, and testing patterns discovered."
)
elif confidence in ["exploring", "low"]:
next_steps = (
f"STOP! Do NOT call {self.get_name()} again yet. Based on your findings, you've identified areas that need "
f"deeper analysis for test generation. MANDATORY ACTIONS before calling {self.get_name()} step {step_number + 1}:\\n"
+ "\\n".join(f"{i+1}. {action}" for i, action in enumerate(required_actions))
+ f"\\n\\nOnly call {self.get_name()} again with step_number: {step_number + 1} AFTER "
+ "completing these test planning tasks."
)
elif confidence in ["medium", "high"]:
next_steps = (
f"WAIT! Your test generation analysis needs final verification. DO NOT call {self.get_name()} immediately. REQUIRED ACTIONS:\\n"
+ "\\n".join(f"{i+1}. {action}" for i, action in enumerate(required_actions))
+ f"\\n\\nREMEMBER: Ensure you have identified all test scenarios including edge cases and error conditions. "
f"Document findings with specific test cases to implement, then call {self.get_name()} "
f"with step_number: {step_number + 1}."
)
else:
remaining_tokens = available_tokens - 10000 if available_tokens else None
if remaining_tokens:
logger.debug(
f"[TESTGEN] Token allocation: {remaining_tokens:,} tokens available for code files (no test examples)"
)
# Use centralized file processing logic for main code files (after deduplication)
logger.debug(f"[TESTGEN] Preparing {len(code_files_to_process)} code files for analysis")
code_content, processed_files = self._prepare_file_content_for_prompt(
code_files_to_process, continuation_id, "Code to test", max_tokens=remaining_tokens, reserve_tokens=2000
)
self._actually_processed_files = processed_files
if code_content:
from utils.token_utils import estimate_tokens
code_tokens = estimate_tokens(code_content)
logger.info(f"[TESTGEN] Code files embedded successfully: {code_tokens:,} tokens")
else:
logger.warning("[TESTGEN] No code content after file processing")
# Test generation is based on code analysis, no web search needed
logger.debug("[TESTGEN] Building complete test generation prompt")
# Build the complete prompt
prompt_parts = []
# Add system prompt
prompt_parts.append(self.get_system_prompt())
# Add user context
prompt_parts.append("=== USER CONTEXT ===")
prompt_parts.append(request.prompt)
prompt_parts.append("=== END CONTEXT ===")
# Add test examples if provided
if test_examples_content:
prompt_parts.append("\n=== TEST EXAMPLES FOR STYLE REFERENCE ===")
if test_examples_note:
prompt_parts.append(f"// {test_examples_note}")
prompt_parts.append(test_examples_content)
prompt_parts.append("=== END TEST EXAMPLES ===")
# Add main code to test
prompt_parts.append("\n=== CODE TO TEST ===")
prompt_parts.append(code_content)
prompt_parts.append("=== END CODE ===")
# Add generation instructions
prompt_parts.append(
"\nPlease analyze the code and generate comprehensive tests following the multi-agent workflow specified in the system prompt."
)
if test_examples_content:
prompt_parts.append(
"Use the provided test examples as a reference for style, framework, and testing patterns."
next_steps = (
f"PAUSE ANALYSIS. Before calling {self.get_name()} step {step_number + 1}, you MUST examine more code thoroughly. "
+ "Required: "
+ ", ".join(required_actions[:2])
+ ". "
+ f"Your next {self.get_name()} call (step_number: {step_number + 1}) must include "
f"NEW test scenarios from actual code analysis, not just theories. NO recursive {self.get_name()} calls "
f"without investigation work!"
)
full_prompt = "\n".join(prompt_parts)
return {"next_steps": next_steps}
# Log final prompt statistics
from utils.token_utils import estimate_tokens
total_tokens = estimate_tokens(full_prompt)
logger.info(f"[TESTGEN] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
return full_prompt
def format_response(self, response: str, request: TestGenerationRequest, model_info: Optional[dict] = None) -> str:
def customize_workflow_response(self, response_data: dict, request) -> dict:
"""
Format the test generation response.
Args:
response: The raw test generation from the model
request: The original request for context
model_info: Optional dict with model metadata
Returns:
str: Formatted response with next steps
Customize response to match test generation workflow format.
"""
return f"""{response}
# Store initial request on first step
if request.step_number == 1:
self.initial_request = request.step
---
# Convert generic status names to test generation-specific ones
tool_name = self.get_name()
status_mapping = {
f"{tool_name}_in_progress": "test_generation_in_progress",
f"pause_for_{tool_name}": "pause_for_test_analysis",
f"{tool_name}_required": "test_analysis_required",
f"{tool_name}_complete": "test_generation_complete",
}
Claude, you are now in EXECUTION MODE. Take immediate action:
if response_data["status"] in status_mapping:
response_data["status"] = status_mapping[response_data["status"]]
## Step 1: THINK & CREATE TESTS
ULTRATHINK while creating these in order to verify that every code reference, import, function name, and logic path is
100% accurate before saving.
# Rename status field to match test generation workflow
if f"{tool_name}_status" in response_data:
response_data["test_generation_status"] = response_data.pop(f"{tool_name}_status")
# Add test generation-specific status fields
response_data["test_generation_status"]["test_scenarios_identified"] = len(
self.consolidated_findings.relevant_context
)
response_data["test_generation_status"]["analysis_confidence"] = self.get_request_confidence(request)
- CREATE all test files in the correct project structure
- SAVE each test using proper naming conventions
- VALIDATE all imports, references, and dependencies are correct as required by the current framework / project / file
# Map complete_testgen to complete_test_generation
if f"complete_{tool_name}" in response_data:
response_data["complete_test_generation"] = response_data.pop(f"complete_{tool_name}")
## Step 2: DISPLAY RESULTS TO USER
After creating each test file, MUST show the user:
```
✅ Created: path/to/test_file.py
- test_function_name(): Brief description of what it tests
- test_another_function(): Brief description
- [Total: X test functions]
```
# Map the completion flag to match test generation workflow
if f"{tool_name}_complete" in response_data:
response_data["test_generation_complete"] = response_data.pop(f"{tool_name}_complete")
## Step 3: VALIDATE BY EXECUTION
CRITICAL: Run the tests immediately to confirm they work:
- Install any missing dependencies first or request user to perform step if this cannot be automated
- Execute the test suite
- Fix any failures or errors
- Confirm 100% pass rate. If there's a failure, re-iterate, go over each test, validate and understand why it's failing
return response_data
## Step 4: INTEGRATION VERIFICATION
- Verify tests integrate with existing test infrastructure
- Confirm test discovery works
- Validate test naming and organization
# Required abstract methods from BaseTool
def get_request_model(self):
"""Return the test generation workflow-specific request model."""
return TestGenRequest
## Step 5: MOVE TO NEXT ACTION
Once tests are confirmed working, immediately proceed to the next logical step for the project.
MANDATORY: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work and all of the
steps listed above are carried out correctly. Take full ownership of the testing implementation and move to your
next work. If you were supplied a more_work_required request in the response above, you MUST honor it."""
async def prepare_prompt(self, request) -> str:
"""Not used - workflow tools use execute_workflow()."""
return "" # Workflow tools use execute_workflow() directly