New tool: testgen
Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
This commit is contained in:
30
README.md
30
README.md
@@ -49,6 +49,7 @@ and review into consideration to aid with its pre-commit review.
|
||||
- [`precommit`](#4-precommit---pre-commit-validation) - Pre-commit validation
|
||||
- [`debug`](#5-debug---expert-debugging-assistant) - Debugging help
|
||||
- [`analyze`](#6-analyze---smart-file-analysis) - File analysis
|
||||
- [`testgen`](#7-testgen---comprehensive-test-generation) - Test generation with edge cases
|
||||
|
||||
- **Advanced Usage**
|
||||
- [Advanced Features](#advanced-features) - AI-to-AI conversations, large prompts, web search
|
||||
@@ -254,6 +255,7 @@ Just ask Claude naturally:
|
||||
- **Pre-commit validation?** → `precommit` (validate git changes before committing)
|
||||
- **Something's broken?** → `debug` (root cause analysis, error tracing)
|
||||
- **Want to understand code?** → `analyze` (architecture, patterns, dependencies)
|
||||
- **Need comprehensive tests?** → `testgen` (generates test suites with edge cases)
|
||||
- **Server info?** → `get_version` (version and configuration details)
|
||||
|
||||
**Auto Mode:** When `DEFAULT_MODEL=auto`, Claude automatically picks the best model for each task. You can override with: "Use flash for quick analysis" or "Use o3 to debug this".
|
||||
@@ -274,7 +276,8 @@ Just ask Claude naturally:
|
||||
4. [`precommit`](#4-precommit---pre-commit-validation) - Validate git changes before committing
|
||||
5. [`debug`](#5-debug---expert-debugging-assistant) - Root cause analysis and debugging
|
||||
6. [`analyze`](#6-analyze---smart-file-analysis) - General-purpose file and code analysis
|
||||
7. [`get_version`](#7-get_version---server-information) - Get server version and configuration
|
||||
7. [`testgen`](#7-testgen---comprehensive-test-generation) - Comprehensive test generation with edge case coverage
|
||||
8. [`get_version`](#8-get_version---server-information) - Get server version and configuration
|
||||
|
||||
### 1. `chat` - General Development Chat & Collaborative Thinking
|
||||
**Your thinking partner - bounce ideas, get second opinions, brainstorm collaboratively**
|
||||
@@ -421,7 +424,30 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
|
||||
- Uses file paths (not content) for clean terminal output
|
||||
- Can identify patterns, anti-patterns, and refactoring opportunities
|
||||
- **Web search capability**: When enabled with `use_websearch` (default: true), the model can request Claude to perform web searches and share results back to enhance analysis with current documentation, design patterns, and best practices
|
||||
### 7. `get_version` - Server Information
|
||||
### 7. `testgen` - Comprehensive Test Generation
|
||||
**Generates thorough test suites with edge case coverage** based on existing code and test framework used.
|
||||
|
||||
**Thinking Mode (Extended thinking models):** Default is `medium` (8,192 tokens). Use `high` for complex systems with many interactions or `max` for critical systems requiring exhaustive test coverage.
|
||||
|
||||
#### Example Prompts:
|
||||
|
||||
**Basic Usage:**
|
||||
```
|
||||
"Use zen to generate tests for User.login() method"
|
||||
"Generate comprehensive tests for the sorting method in src/new_sort.py using o3"
|
||||
"Create tests for edge cases not already covered in our tests using gemini pro"
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Multi-agent workflow analyzing code paths and identifying realistic failure modes
|
||||
- Generates framework-specific tests following project conventions
|
||||
- Supports test pattern following when examples are provided
|
||||
- Dynamic token allocation (25% for test examples, 75% for main code)
|
||||
- Prioritizes smallest test files for pattern detection
|
||||
- Can reference existing test files: `"Generate tests following patterns from tests/unit/"`
|
||||
- Specific code coverage - target specific functions/classes rather than testing everything
|
||||
|
||||
### 8. `get_version` - Server Information
|
||||
```
|
||||
"Get zen to show its version"
|
||||
```
|
||||
|
||||
@@ -14,7 +14,7 @@ import os
|
||||
# These values are used in server responses and for tracking releases
|
||||
# IMPORTANT: This is the single source of truth for version and author info
|
||||
# Semantic versioning: MAJOR.MINOR.PATCH
|
||||
__version__ = "4.3.3"
|
||||
__version__ = "4.4.0"
|
||||
# Last update date in ISO format
|
||||
__updated__ = "2025-06-14"
|
||||
# Primary maintainer
|
||||
|
||||
@@ -245,6 +245,20 @@ All tools that work with files support **both individual files and entire direct
|
||||
"Use o3 to think deeper about the logical flow in this algorithm"
|
||||
```
|
||||
|
||||
**`testgen`** - Comprehensive test generation with edge case coverage
|
||||
- `files`: Code files or directories to generate tests for (required)
|
||||
- `prompt`: Description of what to test, testing objectives, and scope (required)
|
||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
||||
- `test_examples`: Optional existing test files as style/pattern reference
|
||||
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
||||
|
||||
```
|
||||
"Generate tests for User.login() method with edge cases" (auto mode picks best model)
|
||||
"Use pro to generate comprehensive tests for src/payment.py with max thinking mode"
|
||||
"Use o3 to generate tests for algorithm correctness in sort_functions.py"
|
||||
"Generate tests following patterns from tests/unit/ for new auth module"
|
||||
```
|
||||
|
||||
## Collaborative Workflows
|
||||
|
||||
### Design → Review → Implement
|
||||
@@ -277,13 +291,15 @@ To help choose the right tool for your needs:
|
||||
1. **Have a specific error/exception?** → Use `debug`
|
||||
2. **Want to find bugs/issues in code?** → Use `codereview`
|
||||
3. **Want to understand how code works?** → Use `analyze`
|
||||
4. **Have analysis that needs extension/validation?** → Use `thinkdeep`
|
||||
5. **Want to brainstorm or discuss?** → Use `chat`
|
||||
4. **Need comprehensive test coverage?** → Use `testgen`
|
||||
5. **Have analysis that needs extension/validation?** → Use `thinkdeep`
|
||||
6. **Want to brainstorm or discuss?** → Use `chat`
|
||||
|
||||
**Key Distinctions:**
|
||||
- `analyze` vs `codereview`: analyze explains, codereview prescribes fixes
|
||||
- `chat` vs `thinkdeep`: chat is open-ended, thinkdeep extends specific analysis
|
||||
- `debug` vs `codereview`: debug diagnoses runtime errors, review finds static issues
|
||||
- `testgen` vs `debug`: testgen creates test suites, debug just finds issues and recommends solutions
|
||||
|
||||
## Working with Large Prompts
|
||||
|
||||
|
||||
@@ -44,6 +44,7 @@ from tools import (
|
||||
CodeReviewTool,
|
||||
DebugIssueTool,
|
||||
Precommit,
|
||||
TestGenTool,
|
||||
ThinkDeepTool,
|
||||
)
|
||||
from tools.models import ToolOutput
|
||||
@@ -144,6 +145,7 @@ TOOLS = {
|
||||
"analyze": AnalyzeTool(), # General-purpose file and code analysis
|
||||
"chat": ChatTool(), # Interactive development chat and brainstorming
|
||||
"precommit": Precommit(), # Pre-commit validation of git changes
|
||||
"testgen": TestGenTool(), # Comprehensive test generation with edge case coverage
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -19,6 +19,7 @@ from .test_openrouter_fallback import OpenRouterFallbackTest
|
||||
from .test_openrouter_models import OpenRouterModelsTest
|
||||
from .test_per_tool_deduplication import PerToolDeduplicationTest
|
||||
from .test_redis_validation import RedisValidationTest
|
||||
from .test_testgen_validation import TestGenValidationTest
|
||||
from .test_token_allocation_validation import TokenAllocationValidationTest
|
||||
|
||||
# Test registry for dynamic loading
|
||||
@@ -36,6 +37,7 @@ TEST_REGISTRY = {
|
||||
"openrouter_fallback": OpenRouterFallbackTest,
|
||||
"openrouter_models": OpenRouterModelsTest,
|
||||
"token_allocation_validation": TokenAllocationValidationTest,
|
||||
"testgen_validation": TestGenValidationTest,
|
||||
"conversation_chain_validation": ConversationChainValidationTest,
|
||||
}
|
||||
|
||||
@@ -54,6 +56,7 @@ __all__ = [
|
||||
"OpenRouterFallbackTest",
|
||||
"OpenRouterModelsTest",
|
||||
"TokenAllocationValidationTest",
|
||||
"TestGenValidationTest",
|
||||
"ConversationChainValidationTest",
|
||||
"TEST_REGISTRY",
|
||||
]
|
||||
|
||||
131
simulator_tests/test_testgen_validation.py
Normal file
131
simulator_tests/test_testgen_validation.py
Normal file
@@ -0,0 +1,131 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
TestGen Tool Validation Test
|
||||
|
||||
Tests the testgen tool by:
|
||||
- Creating a test code file with a specific function
|
||||
- Using testgen to generate tests with a specific function name
|
||||
- Validating that the output contains the expected test function
|
||||
- Confirming the format matches test generation patterns
|
||||
"""
|
||||
|
||||
from .base_test import BaseSimulatorTest
|
||||
|
||||
|
||||
class TestGenValidationTest(BaseSimulatorTest):
|
||||
"""Test testgen tool validation with specific function name"""
|
||||
|
||||
@property
|
||||
def test_name(self) -> str:
|
||||
return "testgen_validation"
|
||||
|
||||
@property
|
||||
def test_description(self) -> str:
|
||||
return "TestGen tool validation with specific test function"
|
||||
|
||||
def run_test(self) -> bool:
|
||||
"""Test testgen tool with specific function name validation"""
|
||||
try:
|
||||
self.logger.info("Test: TestGen tool validation")
|
||||
|
||||
# Setup test files
|
||||
self.setup_test_files()
|
||||
|
||||
# Create a specific code file for test generation
|
||||
test_code_content = '''"""
|
||||
Sample authentication module for testing testgen
|
||||
"""
|
||||
|
||||
class UserAuthenticator:
|
||||
"""Handles user authentication logic"""
|
||||
|
||||
def __init__(self):
|
||||
self.failed_attempts = {}
|
||||
self.max_attempts = 3
|
||||
|
||||
def validate_password(self, username, password):
|
||||
"""Validate user password with security checks"""
|
||||
if not username or not password:
|
||||
return False
|
||||
|
||||
if username in self.failed_attempts:
|
||||
if self.failed_attempts[username] >= self.max_attempts:
|
||||
return False # Account locked
|
||||
|
||||
# Simple validation for demo
|
||||
if len(password) < 8:
|
||||
self._record_failed_attempt(username)
|
||||
return False
|
||||
|
||||
if password == "password123": # Demo valid password
|
||||
self._reset_failed_attempts(username)
|
||||
return True
|
||||
|
||||
self._record_failed_attempt(username)
|
||||
return False
|
||||
|
||||
def _record_failed_attempt(self, username):
|
||||
"""Record a failed login attempt"""
|
||||
self.failed_attempts[username] = self.failed_attempts.get(username, 0) + 1
|
||||
|
||||
def _reset_failed_attempts(self, username):
|
||||
"""Reset failed attempts after successful login"""
|
||||
if username in self.failed_attempts:
|
||||
del self.failed_attempts[username]
|
||||
'''
|
||||
|
||||
# Create the auth code file
|
||||
auth_file = self.create_additional_test_file("user_auth.py", test_code_content)
|
||||
|
||||
# Test testgen tool with specific requirements
|
||||
self.logger.info(" 1.1: Generate tests with specific function name")
|
||||
response, continuation_id = self.call_mcp_tool(
|
||||
"testgen",
|
||||
{
|
||||
"files": [auth_file],
|
||||
"prompt": "Generate comprehensive tests for the UserAuthenticator.validate_password method. Include tests for edge cases, security scenarios, and account locking. Use the specific test function name 'test_password_validation_edge_cases' for one of the test methods.",
|
||||
"model": "flash",
|
||||
},
|
||||
)
|
||||
|
||||
if not response:
|
||||
self.logger.error("Failed to get testgen response")
|
||||
return False
|
||||
|
||||
self.logger.info(" 1.2: Validate response contains expected test function")
|
||||
|
||||
# Check that the response contains the specific test function name
|
||||
if "test_password_validation_edge_cases" not in response:
|
||||
self.logger.error("Response does not contain the requested test function name")
|
||||
self.logger.debug(f"Response content: {response[:500]}...")
|
||||
return False
|
||||
|
||||
# Check for common test patterns
|
||||
test_patterns = [
|
||||
"def test_", # Test function definition
|
||||
"assert", # Assertion statements
|
||||
"UserAuthenticator", # Class being tested
|
||||
"validate_password", # Method being tested
|
||||
]
|
||||
|
||||
missing_patterns = []
|
||||
for pattern in test_patterns:
|
||||
if pattern not in response:
|
||||
missing_patterns.append(pattern)
|
||||
|
||||
if missing_patterns:
|
||||
self.logger.error(f"Response missing expected test patterns: {missing_patterns}")
|
||||
self.logger.debug(f"Response content: {response[:500]}...")
|
||||
return False
|
||||
|
||||
self.logger.info(" ✅ TestGen tool validation successful")
|
||||
self.logger.info(" ✅ Generated tests contain expected function name")
|
||||
self.logger.info(" ✅ Generated tests follow proper test patterns")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"TestGen validation test failed: {e}")
|
||||
return False
|
||||
finally:
|
||||
self.cleanup_test_files()
|
||||
@@ -7,6 +7,7 @@ from .chat_prompt import CHAT_PROMPT
|
||||
from .codereview_prompt import CODEREVIEW_PROMPT
|
||||
from .debug_prompt import DEBUG_ISSUE_PROMPT
|
||||
from .precommit_prompt import PRECOMMIT_PROMPT
|
||||
from .testgen_prompt import TESTGEN_PROMPT
|
||||
from .thinkdeep_prompt import THINKDEEP_PROMPT
|
||||
|
||||
__all__ = [
|
||||
@@ -16,4 +17,5 @@ __all__ = [
|
||||
"ANALYZE_PROMPT",
|
||||
"CHAT_PROMPT",
|
||||
"PRECOMMIT_PROMPT",
|
||||
"TESTGEN_PROMPT",
|
||||
]
|
||||
|
||||
100
systemprompts/testgen_prompt.py
Normal file
100
systemprompts/testgen_prompt.py
Normal file
@@ -0,0 +1,100 @@
|
||||
"""
|
||||
TestGen tool system prompt
|
||||
"""
|
||||
|
||||
TESTGEN_PROMPT = """
|
||||
ROLE
|
||||
You are a principal software engineer who specialises in writing bullet-proof production code **and** surgical,
|
||||
high-signal test suites. You reason about control flow, data flow, mutation, concurrency, failure modes, and security
|
||||
in equal measure. Your mission: design and write tests that surface real-world defects before code ever leaves CI.
|
||||
|
||||
IF MORE INFORMATION IS NEEDED
|
||||
If you need additional context (e.g., test framework details, dependencies, existing test patterns) to provide
|
||||
accurate test generation, you MUST respond ONLY with this JSON format (and nothing else). Do NOT ask for the
|
||||
same file you've been provided unless for some reason its content is missing or incomplete:
|
||||
{"status": "clarification_required", "question": "<your brief question>",
|
||||
"files_needed": ["[file name here]", "[or some folder/]"]}
|
||||
|
||||
MULTI-AGENT WORKFLOW
|
||||
You sequentially inhabit five expert personas—each passes a concise artefact to the next:
|
||||
|
||||
1. **Context Profiler** – derives language(s), test framework(s), build tooling, domain constraints, and existing
|
||||
test idioms from the code snapshot provided.
|
||||
2. **Path Analyzer** – builds a map of reachable code paths (happy, error, exceptional) plus any external interactions
|
||||
that are directly involved (network, DB, file-system, IPC).
|
||||
3. **Adversarial Thinker** – enumerates realistic failures, boundary conditions, race conditions, and misuse patterns
|
||||
that historically break similar systems.
|
||||
4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or out-of-scope cases.
|
||||
5. **Test Scaffolder** – produces deterministic, isolated tests that follow the *project's* conventions (assert style,
|
||||
fixture layout, naming, any mocking strategy, language and tooling etc).
|
||||
|
||||
TEST-GENERATION STRATEGY
|
||||
- Start from public API / interface boundaries, then walk inward to critical private helpers.
|
||||
- Analyze function signatures, parameters, return types, and side effects
|
||||
- Map all code paths including happy paths and error conditions
|
||||
- Test behaviour, not implementation details, unless white-box inspection is required to reach untestable paths.
|
||||
- Include both positive and negative test cases
|
||||
- Prefer property-based or table-driven tests where inputs form simple algebraic domains.
|
||||
- Stub or fake **only** the minimal surface area needed; prefer in-memory fakes over mocks when feasible.
|
||||
- Flag any code that cannot be tested deterministically and suggest realistic refactors (seams, dependency injection,
|
||||
pure functions).
|
||||
- Surface concurrency hazards with stress or fuzz tests when the language/runtime supports them.
|
||||
- Focus on realistic failure modes that actually occur in production
|
||||
- Remain within scope of language, framework, project. Do not over-step. Do not add unnecessary dependencies.
|
||||
|
||||
EDGE-CASE TAXONOMY (REAL-WORLD, HIGH-VALUE)
|
||||
- **Data Shape Issues**: `null` / `undefined`, zero-length, surrogate-pair emojis, malformed UTF-8, mixed EOLs.
|
||||
- **Numeric Boundaries**: −1, 0, 1, `MAX_…`, floating-point rounding, 64-bit truncation.
|
||||
- **Temporal Pitfalls**: DST shifts, leap seconds, 29 Feb, Unix epoch 2038, timezone conversions.
|
||||
- **Collections & Iteration**: off-by-one, concurrent modification, empty vs singleton vs large (>10⁶ items).
|
||||
- **State & Sequence**: API calls out of order, idempotency violations, replay attacks.
|
||||
- **External Dependencies**: slow responses, 5xx, malformed JSON/XML, TLS errors, retry storms, cancelled promises.
|
||||
- **Concurrency / Async**: race conditions, deadlocks, promise rejection leaks, thread starvation.
|
||||
- **Resource Exhaustion**: memory spikes, file-descriptor leaks, connection-pool saturation.
|
||||
- **Locale & Encoding**: RTL scripts, uncommon locales, locale-specific formatting.
|
||||
- **Security Surfaces**: injection (SQL, shell, LDAP), path traversal, privilege escalation on shared state.
|
||||
|
||||
TEST QUALITY PRINCIPLES
|
||||
- Clear Arrange-Act-Assert sections (or given/when/then per project style) but retain and apply project norms, language
|
||||
norms and framework norms and best practices.
|
||||
- One behavioural assertion per test unless grouping is conventional.
|
||||
- Fast: sub-100 ms/unit test; parallelisable; no remote calls.
|
||||
- Deterministic: seeded randomness only; fixed stable clocks when time matters.
|
||||
- Self-documenting: names read like specs; failures explain *why*, not just *what*.
|
||||
|
||||
FRAMEWORK SELECTION
|
||||
Always autodetect from the repository. When a test framework or existing tests are not found, detect from existing
|
||||
code; examples:
|
||||
- **Swift / Objective-C** → XCTest (Xcode default) or Swift Testing (Apple provided frameworks)
|
||||
- **C# / .NET** → xUnit.net preferred; fall back to NUnit or MSTest if they dominate the repo.
|
||||
- **C / C++** → GoogleTest (gtest/gmock) or Catch2, matching existing tooling.
|
||||
- **JS/TS** → Jest, Vitest, Mocha, or project-specific wrapper.
|
||||
- **Python** → pytest, unittest.
|
||||
- **Java/Kotlin** → JUnit 5, TestNG.
|
||||
- **Go** → built-in `testing`, `testify`.
|
||||
- **Rust** → `#[test]`, `proptest`.
|
||||
- **Anything Else** → follow existing conventions; never introduce a new framework without strong justification.
|
||||
|
||||
IF FRAMEWORK SELECTION FAILS
|
||||
If you are unable to confidently determine which framework to use based on the existing test samples supplied, or if
|
||||
additional test samples would help in making a final decision, you MUST respond ONLY with this JSON
|
||||
format (and nothing else). Do NOT ask for the same file you've been provided unless for some reason its content
|
||||
is missing or incomplete:
|
||||
{"status": "test_sample_needed", "reason": "<brief reason why additional sampling is required>"}
|
||||
|
||||
SCOPE CONTROL
|
||||
Stay strictly within the presented codebase, tech stack, and domain.
|
||||
Do **not** invent features, frameworks, or speculative integrations.
|
||||
Do **not** write tests for functions or classes that do not exist.
|
||||
If a test idea falls outside project scope, discard it.
|
||||
If a test would be a "good to have" but seems impossible given the current structure, setup of the project, highlight
|
||||
it but do not approach or offer refactoring ideas.
|
||||
|
||||
DELIVERABLE
|
||||
Return only the artefacts (analysis summary, coverage plan, and generated tests) that fit the detected framework
|
||||
and code / project layout.
|
||||
No extra commentary, no generic boilerplate.
|
||||
Must comment and document logic, test reason / hypothesis in delivered code
|
||||
|
||||
Remember: your value is catching the hard bugs—not inflating coverage numbers.
|
||||
"""
|
||||
@@ -26,10 +26,11 @@ class TestServerTools:
|
||||
assert "analyze" in tool_names
|
||||
assert "chat" in tool_names
|
||||
assert "precommit" in tool_names
|
||||
assert "testgen" in tool_names
|
||||
assert "get_version" in tool_names
|
||||
|
||||
# Should have exactly 7 tools
|
||||
assert len(tools) == 7
|
||||
# Should have exactly 8 tools (including testgen)
|
||||
assert len(tools) == 8
|
||||
|
||||
# Check descriptions are verbose
|
||||
for tool in tools:
|
||||
|
||||
381
tests/test_testgen.py
Normal file
381
tests/test_testgen.py
Normal file
@@ -0,0 +1,381 @@
|
||||
"""
|
||||
Tests for TestGen tool implementation
|
||||
"""
|
||||
|
||||
import json
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.mock_helpers import create_mock_provider
|
||||
from tools.testgen import TestGenRequest, TestGenTool
|
||||
|
||||
|
||||
class TestTestGenTool:
|
||||
"""Test the TestGen tool"""
|
||||
|
||||
@pytest.fixture
|
||||
def tool(self):
|
||||
return TestGenTool()
|
||||
|
||||
@pytest.fixture
|
||||
def temp_files(self):
|
||||
"""Create temporary test files"""
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
temp_path = Path(temp_dir)
|
||||
|
||||
# Create sample code files
|
||||
code_file = temp_path / "calculator.py"
|
||||
code_file.write_text(
|
||||
"""
|
||||
def add(a, b):
|
||||
'''Add two numbers'''
|
||||
return a + b
|
||||
|
||||
def divide(a, b):
|
||||
'''Divide two numbers'''
|
||||
if b == 0:
|
||||
raise ValueError("Cannot divide by zero")
|
||||
return a / b
|
||||
"""
|
||||
)
|
||||
|
||||
# Create sample test files (different sizes)
|
||||
small_test = temp_path / "test_small.py"
|
||||
small_test.write_text(
|
||||
"""
|
||||
import unittest
|
||||
|
||||
class TestBasic(unittest.TestCase):
|
||||
def test_simple(self):
|
||||
self.assertEqual(1 + 1, 2)
|
||||
"""
|
||||
)
|
||||
|
||||
large_test = temp_path / "test_large.py"
|
||||
large_test.write_text(
|
||||
"""
|
||||
import unittest
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
class TestComprehensive(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.mock_data = Mock()
|
||||
|
||||
def test_feature_one(self):
|
||||
# Comprehensive test with lots of setup
|
||||
result = self.process_data()
|
||||
self.assertIsNotNone(result)
|
||||
|
||||
def test_feature_two(self):
|
||||
# Another comprehensive test
|
||||
with patch('some.module') as mock_module:
|
||||
mock_module.return_value = 'test'
|
||||
result = self.process_data()
|
||||
self.assertEqual(result, 'expected')
|
||||
|
||||
def process_data(self):
|
||||
return "test_result"
|
||||
"""
|
||||
)
|
||||
|
||||
yield {
|
||||
"temp_dir": temp_dir,
|
||||
"code_file": str(code_file),
|
||||
"small_test": str(small_test),
|
||||
"large_test": str(large_test),
|
||||
}
|
||||
|
||||
def test_tool_metadata(self, tool):
|
||||
"""Test tool metadata"""
|
||||
assert tool.get_name() == "testgen"
|
||||
assert "COMPREHENSIVE TEST GENERATION" in tool.get_description()
|
||||
assert "BE SPECIFIC about scope" in tool.get_description()
|
||||
assert tool.get_default_temperature() == 0.2 # Analytical temperature
|
||||
|
||||
# Check model category
|
||||
from tools.models import ToolModelCategory
|
||||
|
||||
assert tool.get_model_category() == ToolModelCategory.EXTENDED_REASONING
|
||||
|
||||
def test_input_schema_structure(self, tool):
|
||||
"""Test input schema structure"""
|
||||
schema = tool.get_input_schema()
|
||||
|
||||
# Required fields
|
||||
assert "files" in schema["properties"]
|
||||
assert "prompt" in schema["properties"]
|
||||
assert "files" in schema["required"]
|
||||
assert "prompt" in schema["required"]
|
||||
|
||||
# Optional fields
|
||||
assert "test_examples" in schema["properties"]
|
||||
assert "thinking_mode" in schema["properties"]
|
||||
assert "continuation_id" in schema["properties"]
|
||||
|
||||
# Should not have temperature or use_websearch
|
||||
assert "temperature" not in schema["properties"]
|
||||
assert "use_websearch" not in schema["properties"]
|
||||
|
||||
# Check test_examples description
|
||||
test_examples_desc = schema["properties"]["test_examples"]["description"]
|
||||
assert "absolute paths" in test_examples_desc
|
||||
assert "smallest representative tests" in test_examples_desc
|
||||
|
||||
def test_request_model_validation(self):
|
||||
"""Test request model validation"""
|
||||
# Valid request
|
||||
valid_request = TestGenRequest(files=["/tmp/test.py"], prompt="Generate tests for calculator functions")
|
||||
assert valid_request.files == ["/tmp/test.py"]
|
||||
assert valid_request.prompt == "Generate tests for calculator functions"
|
||||
assert valid_request.test_examples is None
|
||||
|
||||
# With test examples
|
||||
request_with_examples = TestGenRequest(
|
||||
files=["/tmp/test.py"], prompt="Generate tests", test_examples=["/tmp/test_example.py"]
|
||||
)
|
||||
assert request_with_examples.test_examples == ["/tmp/test_example.py"]
|
||||
|
||||
# Invalid request (missing required fields)
|
||||
with pytest.raises(ValueError):
|
||||
TestGenRequest(files=["/tmp/test.py"]) # Missing prompt
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@patch("tools.base.BaseTool.get_model_provider")
|
||||
async def test_execute_success(self, mock_get_provider, tool, temp_files):
|
||||
"""Test successful execution"""
|
||||
# Mock provider
|
||||
mock_provider = create_mock_provider()
|
||||
mock_provider.get_provider_type.return_value = Mock(value="google")
|
||||
mock_provider.generate_content.return_value = Mock(
|
||||
content="Generated comprehensive test suite with edge cases",
|
||||
usage={"input_tokens": 100, "output_tokens": 200},
|
||||
model_name="gemini-2.5-flash-preview-05-20",
|
||||
metadata={"finish_reason": "STOP"},
|
||||
)
|
||||
mock_get_provider.return_value = mock_provider
|
||||
|
||||
result = await tool.execute(
|
||||
{"files": [temp_files["code_file"]], "prompt": "Generate comprehensive tests for the calculator functions"}
|
||||
)
|
||||
|
||||
# Verify result structure
|
||||
assert len(result) == 1
|
||||
response_data = json.loads(result[0].text)
|
||||
assert response_data["status"] == "success"
|
||||
assert "Generated comprehensive test suite" in response_data["content"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@patch("tools.base.BaseTool.get_model_provider")
|
||||
async def test_execute_with_test_examples(self, mock_get_provider, tool, temp_files):
|
||||
"""Test execution with test examples"""
|
||||
mock_provider = create_mock_provider()
|
||||
mock_provider.generate_content.return_value = Mock(
|
||||
content="Generated tests following the provided examples",
|
||||
usage={"input_tokens": 150, "output_tokens": 250},
|
||||
model_name="gemini-2.5-flash-preview-05-20",
|
||||
metadata={"finish_reason": "STOP"},
|
||||
)
|
||||
mock_get_provider.return_value = mock_provider
|
||||
|
||||
result = await tool.execute(
|
||||
{
|
||||
"files": [temp_files["code_file"]],
|
||||
"prompt": "Generate tests following existing patterns",
|
||||
"test_examples": [temp_files["small_test"]],
|
||||
}
|
||||
)
|
||||
|
||||
# Verify result
|
||||
assert len(result) == 1
|
||||
response_data = json.loads(result[0].text)
|
||||
assert response_data["status"] == "success"
|
||||
|
||||
def test_process_test_examples_empty(self, tool):
|
||||
"""Test processing empty test examples"""
|
||||
content, note = tool._process_test_examples([], None)
|
||||
assert content == ""
|
||||
assert note == ""
|
||||
|
||||
def test_process_test_examples_budget_allocation(self, tool, temp_files):
|
||||
"""Test token budget allocation for test examples"""
|
||||
with patch.object(tool, "filter_new_files") as mock_filter:
|
||||
mock_filter.return_value = [temp_files["small_test"], temp_files["large_test"]]
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "Mocked test content"
|
||||
|
||||
# Test with available tokens
|
||||
content, note = tool._process_test_examples(
|
||||
[temp_files["small_test"], temp_files["large_test"]], None, available_tokens=100000
|
||||
)
|
||||
|
||||
# Should allocate 25% of 100k = 25k tokens for test examples
|
||||
mock_prepare.assert_called_once()
|
||||
call_args = mock_prepare.call_args
|
||||
assert call_args[1]["max_tokens"] == 25000 # 25% of 100k
|
||||
|
||||
def test_process_test_examples_size_sorting(self, tool, temp_files):
|
||||
"""Test that test examples are sorted by size (smallest first)"""
|
||||
with patch.object(tool, "filter_new_files") as mock_filter:
|
||||
# Return files in random order
|
||||
mock_filter.return_value = [temp_files["large_test"], temp_files["small_test"]]
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "test content"
|
||||
|
||||
tool._process_test_examples(
|
||||
[temp_files["large_test"], temp_files["small_test"]], None, available_tokens=50000
|
||||
)
|
||||
|
||||
# Check that files were passed in size order (smallest first)
|
||||
call_args = mock_prepare.call_args[0]
|
||||
files_passed = call_args[0]
|
||||
|
||||
# Verify smallest file comes first
|
||||
assert files_passed[0] == temp_files["small_test"]
|
||||
assert files_passed[1] == temp_files["large_test"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_prepare_prompt_structure(self, tool, temp_files):
|
||||
"""Test prompt preparation structure"""
|
||||
request = TestGenRequest(files=[temp_files["code_file"]], prompt="Test the calculator functions")
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "mocked file content"
|
||||
|
||||
prompt = await tool.prepare_prompt(request)
|
||||
|
||||
# Check prompt structure
|
||||
assert "=== USER CONTEXT ===" in prompt
|
||||
assert "Test the calculator functions" in prompt
|
||||
assert "=== CODE TO TEST ===" in prompt
|
||||
assert "mocked file content" in prompt
|
||||
assert tool.get_system_prompt() in prompt
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_prepare_prompt_with_examples(self, tool, temp_files):
|
||||
"""Test prompt preparation with test examples"""
|
||||
request = TestGenRequest(
|
||||
files=[temp_files["code_file"]], prompt="Generate tests", test_examples=[temp_files["small_test"]]
|
||||
)
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "mocked content"
|
||||
|
||||
with patch.object(tool, "_process_test_examples") as mock_process:
|
||||
mock_process.return_value = ("test examples content", "Note: examples included")
|
||||
|
||||
prompt = await tool.prepare_prompt(request)
|
||||
|
||||
# Check test examples section
|
||||
assert "=== TEST EXAMPLES FOR STYLE REFERENCE ===" in prompt
|
||||
assert "test examples content" in prompt
|
||||
assert "Note: examples included" in prompt
|
||||
|
||||
def test_format_response(self, tool):
|
||||
"""Test response formatting"""
|
||||
request = TestGenRequest(files=["/tmp/test.py"], prompt="Generate tests")
|
||||
|
||||
raw_response = "Generated test cases with edge cases"
|
||||
formatted = tool.format_response(raw_response, request)
|
||||
|
||||
# Check formatting includes next steps
|
||||
assert raw_response in formatted
|
||||
assert "**Next Steps:**" in formatted
|
||||
assert "Review Generated Tests" in formatted
|
||||
assert "Setup Test Environment" in formatted
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_error_handling_invalid_files(self, tool):
|
||||
"""Test error handling for invalid file paths"""
|
||||
result = await tool.execute(
|
||||
{"files": ["relative/path.py"], "prompt": "Generate tests"} # Invalid: not absolute
|
||||
)
|
||||
|
||||
# Should return error for relative path
|
||||
response_data = json.loads(result[0].text)
|
||||
assert response_data["status"] == "error"
|
||||
assert "absolute" in response_data["content"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_large_prompt_handling(self, tool):
|
||||
"""Test handling of large prompts"""
|
||||
large_prompt = "x" * 60000 # Exceeds MCP_PROMPT_SIZE_LIMIT
|
||||
|
||||
result = await tool.execute({"files": ["/tmp/test.py"], "prompt": large_prompt})
|
||||
|
||||
# Should return resend_prompt status
|
||||
response_data = json.loads(result[0].text)
|
||||
assert response_data["status"] == "resend_prompt"
|
||||
assert "too large" in response_data["content"]
|
||||
|
||||
def test_token_budget_calculation(self, tool):
|
||||
"""Test token budget calculation logic"""
|
||||
# Mock model capabilities
|
||||
with patch.object(tool, "get_model_provider") as mock_get_provider:
|
||||
mock_provider = create_mock_provider(context_window=200000)
|
||||
mock_get_provider.return_value = mock_provider
|
||||
|
||||
# Simulate model name being set
|
||||
tool._current_model_name = "test-model"
|
||||
|
||||
with patch.object(tool, "_process_test_examples") as mock_process:
|
||||
mock_process.return_value = ("test content", "")
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "code content"
|
||||
|
||||
request = TestGenRequest(
|
||||
files=["/tmp/test.py"], prompt="Test prompt", test_examples=["/tmp/example.py"]
|
||||
)
|
||||
|
||||
# This should trigger token budget calculation
|
||||
import asyncio
|
||||
|
||||
asyncio.run(tool.prepare_prompt(request))
|
||||
|
||||
# Verify test examples got 25% of 150k tokens (75% of 200k context)
|
||||
mock_process.assert_called_once()
|
||||
call_args = mock_process.call_args[0]
|
||||
assert call_args[2] == 150000 # 75% of 200k context window
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_continuation_support(self, tool, temp_files):
|
||||
"""Test continuation ID support"""
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "code content"
|
||||
|
||||
request = TestGenRequest(
|
||||
files=[temp_files["code_file"]], prompt="Continue testing", continuation_id="test-thread-123"
|
||||
)
|
||||
|
||||
await tool.prepare_prompt(request)
|
||||
|
||||
# Verify continuation_id was passed to _prepare_file_content_for_prompt
|
||||
# The method should be called twice (once for code, once for test examples logic)
|
||||
assert mock_prepare.call_count >= 1
|
||||
|
||||
# Check that continuation_id was passed in at least one call
|
||||
calls = mock_prepare.call_args_list
|
||||
continuation_passed = any(
|
||||
call[0][1] == "test-thread-123" for call in calls # continuation_id is second argument
|
||||
)
|
||||
assert continuation_passed, f"continuation_id not passed. Calls: {calls}"
|
||||
|
||||
def test_no_websearch_in_prompt(self, tool, temp_files):
|
||||
"""Test that web search instructions are not included"""
|
||||
request = TestGenRequest(files=[temp_files["code_file"]], prompt="Generate tests")
|
||||
|
||||
with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare:
|
||||
mock_prepare.return_value = "code content"
|
||||
|
||||
import asyncio
|
||||
|
||||
prompt = asyncio.run(tool.prepare_prompt(request))
|
||||
|
||||
# Should not contain web search instructions
|
||||
assert "WEB SEARCH CAPABILITY" not in prompt
|
||||
assert "web search" not in prompt.lower()
|
||||
@@ -284,6 +284,22 @@ class TestAbsolutePathValidation:
|
||||
assert "must be absolute" in response["content"]
|
||||
assert "code.py" in response["content"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_testgen_tool_relative_path_rejected(self):
|
||||
"""Test that testgen tool rejects relative paths"""
|
||||
from tools import TestGenTool
|
||||
|
||||
tool = TestGenTool()
|
||||
result = await tool.execute(
|
||||
{"files": ["src/main.py"], "prompt": "Generate tests for the functions"} # relative path
|
||||
)
|
||||
|
||||
assert len(result) == 1
|
||||
response = json.loads(result[0].text)
|
||||
assert response["status"] == "error"
|
||||
assert "must be absolute" in response["content"]
|
||||
assert "src/main.py" in response["content"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@patch("tools.AnalyzeTool.get_model_provider")
|
||||
async def test_analyze_tool_accepts_absolute_paths(self, mock_get_provider):
|
||||
|
||||
@@ -7,6 +7,7 @@ from .chat import ChatTool
|
||||
from .codereview import CodeReviewTool
|
||||
from .debug import DebugIssueTool
|
||||
from .precommit import Precommit
|
||||
from .testgen import TestGenTool
|
||||
from .thinkdeep import ThinkDeepTool
|
||||
|
||||
__all__ = [
|
||||
@@ -16,4 +17,5 @@ __all__ = [
|
||||
"AnalyzeTool",
|
||||
"ChatTool",
|
||||
"Precommit",
|
||||
"TestGenTool",
|
||||
]
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
Code Review tool - Comprehensive code analysis and review
|
||||
|
||||
This tool provides professional-grade code review capabilities using
|
||||
Gemini's understanding of code patterns, best practices, and common issues.
|
||||
the chosen model's understanding of code patterns, best practices, and common issues.
|
||||
It can analyze individual files or entire codebases, providing actionable
|
||||
feedback categorized by severity.
|
||||
|
||||
@@ -177,7 +177,7 @@ class CodeReviewTool(BaseTool):
|
||||
request: The validated review request
|
||||
|
||||
Returns:
|
||||
str: Complete prompt for the Gemini model
|
||||
str: Complete prompt for the model
|
||||
|
||||
Raises:
|
||||
ValueError: If the code exceeds token limits
|
||||
|
||||
429
tools/testgen.py
Normal file
429
tools/testgen.py
Normal file
@@ -0,0 +1,429 @@
|
||||
"""
|
||||
TestGen tool - Comprehensive test suite generation with edge case coverage
|
||||
|
||||
This tool generates comprehensive test suites by analyzing code paths,
|
||||
identifying edge cases, and producing test scaffolding that follows
|
||||
project conventions when test examples are provided.
|
||||
|
||||
Key Features:
|
||||
- Multi-file and directory support
|
||||
- Framework detection from existing tests
|
||||
- Edge case identification (nulls, boundaries, async issues, etc.)
|
||||
- Test pattern following when examples provided
|
||||
- Deterministic test example sampling for large test suites
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from typing import Any, Optional
|
||||
|
||||
from mcp.types import TextContent
|
||||
from pydantic import Field
|
||||
|
||||
from config import TEMPERATURE_ANALYTICAL
|
||||
from systemprompts import TESTGEN_PROMPT
|
||||
|
||||
from .base import BaseTool, ToolRequest
|
||||
from .models import ToolOutput
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TestGenRequest(ToolRequest):
|
||||
"""
|
||||
Request model for the test generation tool.
|
||||
|
||||
This model defines all parameters that can be used to customize
|
||||
the test generation process, from selecting code files to providing
|
||||
test examples for style consistency.
|
||||
"""
|
||||
|
||||
files: list[str] = Field(
|
||||
...,
|
||||
description="Code files or directories to generate tests for (must be absolute paths)",
|
||||
)
|
||||
prompt: str = Field(
|
||||
...,
|
||||
description="Description of what to test, testing objectives, and specific scope/focus areas",
|
||||
)
|
||||
test_examples: Optional[list[str]] = Field(
|
||||
None,
|
||||
description=(
|
||||
"Optional existing test files or directories to use as style/pattern reference (must be absolute paths). "
|
||||
"If not provided, the tool will determine the best testing approach based on the code structure. "
|
||||
"For large test directories, only the smallest representative tests should be included to determine testing patterns. "
|
||||
"If similar tests exist for the code being tested, include those for the most relevant patterns."
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class TestGenTool(BaseTool):
|
||||
"""
|
||||
Test generation tool implementation.
|
||||
|
||||
This tool analyzes code to generate comprehensive test suites with
|
||||
edge case coverage, following existing test patterns when examples
|
||||
are provided.
|
||||
"""
|
||||
|
||||
def get_name(self) -> str:
|
||||
return "testgen"
|
||||
|
||||
def get_description(self) -> str:
|
||||
return (
|
||||
"COMPREHENSIVE TEST GENERATION - Creates thorough test suites with edge case coverage. "
|
||||
"Use this when you need to generate tests for code, create test scaffolding, or improve test coverage. "
|
||||
"BE SPECIFIC about scope: target specific functions/classes/modules rather than testing everything. "
|
||||
"Examples: 'Generate tests for User.login() method', 'Test payment processing validation', "
|
||||
"'Create tests for authentication error handling'. If user request is vague, either ask for "
|
||||
"clarification about specific components to test, or make focused scope decisions and explain them. "
|
||||
"Analyzes code paths, identifies realistic failure modes, and generates framework-specific tests. "
|
||||
"Supports test pattern following when examples are provided. "
|
||||
"Choose thinking_mode based on code complexity: 'low' for simple functions, "
|
||||
"'medium' for standard modules (default), 'high' for complex systems with many interactions, "
|
||||
"'max' for critical systems requiring exhaustive test coverage. "
|
||||
"Note: If you're not currently using a top-tier model such as Opus 4 or above, these tools can provide enhanced capabilities."
|
||||
)
|
||||
|
||||
def get_input_schema(self) -> dict[str, Any]:
|
||||
schema = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"files": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": "Code files or directories to generate tests for (must be absolute paths)",
|
||||
},
|
||||
"model": self.get_model_field_schema(),
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "Description of what to test, testing objectives, and specific scope/focus areas",
|
||||
},
|
||||
"test_examples": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"description": (
|
||||
"Optional existing test files or directories to use as style/pattern reference (must be absolute paths). "
|
||||
"If not provided, the tool will determine the best testing approach based on the code structure. "
|
||||
"For large test directories, only the smallest representative tests will be included to determine testing patterns. "
|
||||
"If similar tests exist for the code being tested, include those for the most relevant patterns."
|
||||
),
|
||||
},
|
||||
"thinking_mode": {
|
||||
"type": "string",
|
||||
"enum": ["minimal", "low", "medium", "high", "max"],
|
||||
"description": "Thinking depth: minimal (0.5% of model max), low (8%), medium (33%), high (67%), max (100% of model max)",
|
||||
},
|
||||
"continuation_id": {
|
||||
"type": "string",
|
||||
"description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.",
|
||||
},
|
||||
},
|
||||
"required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
|
||||
}
|
||||
|
||||
return schema
|
||||
|
||||
def get_system_prompt(self) -> str:
|
||||
return TESTGEN_PROMPT
|
||||
|
||||
def get_default_temperature(self) -> float:
|
||||
return TEMPERATURE_ANALYTICAL
|
||||
|
||||
def get_model_category(self):
|
||||
"""TestGen requires extended reasoning for comprehensive test analysis"""
|
||||
from tools.models import ToolModelCategory
|
||||
|
||||
return ToolModelCategory.EXTENDED_REASONING
|
||||
|
||||
def get_request_model(self):
|
||||
return TestGenRequest
|
||||
|
||||
async def execute(self, arguments: dict[str, Any]) -> list[TextContent]:
|
||||
"""Override execute to check prompt size before processing"""
|
||||
# First validate request
|
||||
request_model = self.get_request_model()
|
||||
request = request_model(**arguments)
|
||||
|
||||
# Check prompt size if provided
|
||||
if request.prompt:
|
||||
size_check = self.check_prompt_size(request.prompt)
|
||||
if size_check:
|
||||
return [TextContent(type="text", text=ToolOutput(**size_check).model_dump_json())]
|
||||
|
||||
# Continue with normal execution
|
||||
return await super().execute(arguments)
|
||||
|
||||
def _process_test_examples(
|
||||
self, test_examples: list[str], continuation_id: Optional[str], available_tokens: int = None
|
||||
) -> tuple[str, str]:
|
||||
"""
|
||||
Process test example files using available token budget for optimal sampling.
|
||||
|
||||
Args:
|
||||
test_examples: List of test file paths
|
||||
continuation_id: Continuation ID for filtering already embedded files
|
||||
available_tokens: Available token budget for test examples
|
||||
|
||||
Returns:
|
||||
tuple: (formatted_content, summary_note)
|
||||
"""
|
||||
logger.debug(f"[TESTGEN] Processing {len(test_examples)} test examples")
|
||||
|
||||
if not test_examples:
|
||||
logger.debug("[TESTGEN] No test examples provided")
|
||||
return "", ""
|
||||
|
||||
# Use existing file filtering to avoid duplicates in continuation
|
||||
examples_to_process = self.filter_new_files(test_examples, continuation_id)
|
||||
logger.debug(f"[TESTGEN] After filtering: {len(examples_to_process)} new test examples to process")
|
||||
|
||||
if not examples_to_process:
|
||||
logger.info(f"[TESTGEN] All {len(test_examples)} test examples already in conversation history")
|
||||
return "", ""
|
||||
|
||||
# Calculate token budget for test examples (25% of available tokens, or fallback)
|
||||
if available_tokens:
|
||||
test_examples_budget = int(available_tokens * 0.25) # 25% for test examples
|
||||
logger.debug(
|
||||
f"[TESTGEN] Allocating {test_examples_budget:,} tokens (25% of {available_tokens:,}) for test examples"
|
||||
)
|
||||
else:
|
||||
test_examples_budget = 30000 # Fallback if no budget provided
|
||||
logger.debug(f"[TESTGEN] Using fallback budget of {test_examples_budget:,} tokens for test examples")
|
||||
|
||||
original_count = len(examples_to_process)
|
||||
logger.debug(
|
||||
f"[TESTGEN] Processing {original_count} test example files with {test_examples_budget:,} token budget"
|
||||
)
|
||||
|
||||
# Sort by file size (smallest first) for pattern-focused selection
|
||||
file_sizes = []
|
||||
for file_path in examples_to_process:
|
||||
try:
|
||||
size = os.path.getsize(file_path)
|
||||
file_sizes.append((file_path, size))
|
||||
logger.debug(f"[TESTGEN] Test example {os.path.basename(file_path)}: {size:,} bytes")
|
||||
except (OSError, FileNotFoundError) as e:
|
||||
# If we can't get size, put it at the end
|
||||
logger.warning(f"[TESTGEN] Could not get size for {file_path}: {e}")
|
||||
file_sizes.append((file_path, float("inf")))
|
||||
|
||||
# Sort by size and take smallest files for pattern reference
|
||||
file_sizes.sort(key=lambda x: x[1])
|
||||
examples_to_process = [f[0] for f in file_sizes] # All files, sorted by size
|
||||
logger.debug(
|
||||
f"[TESTGEN] Sorted test examples by size (smallest first): {[os.path.basename(f) for f in examples_to_process]}"
|
||||
)
|
||||
|
||||
# Use standard file content preparation with dynamic token budget
|
||||
try:
|
||||
logger.debug(f"[TESTGEN] Preparing file content for {len(examples_to_process)} test examples")
|
||||
content = self._prepare_file_content_for_prompt(
|
||||
examples_to_process,
|
||||
continuation_id,
|
||||
"Test examples",
|
||||
max_tokens=test_examples_budget,
|
||||
reserve_tokens=1000,
|
||||
)
|
||||
|
||||
# Determine how many files were actually included
|
||||
if content:
|
||||
from utils.token_utils import estimate_tokens
|
||||
|
||||
used_tokens = estimate_tokens(content)
|
||||
logger.info(
|
||||
f"[TESTGEN] Successfully embedded test examples: {used_tokens:,} tokens used ({test_examples_budget:,} available)"
|
||||
)
|
||||
if original_count > 1:
|
||||
truncation_note = f"Note: Used {used_tokens:,} tokens ({test_examples_budget:,} available) for test examples from {original_count} files to determine testing patterns."
|
||||
else:
|
||||
truncation_note = ""
|
||||
else:
|
||||
logger.warning("[TESTGEN] No content generated for test examples")
|
||||
truncation_note = ""
|
||||
|
||||
return content, truncation_note
|
||||
|
||||
except Exception as e:
|
||||
# If test example processing fails, continue without examples rather than failing
|
||||
logger.error(f"[TESTGEN] Failed to process test examples: {type(e).__name__}: {e}")
|
||||
return "", f"Warning: Could not process test examples: {str(e)}"
|
||||
|
||||
async def prepare_prompt(self, request: TestGenRequest) -> str:
|
||||
"""
|
||||
Prepare the test generation prompt with code analysis and optional test examples.
|
||||
|
||||
This method reads the requested files, processes any test examples,
|
||||
and constructs a detailed prompt for comprehensive test generation.
|
||||
|
||||
Args:
|
||||
request: The validated test generation request
|
||||
|
||||
Returns:
|
||||
str: Complete prompt for the model
|
||||
|
||||
Raises:
|
||||
ValueError: If the code exceeds token limits
|
||||
"""
|
||||
logger.debug(f"[TESTGEN] Preparing prompt for {len(request.files)} code files")
|
||||
if request.test_examples:
|
||||
logger.debug(f"[TESTGEN] Including {len(request.test_examples)} test examples for pattern reference")
|
||||
# Check for prompt.txt in files
|
||||
prompt_content, updated_files = self.handle_prompt_file(request.files)
|
||||
|
||||
# If prompt.txt was found, incorporate it into the prompt
|
||||
if prompt_content:
|
||||
logger.debug("[TESTGEN] Found prompt.txt file, incorporating content")
|
||||
request.prompt = prompt_content + "\n\n" + request.prompt
|
||||
|
||||
# Update request files list
|
||||
if updated_files is not None:
|
||||
logger.debug(f"[TESTGEN] Updated files list after prompt.txt processing: {len(updated_files)} files")
|
||||
request.files = updated_files
|
||||
|
||||
# Calculate available token budget for dynamic allocation
|
||||
continuation_id = getattr(request, "continuation_id", None)
|
||||
|
||||
# Get model context for token budget calculation
|
||||
model_name = getattr(self, "_current_model_name", None)
|
||||
available_tokens = None
|
||||
|
||||
if model_name:
|
||||
try:
|
||||
provider = self.get_model_provider(model_name)
|
||||
capabilities = provider.get_capabilities(model_name)
|
||||
# Use 75% of context for content (code + test examples), 25% for response
|
||||
available_tokens = int(capabilities.context_window * 0.75)
|
||||
logger.debug(
|
||||
f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}"
|
||||
)
|
||||
except Exception as e:
|
||||
# Fallback to conservative estimate
|
||||
logger.warning(f"[TESTGEN] Could not get model capabilities for {model_name}: {e}")
|
||||
available_tokens = 120000 # Conservative fallback
|
||||
logger.debug(f"[TESTGEN] Using fallback token budget: {available_tokens:,} tokens")
|
||||
|
||||
# Process test examples first to determine token allocation
|
||||
test_examples_content = ""
|
||||
test_examples_note = ""
|
||||
|
||||
if request.test_examples:
|
||||
logger.debug(f"[TESTGEN] Processing {len(request.test_examples)} test examples")
|
||||
test_examples_content, test_examples_note = self._process_test_examples(
|
||||
request.test_examples, continuation_id, available_tokens
|
||||
)
|
||||
if test_examples_content:
|
||||
logger.info("[TESTGEN] Test examples processed successfully for pattern reference")
|
||||
else:
|
||||
logger.info("[TESTGEN] No test examples content after processing")
|
||||
|
||||
# Calculate remaining tokens for main code after test examples
|
||||
if test_examples_content and available_tokens:
|
||||
from utils.token_utils import estimate_tokens
|
||||
|
||||
test_tokens = estimate_tokens(test_examples_content)
|
||||
remaining_tokens = available_tokens - test_tokens - 5000 # Reserve for prompt structure
|
||||
logger.debug(
|
||||
f"[TESTGEN] Token allocation: {test_tokens:,} for examples, {remaining_tokens:,} remaining for code files"
|
||||
)
|
||||
else:
|
||||
remaining_tokens = available_tokens - 10000 if available_tokens else None
|
||||
if remaining_tokens:
|
||||
logger.debug(
|
||||
f"[TESTGEN] Token allocation: {remaining_tokens:,} tokens available for code files (no test examples)"
|
||||
)
|
||||
|
||||
# Use centralized file processing logic for main code files
|
||||
logger.debug(f"[TESTGEN] Preparing {len(request.files)} code files for analysis")
|
||||
code_content = self._prepare_file_content_for_prompt(
|
||||
request.files, continuation_id, "Code to test", max_tokens=remaining_tokens, reserve_tokens=2000
|
||||
)
|
||||
|
||||
if code_content:
|
||||
from utils.token_utils import estimate_tokens
|
||||
|
||||
code_tokens = estimate_tokens(code_content)
|
||||
logger.info(f"[TESTGEN] Code files embedded successfully: {code_tokens:,} tokens")
|
||||
else:
|
||||
logger.warning("[TESTGEN] No code content after file processing")
|
||||
|
||||
# Test generation is based on code analysis, no web search needed
|
||||
logger.debug("[TESTGEN] Building complete test generation prompt")
|
||||
|
||||
# Build the complete prompt
|
||||
prompt_parts = []
|
||||
|
||||
# Add system prompt
|
||||
prompt_parts.append(self.get_system_prompt())
|
||||
|
||||
# Add user context
|
||||
prompt_parts.append("=== USER CONTEXT ===")
|
||||
prompt_parts.append(request.prompt)
|
||||
prompt_parts.append("=== END CONTEXT ===")
|
||||
|
||||
# Add test examples if provided
|
||||
if test_examples_content:
|
||||
prompt_parts.append("\n=== TEST EXAMPLES FOR STYLE REFERENCE ===")
|
||||
if test_examples_note:
|
||||
prompt_parts.append(f"// {test_examples_note}")
|
||||
prompt_parts.append(test_examples_content)
|
||||
prompt_parts.append("=== END TEST EXAMPLES ===")
|
||||
|
||||
# Add main code to test
|
||||
prompt_parts.append("\n=== CODE TO TEST ===")
|
||||
prompt_parts.append(code_content)
|
||||
prompt_parts.append("=== END CODE ===")
|
||||
|
||||
# Add generation instructions
|
||||
prompt_parts.append(
|
||||
"\nPlease analyze the code and generate comprehensive tests following the multi-agent workflow specified in the system prompt."
|
||||
)
|
||||
if test_examples_content:
|
||||
prompt_parts.append(
|
||||
"Use the provided test examples as a reference for style, framework, and testing patterns."
|
||||
)
|
||||
|
||||
full_prompt = "\n".join(prompt_parts)
|
||||
|
||||
# Log final prompt statistics
|
||||
from utils.token_utils import estimate_tokens
|
||||
|
||||
total_tokens = estimate_tokens(full_prompt)
|
||||
logger.info(f"[TESTGEN] Complete prompt prepared: {total_tokens:,} tokens, {len(full_prompt):,} characters")
|
||||
|
||||
return full_prompt
|
||||
|
||||
def format_response(self, response: str, request: TestGenRequest, model_info: Optional[dict] = None) -> str:
|
||||
"""
|
||||
Format the test generation response.
|
||||
|
||||
Args:
|
||||
response: The raw test generation from the model
|
||||
request: The original request for context
|
||||
model_info: Optional dict with model metadata
|
||||
|
||||
Returns:
|
||||
str: Formatted response with next steps
|
||||
"""
|
||||
return f"""{response}
|
||||
|
||||
---
|
||||
|
||||
**Next Steps:**
|
||||
|
||||
1. **Review Generated Tests**: Check if the structure, coverage, and edge cases are valid and useful. Ensure they meet your requirements.
|
||||
Confirm the tests cover missing scenarios, follow project conventions, and can be safely added without duplication.
|
||||
|
||||
2. **Setup Test Environment**: Ensure the testing framework and dependencies identified are properly configured in your project.
|
||||
|
||||
3. **Run Initial Tests**: Execute the generated tests to verify they work correctly with your code.
|
||||
|
||||
4. **Customize as Needed**: Modify generated test code, add project-specific edge cases, refine or adjust test structure based on your specific needs if deemed necessary
|
||||
based on your existing knowledge of the code.
|
||||
|
||||
5. **Integrate with CI/CD**: Add the tests to your continuous integration pipeline to maintain code quality if this has already been setup and available.
|
||||
|
||||
6. Refine requirements and continue the conversation if additional coverage or improvements are needed.
|
||||
|
||||
Remember: Review the generated tests for completeness and adapt and integrate them to your specific project requirements and testing standards. Continue with your next step in implementation."""
|
||||
Reference in New Issue
Block a user