Improved model response handling to handle additional response statuses in future
Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
This commit is contained in:
@@ -117,8 +117,6 @@ try:
|
||||
|
||||
# Also keep a size-based rotation as backup (100MB max per file)
|
||||
# This prevents any single day's log from growing too large
|
||||
from logging.handlers import RotatingFileHandler
|
||||
|
||||
size_handler = RotatingFileHandler(
|
||||
"/tmp/mcp_server_overflow.log", maxBytes=100 * 1024 * 1024, backupCount=3 # 100MB
|
||||
)
|
||||
|
||||
@@ -64,5 +64,12 @@ After listing issues, add:
|
||||
• **Top 3 priority fixes** (quick bullets)
|
||||
• **Positive aspects** worth retaining
|
||||
|
||||
IF SCOPE TOO LARGE FOR FOCUSED REVIEW
|
||||
If the codebase is too large or complex to review effectively in a single response, you MUST request Claude to
|
||||
provide smaller, more focused subsets for review. Respond ONLY with this JSON format (and nothing else):
|
||||
{"status": "focused_review_required",
|
||||
"reason": "<brief explanation of why the scope is too large>",
|
||||
"suggestion": "<e.g., 'Review authentication module (auth.py, login.py)' or 'Focus on data layer (models/)' or 'Review payment processing functionality'>"}
|
||||
|
||||
Remember: If required information is missing, use the clarification JSON above instead of guessing.
|
||||
"""
|
||||
|
||||
@@ -24,7 +24,8 @@ test idioms from the code snapshot provided.
|
||||
that are directly involved (network, DB, file-system, IPC).
|
||||
3. **Adversarial Thinker** – enumerates realistic failures, boundary conditions, race conditions, and misuse patterns
|
||||
that historically break similar systems.
|
||||
4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or out-of-scope cases.
|
||||
4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or
|
||||
out-of-scope cases.
|
||||
5. **Test Scaffolder** – produces deterministic, isolated tests that follow the *project's* conventions (assert style,
|
||||
fixture layout, naming, any mocking strategy, language and tooling etc).
|
||||
|
||||
@@ -41,6 +42,7 @@ pure functions).
|
||||
- Surface concurrency hazards with stress or fuzz tests when the language/runtime supports them.
|
||||
- Focus on realistic failure modes that actually occur in production
|
||||
- Remain within scope of language, framework, project. Do not over-step. Do not add unnecessary dependencies.
|
||||
- No bogus, fake tests that seemingly pass for no reason at all
|
||||
|
||||
EDGE-CASE TAXONOMY (REAL-WORLD, HIGH-VALUE)
|
||||
- **Data Shape Issues**: `null` / `undefined`, zero-length, surrogate-pair emojis, malformed UTF-8, mixed EOLs.
|
||||
@@ -93,8 +95,27 @@ it but do not approach or offer refactoring ideas.
|
||||
DELIVERABLE
|
||||
Return only the artefacts (analysis summary, coverage plan, and generated tests) that fit the detected framework
|
||||
and code / project layout.
|
||||
No extra commentary, no generic boilerplate.
|
||||
Must comment and document logic, test reason / hypothesis in delivered code
|
||||
Group related tests but separate them into files where this is the convention and most suitable for the project at hand.
|
||||
Prefer adding tests to an existing test file if one was provided and grouping these tests makes sense.
|
||||
Must document logic, test reason/hypothesis in delivered code.
|
||||
MUST NOT add any additional information, introduction, or summaries around generated code. Deliver only the essentials
|
||||
relevant to the test.
|
||||
|
||||
IF ADDITIONAL TEST CASES ARE REQUIRED
|
||||
If you determine that comprehensive test coverage requires generating multiple test files or a large number of
|
||||
test cases for each file that would risk exceeding context limits, you MUST follow this structured approach:
|
||||
|
||||
1. **Generate Essential Tests First**: Create only the most critical and high-impact tests (typically 3-5 key test
|
||||
cases covering the most important paths and failure modes). Clearly state the file these tests belong to, even if
|
||||
these should be added to an existing test file.
|
||||
|
||||
2. **Request Continuation**: You MUST your message with the following added in JSON format (and nothing
|
||||
more after this). This will list the pending tests and their respective files (even if they belong to the same or
|
||||
an existing test file) as this will be used for the next follow-up test generation request.
|
||||
{"status": "more_tests_required",
|
||||
"pending_tests": "test_name (file_name), another_test_name (file_name)"}
|
||||
|
||||
This approach ensures comprehensive test coverage while maintaining quality and avoiding context overflow.
|
||||
|
||||
Remember: your value is catching the hard bugs—not inflating coverage numbers.
|
||||
"""
|
||||
|
||||
263
tests/test_special_status_parsing.py
Normal file
263
tests/test_special_status_parsing.py
Normal file
@@ -0,0 +1,263 @@
|
||||
"""
|
||||
Tests for special status parsing in the base tool
|
||||
"""
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from tools.base import BaseTool
|
||||
|
||||
|
||||
class MockRequest(BaseModel):
|
||||
"""Mock request for testing"""
|
||||
|
||||
test_field: str = "test"
|
||||
|
||||
|
||||
class TestTool(BaseTool):
|
||||
"""Minimal test tool implementation"""
|
||||
|
||||
def get_name(self) -> str:
|
||||
return "test_tool"
|
||||
|
||||
def get_description(self) -> str:
|
||||
return "Test tool for special status parsing"
|
||||
|
||||
def get_input_schema(self) -> dict:
|
||||
return {"type": "object", "properties": {}}
|
||||
|
||||
def get_system_prompt(self) -> str:
|
||||
return "Test prompt"
|
||||
|
||||
def get_request_model(self):
|
||||
return MockRequest
|
||||
|
||||
async def prepare_prompt(self, request) -> str:
|
||||
return "test prompt"
|
||||
|
||||
|
||||
class TestSpecialStatusParsing:
|
||||
"""Test special status parsing functionality"""
|
||||
|
||||
def setup_method(self):
|
||||
"""Setup test tool and request"""
|
||||
self.tool = TestTool()
|
||||
self.request = MockRequest()
|
||||
|
||||
def test_full_codereview_required_parsing(self):
|
||||
"""Test parsing of full_codereview_required status"""
|
||||
response_json = '{"status": "full_codereview_required", "reason": "Codebase too large for quick review"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "full_codereview_required"
|
||||
assert result.content_type == "json"
|
||||
assert "reason" in result.content
|
||||
|
||||
def test_full_codereview_required_without_reason(self):
|
||||
"""Test parsing of full_codereview_required without optional reason"""
|
||||
response_json = '{"status": "full_codereview_required"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "full_codereview_required"
|
||||
assert result.content_type == "json"
|
||||
|
||||
def test_test_sample_needed_parsing(self):
|
||||
"""Test parsing of test_sample_needed status"""
|
||||
response_json = '{"status": "test_sample_needed", "reason": "Cannot determine test framework"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "test_sample_needed"
|
||||
assert result.content_type == "json"
|
||||
assert "reason" in result.content
|
||||
|
||||
def test_more_tests_required_parsing(self):
|
||||
"""Test parsing of more_tests_required status"""
|
||||
response_json = (
|
||||
'{"status": "more_tests_required", "pending_tests": "test_auth (test_auth.py), test_login (test_user.py)"}'
|
||||
)
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "more_tests_required"
|
||||
assert result.content_type == "json"
|
||||
assert "pending_tests" in result.content
|
||||
|
||||
def test_clarification_required_still_works(self):
|
||||
"""Test that existing clarification_required still works"""
|
||||
response_json = (
|
||||
'{"status": "clarification_required", "question": "What files need review?", "files_needed": ["src/"]}'
|
||||
)
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "clarification_required"
|
||||
assert result.content_type == "json"
|
||||
assert "question" in result.content
|
||||
|
||||
def test_invalid_status_payload(self):
|
||||
"""Test that invalid payloads for known statuses are handled gracefully"""
|
||||
# Missing required field 'reason' for test_sample_needed
|
||||
response_json = '{"status": "test_sample_needed"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
# Should fall back to normal processing since validation failed
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
|
||||
def test_unknown_status_ignored(self):
|
||||
"""Test that unknown status types are ignored and treated as normal responses"""
|
||||
response_json = '{"status": "unknown_status", "data": "some data"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
# Should be treated as normal response
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
|
||||
def test_normal_response_unchanged(self):
|
||||
"""Test that normal text responses are handled normally"""
|
||||
response_text = "This is a normal response with some analysis."
|
||||
|
||||
result = self.tool._parse_response(response_text, self.request)
|
||||
|
||||
# Should be processed as normal response
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
assert response_text in result.content
|
||||
|
||||
def test_malformed_json_handled(self):
|
||||
"""Test that malformed JSON is handled gracefully"""
|
||||
response_text = '{"status": "clarification_required", "question": "incomplete json'
|
||||
|
||||
result = self.tool._parse_response(response_text, self.request)
|
||||
|
||||
# Should fall back to normal processing
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
|
||||
def test_metadata_preserved(self):
|
||||
"""Test that model metadata is preserved in special status responses"""
|
||||
response_json = '{"status": "full_codereview_required", "reason": "Too complex"}'
|
||||
model_info = {"model_name": "test-model", "provider": "test-provider"}
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request, model_info)
|
||||
|
||||
assert result.status == "full_codereview_required"
|
||||
assert result.metadata["model_used"] == "test-model"
|
||||
assert "original_request" in result.metadata
|
||||
|
||||
def test_more_tests_required_detailed(self):
|
||||
"""Test more_tests_required with detailed pending_tests parameter"""
|
||||
# Test the exact format expected by testgen prompt
|
||||
pending_tests = "test_authentication_edge_cases (test_auth.py), test_password_validation_complex (test_auth.py), test_user_registration_flow (test_user.py)"
|
||||
response_json = f'{{"status": "more_tests_required", "pending_tests": "{pending_tests}"}}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
assert result.status == "more_tests_required"
|
||||
assert result.content_type == "json"
|
||||
|
||||
# Verify the content contains the validated, parsed data
|
||||
import json
|
||||
|
||||
parsed_content = json.loads(result.content)
|
||||
assert parsed_content["status"] == "more_tests_required"
|
||||
assert parsed_content["pending_tests"] == pending_tests
|
||||
|
||||
# Verify Claude would receive the pending_tests parameter correctly
|
||||
assert "test_authentication_edge_cases (test_auth.py)" in parsed_content["pending_tests"]
|
||||
assert "test_password_validation_complex (test_auth.py)" in parsed_content["pending_tests"]
|
||||
assert "test_user_registration_flow (test_user.py)" in parsed_content["pending_tests"]
|
||||
|
||||
def test_more_tests_required_missing_pending_tests(self):
|
||||
"""Test that more_tests_required without required pending_tests field fails validation"""
|
||||
response_json = '{"status": "more_tests_required"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
# Should fall back to normal processing since validation failed (missing required field)
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
assert result.content_type != "json"
|
||||
|
||||
def test_test_sample_needed_missing_reason(self):
|
||||
"""Test that test_sample_needed without required reason field fails validation"""
|
||||
response_json = '{"status": "test_sample_needed"}'
|
||||
|
||||
result = self.tool._parse_response(response_json, self.request)
|
||||
|
||||
# Should fall back to normal processing since validation failed (missing required field)
|
||||
assert result.status in ["success", "continuation_available"]
|
||||
assert result.content_type != "json"
|
||||
|
||||
def test_special_status_json_format_preserved(self):
|
||||
"""Test that special status responses preserve exact JSON format for Claude"""
|
||||
test_cases = [
|
||||
{
|
||||
"input": '{"status": "clarification_required", "question": "What framework to use?", "files_needed": ["tests/"]}',
|
||||
"expected_fields": ["status", "question", "files_needed"],
|
||||
},
|
||||
{
|
||||
"input": '{"status": "full_codereview_required", "reason": "Codebase too large"}',
|
||||
"expected_fields": ["status", "reason"],
|
||||
},
|
||||
{
|
||||
"input": '{"status": "test_sample_needed", "reason": "Cannot determine test framework"}',
|
||||
"expected_fields": ["status", "reason"],
|
||||
},
|
||||
{
|
||||
"input": '{"status": "more_tests_required", "pending_tests": "test_auth (test_auth.py), test_login (test_user.py)"}',
|
||||
"expected_fields": ["status", "pending_tests"],
|
||||
},
|
||||
]
|
||||
|
||||
for test_case in test_cases:
|
||||
result = self.tool._parse_response(test_case["input"], self.request)
|
||||
|
||||
# Verify status is correctly detected
|
||||
import json
|
||||
|
||||
input_data = json.loads(test_case["input"])
|
||||
assert result.status == input_data["status"]
|
||||
assert result.content_type == "json"
|
||||
|
||||
# Verify all expected fields are preserved in the response
|
||||
parsed_content = json.loads(result.content)
|
||||
for field in test_case["expected_fields"]:
|
||||
assert field in parsed_content, f"Field {field} missing from {input_data['status']} response"
|
||||
assert (
|
||||
parsed_content[field] == input_data[field]
|
||||
), f"Field {field} value mismatch in {input_data['status']} response"
|
||||
|
||||
def test_focused_review_required_parsing(self):
|
||||
"""Test that focused_review_required status is parsed correctly"""
|
||||
import json
|
||||
|
||||
json_response = {
|
||||
"status": "focused_review_required",
|
||||
"reason": "Codebase too large for single review",
|
||||
"suggestion": "Review authentication module (auth.py, login.py)",
|
||||
}
|
||||
|
||||
result = self.tool._parse_response(json.dumps(json_response), self.request)
|
||||
|
||||
assert result.status == "focused_review_required"
|
||||
assert result.content_type == "json"
|
||||
parsed_content = json.loads(result.content)
|
||||
assert parsed_content["status"] == "focused_review_required"
|
||||
assert parsed_content["reason"] == "Codebase too large for single review"
|
||||
assert parsed_content["suggestion"] == "Review authentication module (auth.py, login.py)"
|
||||
|
||||
def test_focused_review_required_missing_suggestion(self):
|
||||
"""Test that focused_review_required fails validation without suggestion"""
|
||||
import json
|
||||
|
||||
json_response = {
|
||||
"status": "focused_review_required",
|
||||
"reason": "Codebase too large",
|
||||
# Missing required suggestion field
|
||||
}
|
||||
|
||||
result = self.tool._parse_response(json.dumps(json_response), self.request)
|
||||
|
||||
# Should fall back to normal response since validation failed
|
||||
assert result.status == "success"
|
||||
assert result.content_type == "text"
|
||||
@@ -284,7 +284,7 @@ class TestComprehensive(unittest.TestCase):
|
||||
|
||||
# Check formatting includes new action-oriented next steps
|
||||
assert raw_response in formatted
|
||||
assert "IMMEDIATE ACTION REQUIRED" in formatted
|
||||
assert "IMMEDIATE NEXT ACTION" in formatted
|
||||
assert "ULTRATHINK" in formatted
|
||||
assert "CREATE" in formatted
|
||||
assert "VALIDATE BY EXECUTION" in formatted
|
||||
|
||||
@@ -37,7 +37,7 @@ from utils.conversation_memory import (
|
||||
)
|
||||
from utils.file_utils import read_file_content, read_files, translate_path_for_environment
|
||||
|
||||
from .models import ClarificationRequest, ContinuationOffer, ToolOutput
|
||||
from .models import SPECIAL_STATUS_MODELS, ContinuationOffer, ToolOutput
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -1183,31 +1183,43 @@ When recommending searches, be specific about what information you need and why
|
||||
logger = logging.getLogger(f"tools.{self.name}")
|
||||
|
||||
try:
|
||||
# Try to parse as JSON to check for clarification requests
|
||||
# Try to parse as JSON to check for special status requests
|
||||
potential_json = json.loads(raw_text.strip())
|
||||
|
||||
if isinstance(potential_json, dict) and potential_json.get("status") == "clarification_required":
|
||||
# Validate the clarification request structure
|
||||
clarification = ClarificationRequest(**potential_json)
|
||||
logger.debug(f"{self.name} tool requested clarification: {clarification.question}")
|
||||
# Extract model information for metadata
|
||||
metadata = {
|
||||
"original_request": (request.model_dump() if hasattr(request, "model_dump") else str(request))
|
||||
}
|
||||
if model_info:
|
||||
model_name = model_info.get("model_name")
|
||||
if model_name:
|
||||
metadata["model_used"] = model_name
|
||||
if isinstance(potential_json, dict) and "status" in potential_json:
|
||||
status_key = potential_json.get("status")
|
||||
status_model = SPECIAL_STATUS_MODELS.get(status_key)
|
||||
|
||||
return ToolOutput(
|
||||
status="clarification_required",
|
||||
content=clarification.model_dump_json(),
|
||||
content_type="json",
|
||||
metadata=metadata,
|
||||
)
|
||||
if status_model:
|
||||
try:
|
||||
# Use Pydantic for robust validation of the special status
|
||||
parsed_status = status_model.model_validate(potential_json)
|
||||
logger.debug(f"{self.name} tool detected special status: {status_key}")
|
||||
|
||||
# Extract model information for metadata
|
||||
metadata = {
|
||||
"original_request": (
|
||||
request.model_dump() if hasattr(request, "model_dump") else str(request)
|
||||
)
|
||||
}
|
||||
if model_info:
|
||||
model_name = model_info.get("model_name")
|
||||
if model_name:
|
||||
metadata["model_used"] = model_name
|
||||
|
||||
return ToolOutput(
|
||||
status=status_key,
|
||||
content=parsed_status.model_dump_json(),
|
||||
content_type="json",
|
||||
metadata=metadata,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Invalid payload for known status, log warning and continue as normal response
|
||||
logger.warning(f"Invalid {status_key} payload: {e}")
|
||||
|
||||
except (json.JSONDecodeError, ValueError, TypeError):
|
||||
# Not a JSON clarification request, treat as normal response
|
||||
# Not a JSON special status request, treat as normal response
|
||||
pass
|
||||
|
||||
# Normal text response - format using tool-specific formatting
|
||||
|
||||
@@ -36,6 +36,10 @@ class ToolOutput(BaseModel):
|
||||
"success",
|
||||
"error",
|
||||
"clarification_required",
|
||||
"full_codereview_required",
|
||||
"focused_review_required",
|
||||
"test_sample_needed",
|
||||
"more_tests_required",
|
||||
"resend_prompt",
|
||||
"continuation_available",
|
||||
] = "success"
|
||||
@@ -50,6 +54,7 @@ class ToolOutput(BaseModel):
|
||||
class ClarificationRequest(BaseModel):
|
||||
"""Request for additional context or clarification"""
|
||||
|
||||
status: Literal["clarification_required"] = "clarification_required"
|
||||
question: str = Field(..., description="Question to ask Claude for more context")
|
||||
files_needed: Optional[list[str]] = Field(
|
||||
default_factory=list, description="Specific files that are needed for analysis"
|
||||
@@ -60,6 +65,48 @@ class ClarificationRequest(BaseModel):
|
||||
)
|
||||
|
||||
|
||||
class FullCodereviewRequired(BaseModel):
|
||||
"""Request for full code review when scope is too large for quick review"""
|
||||
|
||||
status: Literal["full_codereview_required"] = "full_codereview_required"
|
||||
important: Optional[str] = Field(None, description="Important message about escalation")
|
||||
reason: Optional[str] = Field(None, description="Reason why full review is needed")
|
||||
|
||||
|
||||
class FocusedReviewRequired(BaseModel):
|
||||
"""Request for Claude to provide smaller, focused subsets of code for review"""
|
||||
|
||||
status: Literal["focused_review_required"] = "focused_review_required"
|
||||
reason: str = Field(..., description="Why the current scope is too large for effective review")
|
||||
suggestion: str = Field(
|
||||
..., description="Suggested approach for breaking down the review into smaller, focused parts"
|
||||
)
|
||||
|
||||
|
||||
class TestSampleNeeded(BaseModel):
|
||||
"""Request for additional test samples to determine testing framework"""
|
||||
|
||||
status: Literal["test_sample_needed"] = "test_sample_needed"
|
||||
reason: str = Field(..., description="Reason why additional test samples are required")
|
||||
|
||||
|
||||
class MoreTestsRequired(BaseModel):
|
||||
"""Request for continuation to generate additional tests"""
|
||||
|
||||
status: Literal["more_tests_required"] = "more_tests_required"
|
||||
pending_tests: str = Field(..., description="List of pending tests to be generated")
|
||||
|
||||
|
||||
# Registry mapping status strings to their corresponding Pydantic models
|
||||
SPECIAL_STATUS_MODELS = {
|
||||
"clarification_required": ClarificationRequest,
|
||||
"full_codereview_required": FullCodereviewRequired,
|
||||
"focused_review_required": FocusedReviewRequired,
|
||||
"test_sample_needed": TestSampleNeeded,
|
||||
"more_tests_required": MoreTestsRequired,
|
||||
}
|
||||
|
||||
|
||||
class DiagnosticHypothesis(BaseModel):
|
||||
"""A debugging hypothesis with context and next steps"""
|
||||
|
||||
|
||||
@@ -117,7 +117,10 @@ class TestGenTool(BaseTool):
|
||||
},
|
||||
"continuation_id": {
|
||||
"type": "string",
|
||||
"description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.",
|
||||
"description": (
|
||||
"Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
|
||||
"across different tools. Only provide this if continuing a previous conversation thread."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
|
||||
@@ -436,7 +439,7 @@ class TestGenTool(BaseTool):
|
||||
|
||||
---
|
||||
|
||||
# IMMEDIATE ACTION REQUIRED
|
||||
# IMMEDIATE NEXT ACTION
|
||||
|
||||
Claude, you are now in EXECUTION MODE. Take immediate action:
|
||||
|
||||
@@ -472,5 +475,6 @@ After creating each test file, show the user:
|
||||
## Step 5: MOVE TO NEXT ACTION
|
||||
Once tests are confirmed working, immediately proceed to the next logical step for the project.
|
||||
|
||||
**CRITICAL**: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work.
|
||||
Take full ownership of the testing implementation and move to your next work."""
|
||||
**CRITICAL**: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work. Take full
|
||||
ownership of the testing implementation and move to your next work. If you were supplied a more_work_required request
|
||||
in the response above, you MUST honor it."""
|
||||
|
||||
Reference in New Issue
Block a user