Improved model response handling to handle additional response statuses in future

Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier
Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
This commit is contained in:
Fahad
2025-06-14 18:43:56 +04:00
parent ec5fee4409
commit 442decba70
8 changed files with 383 additions and 31 deletions

View File

@@ -117,8 +117,6 @@ try:
# Also keep a size-based rotation as backup (100MB max per file)
# This prevents any single day's log from growing too large
from logging.handlers import RotatingFileHandler
size_handler = RotatingFileHandler(
"/tmp/mcp_server_overflow.log", maxBytes=100 * 1024 * 1024, backupCount=3 # 100MB
)

View File

@@ -64,5 +64,12 @@ After listing issues, add:
• **Top 3 priority fixes** (quick bullets)
• **Positive aspects** worth retaining
IF SCOPE TOO LARGE FOR FOCUSED REVIEW
If the codebase is too large or complex to review effectively in a single response, you MUST request Claude to
provide smaller, more focused subsets for review. Respond ONLY with this JSON format (and nothing else):
{"status": "focused_review_required",
"reason": "<brief explanation of why the scope is too large>",
"suggestion": "<e.g., 'Review authentication module (auth.py, login.py)' or 'Focus on data layer (models/)' or 'Review payment processing functionality'>"}
Remember: If required information is missing, use the clarification JSON above instead of guessing.
"""

View File

@@ -24,7 +24,8 @@ test idioms from the code snapshot provided.
that are directly involved (network, DB, file-system, IPC).
3. **Adversarial Thinker** enumerates realistic failures, boundary conditions, race conditions, and misuse patterns
that historically break similar systems.
4. **Risk Prioritizer** ranks findings by production impact and likelihood; discards speculative or out-of-scope cases.
4. **Risk Prioritizer** ranks findings by production impact and likelihood; discards speculative or
out-of-scope cases.
5. **Test Scaffolder** produces deterministic, isolated tests that follow the *project's* conventions (assert style,
fixture layout, naming, any mocking strategy, language and tooling etc).
@@ -41,6 +42,7 @@ pure functions).
- Surface concurrency hazards with stress or fuzz tests when the language/runtime supports them.
- Focus on realistic failure modes that actually occur in production
- Remain within scope of language, framework, project. Do not over-step. Do not add unnecessary dependencies.
- No bogus, fake tests that seemingly pass for no reason at all
EDGE-CASE TAXONOMY (REAL-WORLD, HIGH-VALUE)
- **Data Shape Issues**: `null` / `undefined`, zero-length, surrogate-pair emojis, malformed UTF-8, mixed EOLs.
@@ -93,8 +95,27 @@ it but do not approach or offer refactoring ideas.
DELIVERABLE
Return only the artefacts (analysis summary, coverage plan, and generated tests) that fit the detected framework
and code / project layout.
No extra commentary, no generic boilerplate.
Must comment and document logic, test reason / hypothesis in delivered code
Group related tests but separate them into files where this is the convention and most suitable for the project at hand.
Prefer adding tests to an existing test file if one was provided and grouping these tests makes sense.
Must document logic, test reason/hypothesis in delivered code.
MUST NOT add any additional information, introduction, or summaries around generated code. Deliver only the essentials
relevant to the test.
IF ADDITIONAL TEST CASES ARE REQUIRED
If you determine that comprehensive test coverage requires generating multiple test files or a large number of
test cases for each file that would risk exceeding context limits, you MUST follow this structured approach:
1. **Generate Essential Tests First**: Create only the most critical and high-impact tests (typically 3-5 key test
cases covering the most important paths and failure modes). Clearly state the file these tests belong to, even if
these should be added to an existing test file.
2. **Request Continuation**: You MUST your message with the following added in JSON format (and nothing
more after this). This will list the pending tests and their respective files (even if they belong to the same or
an existing test file) as this will be used for the next follow-up test generation request.
{"status": "more_tests_required",
"pending_tests": "test_name (file_name), another_test_name (file_name)"}
This approach ensures comprehensive test coverage while maintaining quality and avoiding context overflow.
Remember: your value is catching the hard bugs—not inflating coverage numbers.
"""

View File

@@ -0,0 +1,263 @@
"""
Tests for special status parsing in the base tool
"""
from pydantic import BaseModel
from tools.base import BaseTool
class MockRequest(BaseModel):
"""Mock request for testing"""
test_field: str = "test"
class TestTool(BaseTool):
"""Minimal test tool implementation"""
def get_name(self) -> str:
return "test_tool"
def get_description(self) -> str:
return "Test tool for special status parsing"
def get_input_schema(self) -> dict:
return {"type": "object", "properties": {}}
def get_system_prompt(self) -> str:
return "Test prompt"
def get_request_model(self):
return MockRequest
async def prepare_prompt(self, request) -> str:
return "test prompt"
class TestSpecialStatusParsing:
"""Test special status parsing functionality"""
def setup_method(self):
"""Setup test tool and request"""
self.tool = TestTool()
self.request = MockRequest()
def test_full_codereview_required_parsing(self):
"""Test parsing of full_codereview_required status"""
response_json = '{"status": "full_codereview_required", "reason": "Codebase too large for quick review"}'
result = self.tool._parse_response(response_json, self.request)
assert result.status == "full_codereview_required"
assert result.content_type == "json"
assert "reason" in result.content
def test_full_codereview_required_without_reason(self):
"""Test parsing of full_codereview_required without optional reason"""
response_json = '{"status": "full_codereview_required"}'
result = self.tool._parse_response(response_json, self.request)
assert result.status == "full_codereview_required"
assert result.content_type == "json"
def test_test_sample_needed_parsing(self):
"""Test parsing of test_sample_needed status"""
response_json = '{"status": "test_sample_needed", "reason": "Cannot determine test framework"}'
result = self.tool._parse_response(response_json, self.request)
assert result.status == "test_sample_needed"
assert result.content_type == "json"
assert "reason" in result.content
def test_more_tests_required_parsing(self):
"""Test parsing of more_tests_required status"""
response_json = (
'{"status": "more_tests_required", "pending_tests": "test_auth (test_auth.py), test_login (test_user.py)"}'
)
result = self.tool._parse_response(response_json, self.request)
assert result.status == "more_tests_required"
assert result.content_type == "json"
assert "pending_tests" in result.content
def test_clarification_required_still_works(self):
"""Test that existing clarification_required still works"""
response_json = (
'{"status": "clarification_required", "question": "What files need review?", "files_needed": ["src/"]}'
)
result = self.tool._parse_response(response_json, self.request)
assert result.status == "clarification_required"
assert result.content_type == "json"
assert "question" in result.content
def test_invalid_status_payload(self):
"""Test that invalid payloads for known statuses are handled gracefully"""
# Missing required field 'reason' for test_sample_needed
response_json = '{"status": "test_sample_needed"}'
result = self.tool._parse_response(response_json, self.request)
# Should fall back to normal processing since validation failed
assert result.status in ["success", "continuation_available"]
def test_unknown_status_ignored(self):
"""Test that unknown status types are ignored and treated as normal responses"""
response_json = '{"status": "unknown_status", "data": "some data"}'
result = self.tool._parse_response(response_json, self.request)
# Should be treated as normal response
assert result.status in ["success", "continuation_available"]
def test_normal_response_unchanged(self):
"""Test that normal text responses are handled normally"""
response_text = "This is a normal response with some analysis."
result = self.tool._parse_response(response_text, self.request)
# Should be processed as normal response
assert result.status in ["success", "continuation_available"]
assert response_text in result.content
def test_malformed_json_handled(self):
"""Test that malformed JSON is handled gracefully"""
response_text = '{"status": "clarification_required", "question": "incomplete json'
result = self.tool._parse_response(response_text, self.request)
# Should fall back to normal processing
assert result.status in ["success", "continuation_available"]
def test_metadata_preserved(self):
"""Test that model metadata is preserved in special status responses"""
response_json = '{"status": "full_codereview_required", "reason": "Too complex"}'
model_info = {"model_name": "test-model", "provider": "test-provider"}
result = self.tool._parse_response(response_json, self.request, model_info)
assert result.status == "full_codereview_required"
assert result.metadata["model_used"] == "test-model"
assert "original_request" in result.metadata
def test_more_tests_required_detailed(self):
"""Test more_tests_required with detailed pending_tests parameter"""
# Test the exact format expected by testgen prompt
pending_tests = "test_authentication_edge_cases (test_auth.py), test_password_validation_complex (test_auth.py), test_user_registration_flow (test_user.py)"
response_json = f'{{"status": "more_tests_required", "pending_tests": "{pending_tests}"}}'
result = self.tool._parse_response(response_json, self.request)
assert result.status == "more_tests_required"
assert result.content_type == "json"
# Verify the content contains the validated, parsed data
import json
parsed_content = json.loads(result.content)
assert parsed_content["status"] == "more_tests_required"
assert parsed_content["pending_tests"] == pending_tests
# Verify Claude would receive the pending_tests parameter correctly
assert "test_authentication_edge_cases (test_auth.py)" in parsed_content["pending_tests"]
assert "test_password_validation_complex (test_auth.py)" in parsed_content["pending_tests"]
assert "test_user_registration_flow (test_user.py)" in parsed_content["pending_tests"]
def test_more_tests_required_missing_pending_tests(self):
"""Test that more_tests_required without required pending_tests field fails validation"""
response_json = '{"status": "more_tests_required"}'
result = self.tool._parse_response(response_json, self.request)
# Should fall back to normal processing since validation failed (missing required field)
assert result.status in ["success", "continuation_available"]
assert result.content_type != "json"
def test_test_sample_needed_missing_reason(self):
"""Test that test_sample_needed without required reason field fails validation"""
response_json = '{"status": "test_sample_needed"}'
result = self.tool._parse_response(response_json, self.request)
# Should fall back to normal processing since validation failed (missing required field)
assert result.status in ["success", "continuation_available"]
assert result.content_type != "json"
def test_special_status_json_format_preserved(self):
"""Test that special status responses preserve exact JSON format for Claude"""
test_cases = [
{
"input": '{"status": "clarification_required", "question": "What framework to use?", "files_needed": ["tests/"]}',
"expected_fields": ["status", "question", "files_needed"],
},
{
"input": '{"status": "full_codereview_required", "reason": "Codebase too large"}',
"expected_fields": ["status", "reason"],
},
{
"input": '{"status": "test_sample_needed", "reason": "Cannot determine test framework"}',
"expected_fields": ["status", "reason"],
},
{
"input": '{"status": "more_tests_required", "pending_tests": "test_auth (test_auth.py), test_login (test_user.py)"}',
"expected_fields": ["status", "pending_tests"],
},
]
for test_case in test_cases:
result = self.tool._parse_response(test_case["input"], self.request)
# Verify status is correctly detected
import json
input_data = json.loads(test_case["input"])
assert result.status == input_data["status"]
assert result.content_type == "json"
# Verify all expected fields are preserved in the response
parsed_content = json.loads(result.content)
for field in test_case["expected_fields"]:
assert field in parsed_content, f"Field {field} missing from {input_data['status']} response"
assert (
parsed_content[field] == input_data[field]
), f"Field {field} value mismatch in {input_data['status']} response"
def test_focused_review_required_parsing(self):
"""Test that focused_review_required status is parsed correctly"""
import json
json_response = {
"status": "focused_review_required",
"reason": "Codebase too large for single review",
"suggestion": "Review authentication module (auth.py, login.py)",
}
result = self.tool._parse_response(json.dumps(json_response), self.request)
assert result.status == "focused_review_required"
assert result.content_type == "json"
parsed_content = json.loads(result.content)
assert parsed_content["status"] == "focused_review_required"
assert parsed_content["reason"] == "Codebase too large for single review"
assert parsed_content["suggestion"] == "Review authentication module (auth.py, login.py)"
def test_focused_review_required_missing_suggestion(self):
"""Test that focused_review_required fails validation without suggestion"""
import json
json_response = {
"status": "focused_review_required",
"reason": "Codebase too large",
# Missing required suggestion field
}
result = self.tool._parse_response(json.dumps(json_response), self.request)
# Should fall back to normal response since validation failed
assert result.status == "success"
assert result.content_type == "text"

View File

@@ -284,7 +284,7 @@ class TestComprehensive(unittest.TestCase):
# Check formatting includes new action-oriented next steps
assert raw_response in formatted
assert "IMMEDIATE ACTION REQUIRED" in formatted
assert "IMMEDIATE NEXT ACTION" in formatted
assert "ULTRATHINK" in formatted
assert "CREATE" in formatted
assert "VALIDATE BY EXECUTION" in formatted

View File

@@ -37,7 +37,7 @@ from utils.conversation_memory import (
)
from utils.file_utils import read_file_content, read_files, translate_path_for_environment
from .models import ClarificationRequest, ContinuationOffer, ToolOutput
from .models import SPECIAL_STATUS_MODELS, ContinuationOffer, ToolOutput
logger = logging.getLogger(__name__)
@@ -1183,31 +1183,43 @@ When recommending searches, be specific about what information you need and why
logger = logging.getLogger(f"tools.{self.name}")
try:
# Try to parse as JSON to check for clarification requests
# Try to parse as JSON to check for special status requests
potential_json = json.loads(raw_text.strip())
if isinstance(potential_json, dict) and potential_json.get("status") == "clarification_required":
# Validate the clarification request structure
clarification = ClarificationRequest(**potential_json)
logger.debug(f"{self.name} tool requested clarification: {clarification.question}")
# Extract model information for metadata
metadata = {
"original_request": (request.model_dump() if hasattr(request, "model_dump") else str(request))
}
if model_info:
model_name = model_info.get("model_name")
if model_name:
metadata["model_used"] = model_name
if isinstance(potential_json, dict) and "status" in potential_json:
status_key = potential_json.get("status")
status_model = SPECIAL_STATUS_MODELS.get(status_key)
return ToolOutput(
status="clarification_required",
content=clarification.model_dump_json(),
content_type="json",
metadata=metadata,
)
if status_model:
try:
# Use Pydantic for robust validation of the special status
parsed_status = status_model.model_validate(potential_json)
logger.debug(f"{self.name} tool detected special status: {status_key}")
# Extract model information for metadata
metadata = {
"original_request": (
request.model_dump() if hasattr(request, "model_dump") else str(request)
)
}
if model_info:
model_name = model_info.get("model_name")
if model_name:
metadata["model_used"] = model_name
return ToolOutput(
status=status_key,
content=parsed_status.model_dump_json(),
content_type="json",
metadata=metadata,
)
except Exception as e:
# Invalid payload for known status, log warning and continue as normal response
logger.warning(f"Invalid {status_key} payload: {e}")
except (json.JSONDecodeError, ValueError, TypeError):
# Not a JSON clarification request, treat as normal response
# Not a JSON special status request, treat as normal response
pass
# Normal text response - format using tool-specific formatting

View File

@@ -36,6 +36,10 @@ class ToolOutput(BaseModel):
"success",
"error",
"clarification_required",
"full_codereview_required",
"focused_review_required",
"test_sample_needed",
"more_tests_required",
"resend_prompt",
"continuation_available",
] = "success"
@@ -50,6 +54,7 @@ class ToolOutput(BaseModel):
class ClarificationRequest(BaseModel):
"""Request for additional context or clarification"""
status: Literal["clarification_required"] = "clarification_required"
question: str = Field(..., description="Question to ask Claude for more context")
files_needed: Optional[list[str]] = Field(
default_factory=list, description="Specific files that are needed for analysis"
@@ -60,6 +65,48 @@ class ClarificationRequest(BaseModel):
)
class FullCodereviewRequired(BaseModel):
"""Request for full code review when scope is too large for quick review"""
status: Literal["full_codereview_required"] = "full_codereview_required"
important: Optional[str] = Field(None, description="Important message about escalation")
reason: Optional[str] = Field(None, description="Reason why full review is needed")
class FocusedReviewRequired(BaseModel):
"""Request for Claude to provide smaller, focused subsets of code for review"""
status: Literal["focused_review_required"] = "focused_review_required"
reason: str = Field(..., description="Why the current scope is too large for effective review")
suggestion: str = Field(
..., description="Suggested approach for breaking down the review into smaller, focused parts"
)
class TestSampleNeeded(BaseModel):
"""Request for additional test samples to determine testing framework"""
status: Literal["test_sample_needed"] = "test_sample_needed"
reason: str = Field(..., description="Reason why additional test samples are required")
class MoreTestsRequired(BaseModel):
"""Request for continuation to generate additional tests"""
status: Literal["more_tests_required"] = "more_tests_required"
pending_tests: str = Field(..., description="List of pending tests to be generated")
# Registry mapping status strings to their corresponding Pydantic models
SPECIAL_STATUS_MODELS = {
"clarification_required": ClarificationRequest,
"full_codereview_required": FullCodereviewRequired,
"focused_review_required": FocusedReviewRequired,
"test_sample_needed": TestSampleNeeded,
"more_tests_required": MoreTestsRequired,
}
class DiagnosticHypothesis(BaseModel):
"""A debugging hypothesis with context and next steps"""

View File

@@ -117,7 +117,10 @@ class TestGenTool(BaseTool):
},
"continuation_id": {
"type": "string",
"description": "Thread continuation ID for multi-turn conversations. Can be used to continue conversations across different tools. Only provide this if continuing a previous conversation thread.",
"description": (
"Thread continuation ID for multi-turn conversations. Can be used to continue conversations "
"across different tools. Only provide this if continuing a previous conversation thread."
),
},
},
"required": ["files", "prompt"] + (["model"] if self.is_effective_auto_mode() else []),
@@ -436,7 +439,7 @@ class TestGenTool(BaseTool):
---
# IMMEDIATE ACTION REQUIRED
# IMMEDIATE NEXT ACTION
Claude, you are now in EXECUTION MODE. Take immediate action:
@@ -472,5 +475,6 @@ After creating each test file, show the user:
## Step 5: MOVE TO NEXT ACTION
Once tests are confirmed working, immediately proceed to the next logical step for the project.
**CRITICAL**: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work.
Take full ownership of the testing implementation and move to your next work."""
**CRITICAL**: Do NOT stop after generating - you MUST create, validate, run, and confirm the tests work. Take full
ownership of the testing implementation and move to your next work. If you were supplied a more_work_required request
in the response above, you MUST honor it."""