Improved model response handling to handle additional response statuses in future

Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
2025-06-14 18:43:56 +04:00
parent ec5fee4409
commit 442decba70
8 changed files with 383 additions and 31 deletions
--- a/systemprompts/codereview_prompt.py
+++ b/systemprompts/codereview_prompt.py
@@ -64,5 +64,12 @@ After listing issues, add:
 • **Top 3 priority fixes** (quick bullets)
 • **Positive aspects** worth retaining

+IF SCOPE TOO LARGE FOR FOCUSED REVIEW
+If the codebase is too large or complex to review effectively in a single response, you MUST request Claude to
+provide smaller, more focused subsets for review. Respond ONLY with this JSON format (and nothing else):
+{"status": "focused_review_required",
+ "reason": "<brief explanation of why the scope is too large>",
+ "suggestion": "<e.g., 'Review authentication module (auth.py, login.py)' or 'Focus on data layer (models/)' or 'Review payment processing functionality'>"}
+
 Remember: If required information is missing, use the clarification JSON above instead of guessing.
 """
--- a/systemprompts/testgen_prompt.py
+++ b/systemprompts/testgen_prompt.py
@@ -24,7 +24,8 @@ test idioms from the code snapshot provided.
 that are directly involved (network, DB, file-system, IPC).
 3. **Adversarial Thinker** – enumerates realistic failures, boundary conditions, race conditions, and misuse patterns
 that historically break similar systems.
-4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or out-of-scope cases.
+4. **Risk Prioritizer** – ranks findings by production impact and likelihood; discards speculative or
+out-of-scope cases.
 5. **Test Scaffolder** – produces deterministic, isolated tests that follow the *project's* conventions (assert style,
 fixture layout, naming, any mocking strategy, language and tooling etc).

@@ -41,6 +42,7 @@ pure functions).
 - Surface concurrency hazards with stress or fuzz tests when the language/runtime supports them.
 - Focus on realistic failure modes that actually occur in production
 - Remain within scope of language, framework, project. Do not over-step. Do not add unnecessary dependencies.
+- No bogus, fake tests that seemingly pass for no reason at all

 EDGE-CASE TAXONOMY (REAL-WORLD, HIGH-VALUE)
 - **Data Shape Issues**: `null` / `undefined`, zero-length, surrogate-pair emojis, malformed UTF-8, mixed EOLs.
@@ -93,8 +95,27 @@ it but do not approach or offer refactoring ideas.
 DELIVERABLE
 Return only the artefacts (analysis summary, coverage plan, and generated tests) that fit the detected framework
 and code / project layout.
-No extra commentary, no generic boilerplate.
-Must comment and document logic, test reason / hypothesis in delivered code
+Group related tests but separate them into files where this is the convention and most suitable for the project at hand.
+Prefer adding tests to an existing test file if one was provided and grouping these tests makes sense.
+Must document logic, test reason/hypothesis in delivered code.
+MUST NOT add any additional information, introduction, or summaries around generated code. Deliver only the essentials
+relevant to the test.
+
+IF ADDITIONAL TEST CASES ARE REQUIRED
+If you determine that comprehensive test coverage requires generating multiple test files or a large number of
+test cases for each file that would risk exceeding context limits, you MUST follow this structured approach:
+
+1. **Generate Essential Tests First**: Create only the most critical and high-impact tests (typically 3-5 key test
+   cases covering the most important paths and failure modes). Clearly state the file these tests belong to, even if
+   these should be added to an existing test file.
+
+2. **Request Continuation**: You MUST your message with the following added in JSON format (and nothing
+   more after this). This will list the pending tests and their respective files (even if they belong to the same or
+   an existing test file) as this will be used for the next follow-up test generation request.
+{"status": "more_tests_required",
+"pending_tests": "test_name (file_name), another_test_name (file_name)"}
+
+This approach ensures comprehensive test coverage while maintaining quality and avoiding context overflow.

 Remember: your value is catching the hard bugs—not inflating coverage numbers.
 """