Add Consensus Tool for Multi-Model Perspective Gathering (#67)

* WIP Refactor resolving mode_names, should be done once at MCP call boundary Pass around model context instead Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses. * Deduplication of model resolution, model_context should be available before reaching deeper parts of the code Improved abstraction when building conversations Throw programmer errors early * Guardrails Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus Model name now supports an optional ":option" for future use * Simplified async flow * Improved model for request to support natural language Simplified async flow * Improved model for request to support natural language Simplified async flow * Fix consensus tool async/sync patterns to match codebase standards CRITICAL FIXES: - Converted _get_consensus_responses from async to sync (matches other tools) - Converted store_conversation_turn from async to sync (add_turn is synchronous) - Removed unnecessary asyncio imports and sleep calls - Fixed ClosedResourceError in MCP protocol during long consensus operations PATTERN ALIGNMENT: - Consensus tool now follows same sync patterns as all other tools - Only execute() and prepare_prompt() are async (base class requirement) - All internal operations are synchronous like analyze, chat, debug, etc. TESTING: - MCP simulation test now passes: consensus_stance ✅ - Two-model consensus works correctly in ~35 seconds - Unknown stance handling defaults to neutral with warnings - All 9 unit tests pass (100% success rate) The consensus tool async patterns were anomalous in the codebase. This fix aligns it with the established synchronous patterns used by all other tools while maintaining full functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fixed call order and added new test * Cleanup dead comments Docs for the new tool Improved tests --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
parent 9b98df650b
commit 95556ba9ea
31 changed files with 2643 additions and 324 deletions
--- a/tools/testgen.py
+++ b/tools/testgen.py
@@ -290,23 +290,25 @@ class TestGenerationTool(BaseTool):
        continuation_id = getattr(request, "continuation_id", None)

        # Get model context for token budget calculation
-        model_name = getattr(self, "_current_model_name", None)
        available_tokens = None

-        if model_name:
+        if hasattr(self, "_model_context") and self._model_context:
            try:
-                provider = self.get_model_provider(model_name)
-                capabilities = provider.get_capabilities(model_name)
+                capabilities = self._model_context.capabilities
                # Use 75% of context for content (code + test examples), 25% for response
                available_tokens = int(capabilities.context_window * 0.75)
                logger.debug(
-                    f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {model_name}"
+                    f"[TESTGEN] Token budget calculation: {available_tokens:,} tokens (75% of {capabilities.context_window:,}) for model {self._model_context.model_name}"
                )
            except Exception as e:
                # Fallback to conservative estimate
-                logger.warning(f"[TESTGEN] Could not get model capabilities for {model_name}: {e}")
+                logger.warning(f"[TESTGEN] Could not get model capabilities: {e}")
                available_tokens = 120000  # Conservative fallback
                logger.debug(f"[TESTGEN] Using fallback token budget: {available_tokens:,} tokens")
+        else:
+            # No model context available (shouldn't happen in normal flow)
+            available_tokens = 120000  # Conservative fallback
+            logger.debug(f"[TESTGEN] No model context, using fallback token budget: {available_tokens:,} tokens")

        # Process test examples first to determine token allocation
        test_examples_content = ""