Add Consensus Tool for Multi-Model Perspective Gathering (#67)

* WIP Refactor resolving mode_names, should be done once at MCP call boundary Pass around model context instead Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses. * Deduplication of model resolution, model_context should be available before reaching deeper parts of the code Improved abstraction when building conversations Throw programmer errors early * Guardrails Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus Model name now supports an optional ":option" for future use * Simplified async flow * Improved model for request to support natural language Simplified async flow * Improved model for request to support natural language Simplified async flow * Fix consensus tool async/sync patterns to match codebase standards CRITICAL FIXES: - Converted _get_consensus_responses from async to sync (matches other tools) - Converted store_conversation_turn from async to sync (add_turn is synchronous) - Removed unnecessary asyncio imports and sleep calls - Fixed ClosedResourceError in MCP protocol during long consensus operations PATTERN ALIGNMENT: - Consensus tool now follows same sync patterns as all other tools - Only execute() and prepare_prompt() are async (base class requirement) - All internal operations are synchronous like analyze, chat, debug, etc. TESTING: - MCP simulation test now passes: consensus_stance ✅ - Two-model consensus works correctly in ~35 seconds - Unknown stance handling defaults to neutral with warnings - All 9 unit tests pass (100% success rate) The consensus tool async patterns were anomalous in the codebase. This fix aligns it with the established synchronous patterns used by all other tools while maintaining full functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fixed call order and added new test * Cleanup dead comments Docs for the new tool Improved tests --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
parent 9b98df650b
commit 95556ba9ea
31 changed files with 2643 additions and 324 deletions
--- a/tests/test_testgen.py
+++ b/tests/test_testgen.py
@@ -425,15 +425,39 @@ class TestComprehensive(unittest.TestCase):
                        files=["/tmp/test.py"], prompt="Test prompt", test_examples=["/tmp/example.py"]
                    )

-                    # This should trigger token budget calculation
-                    import asyncio
+                    # Mock the provider registry to return a provider with 200k context
+                    from unittest.mock import MagicMock

-                    asyncio.run(tool.prepare_prompt(request))
+                    from providers.base import ModelCapabilities, ProviderType

-                    # Verify test examples got 25% of 150k tokens (75% of 200k context)
-                    mock_process.assert_called_once()
-                    call_args = mock_process.call_args[0]
-                    assert call_args[2] == 150000  # 75% of 200k context window
+                    mock_provider = MagicMock()
+                    mock_capabilities = ModelCapabilities(
+                        provider=ProviderType.OPENAI,
+                        model_name="o3",
+                        friendly_name="OpenAI",
+                        context_window=200000,
+                        supports_images=False,
+                        supports_extended_thinking=True,
+                    )
+
+                    with patch("providers.registry.ModelProviderRegistry.get_provider_for_model") as mock_get_provider:
+                        mock_provider.get_capabilities.return_value = mock_capabilities
+                        mock_get_provider.return_value = mock_provider
+
+                        # Set up model context to simulate normal execution flow
+                        from utils.model_context import ModelContext
+
+                        tool._model_context = ModelContext("o3")  # Model with 200k context window
+
+                        # This should trigger token budget calculation
+                        import asyncio
+
+                        asyncio.run(tool.prepare_prompt(request))
+
+                        # Verify test examples got 25% of 150k tokens (75% of 200k context)
+                        mock_process.assert_called_once()
+                        call_args = mock_process.call_args[0]
+                        assert call_args[2] == 150000  # 75% of 200k context window

    @pytest.mark.asyncio
    async def test_continuation_support(self, tool, temp_files):