Commit Graph

14 Commits

Author SHA1 Message Date
Beehive Innovations
95556ba9ea Add Consensus Tool for Multi-Model Perspective Gathering (#67)
* WIP
Refactor resolving mode_names, should be done once at MCP call boundary
Pass around model context instead
Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses.

* Deduplication of model resolution, model_context should be available before reaching deeper parts of the code
Improved abstraction when building conversations
Throw programmer errors early

* Guardrails
Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus
Model name now supports an optional ":option" for future use

* Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Fix consensus tool async/sync patterns to match codebase standards

CRITICAL FIXES:
- Converted _get_consensus_responses from async to sync (matches other tools)
- Converted store_conversation_turn from async to sync (add_turn is synchronous)
- Removed unnecessary asyncio imports and sleep calls
- Fixed ClosedResourceError in MCP protocol during long consensus operations

PATTERN ALIGNMENT:
- Consensus tool now follows same sync patterns as all other tools
- Only execute() and prepare_prompt() are async (base class requirement)
- All internal operations are synchronous like analyze, chat, debug, etc.

TESTING:
- MCP simulation test now passes: consensus_stance 
- Two-model consensus works correctly in ~35 seconds
- Unknown stance handling defaults to neutral with warnings
- All 9 unit tests pass (100% success rate)

The consensus tool async patterns were anomalous in the codebase.
This fix aligns it with the established synchronous patterns used
by all other tools while maintaining full functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed call order and added new test

* Cleanup dead comments
Docs for the new tool
Improved tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
Fahad
805e8d6d01 Fix remaining TestGenRequest reference in format_response method
Fixed NameError that was causing Docker container crashes:
- Updated type annotation in format_response method from TestGenRequest to TestGenerationRequest
- This was the last missing reference from the class rename

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 06:08:58 +04:00
Fahad
2cfe0b163a Fix all failing tests and pytest collection warnings
Fixed MagicMock comparison errors across multiple test suites by:
- Adding proper ModelCapabilities mocks with real values instead of MagicMock objects
- Updating test_auto_mode.py with correct provider mocking for model availability tests
- Updating test_thinking_modes.py with proper capabilities mocking in all thinking mode tests
- Updating test_tools.py with proper capabilities mocking for CodeReview and Analyze tools
- Fixing test_large_prompt_handling.py by adding proper provider mocking to prevent errors before large prompt detection

Fixed pytest collection warnings by:
- Renaming TestGenRequest to TestGenerationRequest to avoid pytest collecting it as a test class
- Renaming TestGenTool to TestGenerationTool to avoid pytest collecting it as a test class
- Updated all imports and references across server.py, tools/__init__.py, and test files

All 459 tests now pass without warnings or MagicMock comparison errors.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 06:02:12 +04:00
Fahad
64a1d8664e Fix directory expansion tracking in conversation memory
When directories were provided to tools, only the directory path was stored
in conversation history instead of the individual expanded files. This caused
file filtering to incorrectly skip files in continued conversations.

Changes:
- Modified _prepare_file_content_for_prompt to return (content, processed_files)
- Updated all tools to track actually processed files for conversation memory
- Ensures directories are tracked as their expanded individual files

Fixes issue where Swift directory with 46 files was not properly embedded
in conversation continuations.
2025-06-15 15:36:12 +04:00
Fahad
4becd70a82 Perform prompt size checks only at the MCP boundary
New test to confirm history build-up and system prompt does not affect prompt size checks
Also check for large prompts in focus_on
Fixed .env.example incorrectly did not comment out CUSTOM_API causing the run-server script to think at least one key exists
2025-06-15 10:37:08 +04:00
Fahad
c7835e7eef Easier access to logs at startup with -f on the run script
Improved prompt for immediate action
Additional logging of tool names
Updated documentation
Context aware decomposition system prompt
New script to run code quality checks
2025-06-15 09:25:52 +04:00
Fahad
99fab3e83d Docs added to show how a new provider is added
Docs added to show how a new tool is created
All tools should add numbers to code for models to be able to reference if needed
Enabled line numbering for code for all tools to use
Additional tests to validate line numbering is not added to git diffs
2025-06-15 07:02:27 +04:00
Fahad
b5004b91fc Major new addition: refactor tool
Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today!
Line numbers added to embedded code for better references from model -> claude
2025-06-15 06:00:01 +04:00
Fahad
442decba70 Improved model response handling to handle additional response statuses in future
Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier
Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
2025-06-14 18:43:56 +04:00
Fahad
698c1611d2 Lint fix 2025-06-14 16:47:59 +04:00
Fahad
d0d0a171dc Ensure duplicate file references are gracefully handled
Improved prompt to encourage immediate action
2025-06-14 16:37:02 +04:00
Fahad
ec707e021a Fix for path translation within docker 2025-06-14 16:00:54 +04:00
Fahad
acbfa1c94e Improved prompt for next steps 2025-06-14 15:51:04 +04:00
Fahad
4086306c58 New tool: testgen
Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
2025-06-14 15:41:47 +04:00