Commit Graph

27 Commits

Author SHA1 Message Date
Fahad
adbc4af4a9 Update confidence enum values across workflow tools
Added new confidence values (very_high, almost_certain) to all workflow tools
to provide more granular confidence tracking. Updated enum declarations in:
- analyze.py, codereview.py, debug.py, precommit.py, secaudit.py, testgen.py
- Updated debug.py's get_required_actions to handle new confidence values
- All tools now use consistent 7-value confidence scale
- refactor.py kept its unique scale (exploring/incomplete/partial/complete)

Also fixed model thinking configuration:
- Added very_high and almost_certain to MODEL_THINKING_PREFERENCES
- Set medium thinking for very_high, high thinking for almost_certain
- Updated prompts to clarify certain means 100% local confidence

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-28 00:04:35 +04:00
Fahad
bc447d4bcd Generic naming to work with Gemini CLI / Claude Code 2025-06-27 23:41:20 +04:00
Fahad
6fa2d63eac Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97
Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96
Improved retry error logging
Cleanup
2025-06-21 05:59:19 +04:00
Beehive Innovations
69a3121452 🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)
* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 00:08:11 +04:00
Beehive Innovations
4151c3c3a5 Migration from Docker to Standalone Python Server (#73)
* Migration from docker to standalone server
Migration handling
Fixed tests
Use simpler in-memory storage
Support for concurrent logging to disk
Simplified direct connections to localhost

* Migration from docker / redis to standalone script
Updated tests
Updated run script
Fixed requirements
Use dotenv
Ask if user would like to install MCP in Claude Desktop once
Updated docs

* More cleanup and references to docker removed

* Cleanup

* Comments

* Fixed tests

* Fix GitHub Actions workflow for standalone Python architecture

- Install requirements-dev.txt for pytest and testing dependencies
- Remove Docker setup from simulation tests (now standalone)
- Simplify linting job to use requirements-dev.txt
- Update simulation tests to run directly without Docker

Fixes unit test failures in CI due to missing pytest dependency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove simulation tests from GitHub Actions

- Removed simulation-tests job that makes real API calls
- Keep only unit tests (mocked, no API costs) and linting
- Simulation tests should be run manually with real API keys
- Reduces CI costs and complexity

GitHub Actions now only runs:
- Unit tests (569 tests, all mocked)
- Code quality checks (ruff, black)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed tests

* Fixed tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-18 23:41:22 +04:00
Fahad
b56993a42f Refactor 2025-06-18 06:24:24 +04:00
Fahad
c3276595f9 Updated instructions 2025-06-18 05:58:50 +04:00
Beehive Innovations
95556ba9ea Add Consensus Tool for Multi-Model Perspective Gathering (#67)
* WIP
Refactor resolving mode_names, should be done once at MCP call boundary
Pass around model context instead
Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses.

* Deduplication of model resolution, model_context should be available before reaching deeper parts of the code
Improved abstraction when building conversations
Throw programmer errors early

* Guardrails
Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus
Model name now supports an optional ":option" for future use

* Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Fix consensus tool async/sync patterns to match codebase standards

CRITICAL FIXES:
- Converted _get_consensus_responses from async to sync (matches other tools)
- Converted store_conversation_turn from async to sync (add_turn is synchronous)
- Removed unnecessary asyncio imports and sleep calls
- Fixed ClosedResourceError in MCP protocol during long consensus operations

PATTERN ALIGNMENT:
- Consensus tool now follows same sync patterns as all other tools
- Only execute() and prepare_prompt() are async (base class requirement)
- All internal operations are synchronous like analyze, chat, debug, etc.

TESTING:
- MCP simulation test now passes: consensus_stance 
- Two-model consensus works correctly in ~35 seconds
- Unknown stance handling defaults to neutral with warnings
- All 9 unit tests pass (100% success rate)

The consensus tool async patterns were anomalous in the codebase.
This fix aligns it with the established synchronous patterns used
by all other tools while maintaining full functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed call order and added new test

* Cleanup dead comments
Docs for the new tool
Improved tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
Fahad
97fa6781cf Vision support via images / pdfs etc that can be passed on to other models as part of analysis, additional context etc.
Image processing pipeline added
OpenAI GPT-4.1 support
Chat tool prompt enhancement
Lint and code quality improvements
2025-06-16 13:14:53 +04:00
Fahad
91077e3810 Performance improvements when embedding files:
- Exit early at MCP boundary if files won't fit within given context of chosen model
- Encourage claude to re-run with better context
- Check file sizes before embedding
- Drop files from older conversations when building continuations and give priority to newer files
- List and mention excluded files to Claude on return
- Improved tests
- Improved precommit prompt
- Added a new Low severity to precommit
- Improved documentation of file embedding strategy
- Refactor
2025-06-16 05:51:52 +04:00
Fahad
6dcf095c3d Improved schema description to allow Claude to pre-think harder before invoking thinkdeep 2025-06-15 19:05:23 +04:00
Fahad
07a078b4f2 Updated tests and additional tests for folder expansion during conversation tracking 2025-06-15 16:03:43 +04:00
Fahad
d36a85a3f3 Fix directory expansion tracking in conversation memory
When directories were provided to tools, only the directory path was stored
in conversation history instead of the individual expanded files. This caused
file filtering to incorrectly skip files in continued conversations.

Changes:
- Modified _prepare_file_content_for_prompt to return (content, processed_files)
- Updated all tools to track actually processed files for conversation memory
- Ensures directories are tracked as their expanded individual files

Fixes issue where Swift directory with 46 files was not properly embedded
in conversation continuations.
2025-06-15 15:36:45 +04:00
Fahad
4becd70a82 Perform prompt size checks only at the MCP boundary
New test to confirm history build-up and system prompt does not affect prompt size checks
Also check for large prompts in focus_on
Fixed .env.example incorrectly did not comment out CUSTOM_API causing the run-server script to think at least one key exists
2025-06-15 10:37:08 +04:00
Fahad
4cacd2dad9 Fixed trigger word 2025-06-14 20:14:40 +04:00
Fahad
e0a05b86f1 Add encouraging message about powerful models to schema in case it's not on Opus 4 or above
OPENROUTER_ALLOWED_MODELS environment variable support to further limit the models to allow from within Claude. This will put a limit on top of even the ones listed in custom_models.json
2025-06-14 11:34:17 +04:00
Fahad
21037c2d81 Refactored prompts for better maintainability 2025-06-14 11:09:13 +04:00
Fahad
eb388ab2f2 Categorize tools into 'model capabilities categories' to help determine which type of model to pick when in auto mode
Encourage Claude to pick the best model for the job automatically in auto mode
Lots of new tests to ensure automatic model picking works reliably based on user preference or when a matching model is not found or ambiguous
Improved error reporting when bogus model is requested and is not configured or available
2025-06-14 02:17:06 +04:00
Fahad
3aedb16101 Use the new Gemini 2.5 Flash
Updated to support Thinking Tokens as a ratio of the max allowed
Updated tests
Updated README
2025-06-12 20:46:54 +04:00
Fahad
fb66825bf6 Rebranding, refactoring, renaming, cleanup, updated docs 2025-06-12 10:40:43 +04:00
Fahad
2a067a7f4e WIP major refactor and features 2025-06-12 07:14:59 +04:00
Fahad
22a3fb91ed feat: Add comprehensive dynamic configuration system v3.3.0
## Major Features Added

### 🎯 Dynamic Configuration System
- **Environment-aware model selection**: DEFAULT_MODEL with 'pro'/'flash' shortcuts
- **Configurable thinking modes**: DEFAULT_THINKING_MODE_THINKDEEP for extended reasoning
- **All tool schemas now dynamic**: Show actual current defaults instead of hardcoded values
- **Enhanced setup workflow**: Copy from .env.example with smart customization

### 🔧 Model & Thinking Configuration
- **Smart model resolution**: Support both shortcuts ('pro', 'flash') and full model names
- **Thinking mode optimization**: Only apply thinking budget to models that support it
- **Flash model compatibility**: Works without thinking config, still beneficial via system prompts
- **Dynamic schema descriptions**: Tool parameters show current environment values

### 🚀 Enhanced Developer Experience
- **Fail-fast Docker setup**: GEMINI_API_KEY required upfront in docker-compose
- **Comprehensive startup logging**: Shows current model and thinking mode defaults
- **Enhanced get_version tool**: Reports all dynamic configuration values
- **Better .env documentation**: Clear token consumption details and model options

### 🧪 Comprehensive Testing
- **Live model validation**: New simulator test validates Pro vs Flash thinking behavior
- **Dynamic configuration tests**: Verify environment variable overrides work correctly
- **Complete test coverage**: All 139 unit tests pass, including new model config tests

### 📋 Configuration Files Updated
- **docker-compose.yml**: Fail-fast API key validation, thinking mode support
- **setup-docker.sh**: Copy from .env.example instead of manual creation
- **.env.example**: Detailed documentation with token consumption per thinking mode
- **.gitignore**: Added test-setup/ for cleanup

### 🛠 Technical Improvements
- **Removed setup.py**: Fully Docker-based deployment (no longer needed)
- **REDIS_URL smart defaults**: Auto-configured for Docker, still configurable for dev
- **All tools updated**: Consistent dynamic model parameter descriptions
- **Enhanced error handling**: Better model resolution and validation

## Breaking Changes
- Removed setup.py (Docker-only deployment)
- Model parameter descriptions now show actual defaults (dynamic)

## Migration Guide
- Update .env files using new .env.example format
- Use 'pro'/'flash' shortcuts or full model names
- Set DEFAULT_THINKING_MODE_THINKDEEP for custom thinking depth

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 20:10:25 +04:00
Fahad
5a94737516 Fix conversation history duplication and optimize file embedding
This major refactoring addresses critical bugs in conversation history management
and significantly improves token efficiency through intelligent file embedding:

**Key Improvements:**
• Fixed conversation history duplication bug by centralizing reconstruction in server.py
• Added intelligent file filtering to prevent re-embedding files already in conversation history
• Centralized file processing logic in BaseTool._prepare_file_content_for_prompt()
• Enhanced log monitoring with better categorization and file embedding visibility
• Updated comprehensive test suite to verify new architecture and edge cases

**Architecture Changes:**
• Removed duplicate conversation history reconstruction from tools/base.py
• Conversation history now handled exclusively by server.py:reconstruct_thread_context
• All tools now use centralized file processing with automatic deduplication
• Improved token efficiency by embedding unique files only once per conversation

**Performance Benefits:**
• Reduced token usage through smart file filtering
• Eliminated redundant file embeddings in continued conversations
• Better observability with detailed debug logging for file operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 11:40:12 +04:00
Fahad
94f542c76a fix: critical conversation history bug and improve Docker integration
This commit addresses several critical issues and improvements:

🔧 Critical Fixes:
- Fixed conversation history not being included when using continuation_id in AI-to-AI conversations
- Fixed test mock targeting issues preventing proper conversation memory validation
- Fixed Docker debug logging functionality with Gemini tools

🐛 Bug Fixes:
- Docker compose configuration for proper container command execution
- Test mock import targeting from utils.conversation_memory.* to tools.base.*
- Version bump to 3.1.0 reflecting significant improvements

🚀 Improvements:
- Enhanced Docker environment configuration with comprehensive logging setup
- Added cross-tool continuation documentation and examples in README
- Improved error handling and validation across all tools
- Better logging configuration with LOG_LEVEL environment variable support
- Enhanced conversation memory system documentation

🧪 Testing:
- Added comprehensive conversation history bug fix tests
- Added cross-tool continuation functionality tests
- All 132 tests now pass with proper conversation history validation
- Improved test coverage for AI-to-AI conversation threading

 Code Quality:
- Applied black, isort, and ruff formatting across entire codebase
- Enhanced inline documentation for conversation memory system
- Cleaned up temporary files and improved repository hygiene
- Better test descriptions and coverage for critical functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 08:53:45 +04:00
Fahad
f5060367a0 WIP - communication memory 2025-06-10 19:16:51 +04:00
Fahad
3d958ef7e9 feat: enhance websearch functionality with reasoning-based approach
- Changed use_websearch default from False to True for all tools
- Redesigned websearch behavior to focus on reasoning rather than direct search
- Tools now analyze when web searches would be helpful and recommend specific searches
- Updated all tool-specific websearch instructions to be more thoughtful
- Gemini will now suggest what Claude should search for and explain why

Key improvements:
- Web search is now enabled by default for better documentation awareness
- Instead of performing searches, Gemini reasons about what searches would help
- Provides specific search recommendations for Claude to execute
- More collaborative approach between Gemini and Claude

This change makes the tools more intelligent about when external information
would be valuable while keeping Claude in control of actual web searches.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 12:45:57 +04:00
Fahad
ba8f7192c3 refactor: rename think_deeper to thinkdeep for brevity
- Renamed `think_deeper` tool to `thinkdeep` for shorter, cleaner naming
- Updated all imports from ThinkDeeperTool to ThinkDeepTool
- Updated all references from THINK_DEEPER_PROMPT to THINKDEEP_PROMPT
- Updated tool registration in server.py
- Updated all test files to use new naming convention
- Updated README documentation to reflect new tool names
- All functionality remains the same, only naming has changed

This completes the tool renaming refactor for improved clarity and consistency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 12:38:38 +04:00