* WIP: new workflow architecture
* WIP: further improvements and cleanup
* WIP: cleanup and docks, replace old tool with new
* WIP: cleanup and docks, replace old tool with new
* WIP: new planner implementation using workflow
* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only
* WIP: precommit workflow version swapped with old
* WIP: codereview
* WIP: replaced codereview
* WIP: replaced codereview
* WIP: replaced refactor
* WIP: workflow for thinkdeep
* WIP: ensure files get embedded correctly
* WIP: thinkdeep replaced with workflow version
* WIP: improved messaging when an external model's response is received
* WIP: analyze tool swapped
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: fixed get_completion_next_steps_message missing param
* Fixed tests
Request for files consistently
* Fixed tests
Request for files consistently
* Fixed tests
* New testgen workflow tool
Updated docs
* Swap testgen workflow
* Fix CI test failures by excluding API-dependent tests
- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
* Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97
Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96
Improved retry error logging
Cleanup
* WIP - chat tool using new architecture and improved code sharing
* Removed todo
* Removed todo
* Cleanup old name
* Tweak wordings
* Tweak wordings
Migrate old tests
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
* Support for Flash 2.0 and Flash Lite 2.0
Fixed test
* Improved consensus to use the workflow base class
* Improved consensus to use the workflow base class
* Allow images
* Allow images
* Replaced old consensus tool
* Cleanup tests
* Tests for prompt size
* New tool: docgen
Tests for prompt size
Fixes: https://github.com/BeehiveInnovations/zen-mcp-server/issues/107
Use available token size limits: https://github.com/BeehiveInnovations/zen-mcp-server/issues/105
* Improved docgen prompt
Exclude TestGen from pytest inclusion
* Updated errors
* Lint
* DocGen instructed not to fix bugs, surface them and stick to d
* WIP
* Stop claude from being lazy and only documenting a small handful
* More style rules
---------
Co-authored-by: Claude <noreply@anthropic.com>
* WIP: new workflow architecture
* WIP: further improvements and cleanup
* WIP: cleanup and docks, replace old tool with new
* WIP: cleanup and docks, replace old tool with new
* WIP: new planner implementation using workflow
* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only
* WIP: precommit workflow version swapped with old
* WIP: codereview
* WIP: replaced codereview
* WIP: replaced codereview
* WIP: replaced refactor
* WIP: workflow for thinkdeep
* WIP: ensure files get embedded correctly
* WIP: thinkdeep replaced with workflow version
* WIP: improved messaging when an external model's response is received
* WIP: analyze tool swapped
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only
* WIP: fixed get_completion_next_steps_message missing param
* Fixed tests
Request for files consistently
* Fixed tests
Request for files consistently
* Fixed tests
* New testgen workflow tool
Updated docs
* Swap testgen workflow
* Fix CI test failures by excluding API-dependent tests
- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* WIP - Update tests to match new tools
* WIP - Update tests to match new tools
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Support for Custom URLs and custom models, including locally hosted models such as ollama
* Support for native + openrouter + local models (i.e. dozens of models) means you can start delegating sub-tasks to particular models or work to local models such as localizations or other boring work etc.
* Several tests added
* precommit to also include untracked (new) files
* Logfile auto rollover
* Improved logging
This major refactoring addresses critical bugs in conversation history management
and significantly improves token efficiency through intelligent file embedding:
**Key Improvements:**
• Fixed conversation history duplication bug by centralizing reconstruction in server.py
• Added intelligent file filtering to prevent re-embedding files already in conversation history
• Centralized file processing logic in BaseTool._prepare_file_content_for_prompt()
• Enhanced log monitoring with better categorization and file embedding visibility
• Updated comprehensive test suite to verify new architecture and edge cases
**Architecture Changes:**
• Removed duplicate conversation history reconstruction from tools/base.py
• Conversation history now handled exclusively by server.py:reconstruct_thread_context
• All tools now use centralized file processing with automatic deduplication
• Improved token efficiency by embedding unique files only once per conversation
**Performance Benefits:**
• Reduced token usage through smart file filtering
• Eliminated redundant file embeddings in continued conversations
• Better observability with detailed debug logging for file operations
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Renamed `think_deeper` tool to `thinkdeep` for shorter, cleaner naming
- Updated all imports from ThinkDeeperTool to ThinkDeepTool
- Updated all references from THINK_DEEPER_PROMPT to THINKDEEP_PROMPT
- Updated tool registration in server.py
- Updated all test files to use new naming convention
- Updated README documentation to reflect new tool names
- All functionality remains the same, only naming has changed
This completes the tool renaming refactor for improved clarity and consistency.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Renamed `review_code` tool to `codereview` for better naming convention
- Renamed `review_changes` tool to `precommit` to better reflect its purpose
- Updated all tool descriptions to remove "Triggers:" sections and improve clarity
- Updated all imports and references throughout the codebase
- Renamed test files to match new tool names
- Updated server.py tool registrations
- All existing functionality preserved with improved naming
This refactoring improves code organization and makes tool purposes clearer.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Rename debug_issue.py to debug.py
- Update tool name from 'debug_issue' to 'debug' throughout codebase
- Update all references in server.py, tests, and README
- Keep DebugIssueTool class name for backward compatibility
- All tests pass with the renamed tool
This makes the tool name shorter and more consistent with other
tool names like 'chat' and 'analyze'. The functionality remains
exactly the same.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed review_changes tool to properly translate host paths to container paths in Docker
- Prevents "No such file or directory" errors when running in Docker containers
- Added proper error handling with clear messages when paths are inaccessible
refactor: Centralized token limit validation across all tools
- Added _validate_token_limit method to BaseTool to eliminate code duplication
- Reduced ~25 lines of duplicated code across 5 tools (analyze, chat, debug_issue, review_code, think_deeper)
- Maintains exact same error messages and behavior
feat: Enhanced large prompt handling
- Added support for prompts >50K chars by requesting file-based input
- Preserves MCP's ~25K token capacity for responses
- All tools now check prompt size before processing
test: Added comprehensive Docker path integration tests
- Tests for path translation, security validation, and error handling
- Tests for review_changes tool specifically with Docker paths
- Fixed failing think_deeper test (updated default from "max" to "high")
chore: Code quality improvements
- Applied black formatting across all files
- Fixed import sorting with isort
- All tests passing (96 tests)
- Standardized error handling follows MCP TextContent format
The changes ensure consistent behavior across all environments while reducing code duplication and improving maintainability.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>