31 Commits

Author SHA1 Message Date
Fahad
b2dc84992d fix: rebranding, see [docs/name-change.md](docs/name-change.md) for details 2025-12-04 18:15:14 +04:00
Fahad
090931d7cf Fixed linebreaks
Cleanup
Pass excluded fields to the schema builder directly
2025-06-27 14:29:10 +04:00
Beehive Innovations
c960bcb720 Add DocGen tool with comprehensive documentation generation capabilities (#109)
* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

* Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97
Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96
Improved retry error logging
Cleanup

* WIP - chat tool using new architecture and improved code sharing

* Removed todo

* Removed todo

* Cleanup old name

* Tweak wordings

* Tweak wordings
Migrate old tests

* Support for Flash 2.0 and Flash Lite 2.0

* Support for Flash 2.0 and Flash Lite 2.0

* Support for Flash 2.0 and Flash Lite 2.0
Fixed test

* Improved consensus to use the workflow base class

* Improved consensus to use the workflow base class

* Allow images

* Allow images

* Replaced old consensus tool

* Cleanup tests

* Tests for prompt size

* New tool: docgen
Tests for prompt size
Fixes: https://github.com/BeehiveInnovations/zen-mcp-server/issues/107
Use available token size limits: https://github.com/BeehiveInnovations/zen-mcp-server/issues/105

* Improved docgen prompt
Exclude TestGen from pytest inclusion

* Updated errors

* Lint

* DocGen instructed not to fix bugs, surface them and stick to d

* WIP

* Stop claude from being lazy and only documenting a small handful

* More style rules

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-22 10:21:19 +04:00
Beehive Innovations
69a3121452 🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)
* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 00:08:11 +04:00
Fahad
a509730dca New Planner tool to help you break down complex ideas, problems, and projects into multiple manageable steps. This is a self-prompt generation tool whose output can then be fed into another tool and model as required 2025-06-17 20:49:53 +04:00
Fahad
be7d80d7aa Advertise prompts, fixes https://github.com/BeehiveInnovations/zen-mcp-server/issues/63 2025-06-17 11:17:19 +04:00
Beehive Innovations
95556ba9ea Add Consensus Tool for Multi-Model Perspective Gathering (#67)
* WIP
Refactor resolving mode_names, should be done once at MCP call boundary
Pass around model context instead
Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses.

* Deduplication of model resolution, model_context should be available before reaching deeper parts of the code
Improved abstraction when building conversations
Throw programmer errors early

* Guardrails
Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus
Model name now supports an optional ":option" for future use

* Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Fix consensus tool async/sync patterns to match codebase standards

CRITICAL FIXES:
- Converted _get_consensus_responses from async to sync (matches other tools)
- Converted store_conversation_turn from async to sync (add_turn is synchronous)
- Removed unnecessary asyncio imports and sleep calls
- Fixed ClosedResourceError in MCP protocol during long consensus operations

PATTERN ALIGNMENT:
- Consensus tool now follows same sync patterns as all other tools
- Only execute() and prepare_prompt() are async (base class requirement)
- All internal operations are synchronous like analyze, chat, debug, etc.

TESTING:
- MCP simulation test now passes: consensus_stance 
- Two-model consensus works correctly in ~35 seconds
- Unknown stance handling defaults to neutral with warnings
- All 9 unit tests pass (100% success rate)

The consensus tool async patterns were anomalous in the codebase.
This fix aligns it with the established synchronous patterns used
by all other tools while maintaining full functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed call order and added new test

* Cleanup dead comments
Docs for the new tool
Improved tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
Fahad
70b64adff3 Schema now lists all models including locally available models
New tool to list all models `listmodels`
Integration test to for all the different combinations of API keys
Tweaks to codereview prompt for a better quality input from Claude
Fixed missing 'low' severity in codereview
2025-06-16 19:07:35 +04:00
Fahad
5a49d196c8 More integration tests 2025-06-16 07:07:38 +04:00
Fahad
86728a1442 WIP 2025-06-15 15:32:41 +04:00
Fahad
b5004b91fc Major new addition: refactor tool
Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today!
Line numbers added to embedded code for better references from model -> claude
2025-06-15 06:00:01 +04:00
Fahad
a4f9e22256 Renamed version tool 2025-06-14 18:54:53 +04:00
Fahad
4086306c58 New tool: testgen
Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
2025-06-14 15:41:47 +04:00
Fahad
3aedb16101 Use the new Gemini 2.5 Flash
Updated to support Thinking Tokens as a ratio of the max allowed
Updated tests
Updated README
2025-06-12 20:46:54 +04:00
Fahad
79af2654b9 Use the new flash model
Updated tests
2025-06-12 13:44:09 +04:00
Fahad
fb66825bf6 Rebranding, refactoring, renaming, cleanup, updated docs 2025-06-12 10:40:43 +04:00
Fahad
2a067a7f4e WIP major refactor and features 2025-06-12 07:14:59 +04:00
Fahad
ba8f7192c3 refactor: rename think_deeper to thinkdeep for brevity
- Renamed `think_deeper` tool to `thinkdeep` for shorter, cleaner naming
- Updated all imports from ThinkDeeperTool to ThinkDeepTool
- Updated all references from THINK_DEEPER_PROMPT to THINKDEEP_PROMPT
- Updated tool registration in server.py
- Updated all test files to use new naming convention
- Updated README documentation to reflect new tool names
- All functionality remains the same, only naming has changed

This completes the tool renaming refactor for improved clarity and consistency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 12:38:38 +04:00
Fahad
5f8ed3aae8 refactor: rename review tools for clarity and consistency
- Renamed `review_code` tool to `codereview` for better naming convention
- Renamed `review_changes` tool to `precommit` to better reflect its purpose
- Updated all tool descriptions to remove "Triggers:" sections and improve clarity
- Updated all imports and references throughout the codebase
- Renamed test files to match new tool names
- Updated server.py tool registrations
- All existing functionality preserved with improved naming

This refactoring improves code organization and makes tool purposes clearer.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 12:30:06 +04:00
Fahad
67f18ef3c9 refactor: rename debug_issue tool to debug for brevity
- Rename debug_issue.py to debug.py
- Update tool name from 'debug_issue' to 'debug' throughout codebase
- Update all references in server.py, tests, and README
- Keep DebugIssueTool class name for backward compatibility
- All tests pass with the renamed tool

This makes the tool name shorter and more consistent with other
tool names like 'chat' and 'analyze'. The functionality remains
exactly the same.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 11:43:47 +04:00
Fahad
27add4d05d feat: Major refactoring and improvements v2.11.0
## 🚀 Major Improvements

### Docker Environment Simplification
- **BREAKING**: Simplified Docker configuration by auto-detecting sandbox from WORKSPACE_ROOT
- Removed redundant MCP_PROJECT_ROOT requirement for Docker setups
- Updated all Docker config examples and setup scripts
- Added security validation for dangerous WORKSPACE_ROOT paths

### Security Enhancements
- **CRITICAL**: Fixed insecure PROJECT_ROOT fallback to use current directory instead of home
- Enhanced path validation with proper Docker environment detection
- Removed information disclosure in error messages
- Strengthened symlink and path traversal protection

### File Handling Optimization
- **PERFORMANCE**: Optimized read_files() to return content only (removed summary)
- Unified file reading across all tools using standardized file_utils routines
- Fixed review_changes tool to use consistent file loading patterns
- Improved token management and reduced unnecessary processing

### Tool Improvements
- **UX**: Enhanced ReviewCodeTool to require user context for targeted reviews
- Removed deprecated _get_secure_container_path function and _sanitize_filename
- Standardized file access patterns across analyze, review_changes, and other tools
- Added contextual prompting to align reviews with user expectations

### Code Quality & Testing
- Updated all tests for new function signatures and requirements
- Added comprehensive Docker path integration tests
- Achieved 100% test coverage (95 tests passing)
- Full compliance with ruff, black, and isort linting standards

### Configuration & Deployment
- Added pyproject.toml for modern Python packaging
- Streamlined Docker setup removing redundant environment variables
- Updated setup scripts across all platforms (Windows, macOS, Linux)
- Improved error handling and validation throughout

## 🔧 Technical Changes

- **Removed**: `_get_secure_container_path()`, `_sanitize_filename()`, unused SANDBOX_MODE
- **Enhanced**: Path translation, security validation, token management
- **Standardized**: File reading patterns, error handling, Docker detection
- **Updated**: All tool prompts for better context alignment

## 🛡️ Security Notes

This release significantly improves the security posture by:
- Eliminating broad filesystem access defaults
- Adding validation for Docker environment variables
- Removing information disclosure in error paths
- Strengthening path traversal and symlink protections

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 09:50:05 +04:00
Fahad
783ba73181 refactor: cleanup and comprehensive documentation
Major changes:
- Add comprehensive documentation to all modules with detailed docstrings
- Remove unused THINKING_MODEL config (use single GEMINI_MODEL with thinking_mode param)
- Remove list_models functionality (simplified to single model configuration)
- Rename DEFAULT_MODEL to GEMINI_MODEL for clarity
- Remove unused python-dotenv dependency
- Fix missing pydantic in setup.py dependencies

Documentation improvements:
- Document security measures in file_utils.py (path validation, sandboxing)
- Add detailed comments to critical logic sections
- Document tool creation process in BaseTool
- Explain configuration values and their impact
- Add comprehensive function-level documentation

Code quality:
- Apply black formatting to all files
- Fix all ruff linting issues
- Update tests to match refactored code
- All 63 tests passing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 19:04:24 +04:00
Fahad
fd6e2f9b64 refactor: rename review_pending_changes to review_changes
- Renamed tool from review_pending_changes to review_changes for brevity
- Enhanced tool descriptions for better MCP auto-discovery
- Updated all references throughout codebase including:
  - Tool implementation (tools/review_changes.py)
  - Test files (tests/test_review_changes.py)
  - Server registration and imports
  - Documentation in README.md
  - Tool prompts in prompts/tool_prompts.py
- Enhanced review_changes description to emphasize pre-commit usage
- All tests pass, linting and formatting checks pass

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 14:37:03 +04:00
Fahad
7ee610938b feat: add review_pending_changes tool and enforce absolute path security
- Add new review_pending_changes tool for comprehensive pre-commit reviews
- Implement filesystem sandboxing with MCP_PROJECT_ROOT
- Enforce absolute paths for all file/directory operations
- Add comprehensive git utilities for repository management
- Update all tools to use centralized path validation
- Add extensive test coverage for new features and security model
- Update documentation with new tool and path requirements
- Remove obsolete demo and guide files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 12:42:40 +04:00
Fahad
299f7d3897 feat: add Claude-Gemini collaboration and chat capabilities
- Add collaboration demo showing dynamic context requests
- Implement chat tool for general conversations and brainstorming
- Add tool selection guide with clear boundaries
- Introduce models configuration system
- Update prompts for better tool descriptions
- Refactor server to remove redundant functionality
- Add comprehensive tests for collaboration features
- Enhance base tool with collaborative features

This enables Claude to request additional context from Gemini
during tool execution, improving analysis quality and accuracy.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 11:17:26 +04:00
Fahad
5ccedcecd8 style: fix linting and formatting issues
- Run black formatter on all Python files
- Fix ruff linting issues:
  - Remove unused imports
  - Remove unused variables
  - Fix f-string without placeholders
- All 37 tests still pass
- Code quality improved for CI/CD compliance

🧹 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 09:37:46 +04:00
Fahad
fb5c04ea60 feat: implement comprehensive thinking modes and migrate to google-genai
Major improvements to thinking capabilities and API integration:

- Remove all output token limits for future-proof responses
- Add 5-level thinking mode system: minimal, low, medium, high, max
- Migrate from google-generativeai to google-genai library
- Implement native thinkingBudget support for Gemini 2.5 Pro
- Set medium thinking as default for all tools, max for think_deeper

🧠 Thinking Modes:
- minimal (128 tokens) - simple tasks
- low (2048 tokens) - basic reasoning
- medium (8192 tokens) - default for most tools
- high (16384 tokens) - complex analysis
- max (32768 tokens) - default for think_deeper

🔧 Technical Changes:
- Complete migration to google-genai>=1.19.0
- Remove google-generativeai dependency
- Add ThinkingConfig with thinking_budget parameter
- Update all tools to support thinking_mode parameter
- Comprehensive test suite with 37 passing unit tests
- CI-friendly testing (no API key required for unit tests)
- Live integration tests for API verification

🧪 Testing & CI:
- Add GitHub Actions workflow with multi-Python support
- Unit tests use mocks, no API key required
- Live integration tests optional with API key
- Contributing guide with development setup
- All tests pass without external dependencies

🐛 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 09:35:21 +04:00
Fahad
bccde9307f fix: make tests version-agnostic to prevent future breakage
- Remove hardcoded version checks in test_server.py
- Update test_config.py to check version format instead of specific value
- Tests now validate structure/format rather than exact versions
- Prevents test failures when bumping versions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 05:37:14 +04:00
Fahad
4edbf0abb1 fix: update test to match version 2.4.1
Update test_handle_get_version to check for v2.4.1 instead of v2.4.0

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-09 05:35:17 +04:00
Fahad
c03af1629f fix: apply isort formatting to fix CI linting
Applied isort to properly sort all imports according to PEP8:
- Standard library imports first
- Third-party imports second
- Local imports last
- Alphabetical ordering within each group

All tests still passing after import reordering.
2025-06-08 22:32:27 +04:00
Fahad
1aa19548d1 feat: complete redesign to v2.4.0 - Claude's ultimate development partner
Major redesign of Gemini MCP Server with modular architecture:

- Removed all emoji characters from tool outputs for clean terminal display
- Kept review category emojis (🔴🟠🟡🟢) per user preference
- Added 4 specialized tools:
  - think_deeper: Extended reasoning and problem-solving (temp 0.7)
  - review_code: Professional code review with severity levels (temp 0.2)
  - debug_issue: Root cause analysis and debugging (temp 0.2)
  - analyze: General-purpose file analysis (temp 0.2)
- Modular architecture with base tool class and Pydantic models
- Verbose tool descriptions with natural language triggers
- Updated README with comprehensive examples and real-world use cases
- All 25 tests passing, type checking clean, critical linting clean

BREAKING CHANGE: Removed analyze_code tool in favor of specialized tools

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-08 22:30:45 +04:00