Commit Graph

239 Commits

Author SHA1 Message Date
Nikolai Ugelvik
be2612752a Fix auto mode when only OpenRouter is configured
The get_available_models method in ModelProviderRegistry was only checking
for providers with SUPPORTED_MODELS attribute, which OpenRouter doesn't have.
This caused auto mode to fail with "No models available" error when only
OpenRouter API key was configured.

Added special handling for OpenRouter provider to check its _registry
for available models, ensuring auto mode works correctly with OpenRouter.

Added comprehensive tests to verify:
- Auto mode works with only OpenRouter configured
- Model restrictions are respected
- Graceful handling when no providers are available
- No crashes when OpenRouter lacks _registry attribute
2025-06-14 19:21:14 +02:00
Beehive Innovations
9f973b90e5 Merge pull request #36 from lox/add-o3-pro-support
feat: Add o3-pro model support
2025-06-14 19:44:14 +04:00
Fahad
f1ad06c529 Fixed lint, tests after recent fix
Updated readme
2025-06-14 19:31:31 +04:00
Fahad
a4f9e22256 Renamed version tool 2025-06-14 18:54:53 +04:00
Fahad
442decba70 Improved model response handling to handle additional response statuses in future
Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier
Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
2025-06-14 18:43:56 +04:00
Fahad
d0d0a171dc Ensure duplicate file references are gracefully handled
Improved prompt to encourage immediate action
2025-06-14 16:37:02 +04:00
Fahad
acbfa1c94e Improved prompt for next steps 2025-06-14 15:51:04 +04:00
Fahad
4086306c58 New tool: testgen
Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
2025-06-14 15:41:47 +04:00
Lachlan Donald
40aa1eaeb6 Format test_auto_mode.py with black
Fix code formatting to comply with black style requirements.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-14 21:09:47 +10:00
Fahad
7d33aafcab Configurable conversation limit now set to 10 exchanges. This helps when you want to manually continue a thread of thought across different models manually. 2025-06-14 14:00:13 +04:00
Fahad
bc3f98a291 Make conversation timeout configuration (so that you're able to resume a discussion manually with another model with a gap of several hours in case you stepped away) 2025-06-14 13:27:19 +04:00
Fahad
e0a05b86f1 Add encouraging message about powerful models to schema in case it's not on Opus 4 or above
OPENROUTER_ALLOWED_MODELS environment variable support to further limit the models to allow from within Claude. This will put a limit on top of even the ones listed in custom_models.json
2025-06-14 11:34:17 +04:00
Fahad
23353734cd Support for allowed model restrictions per provider
Tool escalation added to `analyze` to a graceful switch over to codereview is made when absolutely necessary
2025-06-14 10:56:53 +04:00
Lachlan Donald
69ec38d1af Add o3-pro model support and extend test coverage
- Added o3-pro model configuration to custom_models.json with 200K context
- Updated OpenAI provider to support o3-pro with fixed temperature constraint
- Extended simulator tests to include o3-pro validation scenarios

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-14 15:49:19 +10:00
Fahad
2c805d6637 Fixed mock comparison error 2025-06-14 09:34:56 +04:00
Fahad
746380eb7f Renamed setup script to avoid confusion (https://github.com/BeehiveInnovations/zen-mcp-server/issues/35)
Further fixes to tests
Pass O3 simulation test when keys are not set, along with a notice
Updated docs on testing, simulation tests / contributing
Support for OpenAI o4-mini and o4-mini-high
2025-06-14 09:28:20 +04:00
Fahad
c5f682c7b0 Fix tests to work with effective auto mode changes
- Added autouse fixture to mock provider availability in tests
- Updated test expectations to match new auto mode behavior
- Fixed mock provider capabilities to return proper values
- Updated claude continuation tests to set default model
- All 256 tests now passing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-14 02:43:29 +04:00
Fahad
eb388ab2f2 Categorize tools into 'model capabilities categories' to help determine which type of model to pick when in auto mode
Encourage Claude to pick the best model for the job automatically in auto mode
Lots of new tests to ensure automatic model picking works reliably based on user preference or when a matching model is not found or ambiguous
Improved error reporting when bogus model is requested and is not configured or available
2025-06-14 02:17:06 +04:00
Fahad
8ac5bbb5af Fixed workspace path mapping
Refactoring
Improved system prompts, more generalized
Home folder protection and detection
Retry logic for gemini
2025-06-14 00:26:59 +04:00
Fahad
26b22a1d53 Simplified /workspace to map to a project scoped WORKSPACE_ROOT 2025-06-13 20:49:37 +04:00
Fahad
fb69ebebe4 Lint 2025-06-13 16:13:02 +04:00
Fahad
a7b27b285c Cleanup and confirm tests pass 2025-06-13 16:09:40 +04:00
Fahad
048ebf90bf Cleanup 2025-06-13 16:05:21 +04:00
Fahad
f44ca326ef Breaking change: openrouter_models.json -> custom_models.json
* Support for Custom URLs and custom models, including locally hosted models such as ollama
* Support for native + openrouter + local models (i.e. dozens of models) means you can start delegating sub-tasks to particular models or work to local models such as localizations or other boring work etc.
* Several tests added
* precommit to also include untracked (new) files
* Logfile auto rollover
* Improved logging
2025-06-13 15:22:09 +04:00
Fahad
a641159a67 Use consistent terminology
Remove test folder from .gitignore for live simulation test to pass
2025-06-13 09:28:33 +04:00
Fahad
b16f85979b Use consistent terminology 2025-06-13 09:06:12 +04:00
Fahad
0e36fcbc69 Final cleanup 2025-06-13 07:12:29 +04:00
Fahad
2cdb92460b WIP
- OpenRouter model configuration registry
- Model definition file for users to be able to control
- Additional tests
- Update instructions
2025-06-13 06:33:12 +04:00
Fahad
cd1105b741 WIP
- OpenRouter model configuration registry
- Model definition file for users to be able to control
- Update instructions
2025-06-13 05:52:26 +04:00
Fahad
a19055b76a WIP
- OpenRouter model configuration registry
- Model definition file for users to be able to control
- Update instructions
2025-06-13 05:52:16 +04:00
Fahad
52b45f2b03 WIP - OpenRouter support and related refactoring 2025-06-12 22:17:11 +04:00
Fahad
22093bbf18 Fixed tests 2025-06-12 21:00:53 +04:00
Fahad
3aedb16101 Use the new Gemini 2.5 Flash
Updated to support Thinking Tokens as a ratio of the max allowed
Updated tests
Updated README
2025-06-12 20:46:54 +04:00
Fahad
354a0fae0b Fixed tests 2025-06-12 13:51:22 +04:00
Fahad
79af2654b9 Use the new flash model
Updated tests
2025-06-12 13:44:09 +04:00
Fahad
7462599ddb Simplified thread continuations
Fixed and improved tests
2025-06-12 12:47:02 +04:00
Fahad
fb66825bf6 Rebranding, refactoring, renaming, cleanup, updated docs 2025-06-12 10:40:43 +04:00
Fahad
9a55ca8898 WIP lots of new tests and validation scenarios
Simulation tests to confirm threading and history traversal
Chain of communication and branching validation tests from live simulation
Temperature enforcement per model
2025-06-12 09:35:05 +04:00
Fahad
2a067a7f4e WIP major refactor and features 2025-06-12 07:14:59 +04:00
Fahad
22a3fb91ed feat: Add comprehensive dynamic configuration system v3.3.0
## Major Features Added

### 🎯 Dynamic Configuration System
- **Environment-aware model selection**: DEFAULT_MODEL with 'pro'/'flash' shortcuts
- **Configurable thinking modes**: DEFAULT_THINKING_MODE_THINKDEEP for extended reasoning
- **All tool schemas now dynamic**: Show actual current defaults instead of hardcoded values
- **Enhanced setup workflow**: Copy from .env.example with smart customization

### 🔧 Model & Thinking Configuration
- **Smart model resolution**: Support both shortcuts ('pro', 'flash') and full model names
- **Thinking mode optimization**: Only apply thinking budget to models that support it
- **Flash model compatibility**: Works without thinking config, still beneficial via system prompts
- **Dynamic schema descriptions**: Tool parameters show current environment values

### 🚀 Enhanced Developer Experience
- **Fail-fast Docker setup**: GEMINI_API_KEY required upfront in docker-compose
- **Comprehensive startup logging**: Shows current model and thinking mode defaults
- **Enhanced get_version tool**: Reports all dynamic configuration values
- **Better .env documentation**: Clear token consumption details and model options

### 🧪 Comprehensive Testing
- **Live model validation**: New simulator test validates Pro vs Flash thinking behavior
- **Dynamic configuration tests**: Verify environment variable overrides work correctly
- **Complete test coverage**: All 139 unit tests pass, including new model config tests

### 📋 Configuration Files Updated
- **docker-compose.yml**: Fail-fast API key validation, thinking mode support
- **setup-docker.sh**: Copy from .env.example instead of manual creation
- **.env.example**: Detailed documentation with token consumption per thinking mode
- **.gitignore**: Added test-setup/ for cleanup

### 🛠 Technical Improvements
- **Removed setup.py**: Fully Docker-based deployment (no longer needed)
- **REDIS_URL smart defaults**: Auto-configured for Docker, still configurable for dev
- **All tools updated**: Consistent dynamic model parameter descriptions
- **Enhanced error handling**: Better model resolution and validation

## Breaking Changes
- Removed setup.py (Docker-only deployment)
- Model parameter descriptions now show actual defaults (dynamic)

## Migration Guide
- Update .env files using new .env.example format
- Use 'pro'/'flash' shortcuts or full model names
- Set DEFAULT_THINKING_MODE_THINKDEEP for custom thinking depth

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 20:10:25 +04:00
Fahad
b0f17f741f Fixed invalid test assumptions 2025-06-11 18:49:30 +04:00
Fahad
e8df6a7a31 Comments 2025-06-11 17:18:40 +04:00
Fahad
780000f9c9 Lots of tests with live simulation to validate conversation continuation / preservation work across requests 2025-06-11 17:16:05 +04:00
Fahad
c90ac7561e Lots of tests with live simulation to validate conversation continuation / preservation work across requests 2025-06-11 17:03:09 +04:00
Fahad
ac763e0213 More tests 2025-06-11 14:34:51 +04:00
Fahad
98eab46abf WIP - improvements to token usage tracking, simulator added for live testing, improvements to file loading 2025-06-11 13:24:59 +04:00
Fahad
5a94737516 Fix conversation history duplication and optimize file embedding
This major refactoring addresses critical bugs in conversation history management
and significantly improves token efficiency through intelligent file embedding:

**Key Improvements:**
• Fixed conversation history duplication bug by centralizing reconstruction in server.py
• Added intelligent file filtering to prevent re-embedding files already in conversation history
• Centralized file processing logic in BaseTool._prepare_file_content_for_prompt()
• Enhanced log monitoring with better categorization and file embedding visibility
• Updated comprehensive test suite to verify new architecture and edge cases

**Architecture Changes:**
• Removed duplicate conversation history reconstruction from tools/base.py
• Conversation history now handled exclusively by server.py:reconstruct_thread_context
• All tools now use centralized file processing with automatic deduplication
• Improved token efficiency by embedding unique files only once per conversation

**Performance Benefits:**
• Reduced token usage through smart file filtering
• Eliminated redundant file embeddings in continued conversations
• Better observability with detailed debug logging for file operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 11:40:12 +04:00
Fahad
94f542c76a fix: critical conversation history bug and improve Docker integration
This commit addresses several critical issues and improvements:

🔧 Critical Fixes:
- Fixed conversation history not being included when using continuation_id in AI-to-AI conversations
- Fixed test mock targeting issues preventing proper conversation memory validation
- Fixed Docker debug logging functionality with Gemini tools

🐛 Bug Fixes:
- Docker compose configuration for proper container command execution
- Test mock import targeting from utils.conversation_memory.* to tools.base.*
- Version bump to 3.1.0 reflecting significant improvements

🚀 Improvements:
- Enhanced Docker environment configuration with comprehensive logging setup
- Added cross-tool continuation documentation and examples in README
- Improved error handling and validation across all tools
- Better logging configuration with LOG_LEVEL environment variable support
- Enhanced conversation memory system documentation

🧪 Testing:
- Added comprehensive conversation history bug fix tests
- Added cross-tool continuation functionality tests
- All 132 tests now pass with proper conversation history validation
- Improved test coverage for AI-to-AI conversation threading

 Code Quality:
- Applied black, isort, and ruff formatting across entire codebase
- Enhanced inline documentation for conversation memory system
- Cleaned up temporary files and improved repository hygiene
- Better test descriptions and coverage for critical functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 08:53:45 +04:00
Fahad
14ccbede43 Fixes https://github.com/BeehiveInnovations/gemini-mcp-server/issues/6 2025-06-11 07:09:28 +04:00
Fahad
ce23a996c6 Conversation threading test fixes 2025-06-10 20:53:20 +04:00