Commit Graph

30 Commits

Author SHA1 Message Date
Fahad
2c534ac06e feat: centralized environment handling, ensures ZEN_MCP_FORCE_ENV_OVERRIDE is honored correctly
fix: updated tests to override env variables they need instead of relying on the current values from .env
2025-10-04 14:28:56 +04:00
Fahad
d55130a430 fix: external model name now recorded properly in responses
test: http cassettes added for improved integration tests
refactor: generic name for the CLI agent
2025-10-03 11:29:06 +04:00
Beehive Innovations
69a3121452 🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95)
* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 00:08:11 +04:00
Fahad
d0da6ce9e4 Gemini model rename 2025-06-19 05:37:40 +04:00
Beehive Innovations
4151c3c3a5 Migration from Docker to Standalone Python Server (#73)
* Migration from docker to standalone server
Migration handling
Fixed tests
Use simpler in-memory storage
Support for concurrent logging to disk
Simplified direct connections to localhost

* Migration from docker / redis to standalone script
Updated tests
Updated run script
Fixed requirements
Use dotenv
Ask if user would like to install MCP in Claude Desktop once
Updated docs

* More cleanup and references to docker removed

* Cleanup

* Comments

* Fixed tests

* Fix GitHub Actions workflow for standalone Python architecture

- Install requirements-dev.txt for pytest and testing dependencies
- Remove Docker setup from simulation tests (now standalone)
- Simplify linting job to use requirements-dev.txt
- Update simulation tests to run directly without Docker

Fixes unit test failures in CI due to missing pytest dependency.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove simulation tests from GitHub Actions

- Removed simulation-tests job that makes real API calls
- Keep only unit tests (mocked, no API costs) and linting
- Simulation tests should be run manually with real API keys
- Reduces CI costs and complexity

GitHub Actions now only runs:
- Unit tests (569 tests, all mocked)
- Code quality checks (ruff, black)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed tests

* Fixed tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-18 23:41:22 +04:00
Beehive Innovations
95556ba9ea Add Consensus Tool for Multi-Model Perspective Gathering (#67)
* WIP
Refactor resolving mode_names, should be done once at MCP call boundary
Pass around model context instead
Consensus tool allows one to get a consensus from multiple models, optionally assigning one a 'for' or 'against' stance to find nuanced responses.

* Deduplication of model resolution, model_context should be available before reaching deeper parts of the code
Improved abstraction when building conversations
Throw programmer errors early

* Guardrails
Support for `model:option` format at MCP boundary so future tools can use additional options if needed instead of handling this only for consensus
Model name now supports an optional ":option" for future use

* Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Improved model for request to support natural language
Simplified async flow

* Fix consensus tool async/sync patterns to match codebase standards

CRITICAL FIXES:
- Converted _get_consensus_responses from async to sync (matches other tools)
- Converted store_conversation_turn from async to sync (add_turn is synchronous)
- Removed unnecessary asyncio imports and sleep calls
- Fixed ClosedResourceError in MCP protocol during long consensus operations

PATTERN ALIGNMENT:
- Consensus tool now follows same sync patterns as all other tools
- Only execute() and prepare_prompt() are async (base class requirement)
- All internal operations are synchronous like analyze, chat, debug, etc.

TESTING:
- MCP simulation test now passes: consensus_stance 
- Two-model consensus works correctly in ~35 seconds
- Unknown stance handling defaults to neutral with warnings
- All 9 unit tests pass (100% success rate)

The consensus tool async patterns were anomalous in the codebase.
This fix aligns it with the established synchronous patterns used
by all other tools while maintaining full functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Fixed call order and added new test

* Cleanup dead comments
Docs for the new tool
Improved tests

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-06-17 10:53:17 +04:00
Fahad
061fb8691d Merge main into feature/images - resolve conflicts favoring our approach
- Kept version 4.8.0 for new features
- Preserved our _is_builtin_custom_models_config approach over main's ALLOWED_INTERNAL_PATHS
- Our targeted solution is cleaner than the general whitelist approach
2025-06-16 13:19:08 +04:00
Fahad
97fa6781cf Vision support via images / pdfs etc that can be passed on to other models as part of analysis, additional context etc.
Image processing pipeline added
OpenAI GPT-4.1 support
Chat tool prompt enhancement
Lint and code quality improvements
2025-06-16 13:14:53 +04:00
Fahad
d6d7bf8cac Fixed internal file path translation into docker 2025-06-16 11:30:02 +04:00
Fahad
0b94dd8cdd Lint 2025-06-16 07:18:45 +04:00
Fahad
4c0bd3b86d Improved documentation for conversation / file collection strategy, context budget allocation etc 2025-06-16 07:17:35 +04:00
Fahad
91077e3810 Performance improvements when embedding files:
- Exit early at MCP boundary if files won't fit within given context of chosen model
- Encourage claude to re-run with better context
- Check file sizes before embedding
- Drop files from older conversations when building continuations and give priority to newer files
- List and mention excluded files to Claude on return
- Improved tests
- Improved precommit prompt
- Added a new Low severity to precommit
- Improved documentation of file embedding strategy
- Refactor
2025-06-16 05:51:52 +04:00
Fahad
7d33aafcab Configurable conversation limit now set to 10 exchanges. This helps when you want to manually continue a thread of thought across different models manually. 2025-06-14 14:00:13 +04:00
Fahad
bc3f98a291 Make conversation timeout configuration (so that you're able to resume a discussion manually with another model with a gap of several hours in case you stepped away) 2025-06-14 13:27:19 +04:00
Fahad
048ebf90bf Cleanup 2025-06-13 16:05:21 +04:00
Fahad
bed069826d Improvements to continuation prompts to avoid repetition and save on tokens 2025-06-13 09:44:04 +04:00
Fahad
3aedb16101 Use the new Gemini 2.5 Flash
Updated to support Thinking Tokens as a ratio of the max allowed
Updated tests
Updated README
2025-06-12 20:46:54 +04:00
Fahad
79af2654b9 Use the new flash model
Updated tests
2025-06-12 13:44:09 +04:00
Fahad
7462599ddb Simplified thread continuations
Fixed and improved tests
2025-06-12 12:47:02 +04:00
Fahad
fb66825bf6 Rebranding, refactoring, renaming, cleanup, updated docs 2025-06-12 10:40:43 +04:00
Fahad
9a55ca8898 WIP lots of new tests and validation scenarios
Simulation tests to confirm threading and history traversal
Chain of communication and branching validation tests from live simulation
Temperature enforcement per model
2025-06-12 09:35:05 +04:00
Fahad
2a067a7f4e WIP major refactor and features 2025-06-12 07:14:59 +04:00
Fahad
898373bc22 More tests 2025-06-11 18:44:34 +04:00
Fahad
4974fbc725 Extra logging and more tests 2025-06-11 18:26:13 +04:00
Fahad
ac763e0213 More tests 2025-06-11 14:34:51 +04:00
Fahad
98eab46abf WIP - improvements to token usage tracking, simulator added for live testing, improvements to file loading 2025-06-11 13:24:59 +04:00
Fahad
5a94737516 Fix conversation history duplication and optimize file embedding
This major refactoring addresses critical bugs in conversation history management
and significantly improves token efficiency through intelligent file embedding:

**Key Improvements:**
• Fixed conversation history duplication bug by centralizing reconstruction in server.py
• Added intelligent file filtering to prevent re-embedding files already in conversation history
• Centralized file processing logic in BaseTool._prepare_file_content_for_prompt()
• Enhanced log monitoring with better categorization and file embedding visibility
• Updated comprehensive test suite to verify new architecture and edge cases

**Architecture Changes:**
• Removed duplicate conversation history reconstruction from tools/base.py
• Conversation history now handled exclusively by server.py:reconstruct_thread_context
• All tools now use centralized file processing with automatic deduplication
• Improved token efficiency by embedding unique files only once per conversation

**Performance Benefits:**
• Reduced token usage through smart file filtering
• Eliminated redundant file embeddings in continued conversations
• Better observability with detailed debug logging for file operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 11:40:12 +04:00
Fahad
19a5c3e8ee Bumped max turns 2025-06-11 09:03:50 +04:00
Fahad
94f542c76a fix: critical conversation history bug and improve Docker integration
This commit addresses several critical issues and improvements:

🔧 Critical Fixes:
- Fixed conversation history not being included when using continuation_id in AI-to-AI conversations
- Fixed test mock targeting issues preventing proper conversation memory validation
- Fixed Docker debug logging functionality with Gemini tools

🐛 Bug Fixes:
- Docker compose configuration for proper container command execution
- Test mock import targeting from utils.conversation_memory.* to tools.base.*
- Version bump to 3.1.0 reflecting significant improvements

🚀 Improvements:
- Enhanced Docker environment configuration with comprehensive logging setup
- Added cross-tool continuation documentation and examples in README
- Improved error handling and validation across all tools
- Better logging configuration with LOG_LEVEL environment variable support
- Enhanced conversation memory system documentation

🧪 Testing:
- Added comprehensive conversation history bug fix tests
- Added cross-tool continuation functionality tests
- All 132 tests now pass with proper conversation history validation
- Improved test coverage for AI-to-AI conversation threading

 Code Quality:
- Applied black, isort, and ruff formatting across entire codebase
- Enhanced inline documentation for conversation memory system
- Cleaned up temporary files and improved repository hygiene
- Better test descriptions and coverage for critical functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 08:53:45 +04:00
Fahad
f5060367a0 WIP - communication memory 2025-06-10 19:16:51 +04:00