Commit Graph

99 Commits

Author SHA1 Message Date
Fahad
70b64adff3 Schema now lists all models including locally available models
New tool to list all models `listmodels`
Integration test to for all the different combinations of API keys
Tweaks to codereview prompt for a better quality input from Claude
Fixed missing 'low' severity in codereview
2025-06-16 19:07:35 +04:00
Fahad
cb17582d8f Optimize OpenRouter registry loading with class-level caching
Instead of creating new OpenRouterModelRegistry instances multiple times
per tool (4x per tool during schema generation), we now use a shared
class-level cache in BaseTool. This reduces registry loading from 40+ times
to just once during MCP server initialization.

The optimization:
- Adds _openrouter_registry_cache as a class variable in BaseTool
- Implements _get_openrouter_registry() classmethod for lazy loading
- Ensures cache is shared across all tool subclasses
- Maintains identical functionality with improved performance

This significantly reduces startup time and resource usage when OpenRouter
is configured, especially noticeable with many custom models.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 18:54:15 +04:00
Fahad
ed386375be Complete Redis mocking fixes for image support integration tests
- Properly mock Redis client operations to support add_turn functionality
- Set up initial thread contexts so add_turn can find existing threads
- Mock Redis set operations to return success
- Ensure all Redis-dependent tests use proper mock patterns
- All 473 unit tests now pass 100% with proper isolation

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 16:26:23 +04:00
Beehive Innovations
3049c85e3c Update base.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-16 14:51:34 +04:00
Fahad
97fa6781cf Vision support via images / pdfs etc that can be passed on to other models as part of analysis, additional context etc.
Image processing pipeline added
OpenAI GPT-4.1 support
Chat tool prompt enhancement
Lint and code quality improvements
2025-06-16 13:14:53 +04:00
Fahad
4c0bd3b86d Improved documentation for conversation / file collection strategy, context budget allocation etc 2025-06-16 07:17:35 +04:00
Fahad
805e8d6d01 Fix remaining TestGenRequest reference in format_response method
Fixed NameError that was causing Docker container crashes:
- Updated type annotation in format_response method from TestGenRequest to TestGenerationRequest
- This was the last missing reference from the class rename

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 06:08:58 +04:00
Fahad
2cfe0b163a Fix all failing tests and pytest collection warnings
Fixed MagicMock comparison errors across multiple test suites by:
- Adding proper ModelCapabilities mocks with real values instead of MagicMock objects
- Updating test_auto_mode.py with correct provider mocking for model availability tests
- Updating test_thinking_modes.py with proper capabilities mocking in all thinking mode tests
- Updating test_tools.py with proper capabilities mocking for CodeReview and Analyze tools
- Fixing test_large_prompt_handling.py by adding proper provider mocking to prevent errors before large prompt detection

Fixed pytest collection warnings by:
- Renaming TestGenRequest to TestGenerationRequest to avoid pytest collecting it as a test class
- Renaming TestGenTool to TestGenerationTool to avoid pytest collecting it as a test class
- Updated all imports and references across server.py, tools/__init__.py, and test files

All 459 tests now pass without warnings or MagicMock comparison errors.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-16 06:02:12 +04:00
Fahad
91077e3810 Performance improvements when embedding files:
- Exit early at MCP boundary if files won't fit within given context of chosen model
- Encourage claude to re-run with better context
- Check file sizes before embedding
- Drop files from older conversations when building continuations and give priority to newer files
- List and mention excluded files to Claude on return
- Improved tests
- Improved precommit prompt
- Added a new Low severity to precommit
- Improved documentation of file embedding strategy
- Refactor
2025-06-16 05:51:52 +04:00
Fahad
7ef99fb9ce Improved schema description for precommit 2025-06-15 19:11:37 +04:00
Fahad
6dcf095c3d Improved schema description to allow Claude to pre-think harder before invoking thinkdeep 2025-06-15 19:05:23 +04:00
Fahad
ad6cff4498 Lint, bump 2025-06-15 18:43:37 +04:00
Fahad
dfed6f0cbd New tool: "tracer" helps with static analysis / call-flow generation. Does NOT use external models. Used as a quick prompt generator to aid in call-flow / dependency-chart generation. Can be used as an input into another tool / model for extended analysis and deeper thought.
Faster docker restarts
2025-06-15 18:42:10 +04:00
Fahad
6f8d3059a1 Merge branch 'main' into feature/tracer
# Conflicts:
#	tools/base.py
#	tools/debug.py
#	tools/thinkdeep.py
2025-06-15 16:09:54 +04:00
Fahad
07a078b4f2 Updated tests and additional tests for folder expansion during conversation tracking 2025-06-15 16:03:43 +04:00
Fahad
d36a85a3f3 Fix directory expansion tracking in conversation memory
When directories were provided to tools, only the directory path was stored
in conversation history instead of the individual expanded files. This caused
file filtering to incorrectly skip files in continued conversations.

Changes:
- Modified _prepare_file_content_for_prompt to return (content, processed_files)
- Updated all tools to track actually processed files for conversation memory
- Ensures directories are tracked as their expanded individual files

Fixes issue where Swift directory with 46 files was not properly embedded
in conversation continuations.
2025-06-15 15:36:45 +04:00
Fahad
99ed09be8d Update tracer tool for new file tracking system 2025-06-15 15:36:33 +04:00
Fahad
64a1d8664e Fix directory expansion tracking in conversation memory
When directories were provided to tools, only the directory path was stored
in conversation history instead of the individual expanded files. This caused
file filtering to incorrectly skip files in continued conversations.

Changes:
- Modified _prepare_file_content_for_prompt to return (content, processed_files)
- Updated all tools to track actually processed files for conversation memory
- Ensures directories are tracked as their expanded individual files

Fixes issue where Swift directory with 46 files was not properly embedded
in conversation continuations.
2025-06-15 15:36:12 +04:00
Fahad
86728a1442 WIP 2025-06-15 15:32:41 +04:00
Fahad
3bc7956239 Implement TracePath tool for static call path analysis
Add comprehensive TracePath tool that predicts and explains full call paths
and control flow without executing code. Features include:

**Core Functionality:**
- Static call path prediction with confidence levels (🟢🟡🔴)
- Multi-language support (Python, JavaScript, TypeScript, C#, Java)
- Value-driven flow analysis based on parameter combinations
- Side effects identification (database, network, filesystem)
- Polymorphism and dynamic dispatch analysis
- Entry point parsing for multiple syntax patterns

**Technical Implementation:**
- Hybrid AI-first architecture (Phase 1: pure AI, Phase 2: AST enhancement)
- Export formats: Markdown, JSON, PlantUML
- Confidence threshold filtering for speculative branches
- Integration with existing tool ecosystem and conversation threading
- Comprehensive error handling and token management

**Files Added:**
- tools/tracepath.py - Main tool implementation
- systemprompts/tracepath_prompt.py - System prompt for analysis
- tests/test_tracepath.py - Comprehensive unit tests (32 tests)

**Files Modified:**
- server.py - Tool registration
- tools/__init__.py - Tool exports
- systemprompts/__init__.py - Prompt exports

**Quality Assurance:**
- All 449 unit tests pass including 32 new TracePath tests
- Full linting and formatting compliance
- Follows established project patterns and conventions
- Multi-model validation with O3 and Gemini Pro insights

**Usage Examples:**
- "Use zen tracepath to analyze BookingManager::finalizeInvoice(invoiceId: 123)"
- "Trace payment.process_payment() with confidence levels and side effects"

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-15 14:07:23 +04:00
Fahad
4becd70a82 Perform prompt size checks only at the MCP boundary
New test to confirm history build-up and system prompt does not affect prompt size checks
Also check for large prompts in focus_on
Fixed .env.example incorrectly did not comment out CUSTOM_API causing the run-server script to think at least one key exists
2025-06-15 10:37:08 +04:00
Fahad
c7835e7eef Easier access to logs at startup with -f on the run script
Improved prompt for immediate action
Additional logging of tool names
Updated documentation
Context aware decomposition system prompt
New script to run code quality checks
2025-06-15 09:25:52 +04:00
Fahad
99fab3e83d Docs added to show how a new provider is added
Docs added to show how a new tool is created
All tools should add numbers to code for models to be able to reference if needed
Enabled line numbering for code for all tools to use
Additional tests to validate line numbering is not added to git diffs
2025-06-15 07:02:27 +04:00
Fahad
b5004b91fc Major new addition: refactor tool
Supports decomposing large components and files, finding codesmells, finding modernizing opportunities as well as code organization opportunities. Fix this mega-classes today!
Line numbers added to embedded code for better references from model -> claude
2025-06-15 06:00:01 +04:00
Fahad
70f1356e3e Improved trigger words to enforce large prompts are passed in as a file reference 2025-06-14 21:03:17 +04:00
Fahad
4cacd2dad9 Fixed trigger word 2025-06-14 20:14:40 +04:00
Fahad
442decba70 Improved model response handling to handle additional response statuses in future
Improved testgen; encourages follow-ups with less work in between and less token generation to avoid surpassing the 25K barrier
Improved coderevew tool to request a focused code review instead where a single-pass code review is too large or complex
2025-06-14 18:43:56 +04:00
Fahad
698c1611d2 Lint fix 2025-06-14 16:47:59 +04:00
Fahad
d0d0a171dc Ensure duplicate file references are gracefully handled
Improved prompt to encourage immediate action
2025-06-14 16:37:02 +04:00
Fahad
ec707e021a Fix for path translation within docker 2025-06-14 16:00:54 +04:00
Fahad
acbfa1c94e Improved prompt for next steps 2025-06-14 15:51:04 +04:00
Fahad
4086306c58 New tool: testgen
Generates unit tests and encourages model to auto-detect framework and testing style from existing sample (if available)
2025-06-14 15:41:47 +04:00
Fahad
e0a05b86f1 Add encouraging message about powerful models to schema in case it's not on Opus 4 or above
OPENROUTER_ALLOWED_MODELS environment variable support to further limit the models to allow from within Claude. This will put a limit on top of even the ones listed in custom_models.json
2025-06-14 11:34:17 +04:00
Fahad
21037c2d81 Refactored prompts for better maintainability 2025-06-14 11:09:13 +04:00
Fahad
23353734cd Support for allowed model restrictions per provider
Tool escalation added to `analyze` to a graceful switch over to codereview is made when absolutely necessary
2025-06-14 10:56:53 +04:00
Fahad
eb388ab2f2 Categorize tools into 'model capabilities categories' to help determine which type of model to pick when in auto mode
Encourage Claude to pick the best model for the job automatically in auto mode
Lots of new tests to ensure automatic model picking works reliably based on user preference or when a matching model is not found or ambiguous
Improved error reporting when bogus model is requested and is not configured or available
2025-06-14 02:17:06 +04:00
Fahad
7fc1186a7c Fixed for auto mode 2025-06-14 01:16:15 +04:00
Fahad
8ac5bbb5af Fixed workspace path mapping
Refactoring
Improved system prompts, more generalized
Home folder protection and detection
Retry logic for gemini
2025-06-14 00:26:59 +04:00
Fahad
048ebf90bf Cleanup 2025-06-13 16:05:21 +04:00
Fahad
f44ca326ef Breaking change: openrouter_models.json -> custom_models.json
* Support for Custom URLs and custom models, including locally hosted models such as ollama
* Support for native + openrouter + local models (i.e. dozens of models) means you can start delegating sub-tasks to particular models or work to local models such as localizations or other boring work etc.
* Several tests added
* precommit to also include untracked (new) files
* Logfile auto rollover
* Improved logging
2025-06-13 15:22:09 +04:00
Fahad
bed069826d Improvements to continuation prompts to avoid repetition and save on tokens 2025-06-13 09:44:04 +04:00
Fahad
a641159a67 Use consistent terminology
Remove test folder from .gitignore for live simulation test to pass
2025-06-13 09:28:33 +04:00
Fahad
b16f85979b Use consistent terminology 2025-06-13 09:06:12 +04:00
Fahad
c692511e69 Cleanup 2025-06-13 08:37:14 +04:00
Fahad
8abbba2d92 Improved auto mode model discovery when using OpenRouter 2025-06-13 08:35:09 +04:00
Fahad
0e36fcbc69 Final cleanup 2025-06-13 07:12:29 +04:00
Fahad
2cdb92460b WIP
- OpenRouter model configuration registry
- Model definition file for users to be able to control
- Additional tests
- Update instructions
2025-06-13 06:33:12 +04:00
Fahad
93daa2942a WIP - OpenRouter support 2025-06-12 22:45:16 +04:00
Fahad
3aedb16101 Use the new Gemini 2.5 Flash
Updated to support Thinking Tokens as a ratio of the max allowed
Updated tests
Updated README
2025-06-12 20:46:54 +04:00
Fahad
b34c63d710 Fixed web-search prompt, models can prompt claude to perform web searches on their behalf if and as needed 2025-06-12 20:05:55 +04:00