Breaking change: openrouter_models.json -> custom_models.json

* Support for Custom URLs and custom models, including locally hosted models such as ollama
* Support for native + openrouter + local models (i.e. dozens of models) means you can start delegating sub-tasks to particular models or work to local models such as localizations or other boring work etc.
* Several tests added
* precommit to also include untracked (new) files
* Logfile auto rollover
* Improved logging
This commit is contained in:
Fahad
2025-06-13 15:22:09 +04:00
parent f5fdf7b2ed
commit f44ca326ef
27 changed files with 1692 additions and 351 deletions

View File

@@ -3,7 +3,7 @@
https://github.com/user-attachments/assets/8097e18e-b926-4d8b-ba14-a979e4c58bda
<div align="center">
<b>🤖 Claude + [Gemini / O3 / OpenRouter / Any Model] = Your Ultimate AI Development Team</b>
<b>🤖 Claude + [Gemini / O3 / OpenRouter / Ollama / Any Model] = Your Ultimate AI Development Team</b>
</div>
<br/>
@@ -27,7 +27,7 @@ with context carrying forward seamlessly.
All within a single conversation thread! Gemini Pro in step 6 _knows_ what was recommended by O3 in step 3! Taking that context
and review into consideration to aid with its pre-commit review.
**Think of it as Claude Code _for_ Claude Code.** This MCP isn't magic. It's just **super-glue**.
**Think of it as Claude Code _for_ Claude Code.** This MCP isn't magic. It's just **super-glue**.
## Quick Navigation
@@ -63,12 +63,13 @@ Claude is brilliant, but sometimes you need:
- **Multiple AI perspectives** - Let Claude orchestrate between different models to get the best analysis
- **Automatic model selection** - Claude picks the right model for each task (or you can specify)
- **A senior developer partner** to validate and extend ideas ([`chat`](#1-chat---general-development-chat--collaborative-thinking))
- **A second opinion** on complex architectural decisions - augment Claude's thinking with perspectives from Gemini Pro, O3, or [dozens of other models via OpenRouter](docs/openrouter.md) ([`thinkdeep`](#2-thinkdeep---extended-reasoning-partner))
- **A second opinion** on complex architectural decisions - augment Claude's thinking with perspectives from Gemini Pro, O3, or [dozens of other models via custom endpoints](docs/custom_models.md) ([`thinkdeep`](#2-thinkdeep---extended-reasoning-partner))
- **Professional code reviews** with actionable feedback across entire repositories ([`codereview`](#3-codereview---professional-code-review))
- **Pre-commit validation** with deep analysis using the best model for the job ([`precommit`](#4-precommit---pre-commit-validation))
- **Expert debugging** - O3 for logical issues, Gemini for architectural problems ([`debug`](#5-debug---expert-debugging-assistant))
- **Extended context windows beyond Claude's limits** - Delegate analysis to Gemini (1M tokens) or O3 (200K tokens) for entire codebases, large datasets, or comprehensive documentation
- **Model-specific strengths** - Extended thinking with Gemini Pro, fast iteration with Flash, strong reasoning with O3
- **Model-specific strengths** - Extended thinking with Gemini Pro, fast iteration with Flash, strong reasoning with O3, local privacy with Ollama
- **Local model support** - Run models like Llama 3.2 locally via Ollama, vLLM, or LM Studio for privacy and cost control
- **Dynamic collaboration** - Models can request additional context and follow-up replies from Claude mid-analysis
- **Smart file handling** - Automatically expands directories, manages token limits based on model capacity
- **[Bypass MCP's token limits](#working-with-large-prompts)** - Work around MCP's 25K limit automatically
@@ -100,16 +101,25 @@ The final implementation resulted in a 26% improvement in JSON parsing performan
### 1. Get API Keys (at least one required)
**Option A: OpenRouter (Access multiple models with one API)**
- **OpenRouter**: Visit [OpenRouter](https://openrouter.ai/) for access to multiple models through one API. [Setup Guide](docs/openrouter.md)
- **OpenRouter**: Visit [OpenRouter](https://openrouter.ai/) for access to multiple models through one API. [Setup Guide](docs/custom_models.md)
- Control model access and spending limits directly in your OpenRouter dashboard
- Configure model aliases in `conf/openrouter_models.json`
- Configure model aliases in [`conf/custom_models.json`](conf/custom_models.json)
**Option B: Native APIs**
- **Gemini**: Visit [Google AI Studio](https://makersuite.google.com/app/apikey) and generate an API key. For best results with Gemini 2.5 Pro, use a paid API key as the free tier has limited access to the latest models.
- **OpenAI**: Visit [OpenAI Platform](https://platform.openai.com/api-keys) to get an API key for O3 model access.
> **Note:** Using both OpenRouter and native APIs creates ambiguity about which provider serves each model.
> If both are configured, native APIs will take priority for `gemini` and `o3`.
**Option C: Custom API Endpoints (Local models like Ollama, vLLM)**
[Please see the setup guide](docs/custom_models.md#custom-api-setup-ollama-vllm-etc). With a custom API you can use:
- **Ollama**: Run models like Llama 3.2 locally for free inference
- **vLLM**: Self-hosted inference server for high-throughput inference
- **LM Studio**: Local model hosting with OpenAI-compatible API interface
- **Text Generation WebUI**: Popular local interface for running models
- **Any OpenAI-compatible API**: Custom endpoints for your own infrastructure
> **Note:** Using all three options may create ambiguity about which provider / model to use if there is an overlap.
> If all APIs are configured, native APIs will take priority when there is a clash in model name, such as for `gemini` and `o3`.
> Configure your model aliases and give them unique names in [`conf/custom_models.json`](conf/custom_models.json)
### 2. Clone and Set Up
@@ -138,10 +148,16 @@ nano .env
# The file will contain, at least one should be set:
# GEMINI_API_KEY=your-gemini-api-key-here # For Gemini models
# OPENAI_API_KEY=your-openai-api-key-here # For O3 model
# OPENROUTER_API_KEY=your-openrouter-key # For OpenRouter (see docs/openrouter.md)
# OPENROUTER_API_KEY=your-openrouter-key # For OpenRouter (see docs/custom_models.md)
# For local models (Ollama, vLLM, etc.) - Note: Use host.docker.internal for Docker networking:
# CUSTOM_API_URL=http://host.docker.internal:11434/v1 # Ollama example (NOT localhost!)
# CUSTOM_API_KEY= # Empty for Ollama
# CUSTOM_MODEL_NAME=llama3.2 # Default model
# WORKSPACE_ROOT=/Users/your-username (automatically configured)
# Note: At least one API key is required
# Note: At least one API key OR custom URL is required
```
### 4. Configure Claude
@@ -222,6 +238,8 @@ Just ask Claude naturally:
- "Use flash to suggest how to format this code based on the specs mentioned in policy.md" → Uses Gemini Flash specifically
- "Think deeply about this and get o3 to debug this logic error I found in the checkOrders() function" → Uses O3 specifically
- "Brainstorm scaling strategies with pro. Study the code, pick your preferred strategy and debate with pro to settle on two best approaches" → Uses Gemini Pro specifically
- "Use local-llama to localize and add missing translations to this project" → Uses local Llama 3.2 via custom URL
- "First use local-llama for a quick local analysis, then use opus for a thorough security review" → Uses both providers in sequence
> **Remember:** Claude remains in control — but **you** are the true orchestrator.
> You're the prompter, the guide, the puppeteer.
@@ -245,6 +263,7 @@ Just ask Claude naturally:
- Quick formatting check → Claude picks Flash
- Logical debugging → Claude picks O3
- General explanations → Claude picks Flash for speed
- Local analysis → Claude picks your Ollama model
**Pro Tip:** Thinking modes (for Gemini models) control depth vs token cost. Use "minimal" or "low" for quick tasks, "high" or "max" for complex problems. [Learn more](#thinking-modes---managing-token-costs--quality)
@@ -753,8 +772,12 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini
| **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
| **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
**Mix & Match Providers:** Use multiple providers simultaneously! Set both `OPENROUTER_API_KEY` and `CUSTOM_API_URL` to access
cloud models (expensive/powerful) AND local models (free/private) in the same conversation.
**Manual Model Selection:**
You can specify a default model instead of auto mode: