Support for allowed model restrictions per provider

Tool escalation added to `analyze` to a graceful switch over to codereview is made when absolutely necessary
This commit is contained in:
Fahad
2025-06-14 10:56:53 +04:00
parent ac9c58ce61
commit 23353734cd
14 changed files with 1037 additions and 79 deletions

View File

@@ -5,6 +5,7 @@ This guide covers advanced features, configuration options, and workflows for po
## Table of Contents
- [Model Configuration](#model-configuration)
- [Model Usage Restrictions](#model-usage-restrictions)
- [Thinking Modes](#thinking-modes)
- [Tool Parameters](#tool-parameters)
- [Collaborative Workflows](#collaborative-workflows)
@@ -39,6 +40,8 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini
| **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
| **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
@@ -62,12 +65,52 @@ Regardless of your default setting, you can specify models per request:
- "Use **pro** for deep security analysis of auth.py"
- "Use **flash** to quickly format this code"
- "Use **o3** to debug this logic error"
- "Review with **o3-mini** for balanced analysis"
- "Review with **o4-mini** for balanced analysis"
**Model Capabilities:**
- **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
## Model Usage Restrictions
**Limit which models can be used from each provider**
Set environment variables to control model usage:
```env
# Only allow specific OpenAI models
OPENAI_ALLOWED_MODELS=o4-mini,o3-mini
# Only allow specific Gemini models
GOOGLE_ALLOWED_MODELS=flash
# Use shorthand names or full model names
OPENAI_ALLOWED_MODELS=mini,o3-mini # mini = o4-mini
```
**How it works:**
- **Not set or empty**: All models allowed (default)
- **Comma-separated list**: Only those models allowed
- **To disable a provider**: Don't set its API key
**Examples:**
```env
# Cost control - only cheap models
OPENAI_ALLOWED_MODELS=o4-mini
GOOGLE_ALLOWED_MODELS=flash
# Single model per provider
OPENAI_ALLOWED_MODELS=o4-mini
GOOGLE_ALLOWED_MODELS=pro
```
**Notes:**
- Applies to all usage including auto mode
- Case-insensitive, whitespace tolerant
- Server warns about typos at startup
- Only affects native providers (not OpenRouter/Custom)
## Thinking Modes
**Claude automatically manages thinking modes based on task complexity**, but you can also manually control Gemini's reasoning depth to balance between response quality and token consumption. Each thinking mode uses a different amount of tokens, directly affecting API costs and response time.
@@ -135,7 +178,7 @@ All tools that work with files support **both individual files and entire direct
**`analyze`** - Analyze files or directories
- `files`: List of file paths or directories (required)
- `question`: What to analyze (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `analysis_type`: architecture|performance|security|quality|general
- `output_format`: summary|detailed|actionable
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -150,7 +193,7 @@ All tools that work with files support **both individual files and entire direct
**`codereview`** - Review code files or directories
- `files`: List of file paths or directories (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `review_type`: full|security|performance|quick
- `focus_on`: Specific aspects to focus on
- `standards`: Coding standards to enforce
@@ -166,7 +209,7 @@ All tools that work with files support **both individual files and entire direct
**`debug`** - Debug with file context
- `error_description`: Description of the issue (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `error_context`: Stack trace or logs
- `files`: Files or directories related to the issue
- `runtime_info`: Environment details
@@ -182,7 +225,7 @@ All tools that work with files support **both individual files and entire direct
**`thinkdeep`** - Extended analysis with file context
- `current_analysis`: Your current thinking (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `problem_context`: Additional context
- `focus_areas`: Specific aspects to focus on
- `files`: Files or directories for context