Support for allowed model restrictions per provider
Tool escalation added to `analyze` to a graceful switch over to codereview is made when absolutely necessary
This commit is contained in:
@@ -5,6 +5,7 @@ This guide covers advanced features, configuration options, and workflows for po
|
||||
## Table of Contents
|
||||
|
||||
- [Model Configuration](#model-configuration)
|
||||
- [Model Usage Restrictions](#model-usage-restrictions)
|
||||
- [Thinking Modes](#thinking-modes)
|
||||
- [Tool Parameters](#tool-parameters)
|
||||
- [Collaborative Workflows](#collaborative-workflows)
|
||||
@@ -39,6 +40,8 @@ OPENAI_API_KEY=your-openai-key # Enables O3, O3-mini
|
||||
| **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
|
||||
| **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
|
||||
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
|
||||
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
|
||||
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
|
||||
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
|
||||
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
|
||||
|
||||
@@ -62,12 +65,52 @@ Regardless of your default setting, you can specify models per request:
|
||||
- "Use **pro** for deep security analysis of auth.py"
|
||||
- "Use **flash** to quickly format this code"
|
||||
- "Use **o3** to debug this logic error"
|
||||
- "Review with **o3-mini** for balanced analysis"
|
||||
- "Review with **o4-mini** for balanced analysis"
|
||||
|
||||
**Model Capabilities:**
|
||||
- **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
|
||||
- **O3 Models**: Excellent reasoning, systematic analysis, 200K context
|
||||
|
||||
## Model Usage Restrictions
|
||||
|
||||
**Limit which models can be used from each provider**
|
||||
|
||||
Set environment variables to control model usage:
|
||||
|
||||
```env
|
||||
# Only allow specific OpenAI models
|
||||
OPENAI_ALLOWED_MODELS=o4-mini,o3-mini
|
||||
|
||||
# Only allow specific Gemini models
|
||||
GOOGLE_ALLOWED_MODELS=flash
|
||||
|
||||
# Use shorthand names or full model names
|
||||
OPENAI_ALLOWED_MODELS=mini,o3-mini # mini = o4-mini
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
- **Not set or empty**: All models allowed (default)
|
||||
- **Comma-separated list**: Only those models allowed
|
||||
- **To disable a provider**: Don't set its API key
|
||||
|
||||
**Examples:**
|
||||
|
||||
```env
|
||||
# Cost control - only cheap models
|
||||
OPENAI_ALLOWED_MODELS=o4-mini
|
||||
GOOGLE_ALLOWED_MODELS=flash
|
||||
|
||||
# Single model per provider
|
||||
OPENAI_ALLOWED_MODELS=o4-mini
|
||||
GOOGLE_ALLOWED_MODELS=pro
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
- Applies to all usage including auto mode
|
||||
- Case-insensitive, whitespace tolerant
|
||||
- Server warns about typos at startup
|
||||
- Only affects native providers (not OpenRouter/Custom)
|
||||
|
||||
## Thinking Modes
|
||||
|
||||
**Claude automatically manages thinking modes based on task complexity**, but you can also manually control Gemini's reasoning depth to balance between response quality and token consumption. Each thinking mode uses a different amount of tokens, directly affecting API costs and response time.
|
||||
@@ -135,7 +178,7 @@ All tools that work with files support **both individual files and entire direct
|
||||
**`analyze`** - Analyze files or directories
|
||||
- `files`: List of file paths or directories (required)
|
||||
- `question`: What to analyze (required)
|
||||
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
|
||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
||||
- `analysis_type`: architecture|performance|security|quality|general
|
||||
- `output_format`: summary|detailed|actionable
|
||||
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
|
||||
@@ -150,7 +193,7 @@ All tools that work with files support **both individual files and entire direct
|
||||
|
||||
**`codereview`** - Review code files or directories
|
||||
- `files`: List of file paths or directories (required)
|
||||
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
|
||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
||||
- `review_type`: full|security|performance|quick
|
||||
- `focus_on`: Specific aspects to focus on
|
||||
- `standards`: Coding standards to enforce
|
||||
@@ -166,7 +209,7 @@ All tools that work with files support **both individual files and entire direct
|
||||
|
||||
**`debug`** - Debug with file context
|
||||
- `error_description`: Description of the issue (required)
|
||||
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
|
||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
||||
- `error_context`: Stack trace or logs
|
||||
- `files`: Files or directories related to the issue
|
||||
- `runtime_info`: Environment details
|
||||
@@ -182,7 +225,7 @@ All tools that work with files support **both individual files and entire direct
|
||||
|
||||
**`thinkdeep`** - Extended analysis with file context
|
||||
- `current_analysis`: Your current thinking (required)
|
||||
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
|
||||
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
|
||||
- `problem_context`: Additional context
|
||||
- `focus_areas`: Specific aspects to focus on
|
||||
- `files`: Files or directories for context
|
||||
|
||||
Reference in New Issue
Block a user