Support for allowed model restrictions per provider

Tool escalation added to `analyze` to a graceful switch over to codereview is made when absolutely necessary
2025-06-14 10:56:53 +04:00
parent ac9c58ce61
commit 23353734cd
14 changed files with 1037 additions and 79 deletions
--- a/docs/advanced-usage.md
+++ b/docs/advanced-usage.md
@@ -5,6 +5,7 @@ This guide covers advanced features, configuration options, and workflows for po
 ## Table of Contents

 - [Model Configuration](#model-configuration)
+- [Model Usage Restrictions](#model-usage-restrictions)
 - [Thinking Modes](#thinking-modes)
 - [Tool Parameters](#tool-parameters)
 - [Collaborative Workflows](#collaborative-workflows)
@@ -39,6 +40,8 @@ OPENAI_API_KEY=your-openai-key    # Enables O3, O3-mini
 | **`flash`** (Gemini 2.0 Flash) | Google | 1M tokens | Ultra-fast responses | Quick checks, formatting, simple analysis |
 | **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
 | **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
+| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
+| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
 | **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
 | **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |

@@ -62,12 +65,52 @@ Regardless of your default setting, you can specify models per request:
 - "Use **pro** for deep security analysis of auth.py"
 - "Use **flash** to quickly format this code"
 - "Use **o3** to debug this logic error"
- "Review with **o3-mini** for balanced analysis"
+- "Review with **o4-mini** for balanced analysis"

 **Model Capabilities:**
 - **Gemini Models**: Support thinking modes (minimal to max), web search, 1M context
 - **O3 Models**: Excellent reasoning, systematic analysis, 200K context

+## Model Usage Restrictions
+
+**Limit which models can be used from each provider**
+
+Set environment variables to control model usage:
+
+```env
+# Only allow specific OpenAI models
+OPENAI_ALLOWED_MODELS=o4-mini,o3-mini
+
+# Only allow specific Gemini models  
+GOOGLE_ALLOWED_MODELS=flash
+
+# Use shorthand names or full model names
+OPENAI_ALLOWED_MODELS=mini,o3-mini  # mini = o4-mini
+```
+
+**How it works:**
+- **Not set or empty**: All models allowed (default)
+- **Comma-separated list**: Only those models allowed
+- **To disable a provider**: Don't set its API key
+
+**Examples:**
+
+```env
+# Cost control - only cheap models
+OPENAI_ALLOWED_MODELS=o4-mini
+GOOGLE_ALLOWED_MODELS=flash
+
+# Single model per provider
+OPENAI_ALLOWED_MODELS=o4-mini
+GOOGLE_ALLOWED_MODELS=pro
+```
+
+**Notes:**
+- Applies to all usage including auto mode
+- Case-insensitive, whitespace tolerant
+- Server warns about typos at startup
+- Only affects native providers (not OpenRouter/Custom)
+
 ## Thinking Modes

 **Claude automatically manages thinking modes based on task complexity**, but you can also manually control Gemini's reasoning depth to balance between response quality and token consumption. Each thinking mode uses a different amount of tokens, directly affecting API costs and response time.
@@ -135,7 +178,7 @@ All tools that work with files support **both individual files and entire direct
 **`analyze`** - Analyze files or directories
 - `files`: List of file paths or directories (required)
 - `question`: What to analyze (required)  
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `analysis_type`: architecture|performance|security|quality|general
 - `output_format`: summary|detailed|actionable
 - `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -150,7 +193,7 @@ All tools that work with files support **both individual files and entire direct

 **`codereview`** - Review code files or directories
 - `files`: List of file paths or directories (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `review_type`: full|security|performance|quick
 - `focus_on`: Specific aspects to focus on
 - `standards`: Coding standards to enforce
@@ -166,7 +209,7 @@ All tools that work with files support **both individual files and entire direct

 **`debug`** - Debug with file context
 - `error_description`: Description of the issue (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `error_context`: Stack trace or logs
 - `files`: Files or directories related to the issue
 - `runtime_info`: Environment details
@@ -182,7 +225,7 @@ All tools that work with files support **both individual files and entire direct

 **`thinkdeep`** - Extended analysis with file context
 - `current_analysis`: Your current thinking (required)
- `model`: auto|pro|flash|o3|o3-mini (default: server default)
+- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
 - `problem_context`: Additional context
 - `focus_areas`: Specific aspects to focus on
 - `files`: Files or directories for context