Quick test mode for simulation tests

Fixed o4-mini name, OpenAI removed o4-mini-high
Add max_output_tokens property to ModelCapabilities
This commit is contained in:
Fahad
2025-06-23 18:33:47 +04:00
parent 8c1814d4eb
commit ce6c1fd7ea
35 changed files with 137 additions and 110 deletions

View File

@@ -38,7 +38,6 @@ Regardless of your default configuration, you can specify models per request:
| **`o3`** | OpenAI | 200K tokens | Strong logical reasoning | Debugging logic errors, systematic analysis |
| **`o3-mini`** | OpenAI | 200K tokens | Balanced speed/quality | Moderate complexity tasks |
| **`o4-mini`** | OpenAI | 200K tokens | Latest reasoning model | Optimized for shorter contexts |
| **`o4-mini-high`** | OpenAI | 200K tokens | Enhanced reasoning | Complex tasks requiring deeper analysis |
| **`gpt4.1`** | OpenAI | 1M tokens | Latest GPT-4 with extended context | Large codebase analysis, comprehensive reviews |
| **`llama`** (Llama 3.2) | Custom/Local | 128K tokens | Local inference, privacy | On-device analysis, cost-free processing |
| **Any model** | OpenRouter | Varies | Access to GPT-4, Claude, Llama, etc. | User-specified or based on task requirements |
@@ -69,7 +68,7 @@ OPENAI_ALLOWED_MODELS=o4-mini
# High-performance: Quality over cost
GOOGLE_ALLOWED_MODELS=pro
OPENAI_ALLOWED_MODELS=o3,o4-mini-high
OPENAI_ALLOWED_MODELS=o3,o4-mini
```
**Important Notes:**
@@ -144,7 +143,7 @@ All tools that work with files support **both individual files and entire direct
**`analyze`** - Analyze files or directories
- `files`: List of file paths or directories (required)
- `question`: What to analyze (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `analysis_type`: architecture|performance|security|quality|general
- `output_format`: summary|detailed|actionable
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -159,7 +158,7 @@ All tools that work with files support **both individual files and entire direct
**`codereview`** - Review code files or directories
- `files`: List of file paths or directories (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `review_type`: full|security|performance|quick
- `focus_on`: Specific aspects to focus on
- `standards`: Coding standards to enforce
@@ -175,7 +174,7 @@ All tools that work with files support **both individual files and entire direct
**`debug`** - Debug with file context
- `error_description`: Description of the issue (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `error_context`: Stack trace or logs
- `files`: Files or directories related to the issue
- `runtime_info`: Environment details
@@ -191,7 +190,7 @@ All tools that work with files support **both individual files and entire direct
**`thinkdeep`** - Extended analysis with file context
- `current_analysis`: Your current thinking (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `problem_context`: Additional context
- `focus_areas`: Specific aspects to focus on
- `files`: Files or directories for context
@@ -207,7 +206,7 @@ All tools that work with files support **both individual files and entire direct
**`testgen`** - Comprehensive test generation with edge case coverage
- `files`: Code files or directories to generate tests for (required)
- `prompt`: Description of what to test, testing objectives, and scope (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `test_examples`: Optional existing test files as style/pattern reference
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
@@ -222,7 +221,7 @@ All tools that work with files support **both individual files and entire direct
- `files`: Code files or directories to analyze for refactoring opportunities (required)
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
- `refactor_type`: codesmells|decompose|modernize|organization (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
- `style_guide_examples`: Optional existing code files to use as style/pattern reference
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)

View File

@@ -63,7 +63,7 @@ CUSTOM_MODEL_NAME=llama3.2 # Default model
**Default Model Selection:**
```env
# Options: 'auto', 'pro', 'flash', 'o3', 'o3-mini', 'o4-mini', 'o4-mini-high', etc.
# Options: 'auto', 'pro', 'flash', 'o3', 'o3-mini', 'o4-mini', etc.
DEFAULT_MODEL=auto # Claude picks best model for each task (recommended)
```
@@ -74,7 +74,6 @@ DEFAULT_MODEL=auto # Claude picks best model for each task (recommended)
- **`o3`**: Strong logical reasoning (200K context)
- **`o3-mini`**: Balanced speed/quality (200K context)
- **`o4-mini`**: Latest reasoning model, optimized for shorter contexts
- **`o4-mini-high`**: Enhanced O4 with higher reasoning effort
- **`grok`**: GROK-3 advanced reasoning (131K context)
- **Custom models**: via OpenRouter or local APIs
@@ -120,7 +119,6 @@ OPENROUTER_ALLOWED_MODELS=opus,sonnet,mistral
- `o3` (200K context, high reasoning)
- `o3-mini` (200K context, balanced)
- `o4-mini` (200K context, latest balanced)
- `o4-mini-high` (200K context, enhanced reasoning)
- `mini` (shorthand for o4-mini)
**Gemini Models:**

View File

@@ -65,7 +65,7 @@ This workflow ensures methodical analysis before expert insights, resulting in d
**Initial Configuration (used in step 1):**
- `prompt`: What to analyze or look for (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `analysis_type`: architecture|performance|security|quality|general (default: general)
- `output_format`: summary|detailed|actionable (default: detailed)
- `temperature`: Temperature for analysis (0-1, default 0.2)

View File

@@ -33,7 +33,7 @@ and then debate with the other models to give me a final verdict
## Tool Parameters
- `prompt`: Your question or discussion topic (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `files`: Optional files for context (absolute paths)
- `images`: Optional images for visual context (absolute paths)
- `temperature`: Response creativity (0-1, default 0.5)

View File

@@ -80,7 +80,7 @@ The above prompt will simultaneously run two separate `codereview` tools with tw
**Initial Review Configuration (used in step 1):**
- `prompt`: User's summary of what the code does, expected behavior, constraints, and review objectives (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `review_type`: full|security|performance|quick (default: full)
- `focus_on`: Specific aspects to focus on (e.g., "security vulnerabilities", "performance bottlenecks")
- `standards`: Coding standards to enforce (e.g., "PEP8", "ESLint", "Google Style Guide")

View File

@@ -73,7 +73,7 @@ This structured approach ensures Claude performs methodical groundwork before ex
- `images`: Visual debugging materials (error screenshots, logs, etc.)
**Model Selection:**
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini (default: server default)
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
- `use_websearch`: Enable web search for documentation and solutions (default: true)
- `use_assistant_model`: Whether to use expert analysis phase (default: true, set to false to use Claude only)

View File

@@ -135,7 +135,7 @@ Use zen and perform a thorough precommit ensuring there aren't any new regressio
**Initial Configuration (used in step 1):**
- `path`: Starting directory to search for repos (default: current directory, absolute path required)
- `prompt`: The original user request description for the changes (required for context)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `compare_to`: Compare against a branch/tag instead of local changes (optional)
- `severity_filter`: critical|high|medium|low|all (default: all)
- `include_staged`: Include staged changes in the review (default: true)

View File

@@ -103,7 +103,7 @@ This results in Claude first performing its own expert analysis, encouraging it
**Initial Configuration (used in step 1):**
- `prompt`: Description of refactoring goals, context, and specific areas of focus (required)
- `refactor_type`: codesmells|decompose|modernize|organization (default: codesmells)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `focus_areas`: Specific areas to focus on (e.g., 'performance', 'readability', 'maintainability', 'security')
- `style_guide_examples`: Optional existing code files to use as style/pattern reference (absolute paths)
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)

View File

@@ -86,7 +86,7 @@ security remediation plan using planner
- `images`: Architecture diagrams, security documentation, or visual references
**Initial Security Configuration (used in step 1):**
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `security_scope`: Application context, technology stack, and security boundary definition (required)
- `threat_level`: low|medium|high|critical (default: medium) - determines assessment depth and urgency
- `compliance_requirements`: List of compliance frameworks to assess against (e.g., ["PCI DSS", "SOC2"])

View File

@@ -70,7 +70,7 @@ Test generation excels with extended reasoning models like Gemini Pro or O3, whi
**Initial Configuration (used in step 1):**
- `prompt`: Description of what to test, testing objectives, and specific scope/focus areas (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `test_examples`: Optional existing test files or directories to use as style/pattern reference (absolute paths)
- `thinking_mode`: minimal|low|medium|high|max (default: medium, Gemini only)
- `use_assistant_model`: Whether to use expert test generation phase (default: true, set to false to use Claude only)

View File

@@ -30,7 +30,7 @@ with the best architecture for my project
## Tool Parameters
- `prompt`: Your current thinking/analysis to extend and validate (required)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high|gpt4.1 (default: server default)
- `model`: auto|pro|flash|o3|o3-mini|o4-mini|gpt4.1 (default: server default)
- `problem_context`: Additional context about the problem or goal
- `focus_areas`: Specific aspects to focus on (architecture, performance, security, etc.)
- `files`: Optional file paths or directories for additional context (absolute paths)