feat: added intelligence_score to the model capabilities schema; a 1-20 number that can be specified to influence the sort order of models presented to the CLI in auto selection mode

fix: model definition re-introduced into the schema but intelligently and only a summary is generated per tool. Required to ensure CLI calls and uses the correct model fix: removed `model` param from some tools where this wasn't needed fix: fixed adherence to `*_ALLOWED_MODELS` by advertising only the allowed models to the CLI fix: removed duplicates across providers when passing canonical names back to the CLI; the first enabled provider wins
2025-10-02 21:43:44 +04:00
parent e78fe35a1b
commit 6cab9e56fc
22 changed files with 525 additions and 110 deletions
--- a/docs/adding_providers.md
+++ b/docs/adding_providers.md
@@ -10,6 +10,13 @@ Each provider:
 - Implements the minimal abstract hooks (`get_provider_type()` and `generate_content()`)
 - Gets registered automatically via environment variables

+### Intelligence score cheatsheet
+
+Set `intelligence_score` (1–20) when you want deterministic ordering in auto
+mode or the `listmodels` output. The runtime rank starts from this human score
+and adds smaller bonuses for context window, extended thinking, and other
+features ([details here](model_ranking.md)).
+
 ## Choose Your Implementation Path

 **Option A: Full Provider (`ModelProvider`)**
@@ -68,6 +75,7 @@ class ExampleModelProvider(ModelProvider):
            provider=ProviderType.EXAMPLE,
            model_name="example-large",
            friendly_name="Example Large",
+            intelligence_score=18,
            context_window=100_000,
            max_output_tokens=50_000,
            supports_extended_thinking=False,
@@ -79,6 +87,7 @@ class ExampleModelProvider(ModelProvider):
            provider=ProviderType.EXAMPLE,
            model_name="example-small",
            friendly_name="Example Small",
+            intelligence_score=14,
            context_window=32_000,
            max_output_tokens=16_000,
            temperature_constraint=RangeTemperatureConstraint(0.0, 2.0, 0.7),
--- a/docs/custom_models.md
+++ b/docs/custom_models.md
@@ -60,6 +60,10 @@ The server uses `conf/custom_models.json` to map convenient aliases to both Open

 View the full list in [`conf/custom_models.json`](conf/custom_models.json). 

+To control ordering in auto mode or the `listmodels` summary, adjust the
+[`intelligence_score`](model_ranking.md) for each entry (or rely on the automatic
+heuristic described there).
+
 **Note:** While you can use any OpenRouter model by its full name, models not in the config file will use generic capabilities (32K context window, no extended thinking, etc.) which may not match the model's actual capabilities. For best results, add new models to the config file with their proper specifications.

 ## Quick Start
--- a/docs/index.md
+++ b/docs/index.md
@@ -0,0 +1,15 @@
+# Zen MCP Server Documentation
+
+| Document | Description |
+|----------|-------------|
+| [Getting Started](getting-started.md) | Installation paths, prerequisite setup, and first-run guidance. |
+| [Adding Providers](adding_providers.md) | How to register new AI providers and advertise capabilities. |
+| [Model Ranking](model_ranking.md) | How intelligence scores translate into auto-mode ordering. |
+| [Custom Models](custom_models.md) | Configure OpenRouter/custom models and aliases. |
+| [Adding Tools](adding_tools.md) | Create new tools using the shared base classes. |
+| [Advanced Usage](advanced-usage.md) | Auto-mode tricks, workflow tools, and collaboration tips. |
+| [Configuration](configuration.md) | .env options, restriction policies, logging levels. |
+| [Testing](testing.md) | Test strategy, command cheats, and coverage notes. |
+| [Troubleshooting](troubleshooting.md) | Common issues and resolutions. |
+
+Additional docs live in this directory; start with the table above to orient yourself.
--- a/docs/model_ranking.md
+++ b/docs/model_ranking.md
@@ -0,0 +1,69 @@
+# Model Capability Ranking
+
+Auto mode needs a short, trustworthy list of models to suggest. The server
+computes a capability rank for every model at runtime using a simple recipe:
+
+1. Start with the human-supplied `intelligence_score` (1–20). This is the
+   anchor—multiply it by five to map onto the 0–100 scale the server uses.
+2. Add a few light bonuses for hard capabilities:
+   - **Context window:** up to +5 (log-scale bonus when the model exceeds ~1K tokens).
+   - **Output budget:** +2 for ≥65K tokens, +1 for ≥32K.
+   - **Extended thinking:** +3 when the provider supports it.
+   - **Function calling / JSON / images:** +1 each when available.
+   - **Custom endpoints:** −1 to nudge cloud-hosted defaults ahead unless tuned.
+3. Clamp the final score to 0–100 so downstream callers can rely on the range.
+
+In code this looks like:
+
+```python
+base = clamp(intelligence_score, 1, 20) * 5
+ctx_bonus = min(5, max(0, log10(context_window) - 3))
+output_bonus = 2 if max_output_tokens >= 65_000 else 1 if >= 32_000 else 0
+feature_bonus = (
+    (3 if supports_extended_thinking else 0)
+    + (1 if supports_function_calling else 0)
+    + (1 if supports_json_mode else 0)
+    + (1 if supports_images else 0)
+)
+penalty = 1 if is_custom else 0
+
+effective_rank = clamp(base + ctx_bonus + output_bonus + feature_bonus - penalty, 0, 100)
+```
+
+The bonuses are intentionally small—the human intelligence score does most
+of the work so you can enforce organisational preferences easily.
+
+## Picking an intelligence score
+
+A straightforward rubric that mirrors typical provider tiers:
+
+| Intelligence | Guidance |
+|--------------|----------|
+| 18–19 | Frontier reasoning models (Gemini 2.5 Pro, GPT‑5) |
+| 15–17 | Strong general models with large context (O3 Pro, DeepSeek R1) |
+| 12–14 | Balanced assistants (Claude Opus/Sonnet, Mistral Large) |
+| 9–11  | Fast distillations (Gemini Flash, GPT-5 Mini, Mistral medium) |
+| 6–8   | Local or efficiency-focused models (Llama 3 70B, Claude Haiku) |
+| ≤5    | Experimental/lightweight models |
+
+Record the reasoning for your scores so future updates stay consistent.
+
+## How the rank is used
+
+The ranked list is cached per provider and consumed by:
+- Tool schemas (`model` parameter descriptions) when auto mode is active.
+- The `listmodels` tool’s “top models” sections.
+- Fallback messaging when a requested model is unavailable.
+
+Because the rank is computed after restriction filters, only allowed models
+appear in these summaries.
+
+## Customising further
+
+If you need a different weighting you can:
+- Override `intelligence_score` in your provider or custom model config.
+- Subclass the provider and override `get_effective_capability_rank()`.
+- Post-process the rank via `get_capabilities_by_rank()` before surfacing it.
+
+Most teams find that adjusting `intelligence_score` alone is enough to keep
+auto mode honest without revisiting code.