feat: DIAL provider implementation (#112)
## Description This PR implements a new [DIAL](https://dialx.ai/dial_api) (Data & AI Layer) provider for the Zen MCP Server, enabling unified access to multiple AI models through the DIAL API platform. DIAL provides enterprise-grade AI model access with deployment-specific routing similar to Azure OpenAI. ## Changes Made - [x] Added support of atexit: - Ensures automatic cleanup of provider resources (HTTP clients, connection pools) on server shutdown - Fixed bug using ModelProviderRegistry.get_available_providers() instead of accessing private _providers - Works with SIGTERM/Ctrl+C for graceful shutdown in both development and containerized environments - [x] Added new DIAL provider (`providers/dial.py`) inheriting from `OpenAICompatibleProvider` - [x] Updated server.py to register DIAL provider during initialization - [x] Updated provider registry to include DIAL provider type - [x] Implemented deployment-specific routing for DIAL's Azure OpenAI-style endpoints - [x] Implemented performance optimizations: - Connection pooling with httpx for better performance - Thread-safe client caching with double-check locking pattern - Proper resource cleanup with `close()` method - [x] Added comprehensive unit tests with 16 test cases (`tests/test_dial_provider.py`) - [x] Added DIAL configuration to `.env.example` with documentation - [x] Added support for configurable API version via `DIAL_API_VERSION` environment variable - [x] Added DIAL model restrictions support via `DIAL_ALLOWED_MODELS` environment variable ### Supported DIAL Models: - OpenAI models: o3, o4-mini (and their dated versions) - Google models: gemini-2.5-pro, gemini-2.5-flash (including search variant) - Anthropic models: Claude 4 Opus/Sonnet (with and without thinking mode) ### Environment Variables: - `DIAL_API_KEY`: Required API key for DIAL authentication - `DIAL_API_HOST`: Optional base URL (defaults to https://core.dialx.ai) - `DIAL_API_VERSION`: Optional API version header (defaults to 2025-01-01-preview) - `DIAL_ALLOWED_MODELS`: Optional comma-separated list of allowed models ### Breaking Changes: - None ### Dependencies: - No new dependencies added (uses existing OpenAI SDK with custom routing)
This commit is contained in:
committed by
GitHub
parent
4ae0344b14
commit
0623ce3546
45
.env.example
45
.env.example
@@ -3,8 +3,11 @@
|
||||
|
||||
# API Keys - At least one is required
|
||||
#
|
||||
# IMPORTANT: Use EITHER OpenRouter OR native APIs (Gemini/OpenAI), not both!
|
||||
# Having both creates ambiguity about which provider serves each model.
|
||||
# IMPORTANT: Choose ONE approach:
|
||||
# - Native APIs (Gemini/OpenAI/XAI) for direct access
|
||||
# - DIAL for unified enterprise access
|
||||
# - OpenRouter for unified cloud access
|
||||
# Having multiple unified providers creates ambiguity about which serves each model.
|
||||
#
|
||||
# Option 1: Use native APIs (recommended for direct access)
|
||||
# Get your Gemini API key from: https://makersuite.google.com/app/apikey
|
||||
@@ -16,6 +19,12 @@ OPENAI_API_KEY=your_openai_api_key_here
|
||||
# Get your X.AI API key from: https://console.x.ai/
|
||||
XAI_API_KEY=your_xai_api_key_here
|
||||
|
||||
# Get your DIAL API key and configure host URL
|
||||
# DIAL provides unified access to multiple AI models through a single API
|
||||
DIAL_API_KEY=your_dial_api_key_here
|
||||
# DIAL_API_HOST=https://core.dialx.ai # Optional: Base URL without /openai suffix (auto-appended)
|
||||
# DIAL_API_VERSION=2025-01-01-preview # Optional: API version header for DIAL requests
|
||||
|
||||
# Option 2: Use OpenRouter for access to multiple models through one API
|
||||
# Get your OpenRouter API key from: https://openrouter.ai/
|
||||
# If using OpenRouter, comment out the native API keys above
|
||||
@@ -27,7 +36,8 @@ OPENROUTER_API_KEY=your_openrouter_api_key_here
|
||||
# CUSTOM_MODEL_NAME=llama3.2 # Default model name
|
||||
|
||||
# Optional: Default model to use
|
||||
# Options: 'auto' (Claude picks best model), 'pro', 'flash', 'o3', 'o3-mini', 'o4-mini', 'o4-mini-high' etc
|
||||
# Options: 'auto' (Claude picks best model), 'pro', 'flash', 'o3', 'o3-mini', 'o4-mini', 'o4-mini-high',
|
||||
# 'grok', 'opus-4', 'sonnet-4', or any DIAL model if DIAL is configured
|
||||
# When set to 'auto', Claude will select the best model for each task
|
||||
# Defaults to 'auto' if not specified
|
||||
DEFAULT_MODEL=auto
|
||||
@@ -70,6 +80,26 @@ DEFAULT_THINKING_MODE_THINKDEEP=high
|
||||
# - grok3 (shorthand for grok-3)
|
||||
# - grokfast (shorthand for grok-3-fast)
|
||||
#
|
||||
# Supported DIAL models (when available in your DIAL deployment):
|
||||
# - o3-2025-04-16 (200K context, latest O3 release)
|
||||
# - o4-mini-2025-04-16 (200K context, latest O4 mini)
|
||||
# - o3 (shorthand for o3-2025-04-16)
|
||||
# - o4-mini (shorthand for o4-mini-2025-04-16)
|
||||
# - anthropic.claude-sonnet-4-20250514-v1:0 (200K context, Claude 4 Sonnet)
|
||||
# - anthropic.claude-sonnet-4-20250514-v1:0-with-thinking (200K context, Claude 4 Sonnet with thinking mode)
|
||||
# - anthropic.claude-opus-4-20250514-v1:0 (200K context, Claude 4 Opus)
|
||||
# - anthropic.claude-opus-4-20250514-v1:0-with-thinking (200K context, Claude 4 Opus with thinking mode)
|
||||
# - sonnet-4 (shorthand for Claude 4 Sonnet)
|
||||
# - sonnet-4-thinking (shorthand for Claude 4 Sonnet with thinking)
|
||||
# - opus-4 (shorthand for Claude 4 Opus)
|
||||
# - opus-4-thinking (shorthand for Claude 4 Opus with thinking)
|
||||
# - gemini-2.5-pro-preview-03-25-google-search (1M context, with Google Search)
|
||||
# - gemini-2.5-pro-preview-05-06 (1M context, latest preview)
|
||||
# - gemini-2.5-flash-preview-05-20 (1M context, latest flash preview)
|
||||
# - gemini-2.5-pro (shorthand for gemini-2.5-pro-preview-05-06)
|
||||
# - gemini-2.5-pro-search (shorthand for gemini-2.5-pro-preview-03-25-google-search)
|
||||
# - gemini-2.5-flash (shorthand for gemini-2.5-flash-preview-05-20)
|
||||
#
|
||||
# Examples:
|
||||
# OPENAI_ALLOWED_MODELS=o3-mini,o4-mini,mini # Only allow mini models (cost control)
|
||||
# GOOGLE_ALLOWED_MODELS=flash # Only allow Flash (fast responses)
|
||||
@@ -77,21 +107,26 @@ DEFAULT_THINKING_MODE_THINKDEEP=high
|
||||
# OPENAI_ALLOWED_MODELS=o4-mini # Single model standardization
|
||||
# GOOGLE_ALLOWED_MODELS=flash,pro # Allow both Gemini models
|
||||
# XAI_ALLOWED_MODELS=grok,grok-3-fast # Allow both GROK variants
|
||||
# DIAL_ALLOWED_MODELS=o3,o4-mini # Only allow O3/O4 models via DIAL
|
||||
# DIAL_ALLOWED_MODELS=opus-4,sonnet-4 # Only Claude 4 models (without thinking)
|
||||
# DIAL_ALLOWED_MODELS=opus-4-thinking,sonnet-4-thinking # Only Claude 4 with thinking mode
|
||||
# DIAL_ALLOWED_MODELS=gemini-2.5-pro,gemini-2.5-flash # Only Gemini 2.5 models via DIAL
|
||||
#
|
||||
# Note: These restrictions apply even in 'auto' mode - Claude will only pick from allowed models
|
||||
# OPENAI_ALLOWED_MODELS=
|
||||
# GOOGLE_ALLOWED_MODELS=
|
||||
# XAI_ALLOWED_MODELS=
|
||||
# DIAL_ALLOWED_MODELS=
|
||||
|
||||
# Optional: Custom model configuration file path
|
||||
# Override the default location of custom_models.json
|
||||
# CUSTOM_MODELS_CONFIG_PATH=/path/to/your/custom_models.json
|
||||
|
||||
# Note: Redis is no longer used - conversations are stored in memory
|
||||
# Note: Conversations are stored in memory during the session
|
||||
|
||||
# Optional: Conversation timeout (hours)
|
||||
# How long AI-to-AI conversation threads persist before expiring
|
||||
# Longer timeouts use more Redis memory but allow resuming conversations later
|
||||
# Longer timeouts use more memory but allow resuming conversations later
|
||||
# Defaults to 3 hours if not specified
|
||||
CONVERSATION_TIMEOUT_HOURS=3
|
||||
|
||||
|
||||
Reference in New Issue
Block a user