Commit Graph

295 Commits

Author SHA1 Message Date
Fahad
f4c20d2a20 fix: handler for parsing multiple generated code blocks 2025-10-18 00:28:17 +04:00
Fahad
95e69a7cb2 fix: improved error reporting; codex cli would at times fail to figure out how to handle plain-text / JSON errors
fix: working directory should exist, raise error and not try and create one
docs: improved API Lookup instructions
* test added to confirm failures
* chat schema more explicit about file paths
2025-10-17 23:42:32 +04:00
Fahad
9ffca53ce5 feat! Claude Code as a CLI agent now supported. Mix and match: spawn claude code from within claude code, or claude code from within codex.
Stay in codex, plan review and fix complicated bugs, then ask it to spawn claude code and implement the plan.

This uses your current subscription instead of API tokens.
2025-10-08 11:14:22 +04:00
Beehive Innovations
d80d77bb47 Merge pull request #279 from christopher-buss/fix-windows-clink
fix: resolve executable path for clink cross-platform compatibility in CLI
2025-10-08 08:11:04 +04:00
christopher-buss
4370be33b4 test: fix clink agent tests to mock shutil.which() for executable resolution
The previous commit (f98046c) added shutil.which() to resolve executables,
which broke two tests that only mocked subprocess execution. This commit
adds shutil.which() mocking to both test files to restore test compatibility.

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-07 17:39:41 +01:00
Fahad
ece8a5ebed feat!: Full code can now be generated by an external model and shared with the AI tool (Claude Code / Codex etc)!
model definitions now support a new `allow_code_generation` flag, only to be used with higher reasoning models such as GPT-5-Pro and-Gemini 2.5-Pro

 When `true`, the `chat` tool can now request the external model to generate a full implementation / update / instructions etc and then share the implementation with the calling agent.

 This effectively allows us to utilize more powerful models such as GPT-5-Pro to generate code for us or entire implementations (which are either API-only or part of the $200 Pro plan from within the ChatGPT app)
2025-10-07 18:49:13 +04:00
Fahad
7c36b9255a refactor: moved registries into a separate module and code cleanup
fix: refactored dial provider to follow the same pattern
2025-10-07 12:59:09 +04:00
Fahad
cbe1d79932 fix: handle 429 response https://github.com/BeehiveInnovations/zen-mcp-server/issues/273 2025-10-06 23:32:04 +04:00
Fahad
a33efbde52 fix: use CUSTOM_CONNECT_TIMEOUT for gemini too
feat: add grok-4 to openrouter_models.json
2025-10-06 23:23:24 +04:00
Fahad
a65485a1e5 feat: support for GPT-5-Pro highest reasoning model https://github.com/BeehiveInnovations/zen-mcp-server/issues/275 2025-10-06 22:36:44 +04:00
Fahad
561e4aaaa8 feat: support for codex as external CLI
fix: improved handling of MCP token limits when handling CLI output
2025-10-06 00:39:00 +04:00
Fahad
a150e1c312 fix: intercept non-cli errors and allow agent to continue 2025-10-05 11:38:59 +04:00
Fahad
a2ccb48e9a feat!: Huge update - Link another CLI (such as gemini directly from with Claude Code / Codex). https://github.com/BeehiveInnovations/zen-mcp-server/issues/208
Zen now allows you to define `roles` for an external CLI and delegate work to another CLI via the new `clink` tool (short for `CLI + Link`). Gemini, for instance, offers 1000 free requests a day - this means you can save on tokens and your weekly limits within Claude Code by delegating work to another entirely capable CLI agent!

Define your own system prompts as `roles` and make another CLI do anything you'd like. Like the current tool you're connected to, the other CLI has complete access to your files and the current context. This also works incredibly well with Zen's `conversation continuity`.
2025-10-05 10:40:44 +04:00
Fahad
9c99b9b352 refactor: fixed test 2025-10-05 08:55:50 +04:00
Fahad
ff9a07a37a feat!: breaking change - OpenRouter models are now read from conf/openrouter_models.json while Custom / Self-hosted models are read from conf/custom_models.json
feat: Azure OpenAI / Azure AI Foundry support. Models should be defined in conf/azure_models.json (or a custom path). See .env.example for environment variables or see readme. https://github.com/BeehiveInnovations/zen-mcp-server/issues/265

feat: OpenRouter / Custom Models / Azure can separately also use custom config paths now (see .env.example )

refactor: Model registry class made abstract, OpenRouter / Custom Provider / Azure OpenAI now subclass these

refactor: breaking change: `is_custom` property has been removed from model_capabilities.py (and thus custom_models.json) given each models are now read from separate configuration files
2025-10-04 21:10:56 +04:00
Fahad
bc93b5343b fix: CI test 2025-10-04 14:32:47 +04:00
Fahad
2c534ac06e feat: centralized environment handling, ensures ZEN_MCP_FORCE_ENV_OVERRIDE is honored correctly
fix: updated tests to override env variables they need instead of relying on the current values from .env
2025-10-04 14:28:56 +04:00
Fahad
4015e917ed fix: listmodels to always honor restricted models
fix: restrictions should resolve canonical names for openrouter
fix: tools now correctly return restricted list by presenting model names in schema
fix: tests updated to ensure these manage their expected env vars properly
perf: cache model alias resolution to avoid repeated checks
2025-10-04 13:46:22 +04:00
Fahad
06d7701cc3 refactor: removed subclass override when the base class should be resolving the model name
refactor: always disable "stream"
2025-10-04 10:35:32 +04:00
Fahad
f955100f3a refactor: improved retry logic and moved core logic to base class 2025-10-03 23:48:55 +04:00
Fahad
4968e1f7bc refactor: cleanup, remove unused method 2025-10-03 23:26:22 +04:00
Fahad
a8fbafd5a7 feat: minor tweaks to chat system prompt to help with alignment 2025-10-03 23:09:12 +04:00
Fahad
8759edc817 fix: consensus now advertises a short list of models to avoid the CLI getting the names wrong 2025-10-03 22:41:28 +04:00
Beehive Innovations
775ac8ff58 Merge pull request #268 from Coquinate/feat/gpt5-codex-responses-api
feat: add GPT-5-Codex support with Responses API integration
2025-10-03 21:16:13 +04:00
aberemia24
f2653427ca feat: add GPT-5-Codex support with Responses API integration
Adds support for OpenAI's GPT-5-Codex model which uses the new Responses API
endpoint (/v1/responses) instead of the standard Chat Completions API.

Changes:
- Add GPT-5-Codex to MODEL_CAPABILITIES with 400K context, 128K output
- Prioritize GPT-5-Codex for EXTENDED_REASONING tasks
- Add aliases: codex, gpt5-codex, gpt-5-code
- Update tests to expect GPT-5-Codex for extended reasoning

Benefits:
- 40-80% cost savings through Responses API caching
- 3% better performance on coding tasks (SWE-bench)
- Leverages existing dual-API infrastructure
2025-10-03 13:59:44 +03:00
Fahad
88493bd357 test: cross tool memory recall, testing continuation via cassette recording 2025-10-03 13:51:32 +04:00
Fahad
3c4b1368c6 test: http cassettes added for improved testing of consensus 2025-10-03 13:07:42 +04:00
Fahad
d55130a430 fix: external model name now recorded properly in responses
test: http cassettes added for improved integration tests
refactor: generic name for the CLI agent
2025-10-03 11:29:06 +04:00
Fahad
87ccb6b25b test: fixed integration tests, removed magicmock 2025-10-02 23:47:44 +04:00
Fahad
8b3a2867fb fix: https://github.com/BeehiveInnovations/zen-mcp-server/issues/194 2025-10-02 23:12:52 +04:00
Fahad
a199e4a955 refactor: cleanup old use_websearch param 2025-10-02 22:46:58 +04:00
Fahad
6cab9e56fc feat: added intelligence_score to the model capabilities schema; a 1-20 number that can be specified to influence the sort order of models presented to the CLI in auto selection mode
fix: model definition re-introduced into the schema but intelligently and only a summary is generated per tool. Required to ensure CLI calls and uses the correct model
fix: removed `model` param from some tools where this wasn't needed
fix: fixed adherence to `*_ALLOWED_MODELS` by advertising only the allowed models to the CLI
fix: removed duplicates across providers when passing canonical names back to the CLI; the first enabled provider wins
2025-10-02 21:43:44 +04:00
Fahad
4c288b2385 fix: improved conversation retrieval 2025-10-02 14:47:24 +04:00
Fahad
d285fadf4c fix: custom provider must only accept a model if it's declared explicitly. Upon model rejection (in auto mode) the list of available models is returned up-front to help with selection. 2025-10-02 13:49:23 +04:00
Fahad
693b84db2b refactor: cleanup provider base class; cleanup shared responsibilities; cleanup public contract
docs: document provider base class
refactor: cleanup custom provider, it should only deal with `is_custom` model configurations
fix: make sure openrouter provider does not load `is_custom` models
fix: listmodels tool cleanup
2025-10-02 12:59:45 +04:00
Fahad
14a35afa1d refactor: moved image related code out of base provider into a separate utility 2025-10-02 11:23:15 +04:00
Fahad
a254ff2220 refactor: removed method from provider, should use model capabilities instead
refactor: cleanup temperature factory method
2025-10-02 11:08:56 +04:00
Fahad
6d237d0970 refactor: moved temperature method from base provider to model capabilities
refactor: model listing cleanup, moved logic to model_capabilities.py
docs: added AGENTS.md for onboarding Codex
2025-10-02 10:25:41 +04:00
Fahad
1dc25f6c3d refactor: renaming to reflect underlying type
docs: updated to reflect new modules
2025-10-02 09:07:40 +04:00
Fahad
250545e34f refactor: removed hard coded checks, use model capabilities instead 2025-10-02 08:32:51 +04:00
Fahad
182aa627df refactor: code cleanup 2025-10-02 08:09:44 +04:00
Fahad
cc8a4dfd21 Overall savings should now be 50%+ tokens used
perf: tweaks to schema descriptions, aiming to reduce token usage without performance degradation
2025-10-01 22:39:12 +04:00
Fahad
f69ff03c4d refactor: trimmed some prompts
style: version tool output now mentions the default model in use
2025-10-01 21:55:05 +04:00
Fahad
d9449c7bb6 feat: depending on the number of tools in use, this change should save ~50% of overall tokens used. fixes https://github.com/BeehiveInnovations/zen-mcp-server/issues/255 but also refactored individual tools to instead encourage the agent to use the listmodels tool if needed. 2025-10-01 21:40:31 +04:00
Fahad
696b45f25e fix: https://github.com/BeehiveInnovations/zen-mcp-server/issues/258 2025-10-01 20:23:38 +04:00
Fahad
7efb4094d4 test: update tests to match new Claude Sonnet 4.5 alias configuration
- Updated sonnet alias to point to claude-sonnet-4.5 instead of 4.1
- Removed references to deprecated 'claude' alias
- Added sonnet4.1 alias for claude-sonnet-4.1 backwards compatibility
- All 809 tests passing
2025-10-01 19:57:43 +04:00
Fahad
bf9344963f Merge branch 'pr-247-modified' 2025-10-01 19:51:29 +04:00
Fahad
104d09502a test: fixed annotation 2025-10-01 19:28:54 +04:00
Beehive Innovations
77caef6f54 Merge pull request #260 from DragonFSKY/fix/consensus-model-context-issue
fix: resolve consensus tool model_context parameter missing issue
2025-10-01 19:27:47 +04:00
Fahad
70fa088c32 feat: implement semantic cassette matching for o3 models
Adds flexible cassette matching that ignores system prompt changes
for o3 models, preventing CI failures when prompts are updated.

Changes:
- Semantic matching: Only compares model name, user question, and core params
- Ignores: System prompts, conversation memory instructions, metadata
- Prevents cassette breaks when prompts change between code versions
- Added comprehensive tests for semantic matching behavior
- Created maintenance documentation (tests/CASSETTE_MAINTENANCE.md)

This solves the CI failure where o3-pro test cassettes would break
whenever system prompts or conversation memory format changed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 18:53:30 +04:00