Commit Graph

299 Commits

Author SHA1 Message Date
Fahad
bf9344963f Merge branch 'pr-247-modified' 2025-10-01 19:51:29 +04:00
Fahad
104d09502a test: fixed annotation 2025-10-01 19:28:54 +04:00
Beehive Innovations
77caef6f54 Merge pull request #260 from DragonFSKY/fix/consensus-model-context-issue
fix: resolve consensus tool model_context parameter missing issue
2025-10-01 19:27:47 +04:00
Fahad
70fa088c32 feat: implement semantic cassette matching for o3 models
Adds flexible cassette matching that ignores system prompt changes
for o3 models, preventing CI failures when prompts are updated.

Changes:
- Semantic matching: Only compares model name, user question, and core params
- Ignores: System prompts, conversation memory instructions, metadata
- Prevents cassette breaks when prompts change between code versions
- Added comprehensive tests for semantic matching behavior
- Created maintenance documentation (tests/CASSETTE_MAINTENANCE.md)

This solves the CI failure where o3-pro test cassettes would break
whenever system prompts or conversation memory format changed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 18:53:30 +04:00
Fahad
cff6d8998f fix: removed use_websearch; this parameter was confusing Codex. It started using this to prompt the external model to perform searches! web-search is enabled by Claude / Codex etc by default and the external agent can ask claude to search on its behalf. 2025-10-01 18:44:11 +04:00
Beehive Innovations
f51da6e5f8 Merge branch 'main' into main 2025-10-01 18:19:02 +04:00
Devon Hillard
d13700c14c test: Update OpenAI provider alias tests to match new format
Updated test_supported_models_aliases.py to reflect the removal of self-referencing aliases:
- Removed assertion for "o4-mini" in its own aliases (no longer self-referencing)
- Updated "o3-pro" alias test to use "o3pro" (normalized alias format)
- Fixed alias resolution test for o3pro -> o3-pro

These changes align with the fix for duplicate model listings in listmodels output.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-09 19:08:10 -06:00
Sven Lito
525f4598ce refactor: address code review feedback from Gemini
- Extract restriction checking logic into reusable helper method
- Refactor validate_model_name to reduce code duplication
- Fix logging import by using existing module-level logger
- Clean up test file by removing print statement and main block
- All tests continue to pass after refactoring
2025-09-05 11:04:45 +07:00
Sven Lito
2db1323813 fix: respect custom OpenAI model temperature settings (#245)
- OpenAI provider now checks custom models registry for user configurations
- Custom models with supports_temperature=false no longer send temperature to API
- Fixes 400 errors for custom o3/gpt-5 models configured without temperature support
- Added comprehensive tests to verify the fix works correctly
- Maintains backward compatibility with built-in models

Fixes #245
2025-09-05 10:53:28 +07:00
谢栋梁
9044b63809 fix: resolve consensus tool model_context parameter missing issue
Fixed runtime bug where _prepare_file_content_for_prompt was called
without required model_context parameter, causing RuntimeError when
processing requests with relevant_files.

- Create ModelContext instance with model_name in _consult_model method
- Pass model_context parameter to _prepare_file_content_for_prompt call
- Add comprehensive regression test to prevent future occurrences
- Maintain consensus tool's blinded design with independent model contexts
2025-09-03 10:55:22 +08:00
Fahad
4b202f5d1d feat: refactored and tweaked model descriptions / schema to use fewer tokens at launch (average reduction per field description: 60-80%) without sacrificing tool effectiveness
Disabled secondary tools by default (for new installations), updated README.md with instructions on how to enable these in .env
run-server.sh now displays disabled / enabled tools (when DISABLED_TOOLS is set)
2025-08-22 09:23:59 +04:00
David Knedlik
4930824052 feat: Add comprehensive GPT-5 series model support
- Add GPT-5, GPT-5-mini, and GPT-5-nano models to unified configuration
- Implement proper thinking mode support via dynamic capability checking
- Add OpenAI provider model enumeration methods for registry integration
- Update tests to cover all GPT-5 models and their aliases
- Fix critical bug where thinking mode was hardcoded instead of using model capabilities

Breaking Changes:
- None (backward compatible)

New Models Available:
- gpt-5 (400K context, 128K output, reasoning support)
- gpt-5-mini (400K context, 128K output, efficient variant)
- gpt-5-nano (400K context, fastest/cheapest variant)

Aliases:
- gpt5, gpt5-mini, gpt5mini, gpt5-nano, gpt5nano, nano

All models support:
- Extended thinking mode (reasoning tokens)
- Vision capabilities
- JSON mode
- Function calling

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-21 14:27:00 -05:00
Fahad
80d21e57c0 feat: refactored and improved codereview in line with precommit. Reviews are now either external (default) or internal. Takes away anxiety and loss of tokens when Claude incorrectly decides to be 'confident' about its own changes and bungle things up.
fix: Minor tweaks to prompts
fix: Improved support for smaller models that struggle with strict structured JSON output
Rearranged reasons to use the MCP above quick start (collapsed)
2025-08-21 14:04:32 +04:00
Fahad
0af9202012 Precommit updated to take always prefer external analysis (via _other_ model) unless specified not to. This prevents Claude from being overconfident and inadequately performing subpar precommit checks. 2025-08-20 11:55:40 +04:00
google-labs-jules[bot]
0959d6f0fa feat: Update Claude models to Opus 4.1 and Sonnet 4.1
This commit updates all references to Claude Opus 4 and Sonnet 4 to their newer 4.1 versions throughout the codebase.

The changes include:
- Updating model names in `conf/custom_models.json` and `providers/dial.py`.
- Updating aliases and descriptions to match the new model versions.
- Updating `.env.example` to reflect the new model names.
- Updating all relevant test suites to use the new model names and ensure all tests pass.
2025-08-17 16:08:52 +00:00
Beehive Innovations
e6213d4ca1 Merge pull request #227 from svnlto/fix/uvx-resource-packaging
fix: uvx resource packaging issues for OpenRouter functionality
2025-08-11 12:31:49 -07:00
Sven Lito
673d78be6d Fix failing tests and exclude .zen_venv from linting
- Fix test_resource_loading_success by removing outdated mock targeting non-existent 'files' import
- Simplify resource loading test to validate registry functionality directly
- Add .zen_venv exclusion to ruff and black in code_quality_checks.sh
- All tests now passing (793/793) with clean linting
2025-08-10 22:18:08 +07:00
Sven Lito
5e599b9e7d Complete PR review feedback implementation with clean importlib.resources approach
- Remove redundant path checks between Path("conf/custom_models.json") and Path.cwd() variants
- Implement proper importlib.resources.files('conf') approach for robust packaging
- Create conf/__init__.py to make conf a proper Python package
- Update pyproject.toml to include conf* in package discovery
- Clean up verbose comments and simplify resource loading logic
- Fix test mocking to use correct importlib.resources.files target
- All tests passing (8/8) with proper resource and fallback functionality

Addresses all gemini-code-assist bot feedback from PR #227
2025-08-10 22:13:25 +07:00
Sven Lito
84de9b026f Address PR review feedback: Implement proper importlib.resources approach
Improvements based on gemini-code-assist bot feedback:

1. **Proper importlib.resources implementation:**
   - Use files("providers") / "../conf/custom_models.json" for resource loading
   - Prioritize resource loading over file system paths for packaged environments
   - Maintain backward compatibility with explicit config paths and env variables

2. **Remove redundant path checks:**
   - Eliminated duplicate Path("conf/custom_models.json") and Path.cwd() / "conf/custom_models.json"
   - Streamlined fallback logic to development path + working directory only

3. **Enhanced test coverage:**
   - Mock-based testing of actual fallback scenarios with Path.exists
   - Proper resource loading simulation and failure testing
   - Comprehensive coverage of both resource and file system modes

4. **Robust error handling:**
   - Graceful fallback from resources to file system when resource loading fails
   - Clear logging of which loading method is being used
   - Better error messages indicating resource vs file system loading

The implementation now follows Python packaging best practices using importlib.resources
while maintaining full backward compatibility and robust fallback behavior.

Tested: All 8 test cases pass, resource loading works in development,
file system fallback works when resources fail.
2025-08-10 21:36:40 +07:00
Sven Lito
5565f59a1c Fix uvx resource packaging issues for OpenRouter functionality
Resolves issues #203, #186, #206, #185 where OpenRouter model registry
completely failed to load in uvx installations due to inaccessible
conf/custom_models.json file.

Changes:
- Implement multiple path resolution strategy in OpenRouterModelRegistry
  - Development: Path(__file__).parent.parent / "conf" / "custom_models.json"
  - UVX working dir: Path("conf/custom_models.json")
  - Current working dir: Path.cwd() / "conf" / "custom_models.json"
- Add importlib-resources fallback for Python < 3.9 compatibility
- Add comprehensive test suite for path resolution scenarios
- Ensure graceful handling when config files are missing

The fix restores full OpenRouter functionality (15 models, 62+ aliases)
for users installing via uvx while maintaining backward compatibility
for development and explicit config scenarios.

Tested: All path resolution scenarios pass, OpenRouter models load correctly
2025-08-10 21:27:48 +07:00
Sven Lito
ee520825b4 fix: address PR review feedback on test quality
- Remove broken test with unused mock parameter
- Replace placeholder test with actual validation of diagnostic messages
- Remove unused imports (MagicMock, patch)
- Fix whitespace and formatting issues
- Ensure all 6 tests pass with meaningful assertions

Addresses high-priority feedback from PR review comments.
2025-08-08 23:58:19 +07:00
Sven Lito
8c38ef44b5 test: remove empty placeholder test cases
Remove 7 empty test methods that contained only 'pass' statements:
- TestPipDetectionPlatformCompatibility (4 methods)
- TestPipDetectionRegression (3 methods)

Keep working tests that have actual logic and assertions.
2025-08-08 23:53:48 +07:00
Sven Lito
cce6f7106c style: format test file with black 2025-08-08 23:49:24 +07:00
Sven Lito
7c6ec4a928 fix: resolve pip detection inconsistency in non-interactive shells
- Convert virtual environment Python paths to absolute paths to ensure
  consistency across different shell environments (Git Bash, WSL, etc.)
- Add enhanced diagnostic information when pip detection fails to help
  users troubleshoot path and environment issues
- Improve error messages with specific guidance for different platforms
- Fix black configuration to exclude .zen_venv directory from formatting
- Add comprehensive test suite for pip detection edge cases

Fixes #188
2025-08-08 23:49:24 +07:00
Fahad
e29deb23db Improvements to consensus 2025-08-08 12:59:41 +05:00
Beehive Innovations
f7a079bc35 Merge branch 'main' into refactor-image-validation 2025-08-07 23:12:00 -07:00
Fahad
19ae3c5e9c Fixed tests 2025-08-08 11:11:22 +05:00
Beehive Innovations
912cde42d1 Update test_xai_provider.py 2025-08-08 10:06:38 +04:00
Beehive Innovations
8a884c57d6 Merge branch 'main' into grok4-support 2025-08-07 23:04:15 -07:00
Fahad
fcb0fe3ef2 Fix o3-pro model resolution to use o3-pro consistently
- Use o3-pro throughout the codebase instead of o3-pro-2025-06-10
- Update test expectations to match o3-pro model name
- Update cassette to use o3-pro for consistency
- Ensure responses endpoint routing works correctly with o3-pro

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 10:52:23 +05:00
Fahad
2fdc8fad72 Resolve merge conflicts in o3-pro response parsing fix
- Use new output_text field format for o3-pro responses
- Update test expectations to use resolved model name o3-pro-2025-06-10
- Keep HTTP transport recorder and PII sanitization improvements
- Preserve both bug fix and recent GPT-5 updates

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 10:48:56 +05:00
Fahad
7f37efcbfe Grok-4 support 2025-08-08 09:39:07 +05:00
Fahad
1a8ec2e12f GPT-5, GPT-5-mini support
Improvements to model name resolution
Improved instructions for multi-step workflows when continuation is available
Improved instructions for chat tool
Improved preferred model resolution, moved code from registry -> each provider
Updated tests
2025-08-08 08:51:34 +05:00
Fahad
9a4791cb06 Updated description 2025-08-08 05:26:45 +05:00
Josh Vera
7003ae60e0 lint 2025-07-13 12:13:43 -06:00
Josh Vera
780d4ef207 fix: Clear restriction service in o3-pro test setup for proper isolation
The o3-pro test now clears the restriction service singleton in its
setup_method to ensure it re-reads environment variables set by the
@patch.dict decorator. This prevents cached restrictions from previous
tests (like test_fallback_with_shorthand_restrictions) from blocking
the o3-pro model.

This is a minimal, targeted fix that only affects the specific test
that needs it, without breaking other tests that may depend on the
restriction service state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 11:59:54 -06:00
Josh Vera
68866ba95b formatting 2025-07-13 11:48:37 -06:00
Josh Vera
3d24226446 fix: Use monkeypatch for proper test isolation in model restrictions
Replace @patch.dict decorator with pytest monkeypatch fixture in
test_fallback_with_shorthand_restrictions to ensure proper environment
variable cleanup between tests. This prevents OPENAI_ALLOWED_MODELS
from leaking into subsequent tests.

Also remove the manual clearing of _restriction_service singleton
as it's no longer needed with proper environment variable isolation.

This fixes test isolation issues where o3-pro tests would fail when
run after restriction tests due to environment variable persistence.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 11:47:26 -06:00
Josh Vera
6fa7cbcf0d fix: Ensure dummy API keys are set for tests with no_mock_provider marker
The test failures in CI were caused by tests with @pytest.mark.no_mock_provider
that prevented dummy API keys from being set. In CI with no real API keys,
this led to 'Model not available' errors.

Changed pytest_collection_modifyitems to always set dummy keys if missing,
regardless of markers. This ensures tests work in CI while still allowing
real API keys to be used when present.

Fixes test_conversation_field_mapping.py failures in CI across Python 3.10-3.12.
2025-07-13 11:29:02 -06:00
Josh Vera
ac7d489cb4 refactor: Simplify logging and conform to pytest conventions
- Removed excessive debug logging in http_transport_recorder.py
- Consolidated redundant log statements
- Fixed exception logging to use logger.exception()
- Removed emojis from log messages for cleaner output
- Removed __main__ block from test_o3_pro_output_text_fix.py per pytest conventions
- Applied black formatting to comply with CI checks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:49:37 -06:00
Josh Vera
9248947e39 fix: Resolve o3-pro test isolation issues and convert print to logging
- Fix test isolation by clearing LOCALE env var in o3-pro test
- Add restriction service cleanup in test_model_restrictions.py
- Fix PII sanitizer phone regex to not match timestamps
- Convert all print statements to logging in test files per PR review
- Re-record o3-pro cassette with correct environment

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:41:43 -06:00
Josh Vera
3b1c80865b fix: Resolve test isolation issues for o3-pro test
- Fixed test_fallback_with_shorthand_restrictions to clear restriction
  service singleton in finally block, preventing state leakage
- Updated o3-pro test to use @patch.dict for OPENAI_ALLOWED_MODELS,
  following standard pattern and allowing both o3-pro and o3-pro-2025-06-10
- Removed invalid cassette file that had wrong request content

The test now passes in both isolated and full suite runs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:22:03 -06:00
Josh Vera
538ac55880 fix: Clear restriction service singleton in o3-pro test setup
The test was failing when run in the full test suite because the
ModelRestrictionService singleton persisted restrictions from previous
tests. Specifically, test_fallback_with_shorthand_restrictions sets
OPENAI_ALLOWED_MODELS="mini" which blocked o3-pro.

Added utils.model_restrictions._restriction_service = None to ensure
the test starts with clean restriction state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:08:05 -06:00
Josh Vera
1b09238c7a cleanup: Remove redundant o3-pro test files
The bisect and simplified test files were created during investigation
to understand fixture requirements, but they test the same core functionality
as test_o3_pro_output_text_fix.py. Now that we have the final clean
implementation, these files are redundant.

Removed:
• test_o3_pro_fixture_bisect.py - 4 test methods testing fixture combinations
• test_o3_pro_simplified.py - 2 test methods testing minimal requirements

The main test_o3_pro_output_text_fix.py remains and covers all the necessary
o3-pro output_text parsing validation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 09:56:10 -06:00
Josh Vera
91605bbd98 feat: Implement code review improvements from gemini-2.5-pro analysis
 Key improvements:
• Added public reset_for_testing() method to registry for clean test state management
• Updated test setup/teardown to use new public API instead of private attributes
• Enhanced inject_transport helper to ensure OpenAI provider registration
• Migrated additional test files to use inject_transport pattern
• Reduced code duplication by ~30 lines across test files

🔧 Technical details:
• transport_helpers.py: Always register OpenAI provider for transport tests
• test_o3_pro_output_text_fix.py: Use reset_for_testing() API, remove redundant registration
• test_o3_pro_fixture_bisect.py: Migrate all 4 test methods to inject_transport
• test_o3_pro_simplified.py: Migrate both test methods to inject_transport
• providers/registry.py: Add reset_for_testing() public method

 Quality assurance:
• All 7 o3-pro tests pass with new helper pattern
• No regression in test isolation or provider state management
• Improved maintainability through centralized transport injection
• Follows single responsibility principle with focused helper function

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 09:53:49 -06:00
Josh Vera
17b97751ab refactor: Simplify o3-pro test by removing fixture and monkey patching boilerplate
- Remove over-engineered allow_all_models fixture (6 operations → 1 line API key setting)
- Replace 10 lines of monkey patching boilerplate with 1-line inject_transport helper
- Remove cargo-cult error handling that allowed test to pass with API failures
- Create reusable transport_helpers.py for HTTP transport injection patterns
- Fix provider registration state pollution between batch test runs
- Test now works reliably in both individual and batch execution modes

The test is significantly cleaner and addresses root cause (provider registration timing)
rather than symptoms (cache clearing).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 08:12:20 -06:00
Josh Vera
83e8b67234 test: Enhance o3-pro test to verify model metadata and response parsing
- Add verification that o3-pro model was actually used (not just requested)
- Verify model_used and provider_used metadata fields are populated
- Add graceful handling for error responses in test
- Improve test documentation explaining what's being verified
- Confirm response parsing uses output_text field correctly

This ensures the test properly validates both that:
1. The o3-pro model was selected and used via the /v1/responses endpoint
2. The response metadata correctly identifies the model and provider

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 06:09:31 -06:00
Nate Parsons
70d6cf8b54 refactor: Extract image validation to provider base class
Consolidates duplicated image validation logic from individual providers
into a reusable base class method. This improves maintainability and
ensures consistent validation across all providers.

- Added validate_image() method to ModelProvider base class
- Supports both file paths and data URLs
- Validates image format, size, and MIME types
- Added DEFAULT_MAX_IMAGE_SIZE_MB class constant (20MB)
- Refactored Gemini and OpenAI providers to use base validation
- Added comprehensive test suite with 19 tests
- Used minimal mocking approach with concrete test provider class
2025-07-12 21:51:24 -07:00
Josh Vera
3db49413ff fix: Resolve o3-pro response parsing and test execution issues
- Fix lint errors: trailing whitespace and deprecated typing imports
- Update test mock for o3-pro response format (output.content[] → output_text)
- Implement robust test isolation with monkeypatch fixture
- Clear provider registry cache to prevent test interference
- Ensure o3-pro tests pass in both individual and full suite execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 20:24:34 -06:00
Josh Vera
ae5e43b792 test: Add o3-pro test cassette and remove unused cassette
- Add o3_pro_basic_math.json cassette for test_o3_pro_output_text_fix.py
- Remove unused o3_pro_content_capture.json cassette
- This allows tests to run without API keys in CI/CD

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:45:49 -06:00