326 Commits

Author SHA1 Message Date
Sven Lito
7c6ec4a928 fix: resolve pip detection inconsistency in non-interactive shells
- Convert virtual environment Python paths to absolute paths to ensure
  consistency across different shell environments (Git Bash, WSL, etc.)
- Add enhanced diagnostic information when pip detection fails to help
  users troubleshoot path and environment issues
- Improve error messages with specific guidance for different platforms
- Fix black configuration to exclude .zen_venv directory from formatting
- Add comprehensive test suite for pip detection edge cases

Fixes #188
2025-08-08 23:49:24 +07:00
Fahad
e29deb23db Improvements to consensus 2025-08-08 12:59:41 +05:00
Beehive Innovations
f7a079bc35 Merge branch 'main' into refactor-image-validation 2025-08-07 23:12:00 -07:00
Fahad
19ae3c5e9c Fixed tests 2025-08-08 11:11:22 +05:00
Beehive Innovations
912cde42d1 Update test_xai_provider.py 2025-08-08 10:06:38 +04:00
Beehive Innovations
8a884c57d6 Merge branch 'main' into grok4-support 2025-08-07 23:04:15 -07:00
Fahad
fcb0fe3ef2 Fix o3-pro model resolution to use o3-pro consistently
- Use o3-pro throughout the codebase instead of o3-pro-2025-06-10
- Update test expectations to match o3-pro model name
- Update cassette to use o3-pro for consistency
- Ensure responses endpoint routing works correctly with o3-pro

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 10:52:23 +05:00
Fahad
2fdc8fad72 Resolve merge conflicts in o3-pro response parsing fix
- Use new output_text field format for o3-pro responses
- Update test expectations to use resolved model name o3-pro-2025-06-10
- Keep HTTP transport recorder and PII sanitization improvements
- Preserve both bug fix and recent GPT-5 updates

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-08 10:48:56 +05:00
Fahad
7f37efcbfe Grok-4 support 2025-08-08 09:39:07 +05:00
Fahad
1a8ec2e12f GPT-5, GPT-5-mini support
Improvements to model name resolution
Improved instructions for multi-step workflows when continuation is available
Improved instructions for chat tool
Improved preferred model resolution, moved code from registry -> each provider
Updated tests
2025-08-08 08:51:34 +05:00
Fahad
9a4791cb06 Updated description 2025-08-08 05:26:45 +05:00
Josh Vera
7003ae60e0 lint 2025-07-13 12:13:43 -06:00
Josh Vera
780d4ef207 fix: Clear restriction service in o3-pro test setup for proper isolation
The o3-pro test now clears the restriction service singleton in its
setup_method to ensure it re-reads environment variables set by the
@patch.dict decorator. This prevents cached restrictions from previous
tests (like test_fallback_with_shorthand_restrictions) from blocking
the o3-pro model.

This is a minimal, targeted fix that only affects the specific test
that needs it, without breaking other tests that may depend on the
restriction service state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 11:59:54 -06:00
Josh Vera
68866ba95b formatting 2025-07-13 11:48:37 -06:00
Josh Vera
3d24226446 fix: Use monkeypatch for proper test isolation in model restrictions
Replace @patch.dict decorator with pytest monkeypatch fixture in
test_fallback_with_shorthand_restrictions to ensure proper environment
variable cleanup between tests. This prevents OPENAI_ALLOWED_MODELS
from leaking into subsequent tests.

Also remove the manual clearing of _restriction_service singleton
as it's no longer needed with proper environment variable isolation.

This fixes test isolation issues where o3-pro tests would fail when
run after restriction tests due to environment variable persistence.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 11:47:26 -06:00
Josh Vera
6fa7cbcf0d fix: Ensure dummy API keys are set for tests with no_mock_provider marker
The test failures in CI were caused by tests with @pytest.mark.no_mock_provider
that prevented dummy API keys from being set. In CI with no real API keys,
this led to 'Model not available' errors.

Changed pytest_collection_modifyitems to always set dummy keys if missing,
regardless of markers. This ensures tests work in CI while still allowing
real API keys to be used when present.

Fixes test_conversation_field_mapping.py failures in CI across Python 3.10-3.12.
2025-07-13 11:29:02 -06:00
Josh Vera
ac7d489cb4 refactor: Simplify logging and conform to pytest conventions
- Removed excessive debug logging in http_transport_recorder.py
- Consolidated redundant log statements
- Fixed exception logging to use logger.exception()
- Removed emojis from log messages for cleaner output
- Removed __main__ block from test_o3_pro_output_text_fix.py per pytest conventions
- Applied black formatting to comply with CI checks

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:49:37 -06:00
Josh Vera
9248947e39 fix: Resolve o3-pro test isolation issues and convert print to logging
- Fix test isolation by clearing LOCALE env var in o3-pro test
- Add restriction service cleanup in test_model_restrictions.py
- Fix PII sanitizer phone regex to not match timestamps
- Convert all print statements to logging in test files per PR review
- Re-record o3-pro cassette with correct environment

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:41:43 -06:00
Josh Vera
3b1c80865b fix: Resolve test isolation issues for o3-pro test
- Fixed test_fallback_with_shorthand_restrictions to clear restriction
  service singleton in finally block, preventing state leakage
- Updated o3-pro test to use @patch.dict for OPENAI_ALLOWED_MODELS,
  following standard pattern and allowing both o3-pro and o3-pro-2025-06-10
- Removed invalid cassette file that had wrong request content

The test now passes in both isolated and full suite runs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:22:03 -06:00
Josh Vera
538ac55880 fix: Clear restriction service singleton in o3-pro test setup
The test was failing when run in the full test suite because the
ModelRestrictionService singleton persisted restrictions from previous
tests. Specifically, test_fallback_with_shorthand_restrictions sets
OPENAI_ALLOWED_MODELS="mini" which blocked o3-pro.

Added utils.model_restrictions._restriction_service = None to ensure
the test starts with clean restriction state.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:08:05 -06:00
Josh Vera
1b09238c7a cleanup: Remove redundant o3-pro test files
The bisect and simplified test files were created during investigation
to understand fixture requirements, but they test the same core functionality
as test_o3_pro_output_text_fix.py. Now that we have the final clean
implementation, these files are redundant.

Removed:
• test_o3_pro_fixture_bisect.py - 4 test methods testing fixture combinations
• test_o3_pro_simplified.py - 2 test methods testing minimal requirements

The main test_o3_pro_output_text_fix.py remains and covers all the necessary
o3-pro output_text parsing validation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 09:56:10 -06:00
Josh Vera
91605bbd98 feat: Implement code review improvements from gemini-2.5-pro analysis
 Key improvements:
• Added public reset_for_testing() method to registry for clean test state management
• Updated test setup/teardown to use new public API instead of private attributes
• Enhanced inject_transport helper to ensure OpenAI provider registration
• Migrated additional test files to use inject_transport pattern
• Reduced code duplication by ~30 lines across test files

🔧 Technical details:
• transport_helpers.py: Always register OpenAI provider for transport tests
• test_o3_pro_output_text_fix.py: Use reset_for_testing() API, remove redundant registration
• test_o3_pro_fixture_bisect.py: Migrate all 4 test methods to inject_transport
• test_o3_pro_simplified.py: Migrate both test methods to inject_transport
• providers/registry.py: Add reset_for_testing() public method

 Quality assurance:
• All 7 o3-pro tests pass with new helper pattern
• No regression in test isolation or provider state management
• Improved maintainability through centralized transport injection
• Follows single responsibility principle with focused helper function

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 09:53:49 -06:00
Josh Vera
17b97751ab refactor: Simplify o3-pro test by removing fixture and monkey patching boilerplate
- Remove over-engineered allow_all_models fixture (6 operations → 1 line API key setting)
- Replace 10 lines of monkey patching boilerplate with 1-line inject_transport helper
- Remove cargo-cult error handling that allowed test to pass with API failures
- Create reusable transport_helpers.py for HTTP transport injection patterns
- Fix provider registration state pollution between batch test runs
- Test now works reliably in both individual and batch execution modes

The test is significantly cleaner and addresses root cause (provider registration timing)
rather than symptoms (cache clearing).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 08:12:20 -06:00
Josh Vera
83e8b67234 test: Enhance o3-pro test to verify model metadata and response parsing
- Add verification that o3-pro model was actually used (not just requested)
- Verify model_used and provider_used metadata fields are populated
- Add graceful handling for error responses in test
- Improve test documentation explaining what's being verified
- Confirm response parsing uses output_text field correctly

This ensures the test properly validates both that:
1. The o3-pro model was selected and used via the /v1/responses endpoint
2. The response metadata correctly identifies the model and provider

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 06:09:31 -06:00
Nate Parsons
70d6cf8b54 refactor: Extract image validation to provider base class
Consolidates duplicated image validation logic from individual providers
into a reusable base class method. This improves maintainability and
ensures consistent validation across all providers.

- Added validate_image() method to ModelProvider base class
- Supports both file paths and data URLs
- Validates image format, size, and MIME types
- Added DEFAULT_MAX_IMAGE_SIZE_MB class constant (20MB)
- Refactored Gemini and OpenAI providers to use base validation
- Added comprehensive test suite with 19 tests
- Used minimal mocking approach with concrete test provider class
2025-07-12 21:51:24 -07:00
Josh Vera
3db49413ff fix: Resolve o3-pro response parsing and test execution issues
- Fix lint errors: trailing whitespace and deprecated typing imports
- Update test mock for o3-pro response format (output.content[] → output_text)
- Implement robust test isolation with monkeypatch fixture
- Clear provider registry cache to prevent test interference
- Ensure o3-pro tests pass in both individual and full suite execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 20:24:34 -06:00
Josh Vera
ae5e43b792 test: Add o3-pro test cassette and remove unused cassette
- Add o3_pro_basic_math.json cassette for test_o3_pro_output_text_fix.py
- Remove unused o3_pro_content_capture.json cassette
- This allows tests to run without API keys in CI/CD

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:45:49 -06:00
Josh Vera
8eef4b6722 refactor: Simplify PIISanitizer class by 27%
- Consolidate patterns: GitHub tokens (3→1), phone numbers (2→1)
- Remove duplicate Bearer token patterns (saved 18 lines)
- Simplify sanitize_headers method (30→15 lines)
- Remove unnecessary base64 handling methods
- Clean up unused imports (base64, json, Tuple)
- Reduce total patterns from 24 to 14
- All tests pass, functionality preserved

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:40:04 -06:00
Josh Vera
69f7a79804 chore: Remove unused test_replay.json cassette file
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:31:26 -06:00
Josh Vera
a1451befd2 refactor: Clean up test files and simplify documentation
- Remove unused cassette files with incomplete recordings
- Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py)
- Fix respx references in docstrings to mention HTTP transport recorder
- Simplify vcr-testing.md documentation (60% reduction, more task-oriented)
- Add simplified PR template with better test instructions
- Fix cassette path consistency in examples
- Add security note about reviewing cassettes before committing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:24:51 -06:00
Josh Vera
7f92085c70 feat: Fix o3-pro response parsing and implement HTTP transport recorder
- Fix o3-pro response parsing to use output_text convenience field
- Replace respx with custom httpx transport solution for better reliability
- Implement comprehensive PII sanitization to prevent secret exposure
- Add HTTP request/response recording with cassette format for testing
- Sanitize all existing cassettes to remove exposed API keys
- Update documentation to reflect new HTTP transport recorder
- Add test suite for PII sanitization and HTTP recording

This change:
1. Fixes timeout issues with o3-pro API calls (was 2+ minutes, now ~15-22 seconds)
2. Properly captures response content without httpx.ResponseNotRead exceptions
3. Preserves original HTTP response format including gzip compression
4. Prevents future secret exposure with automatic PII sanitization
5. Enables reliable replay testing for o3-pro interactions

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 18:47:17 -06:00
Raymond Lucke
39e2bc61b6 Tests for Grok 4. 2025-07-10 18:15:31 -07:00
Fahad
a70dcbe4d1 Lint
Fixed challenge expectation
2025-06-30 13:35:40 +04:00
Fahad
a54343dc79 Improved challenge prompt and instructions 2025-06-29 17:52:00 +04:00
Fahad
6b495cea0b New tool! "challenge" with confidence and stop Claude from agreeing with you blindly and undoing the _correct_ strategy because you were wrong
Fixed run script to ensure pip is installed
2025-06-29 15:50:45 +04:00
Beehive Innovations
c82026941a Merge pull request #150 from SamDc73/main
Add uvx support
2025-06-29 01:56:53 -07:00
OhMyApps
479f556535 Merge branch 'BeehiveInnovations:main' into feat-dockerisation 2025-06-29 02:07:06 +02:00
OhMyApps
3d12a7cb70 feat: Add comprehensive tests for Docker integration, security, and volume persistence
- Introduced tests for Docker deployment scripts to ensure existence, permissions, and proper command usage.
- Added tests for Docker integration with Claude Desktop, validating MCP configuration and command formats.
- Implemented health check tests for Docker, ensuring script functionality and proper configuration in Docker setup.
- Created tests for Docker MCP validation, focusing on command validation and security configurations.
- Developed security tests for Docker configurations, checking for non-root user setups, privilege restrictions, and sensitive data handling.
- Added volume persistence tests to ensure configuration and logs are correctly managed across container runs.
- Updated .dockerignore to exclude sensitive files and added relevant tests for Docker secrets handling.
2025-06-29 00:01:35 +02:00
Fahad
adbc4af4a9 Update confidence enum values across workflow tools
Added new confidence values (very_high, almost_certain) to all workflow tools
to provide more granular confidence tracking. Updated enum declarations in:
- analyze.py, codereview.py, debug.py, precommit.py, secaudit.py, testgen.py
- Updated debug.py's get_required_actions to handle new confidence values
- All tools now use consistent 7-value confidence scale
- refactor.py kept its unique scale (exploring/incomplete/partial/complete)

Also fixed model thinking configuration:
- Added very_high and almost_certain to MODEL_THINKING_PREFERENCES
- Set medium thinking for very_high, high thinking for almost_certain
- Updated prompts to clarify certain means 100% local confidence

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-28 00:04:35 +04:00
Fahad
bc447d4bcd Generic naming to work with Gemini CLI / Claude Code 2025-06-27 23:41:20 +04:00
Husam Alshehadat
e3e9e4eb55 fix: Handle tomllib import for Python 3.10 compatibility in uvx tests 2025-06-27 10:37:32 -07:00
Husam Alshehadat
36bba89325 style: Fix formatting after sync 2025-06-27 10:35:16 -07:00
OhMyApps
62178aa073 Merge branch 'BeehiveInnovations:main' into feat-dockerisation 2025-06-27 18:45:41 +02:00
Beehive Innovations
7f6a37a7b9 Merge pull request #131 from GiGiDKR/feat-local_support_with_UTF-8_encoding-update
feat: local support with utf 8 encoding
2025-06-27 08:02:14 -07:00
Fahad
090931d7cf Fixed linebreaks
Cleanup
Pass excluded fields to the schema builder directly
2025-06-27 14:29:10 +04:00
OhMyApps
453f921df6 Merge branch 'main' into feat-dockerisation 2025-06-25 18:10:26 +02:00
OhMyApps
8ff8e06bf9 refactor: Update environment and Docker configuration files; remove unused MCP configuration tests 2025-06-25 17:42:58 +02:00
OhMyApps
b181f051ac Merge pull request #2 from GiGiDKR/feat-dockerisation
Feat: Add comprehensive Docker support and documentation for Zen MCP Server
2025-06-25 16:38:52 +02:00
OhMyApps
e4c2b36cb3 refactor: Remove unused mcp.json configuration test from Docker tests 2025-06-25 16:32:33 +02:00
OhMyApps
ec49c8f0c7 refactor: Delete unused validation script test 2025-06-25 16:11:42 +02:00