Commit Graph

949 Commits

Author SHA1 Message Date
Josh Vera
91605bbd98 feat: Implement code review improvements from gemini-2.5-pro analysis
 Key improvements:
• Added public reset_for_testing() method to registry for clean test state management
• Updated test setup/teardown to use new public API instead of private attributes
• Enhanced inject_transport helper to ensure OpenAI provider registration
• Migrated additional test files to use inject_transport pattern
• Reduced code duplication by ~30 lines across test files

🔧 Technical details:
• transport_helpers.py: Always register OpenAI provider for transport tests
• test_o3_pro_output_text_fix.py: Use reset_for_testing() API, remove redundant registration
• test_o3_pro_fixture_bisect.py: Migrate all 4 test methods to inject_transport
• test_o3_pro_simplified.py: Migrate both test methods to inject_transport
• providers/registry.py: Add reset_for_testing() public method

 Quality assurance:
• All 7 o3-pro tests pass with new helper pattern
• No regression in test isolation or provider state management
• Improved maintainability through centralized transport injection
• Follows single responsibility principle with focused helper function

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 09:53:49 -06:00
Josh Vera
17b97751ab refactor: Simplify o3-pro test by removing fixture and monkey patching boilerplate
- Remove over-engineered allow_all_models fixture (6 operations → 1 line API key setting)
- Replace 10 lines of monkey patching boilerplate with 1-line inject_transport helper
- Remove cargo-cult error handling that allowed test to pass with API failures
- Create reusable transport_helpers.py for HTTP transport injection patterns
- Fix provider registration state pollution between batch test runs
- Test now works reliably in both individual and batch execution modes

The test is significantly cleaner and addresses root cause (provider registration timing)
rather than symptoms (cache clearing).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 08:12:20 -06:00
Josh Vera
83e8b67234 test: Enhance o3-pro test to verify model metadata and response parsing
- Add verification that o3-pro model was actually used (not just requested)
- Verify model_used and provider_used metadata fields are populated
- Add graceful handling for error responses in test
- Improve test documentation explaining what's being verified
- Confirm response parsing uses output_text field correctly

This ensures the test properly validates both that:
1. The o3-pro model was selected and used via the /v1/responses endpoint
2. The response metadata correctly identifies the model and provider

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 06:09:31 -06:00
Nate Parsons
48cff76c99 Address PR #192 review comments
- Fix TOCTOU race condition by removing os.path.exists() check before file open
- Move imports (base64, binascii, os, utils.file_types) to top of file
- Replace broad Exception catch with specific binascii.Error for base64 decoding
- Maintain proper error handling and test compatibility
2025-07-12 22:13:03 -07:00
Nate Parsons
70d6cf8b54 refactor: Extract image validation to provider base class
Consolidates duplicated image validation logic from individual providers
into a reusable base class method. This improves maintainability and
ensures consistent validation across all providers.

- Added validate_image() method to ModelProvider base class
- Supports both file paths and data URLs
- Validates image format, size, and MIME types
- Added DEFAULT_MAX_IMAGE_SIZE_MB class constant (20MB)
- Refactored Gemini and OpenAI providers to use base validation
- Added comprehensive test suite with 19 tests
- Used minimal mocking approach with concrete test provider class
2025-07-12 21:51:24 -07:00
Nate Parsons
2c979058e5 feat: Continue Python environment setup even without API keys
Modified run-server.sh to allow developers to set up the Python development
environment without having API keys configured. This enables:

- Developers to clone and immediately start working on the codebase
- Running tests that don't require API calls
- Browsing and understanding the code structure
- Adding API keys later when ready to test MCP server functionality

Changes:
- Added new check_api_keys() function that warns but doesn't exit
- Changed print_error to print_warning for missing keys
- Updated main() to use check_api_keys instead of validate_api_keys || exit 1
- Kept original validate_api_keys() for backward compatibility

The script now shows a warning when API keys are missing but continues
with the full Python environment setup, dependencies installation, and
Claude configuration.
2025-07-12 20:32:33 -07:00
Josh Vera
3db49413ff fix: Resolve o3-pro response parsing and test execution issues
- Fix lint errors: trailing whitespace and deprecated typing imports
- Update test mock for o3-pro response format (output.content[] → output_text)
- Implement robust test isolation with monkeypatch fixture
- Clear provider registry cache to prevent test interference
- Ensure o3-pro tests pass in both individual and full suite execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 20:24:34 -06:00
Josh Vera
ae5e43b792 test: Add o3-pro test cassette and remove unused cassette
- Add o3_pro_basic_math.json cassette for test_o3_pro_output_text_fix.py
- Remove unused o3_pro_content_capture.json cassette
- This allows tests to run without API keys in CI/CD

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:45:49 -06:00
Josh Vera
8eef4b6722 refactor: Simplify PIISanitizer class by 27%
- Consolidate patterns: GitHub tokens (3→1), phone numbers (2→1)
- Remove duplicate Bearer token patterns (saved 18 lines)
- Simplify sanitize_headers method (30→15 lines)
- Remove unnecessary base64 handling methods
- Clean up unused imports (base64, json, Tuple)
- Reduce total patterns from 24 to 14
- All tests pass, functionality preserved

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:40:04 -06:00
Josh Vera
69f7a79804 chore: Remove unused test_replay.json cassette file
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:31:26 -06:00
Josh Vera
840b3deee4 chore: Remove simplified PR template from version control
Accidentally added it back with git add .

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:28:43 -06:00
Josh Vera
224d039250 chore: Remove PR template files from version control
PR templates should not be committed to the repository

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:27:12 -06:00
Josh Vera
a1451befd2 refactor: Clean up test files and simplify documentation
- Remove unused cassette files with incomplete recordings
- Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py)
- Fix respx references in docstrings to mention HTTP transport recorder
- Simplify vcr-testing.md documentation (60% reduction, more task-oriented)
- Add simplified PR template with better test instructions
- Fix cassette path consistency in examples
- Add security note about reviewing cassettes before committing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:24:51 -06:00
Josh Vera
7f92085c70 feat: Fix o3-pro response parsing and implement HTTP transport recorder
- Fix o3-pro response parsing to use output_text convenience field
- Replace respx with custom httpx transport solution for better reliability
- Implement comprehensive PII sanitization to prevent secret exposure
- Add HTTP request/response recording with cassette format for testing
- Sanitize all existing cassettes to remove exposed API keys
- Update documentation to reflect new HTTP transport recorder
- Add test suite for PII sanitization and HTTP recording

This change:
1. Fixes timeout issues with o3-pro API calls (was 2+ minutes, now ~15-22 seconds)
2. Properly captures response content without httpx.ResponseNotRead exceptions
3. Preserves original HTTP response format including gzip compression
4. Prevents future secret exposure with automatic PII sanitization
5. Enables reliable replay testing for o3-pro interactions

Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 18:47:17 -06:00
Ted Slesinski
3e2dcc9c78 Update docs for grok 4 2025-07-12 09:42:29 -04:00
Raymond Lucke
0b4167bd6f Update Grok 4 max_output_tokens to 16K to match Grok 3. 2025-07-10 18:41:50 -07:00
Raymond Lucke
346410e34e Use ModelCapabilities for Grok thinking mode support check
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-10 18:21:44 -07:00
Raymond Lucke
39e2bc61b6 Tests for Grok 4. 2025-07-10 18:15:31 -07:00
Raymond Lucke
3ff6fef086 Add configuration for Grok 4. 2025-07-10 17:44:51 -07:00
OhMyApps
96ff1ea520 feat: Add Dev parameter to install development dependencies
Add the Docker parameter to build the Zen MCP server Docker imag

Add environnement detection (Python vs Docker) to configure the MCP server command in MCP client settings
2025-07-06 01:31:42 +02:00
OhMyApps
c5bb7fa6ce feat: Add Dev parameter to install development dependencies
Add the Docker parameter to build the Zen MCP server Docker imag

Add environnement detection (Python vs Docker) to configure the MCP server command in MCP client settings
2025-07-06 01:30:31 +02:00
OhMyApps
9b5d03747e fix: PR#151 - Enhance cross-platform support
- Improved error handling and path resolution in run-server.ps1 for better reliability.
- Implemented conversation tests for Docker mode compatibility in validation_crossplatform.py.
- Updated run-server.ps1 to include detailed help documentation, configuration management, and backup retention for configuration files.
- Added Docker path validation tests in validation_crossplatform.py to ensure correct path handling in Docker mode.
- Enhanced integration test script run_integration_tests.ps1 with comprehensive documentation and parameter support for output customization.
2025-07-05 14:57:27 +02:00
Andrew
fd76f8580d Update run-server.sh
Fix issue where wslu not installed results in var not set errors.
2025-07-03 12:01:57 -05:00
Fahad
ad6b216265 Updated description 2025-06-30 13:51:10 +04:00
Fahad
e091c6d40a Updated with screenshots 2025-06-30 13:49:32 +04:00
Beehive Innovations
7d3f43aaf5 Update README.md
Screenshots with and without zen for Challenge tool
2025-06-30 13:47:38 +04:00
Fahad
a70dcbe4d1 Lint
Fixed challenge expectation
2025-06-30 13:35:40 +04:00
Fahad
268df43858 Improved auto-challenge invocation
Automatically determine MCP client's name
2025-06-30 13:31:04 +04:00
Fahad
a1793a6028 Cleanup 2025-06-30 01:41:07 +04:00
Fahad
8bd06efe1f More examples 2025-06-29 20:35:39 +04:00
Fahad
a54343dc79 Improved challenge prompt and instructions 2025-06-29 17:52:00 +04:00
Fahad
26170efa65 Typo 2025-06-29 15:55:55 +04:00
Fahad
6b495cea0b New tool! "challenge" with confidence and stop Claude from agreeing with you blindly and undoing the _correct_ strategy because you were wrong
Fixed run script to ensure pip is installed
2025-06-29 15:50:45 +04:00
Fahad
4972e7c281 Added tips 2025-06-29 14:04:28 +04:00
Fahad
4d0a41b12c Updated instructions for uvx 2025-06-29 13:12:35 +04:00
Fahad
df5e8e6793 Merge remote-tracking branch 'origin/main' 2025-06-29 13:01:17 +04:00
Beehive Innovations
c82026941a Merge pull request #150 from SamDc73/main
Add uvx support
2025-06-29 01:56:53 -07:00
Beehive Innovations
242806080b Merge pull request #151 from GiGiDKR/feat-cross-plateform_compatibility
Feat: Cross-Plateform Compatibily
2025-06-29 01:53:12 -07:00
Beehive Innovations
b3d16d581f Merge pull request #142 from GiGiDKR/feat-dockerisation
feat: Add Docker support and documentation
2025-06-29 01:52:48 -07:00
OhMyApps
1793449cf8 fix: enhance Windows path validation and detection in base_tool.py 2025-06-29 04:50:51 +02:00
OhMyApps
479f556535 Merge branch 'BeehiveInnovations:main' into feat-dockerisation 2025-06-29 02:07:06 +02:00
OhMyApps
6ae3892d85 Merge branch 'BeehiveInnovations:main' into feat-cross-plateform_compatibility 2025-06-29 02:06:39 +02:00
OhMyApps
3d12a7cb70 feat: Add comprehensive tests for Docker integration, security, and volume persistence
- Introduced tests for Docker deployment scripts to ensure existence, permissions, and proper command usage.
- Added tests for Docker integration with Claude Desktop, validating MCP configuration and command formats.
- Implemented health check tests for Docker, ensuring script functionality and proper configuration in Docker setup.
- Created tests for Docker MCP validation, focusing on command validation and security configurations.
- Developed security tests for Docker configurations, checking for non-root user setups, privilege restrictions, and sensitive data handling.
- Added volume persistence tests to ensure configuration and logs are correctly managed across container runs.
- Updated .dockerignore to exclude sensitive files and added relevant tests for Docker secrets handling.
2025-06-29 00:01:35 +02:00
OhMyApps
fd2b14028a feat: add timezone synchronization volume and update deployment scripts for health checks
Update the Docker README and create a Docker Deployment guide in the docs folder
2025-06-29 00:00:47 +02:00
Fahad
26169ae827 Disable auto mode for consensus 2025-06-28 22:40:19 +04:00
Fahad
b9c2e4f5e6 Tweaks to prompts to prevent Claude from becoming overconfident 2025-06-28 22:30:58 +04:00
OhMyApps
180a350f6d fix: improve the Python path detection in mock tests and ensures logger initialization order is correct.
Fix run-server.ps1 to handle PowerShell script creation correctly and ensure pip is installed in the uv environment.
2025-06-27 23:58:13 +02:00
Fahad
adbc4af4a9 Update confidence enum values across workflow tools
Added new confidence values (very_high, almost_certain) to all workflow tools
to provide more granular confidence tracking. Updated enum declarations in:
- analyze.py, codereview.py, debug.py, precommit.py, secaudit.py, testgen.py
- Updated debug.py's get_required_actions to handle new confidence values
- All tools now use consistent 7-value confidence scale
- refactor.py kept its unique scale (exploring/incomplete/partial/complete)

Also fixed model thinking configuration:
- Added very_high and almost_certain to MODEL_THINKING_PREFERENCES
- Set medium thinking for very_high, high thinking for almost_certain
- Updated prompts to clarify certain means 100% local confidence

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-28 00:04:35 +04:00
OhMyApps
c2c8d3de1e docs: Add PowerShell commands reference to README.md 2025-06-27 21:55:53 +02:00
Fahad
bc447d4bcd Generic naming to work with Gemini CLI / Claude Code 2025-06-27 23:41:20 +04:00