refactor: Clean up test files and simplify documentation
- Remove unused cassette files with incomplete recordings - Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py) - Fix respx references in docstrings to mention HTTP transport recorder - Simplify vcr-testing.md documentation (60% reduction, more task-oriented) - Add simplified PR template with better test instructions - Fix cassette path consistency in examples - Add security note about reviewing cassettes before committing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1,216 +1,87 @@
|
|||||||
# HTTP Recording/Replay Testing with HTTP Transport Recorder
|
# HTTP Transport Recorder for Testing
|
||||||
|
|
||||||
This project uses a custom HTTP Transport Recorder for testing expensive API integrations (like o3-pro) with real recorded responses.
|
A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.
|
||||||
|
|
||||||
## What is HTTP Transport Recorder?
|
## Overview
|
||||||
|
|
||||||
The HTTP Transport Recorder is a custom httpx transport implementation that intercepts HTTP requests/responses at the transport layer. This approach provides:
|
The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
|
||||||
|
- Cost-efficient testing of expensive APIs (record once, replay forever)
|
||||||
|
- Deterministic tests with real API responses
|
||||||
|
- Seamless integration with httpx and OpenAI SDK
|
||||||
|
|
||||||
- **Real API structure**: Tests use actual API responses, not guessed mocks
|
## Quick Start
|
||||||
- **Cost efficiency**: Only pay for API calls once during recording
|
|
||||||
- **Deterministic tests**: Same response every time, no API variability
|
|
||||||
- **Transport-level interception**: Works seamlessly with httpx and OpenAI SDK
|
|
||||||
- **Full response capture**: Captures complete HTTP responses including headers and gzipped content
|
|
||||||
|
|
||||||
## Directory Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
tests/
|
|
||||||
├── openai_cassettes/ # Recorded HTTP interactions
|
|
||||||
│ ├── o3_pro_basic_math.json
|
|
||||||
│ └── o3_pro_content_capture.json
|
|
||||||
├── http_transport_recorder.py # Transport recorder implementation
|
|
||||||
├── test_content_capture.py # Example recording test
|
|
||||||
└── test_replay.py # Example replay test
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key Components
|
|
||||||
|
|
||||||
### RecordingTransport
|
|
||||||
- Wraps httpx's default transport
|
|
||||||
- Makes real HTTP calls and captures responses
|
|
||||||
- Handles gzip compression/decompression properly
|
|
||||||
- Saves interactions to JSON cassettes
|
|
||||||
|
|
||||||
### ReplayTransport
|
|
||||||
- Serves saved responses from cassettes
|
|
||||||
- No real HTTP calls made
|
|
||||||
- Matches requests by method, path, and content hash
|
|
||||||
- Re-applies gzip compression when needed
|
|
||||||
|
|
||||||
### TransportFactory
|
|
||||||
- Auto-selects record vs replay mode based on cassette existence
|
|
||||||
- Simplifies test setup
|
|
||||||
|
|
||||||
## Workflow
|
|
||||||
|
|
||||||
### 1. Use Transport Recorder in Tests
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from tests.http_transport_recorder import TransportFactory
|
from tests.http_transport_recorder import TransportFactory
|
||||||
|
from providers import ModelProviderRegistry
|
||||||
|
|
||||||
# Create transport based on cassette existence
|
# Setup transport recorder
|
||||||
cassette_path = "tests/openai_cassettes/my_test.json"
|
cassette_path = "tests/openai_cassettes/my_test.json"
|
||||||
transport = TransportFactory.create_transport(cassette_path)
|
transport = TransportFactory.create_transport(cassette_path)
|
||||||
|
|
||||||
# Inject into OpenAI provider
|
# Inject into provider
|
||||||
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
|
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
|
||||||
provider._test_transport = transport
|
provider._test_transport = transport
|
||||||
|
|
||||||
# Make API calls - will be recorded/replayed automatically
|
# Make API calls - automatically recorded/replayed
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Initial Recording (Expensive)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# With real API key, cassette doesn't exist -> records
|
|
||||||
python test_content_capture.py
|
|
||||||
|
|
||||||
# ⚠️ This will cost money! O3-Pro is $15-60 per 1K tokens
|
|
||||||
# But only needs to be done once
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Subsequent Runs (Free)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Cassette exists -> replays
|
|
||||||
python test_replay.py
|
|
||||||
|
|
||||||
# Can even use fake API key to prove no real calls
|
|
||||||
OPENAI_API_KEY="sk-fake-key" python test_replay.py
|
|
||||||
|
|
||||||
# Fast, free, deterministic
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Re-recording (When API Changes)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Delete cassette to force re-recording
|
|
||||||
rm tests/openai_cassettes/my_test.json
|
|
||||||
|
|
||||||
# Run test again with real API key
|
|
||||||
python test_content_capture.py
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## How It Works
|
## How It Works
|
||||||
|
|
||||||
1. **Transport Injection**: Custom transport injected into httpx client
|
1. **First run** (cassette doesn't exist): Records real API calls
|
||||||
2. **Request Interception**: All HTTP requests go through custom transport
|
2. **Subsequent runs** (cassette exists): Replays saved responses
|
||||||
3. **Mode Detection**: Checks if cassette exists (replay) or needs creation (record)
|
3. **Re-record**: Delete cassette file and run again
|
||||||
4. **Content Capture**: Properly handles streaming responses and gzip encoding
|
|
||||||
5. **Request Matching**: Uses method + path + content hash for deterministic matching
|
|
||||||
|
|
||||||
## Cassette Format
|
## Usage in Tests
|
||||||
|
|
||||||
```json
|
See `test_o3_pro_output_text_fix.py` for a complete example:
|
||||||
{
|
|
||||||
"interactions": [
|
|
||||||
{
|
|
||||||
"request": {
|
|
||||||
"method": "POST",
|
|
||||||
"url": "https://api.openai.com/v1/responses",
|
|
||||||
"path": "/v1/responses",
|
|
||||||
"headers": {
|
|
||||||
"content-type": "application/json",
|
|
||||||
"accept-encoding": "gzip, deflate"
|
|
||||||
},
|
|
||||||
"content": {
|
|
||||||
"model": "o3-pro-2025-06-10",
|
|
||||||
"input": [...],
|
|
||||||
"reasoning": {"effort": "medium"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"response": {
|
|
||||||
"status_code": 200,
|
|
||||||
"headers": {
|
|
||||||
"content-type": "application/json",
|
|
||||||
"content-encoding": "gzip"
|
|
||||||
},
|
|
||||||
"content": {
|
|
||||||
"data": "base64_encoded_response_body",
|
|
||||||
"encoding": "base64",
|
|
||||||
"size": 1413
|
|
||||||
},
|
|
||||||
"reason_phrase": "OK"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Key features:
|
|
||||||
- Complete request/response capture
|
|
||||||
- Base64 encoding for binary content
|
|
||||||
- Preserves gzip compression
|
|
||||||
- Sanitizes sensitive data (API keys removed)
|
|
||||||
|
|
||||||
## Benefits Over Previous Approaches
|
|
||||||
|
|
||||||
1. **Works with any HTTP client**: Not tied to OpenAI SDK specifically
|
|
||||||
2. **Handles compression**: Properly manages gzipped responses
|
|
||||||
3. **Full HTTP fidelity**: Captures headers, status codes, etc.
|
|
||||||
4. **Simpler than VCR.py**: No sync/async conflicts or monkey patching
|
|
||||||
5. **Better than respx**: No streaming response issues
|
|
||||||
|
|
||||||
## Example Test
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
#!/usr/bin/env python3
|
|
||||||
import asyncio
|
|
||||||
from pathlib import Path
|
|
||||||
from tests.http_transport_recorder import TransportFactory
|
|
||||||
from providers import ModelProviderRegistry
|
|
||||||
from tools.chat import ChatTool
|
|
||||||
|
|
||||||
async def test_with_recording():
|
async def test_with_recording():
|
||||||
cassette_path = "tests/openai_cassettes/test_example.json"
|
# Transport factory auto-detects record vs replay mode
|
||||||
|
transport = TransportFactory.create_transport("tests/openai_cassettes/my_test.json")
|
||||||
# Setup transport
|
|
||||||
transport = TransportFactory.create_transport(cassette_path)
|
|
||||||
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
|
|
||||||
provider._test_transport = transport
|
provider._test_transport = transport
|
||||||
|
|
||||||
# Use ChatTool normally
|
|
||||||
chat_tool = ChatTool()
|
|
||||||
result = await chat_tool.execute({
|
|
||||||
"prompt": "What is 2+2?",
|
|
||||||
"model": "o3-pro",
|
|
||||||
"temperature": 1.0
|
|
||||||
})
|
|
||||||
|
|
||||||
print(f"Response: {result[0].text}")
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
# Use normally - recording happens transparently
|
||||||
asyncio.run(test_with_recording())
|
result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
|
||||||
```
|
```
|
||||||
|
|
||||||
## Timeout Protection
|
## File Structure
|
||||||
|
|
||||||
Tests can use GNU timeout to prevent hanging:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install GNU coreutils if needed
|
|
||||||
brew install coreutils
|
|
||||||
|
|
||||||
# Run with 30 second timeout
|
|
||||||
gtimeout 30s python test_content_capture.py
|
|
||||||
```
|
```
|
||||||
|
tests/
|
||||||
## CI/CD Integration
|
├── openai_cassettes/ # Recorded API interactions
|
||||||
|
│ └── *.json # Cassette files
|
||||||
```yaml
|
├── http_transport_recorder.py # Transport implementation
|
||||||
# In CI, tests use existing cassettes (no API keys needed)
|
└── test_o3_pro_output_text_fix.py # Example usage
|
||||||
- name: Run OpenAI tests
|
|
||||||
run: |
|
|
||||||
# Tests will use replay mode with existing cassettes
|
|
||||||
python -m pytest tests/test_o3_pro.py
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Cost Management
|
## Cost Management
|
||||||
|
|
||||||
- **One-time cost**: Initial recording per test scenario
|
- **One-time cost**: Initial recording only
|
||||||
- **Zero ongoing cost**: Replays are free
|
- **Zero ongoing cost**: Replays are free
|
||||||
- **Controlled re-recording**: Manual cassette deletion required
|
- **CI-friendly**: No API keys needed for replay
|
||||||
- **CI-friendly**: No accidental API calls in automation
|
|
||||||
|
## Re-recording
|
||||||
|
|
||||||
|
When API changes require new recordings:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Delete specific cassette
|
||||||
|
rm tests/openai_cassettes/my_test.json
|
||||||
|
|
||||||
|
# Run test with real API key
|
||||||
|
python -m pytest tests/test_o3_pro_output_text_fix.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation Details
|
||||||
|
|
||||||
|
- **RecordingTransport**: Captures real HTTP calls
|
||||||
|
- **ReplayTransport**: Serves saved responses
|
||||||
|
- **TransportFactory**: Auto-selects mode based on cassette existence
|
||||||
|
- **PII Sanitization**: Automatically removes API keys from recordings
|
||||||
|
|
||||||
|
**Security Note**: Always review new cassette files before committing to ensure no sensitive data is included.
|
||||||
|
|
||||||
|
For implementation details, see `tests/http_transport_recorder.py`.
|
||||||
|
|
||||||
This HTTP transport recorder approach provides accurate API testing with cost efficiency, specifically optimized for expensive endpoints like o3-pro while being flexible enough for any HTTP-based API.
|
|
||||||
99
pr_template_filled.md
Normal file
99
pr_template_filled.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
## PR Title Format
|
||||||
|
|
||||||
|
**fix: Fix o3-pro empty response issue by using output_text field**
|
||||||
|
|
||||||
|
## Description
|
||||||
|
|
||||||
|
This PR fixes a critical bug where o3-pro API calls were returning empty responses. The root cause was incorrect response parsing - the code was trying to manually parse `response.output.content[]` array structure, but o3-pro provides a simpler `output_text` convenience field directly on the response object. This PR also introduces a secure HTTP recording system for testing expensive o3-pro calls.
|
||||||
|
|
||||||
|
## Changes Made
|
||||||
|
|
||||||
|
- [x] Fixed o3-pro response parsing by using the `output_text` convenience field instead of manual parsing
|
||||||
|
- [x] Added `_safe_extract_output_text` method with proper validation to handle o3-pro's response format
|
||||||
|
- [x] Implemented custom HTTP transport recorder to replace respx for more reliable test recordings
|
||||||
|
- [x] Added comprehensive PII sanitization to prevent accidental API key exposure in test cassettes
|
||||||
|
- [x] Sanitized all existing test cassettes to remove any exposed secrets
|
||||||
|
- [x] Updated documentation for the new testing infrastructure
|
||||||
|
- [x] Added test suite to validate the fix and ensure PII sanitization works correctly
|
||||||
|
|
||||||
|
**No breaking changes** - The fix only affects o3-pro model parsing internally.
|
||||||
|
|
||||||
|
**Dependencies added:**
|
||||||
|
- None (uses existing httpx and standard library modules)
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Run all linting and tests (required):
|
||||||
|
```bash
|
||||||
|
# Activate virtual environment first
|
||||||
|
source venv/bin/activate
|
||||||
|
|
||||||
|
# Run comprehensive code quality checks (recommended)
|
||||||
|
./code_quality_checks.sh
|
||||||
|
|
||||||
|
# If you made tool changes, also run simulator tests
|
||||||
|
python communication_simulator_test.py
|
||||||
|
```
|
||||||
|
|
||||||
|
- [x] All linting passes (ruff, black, isort)
|
||||||
|
- [x] All unit tests pass
|
||||||
|
- [x] **For bug fixes**: Tests added to prevent regression
|
||||||
|
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing works correctly
|
||||||
|
- `test_o3_pro_http_recording.py` - Tests HTTP recording functionality
|
||||||
|
- `test_pii_sanitizer.py` - Ensures PII sanitization works properly
|
||||||
|
- [x] Manual testing completed with realistic scenarios
|
||||||
|
- Verified o3-pro calls return actual content instead of empty responses
|
||||||
|
- Validated that recorded cassettes contain no exposed API keys
|
||||||
|
|
||||||
|
## Related Issues
|
||||||
|
|
||||||
|
Fixes o3-pro API calls returning empty responses on master branch.
|
||||||
|
|
||||||
|
## Checklist
|
||||||
|
|
||||||
|
- [x] PR title follows the format guidelines above
|
||||||
|
- [x] **Activated venv and ran code quality checks: `source venv/bin/activate && ./code_quality_checks.sh`**
|
||||||
|
- [x] Self-review completed
|
||||||
|
- [x] **Tests added for ALL changes** (see Testing section above)
|
||||||
|
- [x] Documentation updated as needed
|
||||||
|
- Updated `docs/testing.md` with new testing approach
|
||||||
|
- Added `docs/vcr-testing.md` for HTTP recording documentation
|
||||||
|
- [x] All unit tests passing
|
||||||
|
- [x] Ready for review
|
||||||
|
|
||||||
|
## Additional Notes
|
||||||
|
|
||||||
|
### The Bug:
|
||||||
|
On master branch, o3-pro API calls were returning empty responses because the code was trying to parse the response incorrectly:
|
||||||
|
```python
|
||||||
|
# Master branch - incorrect parsing
|
||||||
|
if hasattr(response.output, "content") and response.output.content:
|
||||||
|
for content_item in response.output.content:
|
||||||
|
if hasattr(content_item, "type") and content_item.type == "output_text":
|
||||||
|
content = content_item.text
|
||||||
|
break
|
||||||
|
```
|
||||||
|
|
||||||
|
The o3-pro response object actually provides an `output_text` convenience field directly:
|
||||||
|
```python
|
||||||
|
# Fixed version - correct parsing
|
||||||
|
content = response.output_text
|
||||||
|
```
|
||||||
|
|
||||||
|
### The Fix:
|
||||||
|
1. Added `_safe_extract_output_text` method that properly validates and extracts the `output_text` field
|
||||||
|
2. Updated the response parsing logic in `_generate_with_responses_endpoint` to use this new method
|
||||||
|
3. Added proper error handling and validation to catch future response format issues
|
||||||
|
|
||||||
|
### Additional Improvements:
|
||||||
|
- **Testing Infrastructure**: Implemented HTTP transport recorder to enable testing without repeated expensive API calls
|
||||||
|
- **Security**: Added automatic PII sanitization to prevent API keys from being accidentally committed in test recordings
|
||||||
|
|
||||||
|
### Development Notes:
|
||||||
|
- During development, we encountered timeout issues with the initial respx-based approach which led to implementing the custom HTTP transport recorder
|
||||||
|
- The transport recorder solution properly handles streaming responses and gzip compression
|
||||||
|
|
||||||
|
### For Reviewers:
|
||||||
|
- The core fix is in `providers/openai_compatible.py` lines 307-335 and line 396
|
||||||
|
- The HTTP transport recorder is test infrastructure only and doesn't affect production code
|
||||||
|
- All test cassettes have been sanitized and verified to contain no secrets
|
||||||
59
pr_template_filled_simplified.md
Normal file
59
pr_template_filled_simplified.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
## PR Title
|
||||||
|
|
||||||
|
**fix: Fix o3-pro empty response issue by using output_text field**
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Fixes o3-pro API calls returning empty responses due to incorrect response parsing. The code was trying to parse `response.output.content[]` array, but o3-pro provides `output_text` directly.
|
||||||
|
|
||||||
|
## Changes
|
||||||
|
|
||||||
|
- Fixed o3-pro response parsing to use `output_text` field
|
||||||
|
- Added `_safe_extract_output_text` method with validation
|
||||||
|
- Implemented HTTP transport recorder for testing expensive API calls
|
||||||
|
- Added PII sanitization for test recordings
|
||||||
|
- Added regression tests
|
||||||
|
|
||||||
|
**No breaking changes** - Internal fix only
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source venv/bin/activate
|
||||||
|
./code_quality_checks.sh
|
||||||
|
|
||||||
|
# Run the new tests added in this PR
|
||||||
|
python -m pytest tests/test_o3_pro_output_text_fix.py -v
|
||||||
|
python -m pytest tests/test_pii_sanitizer.py -v
|
||||||
|
|
||||||
|
# Or run all new tests together
|
||||||
|
python -m pytest tests/test_o3_pro_output_text_fix.py tests/test_pii_sanitizer.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
- [x] All checks pass
|
||||||
|
- [x] Regression tests added:
|
||||||
|
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing and HTTP transport recording
|
||||||
|
- `test_pii_sanitizer.py` - Ensures API key sanitization
|
||||||
|
|
||||||
|
## Code Example
|
||||||
|
|
||||||
|
**Before:**
|
||||||
|
```python
|
||||||
|
# Incorrect - manual parsing
|
||||||
|
for content_item in response.output.content:
|
||||||
|
if content_item.type == "output_text":
|
||||||
|
content = content_item.text
|
||||||
|
```
|
||||||
|
|
||||||
|
**After:**
|
||||||
|
```python
|
||||||
|
# Correct - direct field access
|
||||||
|
content = response.output_text
|
||||||
|
```
|
||||||
|
|
||||||
|
## For Reviewers
|
||||||
|
|
||||||
|
- Core fix: `providers/openai_compatible.py` - see `_safe_extract_output_text()` method
|
||||||
|
- Response parsing: `_generate_with_responses_endpoint()` method now uses the direct field
|
||||||
|
- Test infrastructure changes don't affect production code
|
||||||
|
- All test recordings sanitized for security
|
||||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -1,104 +0,0 @@
|
|||||||
"""
|
|
||||||
Tests for o3-pro output_text parsing fix using HTTP-level recording via respx.
|
|
||||||
|
|
||||||
This test validates the fix using real OpenAI SDK objects by recording/replaying
|
|
||||||
HTTP responses instead of creating mock objects.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import unittest
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
|
|
||||||
from tests.test_helpers.http_recorder import HTTPRecorder
|
|
||||||
from tools.chat import ChatTool
|
|
||||||
|
|
||||||
# Load environment variables from .env file
|
|
||||||
load_dotenv()
|
|
||||||
|
|
||||||
# Use absolute path for cassette directory
|
|
||||||
cassette_dir = Path(__file__).parent / "http_cassettes"
|
|
||||||
cassette_dir.mkdir(exist_ok=True)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.no_mock_provider # Disable provider mocking for this test
|
|
||||||
class TestO3ProHTTPRecording(unittest.IsolatedAsyncioTestCase):
|
|
||||||
"""Test o3-pro response parsing using HTTP-level recording with real SDK objects."""
|
|
||||||
|
|
||||||
async def test_o3_pro_real_sdk_objects(self):
|
|
||||||
"""Test that o3-pro parsing works with real OpenAI SDK objects from HTTP replay."""
|
|
||||||
# Skip if no API key available and cassette doesn't exist
|
|
||||||
cassette_path = cassette_dir / "o3_pro_real_sdk.json"
|
|
||||||
if not cassette_path.exists() and not os.getenv("OPENAI_API_KEY"):
|
|
||||||
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
|
|
||||||
|
|
||||||
# Use HTTPRecorder to record/replay raw HTTP responses
|
|
||||||
async with HTTPRecorder(str(cassette_path)):
|
|
||||||
# Execute the chat tool test - real SDK objects will be created
|
|
||||||
result = await self._execute_chat_tool_test()
|
|
||||||
|
|
||||||
# Verify the response works correctly with real SDK objects
|
|
||||||
self._verify_chat_tool_response(result)
|
|
||||||
|
|
||||||
# Verify cassette was created in record mode
|
|
||||||
if os.getenv("OPENAI_API_KEY") and not os.getenv("OPENAI_API_KEY").startswith("dummy"):
|
|
||||||
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
|
|
||||||
|
|
||||||
async def _execute_chat_tool_test(self):
|
|
||||||
"""Execute the ChatTool with o3-pro and return the result."""
|
|
||||||
chat_tool = ChatTool()
|
|
||||||
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
|
|
||||||
|
|
||||||
return await chat_tool.execute(arguments)
|
|
||||||
|
|
||||||
def _verify_chat_tool_response(self, result):
|
|
||||||
"""Verify the ChatTool response contains expected data."""
|
|
||||||
# Verify we got a valid response
|
|
||||||
self.assertIsNotNone(result, "Should get response from ChatTool")
|
|
||||||
|
|
||||||
# Parse the result content (ChatTool returns MCP TextContent format)
|
|
||||||
self.assertIsInstance(result, list, "ChatTool should return list of content")
|
|
||||||
self.assertTrue(len(result) > 0, "Should have at least one content item")
|
|
||||||
|
|
||||||
# Get the text content (result is a list of TextContent objects)
|
|
||||||
content_item = result[0]
|
|
||||||
self.assertEqual(content_item.type, "text", "First item should be text content")
|
|
||||||
|
|
||||||
text_content = content_item.text
|
|
||||||
self.assertTrue(len(text_content) > 0, "Should have text content")
|
|
||||||
|
|
||||||
# Parse the JSON response from chat tool
|
|
||||||
import json
|
|
||||||
try:
|
|
||||||
response_data = json.loads(text_content)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
|
|
||||||
|
|
||||||
# Verify the response makes sense for the math question
|
|
||||||
actual_content = response_data.get("content", "")
|
|
||||||
self.assertIn("4", actual_content, "Should contain the answer '4'")
|
|
||||||
|
|
||||||
# Verify metadata shows o3-pro was used
|
|
||||||
metadata = response_data.get("metadata", {})
|
|
||||||
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
|
|
||||||
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
|
|
||||||
|
|
||||||
# Additional verification that the fix is working
|
|
||||||
self.assertTrue(actual_content.strip(), "Content should not be empty")
|
|
||||||
self.assertIsInstance(actual_content, str, "Content should be string")
|
|
||||||
|
|
||||||
# Verify successful status
|
|
||||||
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
print("🌐 HTTP-Level Recording Tests for O3-Pro with Real SDK Objects")
|
|
||||||
print("=" * 60)
|
|
||||||
print("FIRST RUN: Requires OPENAI_API_KEY - records HTTP responses (EXPENSIVE!)")
|
|
||||||
print("SUBSEQUENT RUNS: Uses recorded HTTP responses - free and fast")
|
|
||||||
print("RECORDING: Delete .json files in tests/http_cassettes/ to re-record")
|
|
||||||
print()
|
|
||||||
|
|
||||||
unittest.main()
|
|
||||||
@@ -1,10 +1,10 @@
|
|||||||
"""
|
"""
|
||||||
Tests for o3-pro output_text parsing fix using respx response recording.
|
Tests for o3-pro output_text parsing fix using HTTP transport recording.
|
||||||
|
|
||||||
This test validates the fix that uses `response.output_text` convenience field
|
This test validates the fix that uses `response.output_text` convenience field
|
||||||
instead of manually parsing `response.output.content[].text`.
|
instead of manually parsing `response.output.content[].text`.
|
||||||
|
|
||||||
Uses respx to record real o3-pro API responses at the HTTP level while allowing
|
Uses HTTP transport recorder to record real o3-pro API responses at the HTTP level while allowing
|
||||||
the OpenAI SDK to create real response objects that we can test.
|
the OpenAI SDK to create real response objects that we can test.
|
||||||
|
|
||||||
RECORDING: To record new responses, delete the cassette file and run with real API keys.
|
RECORDING: To record new responses, delete the cassette file and run with real API keys.
|
||||||
|
|||||||
@@ -1,104 +0,0 @@
|
|||||||
"""
|
|
||||||
Tests for o3-pro output_text parsing fix using respx for HTTP recording/replay.
|
|
||||||
|
|
||||||
This test uses respx's built-in recording capabilities to record/replay HTTP responses,
|
|
||||||
allowing the OpenAI SDK to create real response objects with all convenience methods.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import os
|
|
||||||
import unittest
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
import pytest
|
|
||||||
from dotenv import load_dotenv
|
|
||||||
|
|
||||||
from tests.test_helpers.respx_recorder import RespxRecorder
|
|
||||||
from tools.chat import ChatTool
|
|
||||||
|
|
||||||
# Load environment variables from .env file
|
|
||||||
load_dotenv()
|
|
||||||
|
|
||||||
# Use absolute path for cassette directory
|
|
||||||
cassette_dir = Path(__file__).parent / "respx_cassettes"
|
|
||||||
cassette_dir.mkdir(exist_ok=True)
|
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.no_mock_provider # Disable provider mocking for this test
|
|
||||||
class TestO3ProRespxSimple(unittest.IsolatedAsyncioTestCase):
|
|
||||||
"""Test o3-pro response parsing using respx for HTTP recording/replay."""
|
|
||||||
|
|
||||||
async def test_o3_pro_with_respx_recording(self):
|
|
||||||
"""Test o3-pro parsing with respx HTTP recording - real SDK objects."""
|
|
||||||
cassette_path = cassette_dir / "o3_pro_respx.json"
|
|
||||||
|
|
||||||
# Skip if no API key available and no cassette exists
|
|
||||||
if not cassette_path.exists() and (not os.getenv("OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY").startswith("dummy")):
|
|
||||||
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
|
|
||||||
|
|
||||||
# Use RespxRecorder for automatic recording/replay
|
|
||||||
async with RespxRecorder(str(cassette_path)) as recorder:
|
|
||||||
# Execute the chat tool test - recorder handles recording or replay automatically
|
|
||||||
result = await self._execute_chat_tool_test()
|
|
||||||
|
|
||||||
# Verify the response works correctly with real SDK objects
|
|
||||||
self._verify_chat_tool_response(result)
|
|
||||||
|
|
||||||
# Verify cassette was created in record mode
|
|
||||||
if not os.getenv("OPENAI_API_KEY", "").startswith("dummy"):
|
|
||||||
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
|
|
||||||
|
|
||||||
async def _execute_chat_tool_test(self):
|
|
||||||
"""Execute the ChatTool with o3-pro and return the result."""
|
|
||||||
chat_tool = ChatTool()
|
|
||||||
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
|
|
||||||
|
|
||||||
return await chat_tool.execute(arguments)
|
|
||||||
|
|
||||||
def _verify_chat_tool_response(self, result):
|
|
||||||
"""Verify the ChatTool response contains expected data."""
|
|
||||||
# Verify we got a valid response
|
|
||||||
self.assertIsNotNone(result, "Should get response from ChatTool")
|
|
||||||
|
|
||||||
# Parse the result content (ChatTool returns MCP TextContent format)
|
|
||||||
self.assertIsInstance(result, list, "ChatTool should return list of content")
|
|
||||||
self.assertTrue(len(result) > 0, "Should have at least one content item")
|
|
||||||
|
|
||||||
# Get the text content (result is a list of TextContent objects)
|
|
||||||
content_item = result[0]
|
|
||||||
self.assertEqual(content_item.type, "text", "First item should be text content")
|
|
||||||
|
|
||||||
text_content = content_item.text
|
|
||||||
self.assertTrue(len(text_content) > 0, "Should have text content")
|
|
||||||
|
|
||||||
# Parse the JSON response from chat tool
|
|
||||||
import json
|
|
||||||
try:
|
|
||||||
response_data = json.loads(text_content)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
|
|
||||||
|
|
||||||
# Verify the response makes sense for the math question
|
|
||||||
actual_content = response_data.get("content", "")
|
|
||||||
self.assertIn("4", actual_content, "Should contain the answer '4'")
|
|
||||||
|
|
||||||
# Verify metadata shows o3-pro was used
|
|
||||||
metadata = response_data.get("metadata", {})
|
|
||||||
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
|
|
||||||
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
|
|
||||||
|
|
||||||
# Additional verification
|
|
||||||
self.assertTrue(actual_content.strip(), "Content should not be empty")
|
|
||||||
self.assertIsInstance(actual_content, str, "Content should be string")
|
|
||||||
|
|
||||||
# Verify successful status
|
|
||||||
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
print("🔥 Respx HTTP Recording Tests for O3-Pro with Real SDK Objects")
|
|
||||||
print("=" * 60)
|
|
||||||
print("This tests the concept of using respx for HTTP-level recording")
|
|
||||||
print("Currently using pass_through mode to validate the approach")
|
|
||||||
print()
|
|
||||||
|
|
||||||
unittest.main()
|
|
||||||
Reference in New Issue
Block a user