refactor: Clean up test files and simplify documentation
- Remove unused cassette files with incomplete recordings - Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py) - Fix respx references in docstrings to mention HTTP transport recorder - Simplify vcr-testing.md documentation (60% reduction, more task-oriented) - Add simplified PR template with better test instructions - Fix cassette path consistency in examples - Add security note about reviewing cassettes before committing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1,216 +1,87 @@
|
||||
# HTTP Recording/Replay Testing with HTTP Transport Recorder
|
||||
# HTTP Transport Recorder for Testing
|
||||
|
||||
This project uses a custom HTTP Transport Recorder for testing expensive API integrations (like o3-pro) with real recorded responses.
|
||||
A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.
|
||||
|
||||
## What is HTTP Transport Recorder?
|
||||
## Overview
|
||||
|
||||
The HTTP Transport Recorder is a custom httpx transport implementation that intercepts HTTP requests/responses at the transport layer. This approach provides:
|
||||
The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
|
||||
- Cost-efficient testing of expensive APIs (record once, replay forever)
|
||||
- Deterministic tests with real API responses
|
||||
- Seamless integration with httpx and OpenAI SDK
|
||||
|
||||
- **Real API structure**: Tests use actual API responses, not guessed mocks
|
||||
- **Cost efficiency**: Only pay for API calls once during recording
|
||||
- **Deterministic tests**: Same response every time, no API variability
|
||||
- **Transport-level interception**: Works seamlessly with httpx and OpenAI SDK
|
||||
- **Full response capture**: Captures complete HTTP responses including headers and gzipped content
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
tests/
|
||||
├── openai_cassettes/ # Recorded HTTP interactions
|
||||
│ ├── o3_pro_basic_math.json
|
||||
│ └── o3_pro_content_capture.json
|
||||
├── http_transport_recorder.py # Transport recorder implementation
|
||||
├── test_content_capture.py # Example recording test
|
||||
└── test_replay.py # Example replay test
|
||||
```
|
||||
|
||||
## Key Components
|
||||
|
||||
### RecordingTransport
|
||||
- Wraps httpx's default transport
|
||||
- Makes real HTTP calls and captures responses
|
||||
- Handles gzip compression/decompression properly
|
||||
- Saves interactions to JSON cassettes
|
||||
|
||||
### ReplayTransport
|
||||
- Serves saved responses from cassettes
|
||||
- No real HTTP calls made
|
||||
- Matches requests by method, path, and content hash
|
||||
- Re-applies gzip compression when needed
|
||||
|
||||
### TransportFactory
|
||||
- Auto-selects record vs replay mode based on cassette existence
|
||||
- Simplifies test setup
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Use Transport Recorder in Tests
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from tests.http_transport_recorder import TransportFactory
|
||||
from providers import ModelProviderRegistry
|
||||
|
||||
# Create transport based on cassette existence
|
||||
# Setup transport recorder
|
||||
cassette_path = "tests/openai_cassettes/my_test.json"
|
||||
transport = TransportFactory.create_transport(cassette_path)
|
||||
|
||||
# Inject into OpenAI provider
|
||||
# Inject into provider
|
||||
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
|
||||
provider._test_transport = transport
|
||||
|
||||
# Make API calls - will be recorded/replayed automatically
|
||||
```
|
||||
|
||||
### 2. Initial Recording (Expensive)
|
||||
|
||||
```bash
|
||||
# With real API key, cassette doesn't exist -> records
|
||||
python test_content_capture.py
|
||||
|
||||
# ⚠️ This will cost money! O3-Pro is $15-60 per 1K tokens
|
||||
# But only needs to be done once
|
||||
```
|
||||
|
||||
### 3. Subsequent Runs (Free)
|
||||
|
||||
```bash
|
||||
# Cassette exists -> replays
|
||||
python test_replay.py
|
||||
|
||||
# Can even use fake API key to prove no real calls
|
||||
OPENAI_API_KEY="sk-fake-key" python test_replay.py
|
||||
|
||||
# Fast, free, deterministic
|
||||
```
|
||||
|
||||
### 4. Re-recording (When API Changes)
|
||||
|
||||
```bash
|
||||
# Delete cassette to force re-recording
|
||||
rm tests/openai_cassettes/my_test.json
|
||||
|
||||
# Run test again with real API key
|
||||
python test_content_capture.py
|
||||
# Make API calls - automatically recorded/replayed
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Transport Injection**: Custom transport injected into httpx client
|
||||
2. **Request Interception**: All HTTP requests go through custom transport
|
||||
3. **Mode Detection**: Checks if cassette exists (replay) or needs creation (record)
|
||||
4. **Content Capture**: Properly handles streaming responses and gzip encoding
|
||||
5. **Request Matching**: Uses method + path + content hash for deterministic matching
|
||||
1. **First run** (cassette doesn't exist): Records real API calls
|
||||
2. **Subsequent runs** (cassette exists): Replays saved responses
|
||||
3. **Re-record**: Delete cassette file and run again
|
||||
|
||||
## Cassette Format
|
||||
## Usage in Tests
|
||||
|
||||
```json
|
||||
{
|
||||
"interactions": [
|
||||
{
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"url": "https://api.openai.com/v1/responses",
|
||||
"path": "/v1/responses",
|
||||
"headers": {
|
||||
"content-type": "application/json",
|
||||
"accept-encoding": "gzip, deflate"
|
||||
},
|
||||
"content": {
|
||||
"model": "o3-pro-2025-06-10",
|
||||
"input": [...],
|
||||
"reasoning": {"effort": "medium"}
|
||||
}
|
||||
},
|
||||
"response": {
|
||||
"status_code": 200,
|
||||
"headers": {
|
||||
"content-type": "application/json",
|
||||
"content-encoding": "gzip"
|
||||
},
|
||||
"content": {
|
||||
"data": "base64_encoded_response_body",
|
||||
"encoding": "base64",
|
||||
"size": 1413
|
||||
},
|
||||
"reason_phrase": "OK"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Key features:
|
||||
- Complete request/response capture
|
||||
- Base64 encoding for binary content
|
||||
- Preserves gzip compression
|
||||
- Sanitizes sensitive data (API keys removed)
|
||||
|
||||
## Benefits Over Previous Approaches
|
||||
|
||||
1. **Works with any HTTP client**: Not tied to OpenAI SDK specifically
|
||||
2. **Handles compression**: Properly manages gzipped responses
|
||||
3. **Full HTTP fidelity**: Captures headers, status codes, etc.
|
||||
4. **Simpler than VCR.py**: No sync/async conflicts or monkey patching
|
||||
5. **Better than respx**: No streaming response issues
|
||||
|
||||
## Example Test
|
||||
See `test_o3_pro_output_text_fix.py` for a complete example:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
from tests.http_transport_recorder import TransportFactory
|
||||
from providers import ModelProviderRegistry
|
||||
from tools.chat import ChatTool
|
||||
|
||||
async def test_with_recording():
|
||||
cassette_path = "tests/openai_cassettes/test_example.json"
|
||||
|
||||
# Setup transport
|
||||
transport = TransportFactory.create_transport(cassette_path)
|
||||
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
|
||||
# Transport factory auto-detects record vs replay mode
|
||||
transport = TransportFactory.create_transport("tests/openai_cassettes/my_test.json")
|
||||
provider._test_transport = transport
|
||||
|
||||
# Use ChatTool normally
|
||||
chat_tool = ChatTool()
|
||||
result = await chat_tool.execute({
|
||||
"prompt": "What is 2+2?",
|
||||
"model": "o3-pro",
|
||||
"temperature": 1.0
|
||||
})
|
||||
|
||||
print(f"Response: {result[0].text}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(test_with_recording())
|
||||
# Use normally - recording happens transparently
|
||||
result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
|
||||
```
|
||||
|
||||
## Timeout Protection
|
||||
## File Structure
|
||||
|
||||
Tests can use GNU timeout to prevent hanging:
|
||||
|
||||
```bash
|
||||
# Install GNU coreutils if needed
|
||||
brew install coreutils
|
||||
|
||||
# Run with 30 second timeout
|
||||
gtimeout 30s python test_content_capture.py
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
```yaml
|
||||
# In CI, tests use existing cassettes (no API keys needed)
|
||||
- name: Run OpenAI tests
|
||||
run: |
|
||||
# Tests will use replay mode with existing cassettes
|
||||
python -m pytest tests/test_o3_pro.py
|
||||
tests/
|
||||
├── openai_cassettes/ # Recorded API interactions
|
||||
│ └── *.json # Cassette files
|
||||
├── http_transport_recorder.py # Transport implementation
|
||||
└── test_o3_pro_output_text_fix.py # Example usage
|
||||
```
|
||||
|
||||
## Cost Management
|
||||
|
||||
- **One-time cost**: Initial recording per test scenario
|
||||
- **One-time cost**: Initial recording only
|
||||
- **Zero ongoing cost**: Replays are free
|
||||
- **Controlled re-recording**: Manual cassette deletion required
|
||||
- **CI-friendly**: No accidental API calls in automation
|
||||
- **CI-friendly**: No API keys needed for replay
|
||||
|
||||
## Re-recording
|
||||
|
||||
When API changes require new recordings:
|
||||
|
||||
```bash
|
||||
# Delete specific cassette
|
||||
rm tests/openai_cassettes/my_test.json
|
||||
|
||||
# Run test with real API key
|
||||
python -m pytest tests/test_o3_pro_output_text_fix.py
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
- **RecordingTransport**: Captures real HTTP calls
|
||||
- **ReplayTransport**: Serves saved responses
|
||||
- **TransportFactory**: Auto-selects mode based on cassette existence
|
||||
- **PII Sanitization**: Automatically removes API keys from recordings
|
||||
|
||||
**Security Note**: Always review new cassette files before committing to ensure no sensitive data is included.
|
||||
|
||||
For implementation details, see `tests/http_transport_recorder.py`.
|
||||
|
||||
This HTTP transport recorder approach provides accurate API testing with cost efficiency, specifically optimized for expensive endpoints like o3-pro while being flexible enough for any HTTP-based API.
|
||||
99
pr_template_filled.md
Normal file
99
pr_template_filled.md
Normal file
@@ -0,0 +1,99 @@
|
||||
## PR Title Format
|
||||
|
||||
**fix: Fix o3-pro empty response issue by using output_text field**
|
||||
|
||||
## Description
|
||||
|
||||
This PR fixes a critical bug where o3-pro API calls were returning empty responses. The root cause was incorrect response parsing - the code was trying to manually parse `response.output.content[]` array structure, but o3-pro provides a simpler `output_text` convenience field directly on the response object. This PR also introduces a secure HTTP recording system for testing expensive o3-pro calls.
|
||||
|
||||
## Changes Made
|
||||
|
||||
- [x] Fixed o3-pro response parsing by using the `output_text` convenience field instead of manual parsing
|
||||
- [x] Added `_safe_extract_output_text` method with proper validation to handle o3-pro's response format
|
||||
- [x] Implemented custom HTTP transport recorder to replace respx for more reliable test recordings
|
||||
- [x] Added comprehensive PII sanitization to prevent accidental API key exposure in test cassettes
|
||||
- [x] Sanitized all existing test cassettes to remove any exposed secrets
|
||||
- [x] Updated documentation for the new testing infrastructure
|
||||
- [x] Added test suite to validate the fix and ensure PII sanitization works correctly
|
||||
|
||||
**No breaking changes** - The fix only affects o3-pro model parsing internally.
|
||||
|
||||
**Dependencies added:**
|
||||
- None (uses existing httpx and standard library modules)
|
||||
|
||||
## Testing
|
||||
|
||||
### Run all linting and tests (required):
|
||||
```bash
|
||||
# Activate virtual environment first
|
||||
source venv/bin/activate
|
||||
|
||||
# Run comprehensive code quality checks (recommended)
|
||||
./code_quality_checks.sh
|
||||
|
||||
# If you made tool changes, also run simulator tests
|
||||
python communication_simulator_test.py
|
||||
```
|
||||
|
||||
- [x] All linting passes (ruff, black, isort)
|
||||
- [x] All unit tests pass
|
||||
- [x] **For bug fixes**: Tests added to prevent regression
|
||||
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing works correctly
|
||||
- `test_o3_pro_http_recording.py` - Tests HTTP recording functionality
|
||||
- `test_pii_sanitizer.py` - Ensures PII sanitization works properly
|
||||
- [x] Manual testing completed with realistic scenarios
|
||||
- Verified o3-pro calls return actual content instead of empty responses
|
||||
- Validated that recorded cassettes contain no exposed API keys
|
||||
|
||||
## Related Issues
|
||||
|
||||
Fixes o3-pro API calls returning empty responses on master branch.
|
||||
|
||||
## Checklist
|
||||
|
||||
- [x] PR title follows the format guidelines above
|
||||
- [x] **Activated venv and ran code quality checks: `source venv/bin/activate && ./code_quality_checks.sh`**
|
||||
- [x] Self-review completed
|
||||
- [x] **Tests added for ALL changes** (see Testing section above)
|
||||
- [x] Documentation updated as needed
|
||||
- Updated `docs/testing.md` with new testing approach
|
||||
- Added `docs/vcr-testing.md` for HTTP recording documentation
|
||||
- [x] All unit tests passing
|
||||
- [x] Ready for review
|
||||
|
||||
## Additional Notes
|
||||
|
||||
### The Bug:
|
||||
On master branch, o3-pro API calls were returning empty responses because the code was trying to parse the response incorrectly:
|
||||
```python
|
||||
# Master branch - incorrect parsing
|
||||
if hasattr(response.output, "content") and response.output.content:
|
||||
for content_item in response.output.content:
|
||||
if hasattr(content_item, "type") and content_item.type == "output_text":
|
||||
content = content_item.text
|
||||
break
|
||||
```
|
||||
|
||||
The o3-pro response object actually provides an `output_text` convenience field directly:
|
||||
```python
|
||||
# Fixed version - correct parsing
|
||||
content = response.output_text
|
||||
```
|
||||
|
||||
### The Fix:
|
||||
1. Added `_safe_extract_output_text` method that properly validates and extracts the `output_text` field
|
||||
2. Updated the response parsing logic in `_generate_with_responses_endpoint` to use this new method
|
||||
3. Added proper error handling and validation to catch future response format issues
|
||||
|
||||
### Additional Improvements:
|
||||
- **Testing Infrastructure**: Implemented HTTP transport recorder to enable testing without repeated expensive API calls
|
||||
- **Security**: Added automatic PII sanitization to prevent API keys from being accidentally committed in test recordings
|
||||
|
||||
### Development Notes:
|
||||
- During development, we encountered timeout issues with the initial respx-based approach which led to implementing the custom HTTP transport recorder
|
||||
- The transport recorder solution properly handles streaming responses and gzip compression
|
||||
|
||||
### For Reviewers:
|
||||
- The core fix is in `providers/openai_compatible.py` lines 307-335 and line 396
|
||||
- The HTTP transport recorder is test infrastructure only and doesn't affect production code
|
||||
- All test cassettes have been sanitized and verified to contain no secrets
|
||||
59
pr_template_filled_simplified.md
Normal file
59
pr_template_filled_simplified.md
Normal file
@@ -0,0 +1,59 @@
|
||||
## PR Title
|
||||
|
||||
**fix: Fix o3-pro empty response issue by using output_text field**
|
||||
|
||||
## Summary
|
||||
|
||||
Fixes o3-pro API calls returning empty responses due to incorrect response parsing. The code was trying to parse `response.output.content[]` array, but o3-pro provides `output_text` directly.
|
||||
|
||||
## Changes
|
||||
|
||||
- Fixed o3-pro response parsing to use `output_text` field
|
||||
- Added `_safe_extract_output_text` method with validation
|
||||
- Implemented HTTP transport recorder for testing expensive API calls
|
||||
- Added PII sanitization for test recordings
|
||||
- Added regression tests
|
||||
|
||||
**No breaking changes** - Internal fix only
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
./code_quality_checks.sh
|
||||
|
||||
# Run the new tests added in this PR
|
||||
python -m pytest tests/test_o3_pro_output_text_fix.py -v
|
||||
python -m pytest tests/test_pii_sanitizer.py -v
|
||||
|
||||
# Or run all new tests together
|
||||
python -m pytest tests/test_o3_pro_output_text_fix.py tests/test_pii_sanitizer.py -v
|
||||
```
|
||||
|
||||
- [x] All checks pass
|
||||
- [x] Regression tests added:
|
||||
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing and HTTP transport recording
|
||||
- `test_pii_sanitizer.py` - Ensures API key sanitization
|
||||
|
||||
## Code Example
|
||||
|
||||
**Before:**
|
||||
```python
|
||||
# Incorrect - manual parsing
|
||||
for content_item in response.output.content:
|
||||
if content_item.type == "output_text":
|
||||
content = content_item.text
|
||||
```
|
||||
|
||||
**After:**
|
||||
```python
|
||||
# Correct - direct field access
|
||||
content = response.output_text
|
||||
```
|
||||
|
||||
## For Reviewers
|
||||
|
||||
- Core fix: `providers/openai_compatible.py` - see `_safe_extract_output_text()` method
|
||||
- Response parsing: `_generate_with_responses_endpoint()` method now uses the direct field
|
||||
- Test infrastructure changes don't affect production code
|
||||
- All test recordings sanitized for security
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -1,104 +0,0 @@
|
||||
"""
|
||||
Tests for o3-pro output_text parsing fix using HTTP-level recording via respx.
|
||||
|
||||
This test validates the fix using real OpenAI SDK objects by recording/replaying
|
||||
HTTP responses instead of creating mock objects.
|
||||
"""
|
||||
|
||||
import os
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from tests.test_helpers.http_recorder import HTTPRecorder
|
||||
from tools.chat import ChatTool
|
||||
|
||||
# Load environment variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
# Use absolute path for cassette directory
|
||||
cassette_dir = Path(__file__).parent / "http_cassettes"
|
||||
cassette_dir.mkdir(exist_ok=True)
|
||||
|
||||
|
||||
@pytest.mark.no_mock_provider # Disable provider mocking for this test
|
||||
class TestO3ProHTTPRecording(unittest.IsolatedAsyncioTestCase):
|
||||
"""Test o3-pro response parsing using HTTP-level recording with real SDK objects."""
|
||||
|
||||
async def test_o3_pro_real_sdk_objects(self):
|
||||
"""Test that o3-pro parsing works with real OpenAI SDK objects from HTTP replay."""
|
||||
# Skip if no API key available and cassette doesn't exist
|
||||
cassette_path = cassette_dir / "o3_pro_real_sdk.json"
|
||||
if not cassette_path.exists() and not os.getenv("OPENAI_API_KEY"):
|
||||
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
|
||||
|
||||
# Use HTTPRecorder to record/replay raw HTTP responses
|
||||
async with HTTPRecorder(str(cassette_path)):
|
||||
# Execute the chat tool test - real SDK objects will be created
|
||||
result = await self._execute_chat_tool_test()
|
||||
|
||||
# Verify the response works correctly with real SDK objects
|
||||
self._verify_chat_tool_response(result)
|
||||
|
||||
# Verify cassette was created in record mode
|
||||
if os.getenv("OPENAI_API_KEY") and not os.getenv("OPENAI_API_KEY").startswith("dummy"):
|
||||
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
|
||||
|
||||
async def _execute_chat_tool_test(self):
|
||||
"""Execute the ChatTool with o3-pro and return the result."""
|
||||
chat_tool = ChatTool()
|
||||
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
|
||||
|
||||
return await chat_tool.execute(arguments)
|
||||
|
||||
def _verify_chat_tool_response(self, result):
|
||||
"""Verify the ChatTool response contains expected data."""
|
||||
# Verify we got a valid response
|
||||
self.assertIsNotNone(result, "Should get response from ChatTool")
|
||||
|
||||
# Parse the result content (ChatTool returns MCP TextContent format)
|
||||
self.assertIsInstance(result, list, "ChatTool should return list of content")
|
||||
self.assertTrue(len(result) > 0, "Should have at least one content item")
|
||||
|
||||
# Get the text content (result is a list of TextContent objects)
|
||||
content_item = result[0]
|
||||
self.assertEqual(content_item.type, "text", "First item should be text content")
|
||||
|
||||
text_content = content_item.text
|
||||
self.assertTrue(len(text_content) > 0, "Should have text content")
|
||||
|
||||
# Parse the JSON response from chat tool
|
||||
import json
|
||||
try:
|
||||
response_data = json.loads(text_content)
|
||||
except json.JSONDecodeError:
|
||||
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
|
||||
|
||||
# Verify the response makes sense for the math question
|
||||
actual_content = response_data.get("content", "")
|
||||
self.assertIn("4", actual_content, "Should contain the answer '4'")
|
||||
|
||||
# Verify metadata shows o3-pro was used
|
||||
metadata = response_data.get("metadata", {})
|
||||
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
|
||||
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
|
||||
|
||||
# Additional verification that the fix is working
|
||||
self.assertTrue(actual_content.strip(), "Content should not be empty")
|
||||
self.assertIsInstance(actual_content, str, "Content should be string")
|
||||
|
||||
# Verify successful status
|
||||
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🌐 HTTP-Level Recording Tests for O3-Pro with Real SDK Objects")
|
||||
print("=" * 60)
|
||||
print("FIRST RUN: Requires OPENAI_API_KEY - records HTTP responses (EXPENSIVE!)")
|
||||
print("SUBSEQUENT RUNS: Uses recorded HTTP responses - free and fast")
|
||||
print("RECORDING: Delete .json files in tests/http_cassettes/ to re-record")
|
||||
print()
|
||||
|
||||
unittest.main()
|
||||
@@ -1,10 +1,10 @@
|
||||
"""
|
||||
Tests for o3-pro output_text parsing fix using respx response recording.
|
||||
Tests for o3-pro output_text parsing fix using HTTP transport recording.
|
||||
|
||||
This test validates the fix that uses `response.output_text` convenience field
|
||||
instead of manually parsing `response.output.content[].text`.
|
||||
|
||||
Uses respx to record real o3-pro API responses at the HTTP level while allowing
|
||||
Uses HTTP transport recorder to record real o3-pro API responses at the HTTP level while allowing
|
||||
the OpenAI SDK to create real response objects that we can test.
|
||||
|
||||
RECORDING: To record new responses, delete the cassette file and run with real API keys.
|
||||
|
||||
@@ -1,104 +0,0 @@
|
||||
"""
|
||||
Tests for o3-pro output_text parsing fix using respx for HTTP recording/replay.
|
||||
|
||||
This test uses respx's built-in recording capabilities to record/replay HTTP responses,
|
||||
allowing the OpenAI SDK to create real response objects with all convenience methods.
|
||||
"""
|
||||
|
||||
import os
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from dotenv import load_dotenv
|
||||
|
||||
from tests.test_helpers.respx_recorder import RespxRecorder
|
||||
from tools.chat import ChatTool
|
||||
|
||||
# Load environment variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
# Use absolute path for cassette directory
|
||||
cassette_dir = Path(__file__).parent / "respx_cassettes"
|
||||
cassette_dir.mkdir(exist_ok=True)
|
||||
|
||||
|
||||
@pytest.mark.no_mock_provider # Disable provider mocking for this test
|
||||
class TestO3ProRespxSimple(unittest.IsolatedAsyncioTestCase):
|
||||
"""Test o3-pro response parsing using respx for HTTP recording/replay."""
|
||||
|
||||
async def test_o3_pro_with_respx_recording(self):
|
||||
"""Test o3-pro parsing with respx HTTP recording - real SDK objects."""
|
||||
cassette_path = cassette_dir / "o3_pro_respx.json"
|
||||
|
||||
# Skip if no API key available and no cassette exists
|
||||
if not cassette_path.exists() and (not os.getenv("OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY").startswith("dummy")):
|
||||
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
|
||||
|
||||
# Use RespxRecorder for automatic recording/replay
|
||||
async with RespxRecorder(str(cassette_path)) as recorder:
|
||||
# Execute the chat tool test - recorder handles recording or replay automatically
|
||||
result = await self._execute_chat_tool_test()
|
||||
|
||||
# Verify the response works correctly with real SDK objects
|
||||
self._verify_chat_tool_response(result)
|
||||
|
||||
# Verify cassette was created in record mode
|
||||
if not os.getenv("OPENAI_API_KEY", "").startswith("dummy"):
|
||||
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
|
||||
|
||||
async def _execute_chat_tool_test(self):
|
||||
"""Execute the ChatTool with o3-pro and return the result."""
|
||||
chat_tool = ChatTool()
|
||||
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
|
||||
|
||||
return await chat_tool.execute(arguments)
|
||||
|
||||
def _verify_chat_tool_response(self, result):
|
||||
"""Verify the ChatTool response contains expected data."""
|
||||
# Verify we got a valid response
|
||||
self.assertIsNotNone(result, "Should get response from ChatTool")
|
||||
|
||||
# Parse the result content (ChatTool returns MCP TextContent format)
|
||||
self.assertIsInstance(result, list, "ChatTool should return list of content")
|
||||
self.assertTrue(len(result) > 0, "Should have at least one content item")
|
||||
|
||||
# Get the text content (result is a list of TextContent objects)
|
||||
content_item = result[0]
|
||||
self.assertEqual(content_item.type, "text", "First item should be text content")
|
||||
|
||||
text_content = content_item.text
|
||||
self.assertTrue(len(text_content) > 0, "Should have text content")
|
||||
|
||||
# Parse the JSON response from chat tool
|
||||
import json
|
||||
try:
|
||||
response_data = json.loads(text_content)
|
||||
except json.JSONDecodeError:
|
||||
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
|
||||
|
||||
# Verify the response makes sense for the math question
|
||||
actual_content = response_data.get("content", "")
|
||||
self.assertIn("4", actual_content, "Should contain the answer '4'")
|
||||
|
||||
# Verify metadata shows o3-pro was used
|
||||
metadata = response_data.get("metadata", {})
|
||||
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
|
||||
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
|
||||
|
||||
# Additional verification
|
||||
self.assertTrue(actual_content.strip(), "Content should not be empty")
|
||||
self.assertIsInstance(actual_content, str, "Content should be string")
|
||||
|
||||
# Verify successful status
|
||||
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🔥 Respx HTTP Recording Tests for O3-Pro with Real SDK Objects")
|
||||
print("=" * 60)
|
||||
print("This tests the concept of using respx for HTTP-level recording")
|
||||
print("Currently using pass_through mode to validate the approach")
|
||||
print()
|
||||
|
||||
unittest.main()
|
||||
Reference in New Issue
Block a user