refactor: Clean up test files and simplify documentation

- Remove unused cassette files with incomplete recordings
- Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py)
- Fix respx references in docstrings to mention HTTP transport recorder
- Simplify vcr-testing.md documentation (60% reduction, more task-oriented)
- Add simplified PR template with better test instructions
- Fix cassette path consistency in examples
- Add security note about reviewing cassettes before committing

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Josh Vera
2025-07-12 19:24:51 -06:00
parent 7f92085c70
commit a1451befd2
8 changed files with 212 additions and 651 deletions

View File

@@ -1,216 +1,87 @@
# HTTP Recording/Replay Testing with HTTP Transport Recorder
# HTTP Transport Recorder for Testing
This project uses a custom HTTP Transport Recorder for testing expensive API integrations (like o3-pro) with real recorded responses.
A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.
## What is HTTP Transport Recorder?
## Overview
The HTTP Transport Recorder is a custom httpx transport implementation that intercepts HTTP requests/responses at the transport layer. This approach provides:
The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
- Cost-efficient testing of expensive APIs (record once, replay forever)
- Deterministic tests with real API responses
- Seamless integration with httpx and OpenAI SDK
- **Real API structure**: Tests use actual API responses, not guessed mocks
- **Cost efficiency**: Only pay for API calls once during recording
- **Deterministic tests**: Same response every time, no API variability
- **Transport-level interception**: Works seamlessly with httpx and OpenAI SDK
- **Full response capture**: Captures complete HTTP responses including headers and gzipped content
## Directory Structure
```
tests/
├── openai_cassettes/ # Recorded HTTP interactions
│ ├── o3_pro_basic_math.json
│ └── o3_pro_content_capture.json
├── http_transport_recorder.py # Transport recorder implementation
├── test_content_capture.py # Example recording test
└── test_replay.py # Example replay test
```
## Key Components
### RecordingTransport
- Wraps httpx's default transport
- Makes real HTTP calls and captures responses
- Handles gzip compression/decompression properly
- Saves interactions to JSON cassettes
### ReplayTransport
- Serves saved responses from cassettes
- No real HTTP calls made
- Matches requests by method, path, and content hash
- Re-applies gzip compression when needed
### TransportFactory
- Auto-selects record vs replay mode based on cassette existence
- Simplifies test setup
## Workflow
### 1. Use Transport Recorder in Tests
## Quick Start
```python
from tests.http_transport_recorder import TransportFactory
from providers import ModelProviderRegistry
# Create transport based on cassette existence
# Setup transport recorder
cassette_path = "tests/openai_cassettes/my_test.json"
transport = TransportFactory.create_transport(cassette_path)
# Inject into OpenAI provider
# Inject into provider
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
provider._test_transport = transport
# Make API calls - will be recorded/replayed automatically
```
### 2. Initial Recording (Expensive)
```bash
# With real API key, cassette doesn't exist -> records
python test_content_capture.py
# ⚠️ This will cost money! O3-Pro is $15-60 per 1K tokens
# But only needs to be done once
```
### 3. Subsequent Runs (Free)
```bash
# Cassette exists -> replays
python test_replay.py
# Can even use fake API key to prove no real calls
OPENAI_API_KEY="sk-fake-key" python test_replay.py
# Fast, free, deterministic
```
### 4. Re-recording (When API Changes)
```bash
# Delete cassette to force re-recording
rm tests/openai_cassettes/my_test.json
# Run test again with real API key
python test_content_capture.py
# Make API calls - automatically recorded/replayed
```
## How It Works
1. **Transport Injection**: Custom transport injected into httpx client
2. **Request Interception**: All HTTP requests go through custom transport
3. **Mode Detection**: Checks if cassette exists (replay) or needs creation (record)
4. **Content Capture**: Properly handles streaming responses and gzip encoding
5. **Request Matching**: Uses method + path + content hash for deterministic matching
1. **First run** (cassette doesn't exist): Records real API calls
2. **Subsequent runs** (cassette exists): Replays saved responses
3. **Re-record**: Delete cassette file and run again
## Cassette Format
## Usage in Tests
```json
{
"interactions": [
{
"request": {
"method": "POST",
"url": "https://api.openai.com/v1/responses",
"path": "/v1/responses",
"headers": {
"content-type": "application/json",
"accept-encoding": "gzip, deflate"
},
"content": {
"model": "o3-pro-2025-06-10",
"input": [...],
"reasoning": {"effort": "medium"}
}
},
"response": {
"status_code": 200,
"headers": {
"content-type": "application/json",
"content-encoding": "gzip"
},
"content": {
"data": "base64_encoded_response_body",
"encoding": "base64",
"size": 1413
},
"reason_phrase": "OK"
}
}
]
}
```
Key features:
- Complete request/response capture
- Base64 encoding for binary content
- Preserves gzip compression
- Sanitizes sensitive data (API keys removed)
## Benefits Over Previous Approaches
1. **Works with any HTTP client**: Not tied to OpenAI SDK specifically
2. **Handles compression**: Properly manages gzipped responses
3. **Full HTTP fidelity**: Captures headers, status codes, etc.
4. **Simpler than VCR.py**: No sync/async conflicts or monkey patching
5. **Better than respx**: No streaming response issues
## Example Test
See `test_o3_pro_output_text_fix.py` for a complete example:
```python
#!/usr/bin/env python3
import asyncio
from pathlib import Path
from tests.http_transport_recorder import TransportFactory
from providers import ModelProviderRegistry
from tools.chat import ChatTool
async def test_with_recording():
cassette_path = "tests/openai_cassettes/test_example.json"
# Setup transport
transport = TransportFactory.create_transport(cassette_path)
provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
# Transport factory auto-detects record vs replay mode
transport = TransportFactory.create_transport("tests/openai_cassettes/my_test.json")
provider._test_transport = transport
# Use ChatTool normally
chat_tool = ChatTool()
result = await chat_tool.execute({
"prompt": "What is 2+2?",
"model": "o3-pro",
"temperature": 1.0
})
print(f"Response: {result[0].text}")
if __name__ == "__main__":
asyncio.run(test_with_recording())
# Use normally - recording happens transparently
result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
```
## Timeout Protection
## File Structure
Tests can use GNU timeout to prevent hanging:
```bash
# Install GNU coreutils if needed
brew install coreutils
# Run with 30 second timeout
gtimeout 30s python test_content_capture.py
```
## CI/CD Integration
```yaml
# In CI, tests use existing cassettes (no API keys needed)
- name: Run OpenAI tests
run: |
# Tests will use replay mode with existing cassettes
python -m pytest tests/test_o3_pro.py
tests/
├── openai_cassettes/ # Recorded API interactions
│ └── *.json # Cassette files
├── http_transport_recorder.py # Transport implementation
└── test_o3_pro_output_text_fix.py # Example usage
```
## Cost Management
- **One-time cost**: Initial recording per test scenario
- **One-time cost**: Initial recording only
- **Zero ongoing cost**: Replays are free
- **Controlled re-recording**: Manual cassette deletion required
- **CI-friendly**: No accidental API calls in automation
- **CI-friendly**: No API keys needed for replay
## Re-recording
When API changes require new recordings:
```bash
# Delete specific cassette
rm tests/openai_cassettes/my_test.json
# Run test with real API key
python -m pytest tests/test_o3_pro_output_text_fix.py
```
## Implementation Details
- **RecordingTransport**: Captures real HTTP calls
- **ReplayTransport**: Serves saved responses
- **TransportFactory**: Auto-selects mode based on cassette existence
- **PII Sanitization**: Automatically removes API keys from recordings
**Security Note**: Always review new cassette files before committing to ensure no sensitive data is included.
For implementation details, see `tests/http_transport_recorder.py`.
This HTTP transport recorder approach provides accurate API testing with cost efficiency, specifically optimized for expensive endpoints like o3-pro while being flexible enough for any HTTP-based API.

99
pr_template_filled.md Normal file
View File

@@ -0,0 +1,99 @@
## PR Title Format
**fix: Fix o3-pro empty response issue by using output_text field**
## Description
This PR fixes a critical bug where o3-pro API calls were returning empty responses. The root cause was incorrect response parsing - the code was trying to manually parse `response.output.content[]` array structure, but o3-pro provides a simpler `output_text` convenience field directly on the response object. This PR also introduces a secure HTTP recording system for testing expensive o3-pro calls.
## Changes Made
- [x] Fixed o3-pro response parsing by using the `output_text` convenience field instead of manual parsing
- [x] Added `_safe_extract_output_text` method with proper validation to handle o3-pro's response format
- [x] Implemented custom HTTP transport recorder to replace respx for more reliable test recordings
- [x] Added comprehensive PII sanitization to prevent accidental API key exposure in test cassettes
- [x] Sanitized all existing test cassettes to remove any exposed secrets
- [x] Updated documentation for the new testing infrastructure
- [x] Added test suite to validate the fix and ensure PII sanitization works correctly
**No breaking changes** - The fix only affects o3-pro model parsing internally.
**Dependencies added:**
- None (uses existing httpx and standard library modules)
## Testing
### Run all linting and tests (required):
```bash
# Activate virtual environment first
source venv/bin/activate
# Run comprehensive code quality checks (recommended)
./code_quality_checks.sh
# If you made tool changes, also run simulator tests
python communication_simulator_test.py
```
- [x] All linting passes (ruff, black, isort)
- [x] All unit tests pass
- [x] **For bug fixes**: Tests added to prevent regression
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing works correctly
- `test_o3_pro_http_recording.py` - Tests HTTP recording functionality
- `test_pii_sanitizer.py` - Ensures PII sanitization works properly
- [x] Manual testing completed with realistic scenarios
- Verified o3-pro calls return actual content instead of empty responses
- Validated that recorded cassettes contain no exposed API keys
## Related Issues
Fixes o3-pro API calls returning empty responses on master branch.
## Checklist
- [x] PR title follows the format guidelines above
- [x] **Activated venv and ran code quality checks: `source venv/bin/activate && ./code_quality_checks.sh`**
- [x] Self-review completed
- [x] **Tests added for ALL changes** (see Testing section above)
- [x] Documentation updated as needed
- Updated `docs/testing.md` with new testing approach
- Added `docs/vcr-testing.md` for HTTP recording documentation
- [x] All unit tests passing
- [x] Ready for review
## Additional Notes
### The Bug:
On master branch, o3-pro API calls were returning empty responses because the code was trying to parse the response incorrectly:
```python
# Master branch - incorrect parsing
if hasattr(response.output, "content") and response.output.content:
for content_item in response.output.content:
if hasattr(content_item, "type") and content_item.type == "output_text":
content = content_item.text
break
```
The o3-pro response object actually provides an `output_text` convenience field directly:
```python
# Fixed version - correct parsing
content = response.output_text
```
### The Fix:
1. Added `_safe_extract_output_text` method that properly validates and extracts the `output_text` field
2. Updated the response parsing logic in `_generate_with_responses_endpoint` to use this new method
3. Added proper error handling and validation to catch future response format issues
### Additional Improvements:
- **Testing Infrastructure**: Implemented HTTP transport recorder to enable testing without repeated expensive API calls
- **Security**: Added automatic PII sanitization to prevent API keys from being accidentally committed in test recordings
### Development Notes:
- During development, we encountered timeout issues with the initial respx-based approach which led to implementing the custom HTTP transport recorder
- The transport recorder solution properly handles streaming responses and gzip compression
### For Reviewers:
- The core fix is in `providers/openai_compatible.py` lines 307-335 and line 396
- The HTTP transport recorder is test infrastructure only and doesn't affect production code
- All test cassettes have been sanitized and verified to contain no secrets

View File

@@ -0,0 +1,59 @@
## PR Title
**fix: Fix o3-pro empty response issue by using output_text field**
## Summary
Fixes o3-pro API calls returning empty responses due to incorrect response parsing. The code was trying to parse `response.output.content[]` array, but o3-pro provides `output_text` directly.
## Changes
- Fixed o3-pro response parsing to use `output_text` field
- Added `_safe_extract_output_text` method with validation
- Implemented HTTP transport recorder for testing expensive API calls
- Added PII sanitization for test recordings
- Added regression tests
**No breaking changes** - Internal fix only
## Testing
```bash
source venv/bin/activate
./code_quality_checks.sh
# Run the new tests added in this PR
python -m pytest tests/test_o3_pro_output_text_fix.py -v
python -m pytest tests/test_pii_sanitizer.py -v
# Or run all new tests together
python -m pytest tests/test_o3_pro_output_text_fix.py tests/test_pii_sanitizer.py -v
```
- [x] All checks pass
- [x] Regression tests added:
- `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing and HTTP transport recording
- `test_pii_sanitizer.py` - Ensures API key sanitization
## Code Example
**Before:**
```python
# Incorrect - manual parsing
for content_item in response.output.content:
if content_item.type == "output_text":
content = content_item.text
```
**After:**
```python
# Correct - direct field access
content = response.output_text
```
## For Reviewers
- Core fix: `providers/openai_compatible.py` - see `_safe_extract_output_text()` method
- Response parsing: `_generate_with_responses_endpoint()` method now uses the direct field
- Test infrastructure changes don't affect production code
- All test recordings sanitized for security

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,104 +0,0 @@
"""
Tests for o3-pro output_text parsing fix using HTTP-level recording via respx.
This test validates the fix using real OpenAI SDK objects by recording/replaying
HTTP responses instead of creating mock objects.
"""
import os
import unittest
from pathlib import Path
import pytest
from dotenv import load_dotenv
from tests.test_helpers.http_recorder import HTTPRecorder
from tools.chat import ChatTool
# Load environment variables from .env file
load_dotenv()
# Use absolute path for cassette directory
cassette_dir = Path(__file__).parent / "http_cassettes"
cassette_dir.mkdir(exist_ok=True)
@pytest.mark.no_mock_provider # Disable provider mocking for this test
class TestO3ProHTTPRecording(unittest.IsolatedAsyncioTestCase):
"""Test o3-pro response parsing using HTTP-level recording with real SDK objects."""
async def test_o3_pro_real_sdk_objects(self):
"""Test that o3-pro parsing works with real OpenAI SDK objects from HTTP replay."""
# Skip if no API key available and cassette doesn't exist
cassette_path = cassette_dir / "o3_pro_real_sdk.json"
if not cassette_path.exists() and not os.getenv("OPENAI_API_KEY"):
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
# Use HTTPRecorder to record/replay raw HTTP responses
async with HTTPRecorder(str(cassette_path)):
# Execute the chat tool test - real SDK objects will be created
result = await self._execute_chat_tool_test()
# Verify the response works correctly with real SDK objects
self._verify_chat_tool_response(result)
# Verify cassette was created in record mode
if os.getenv("OPENAI_API_KEY") and not os.getenv("OPENAI_API_KEY").startswith("dummy"):
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
async def _execute_chat_tool_test(self):
"""Execute the ChatTool with o3-pro and return the result."""
chat_tool = ChatTool()
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
return await chat_tool.execute(arguments)
def _verify_chat_tool_response(self, result):
"""Verify the ChatTool response contains expected data."""
# Verify we got a valid response
self.assertIsNotNone(result, "Should get response from ChatTool")
# Parse the result content (ChatTool returns MCP TextContent format)
self.assertIsInstance(result, list, "ChatTool should return list of content")
self.assertTrue(len(result) > 0, "Should have at least one content item")
# Get the text content (result is a list of TextContent objects)
content_item = result[0]
self.assertEqual(content_item.type, "text", "First item should be text content")
text_content = content_item.text
self.assertTrue(len(text_content) > 0, "Should have text content")
# Parse the JSON response from chat tool
import json
try:
response_data = json.loads(text_content)
except json.JSONDecodeError:
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
# Verify the response makes sense for the math question
actual_content = response_data.get("content", "")
self.assertIn("4", actual_content, "Should contain the answer '4'")
# Verify metadata shows o3-pro was used
metadata = response_data.get("metadata", {})
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
# Additional verification that the fix is working
self.assertTrue(actual_content.strip(), "Content should not be empty")
self.assertIsInstance(actual_content, str, "Content should be string")
# Verify successful status
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
if __name__ == "__main__":
print("🌐 HTTP-Level Recording Tests for O3-Pro with Real SDK Objects")
print("=" * 60)
print("FIRST RUN: Requires OPENAI_API_KEY - records HTTP responses (EXPENSIVE!)")
print("SUBSEQUENT RUNS: Uses recorded HTTP responses - free and fast")
print("RECORDING: Delete .json files in tests/http_cassettes/ to re-record")
print()
unittest.main()

View File

@@ -1,10 +1,10 @@
"""
Tests for o3-pro output_text parsing fix using respx response recording.
Tests for o3-pro output_text parsing fix using HTTP transport recording.
This test validates the fix that uses `response.output_text` convenience field
instead of manually parsing `response.output.content[].text`.
Uses respx to record real o3-pro API responses at the HTTP level while allowing
Uses HTTP transport recorder to record real o3-pro API responses at the HTTP level while allowing
the OpenAI SDK to create real response objects that we can test.
RECORDING: To record new responses, delete the cassette file and run with real API keys.

View File

@@ -1,104 +0,0 @@
"""
Tests for o3-pro output_text parsing fix using respx for HTTP recording/replay.
This test uses respx's built-in recording capabilities to record/replay HTTP responses,
allowing the OpenAI SDK to create real response objects with all convenience methods.
"""
import os
import unittest
from pathlib import Path
import pytest
from dotenv import load_dotenv
from tests.test_helpers.respx_recorder import RespxRecorder
from tools.chat import ChatTool
# Load environment variables from .env file
load_dotenv()
# Use absolute path for cassette directory
cassette_dir = Path(__file__).parent / "respx_cassettes"
cassette_dir.mkdir(exist_ok=True)
@pytest.mark.no_mock_provider # Disable provider mocking for this test
class TestO3ProRespxSimple(unittest.IsolatedAsyncioTestCase):
"""Test o3-pro response parsing using respx for HTTP recording/replay."""
async def test_o3_pro_with_respx_recording(self):
"""Test o3-pro parsing with respx HTTP recording - real SDK objects."""
cassette_path = cassette_dir / "o3_pro_respx.json"
# Skip if no API key available and no cassette exists
if not cassette_path.exists() and (not os.getenv("OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY").startswith("dummy")):
pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
# Use RespxRecorder for automatic recording/replay
async with RespxRecorder(str(cassette_path)) as recorder:
# Execute the chat tool test - recorder handles recording or replay automatically
result = await self._execute_chat_tool_test()
# Verify the response works correctly with real SDK objects
self._verify_chat_tool_response(result)
# Verify cassette was created in record mode
if not os.getenv("OPENAI_API_KEY", "").startswith("dummy"):
self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
async def _execute_chat_tool_test(self):
"""Execute the ChatTool with o3-pro and return the result."""
chat_tool = ChatTool()
arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
return await chat_tool.execute(arguments)
def _verify_chat_tool_response(self, result):
"""Verify the ChatTool response contains expected data."""
# Verify we got a valid response
self.assertIsNotNone(result, "Should get response from ChatTool")
# Parse the result content (ChatTool returns MCP TextContent format)
self.assertIsInstance(result, list, "ChatTool should return list of content")
self.assertTrue(len(result) > 0, "Should have at least one content item")
# Get the text content (result is a list of TextContent objects)
content_item = result[0]
self.assertEqual(content_item.type, "text", "First item should be text content")
text_content = content_item.text
self.assertTrue(len(text_content) > 0, "Should have text content")
# Parse the JSON response from chat tool
import json
try:
response_data = json.loads(text_content)
except json.JSONDecodeError:
self.fail(f"Could not parse chat tool response as JSON: {text_content}")
# Verify the response makes sense for the math question
actual_content = response_data.get("content", "")
self.assertIn("4", actual_content, "Should contain the answer '4'")
# Verify metadata shows o3-pro was used
metadata = response_data.get("metadata", {})
self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
# Additional verification
self.assertTrue(actual_content.strip(), "Content should not be empty")
self.assertIsInstance(actual_content, str, "Content should be string")
# Verify successful status
self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
if __name__ == "__main__":
print("🔥 Respx HTTP Recording Tests for O3-Pro with Real SDK Objects")
print("=" * 60)
print("This tests the concept of using respx for HTTP-level recording")
print("Currently using pass_through mode to validate the approach")
print()
unittest.main()