refactor: Clean up test files and simplify documentation

- Remove unused cassette files with incomplete recordings - Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py) - Fix respx references in docstrings to mention HTTP transport recorder - Simplify vcr-testing.md documentation (60% reduction, more task-oriented) - Add simplified PR template with better test instructions - Fix cassette path consistency in examples - Add security note about reviewing cassettes before committing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:24:51 -06:00
parent 7f92085c70
commit a1451befd2
8 changed files with 212 additions and 651 deletions
--- a/docs/vcr-testing.md
+++ b/docs/vcr-testing.md
@@ -1,216 +1,87 @@
-# HTTP Recording/Replay Testing with HTTP Transport Recorder
+# HTTP Transport Recorder for Testing

-This project uses a custom HTTP Transport Recorder for testing expensive API integrations (like o3-pro) with real recorded responses.
+A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.

-## What is HTTP Transport Recorder?
+## Overview

-The HTTP Transport Recorder is a custom httpx transport implementation that intercepts HTTP requests/responses at the transport layer. This approach provides:
+The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
+- Cost-efficient testing of expensive APIs (record once, replay forever)
+- Deterministic tests with real API responses
+- Seamless integration with httpx and OpenAI SDK

- **Real API structure**: Tests use actual API responses, not guessed mocks
- **Cost efficiency**: Only pay for API calls once during recording
- **Deterministic tests**: Same response every time, no API variability
- **Transport-level interception**: Works seamlessly with httpx and OpenAI SDK
- **Full response capture**: Captures complete HTTP responses including headers and gzipped content
-
-## Directory Structure
-
-```
-tests/
-├── openai_cassettes/         # Recorded HTTP interactions
-│   ├── o3_pro_basic_math.json
-│   └── o3_pro_content_capture.json
-├── http_transport_recorder.py  # Transport recorder implementation
-├── test_content_capture.py     # Example recording test
-└── test_replay.py             # Example replay test
-```
-
-## Key Components
-
-### RecordingTransport
- Wraps httpx's default transport
- Makes real HTTP calls and captures responses
- Handles gzip compression/decompression properly
- Saves interactions to JSON cassettes
-
-### ReplayTransport
- Serves saved responses from cassettes
- No real HTTP calls made
- Matches requests by method, path, and content hash
- Re-applies gzip compression when needed
-
-### TransportFactory
- Auto-selects record vs replay mode based on cassette existence
- Simplifies test setup
-
-## Workflow
-
-### 1. Use Transport Recorder in Tests
+## Quick Start

 ```python
 from tests.http_transport_recorder import TransportFactory
+from providers import ModelProviderRegistry

-# Create transport based on cassette existence
+# Setup transport recorder
 cassette_path = "tests/openai_cassettes/my_test.json"
 transport = TransportFactory.create_transport(cassette_path)

-# Inject into OpenAI provider
+# Inject into provider
 provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
 provider._test_transport = transport

-# Make API calls - will be recorded/replayed automatically
-```
-
-### 2. Initial Recording (Expensive)
-
-```bash
-# With real API key, cassette doesn't exist -> records
-python test_content_capture.py
-
-# ⚠️ This will cost money! O3-Pro is $15-60 per 1K tokens
-# But only needs to be done once
-```
-
-### 3. Subsequent Runs (Free)
-
-```bash
-# Cassette exists -> replays
-python test_replay.py
-
-# Can even use fake API key to prove no real calls
-OPENAI_API_KEY="sk-fake-key" python test_replay.py
-
-# Fast, free, deterministic
-```
-
-### 4. Re-recording (When API Changes)
-
-```bash
-# Delete cassette to force re-recording
-rm tests/openai_cassettes/my_test.json
-
-# Run test again with real API key
-python test_content_capture.py
+# Make API calls - automatically recorded/replayed
 ```

 ## How It Works

-1. **Transport Injection**: Custom transport injected into httpx client
-2. **Request Interception**: All HTTP requests go through custom transport
-3. **Mode Detection**: Checks if cassette exists (replay) or needs creation (record)
-4. **Content Capture**: Properly handles streaming responses and gzip encoding
-5. **Request Matching**: Uses method + path + content hash for deterministic matching
+1. **First run** (cassette doesn't exist): Records real API calls
+2. **Subsequent runs** (cassette exists): Replays saved responses
+3. **Re-record**: Delete cassette file and run again

-## Cassette Format
+## Usage in Tests

-```json
-{
-  "interactions": [
-    {
-      "request": {
-        "method": "POST",
-        "url": "https://api.openai.com/v1/responses",
-        "path": "/v1/responses",
-        "headers": {
-          "content-type": "application/json",
-          "accept-encoding": "gzip, deflate"
-        },
-        "content": {
-          "model": "o3-pro-2025-06-10",
-          "input": [...],
-          "reasoning": {"effort": "medium"}
-        }
-      },
-      "response": {
-        "status_code": 200,
-        "headers": {
-          "content-type": "application/json",
-          "content-encoding": "gzip"
-        },
-        "content": {
-          "data": "base64_encoded_response_body",
-          "encoding": "base64",
-          "size": 1413
-        },
-        "reason_phrase": "OK"
-      }
-    }
-  ]
-}
-```
-
-Key features:
- Complete request/response capture
- Base64 encoding for binary content
- Preserves gzip compression
- Sanitizes sensitive data (API keys removed)
-
-## Benefits Over Previous Approaches
-
-1. **Works with any HTTP client**: Not tied to OpenAI SDK specifically
-2. **Handles compression**: Properly manages gzipped responses
-3. **Full HTTP fidelity**: Captures headers, status codes, etc.
-4. **Simpler than VCR.py**: No sync/async conflicts or monkey patching
-5. **Better than respx**: No streaming response issues
-
-## Example Test
+See `test_o3_pro_output_text_fix.py` for a complete example:

 ```python
-#!/usr/bin/env python3
-import asyncio
-from pathlib import Path
-from tests.http_transport_recorder import TransportFactory
-from providers import ModelProviderRegistry
-from tools.chat import ChatTool
-
 async def test_with_recording():
-    cassette_path = "tests/openai_cassettes/test_example.json"
-    
-    # Setup transport
-    transport = TransportFactory.create_transport(cassette_path)
-    provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
+    # Transport factory auto-detects record vs replay mode
+    transport = TransportFactory.create_transport("tests/openai_cassettes/my_test.json")
    provider._test_transport = transport
-    
-    # Use ChatTool normally
-    chat_tool = ChatTool()
-    result = await chat_tool.execute({
-        "prompt": "What is 2+2?",
-        "model": "o3-pro",
-        "temperature": 1.0
-    })
-    
-    print(f"Response: {result[0].text}")

-if __name__ == "__main__":
-    asyncio.run(test_with_recording())
+    # Use normally - recording happens transparently
+    result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
 ```

-## Timeout Protection
+## File Structure

-Tests can use GNU timeout to prevent hanging:
-
-```bash
-# Install GNU coreutils if needed
-brew install coreutils
-
-# Run with 30 second timeout
-gtimeout 30s python test_content_capture.py
 ```
-
-## CI/CD Integration
-
-```yaml
-# In CI, tests use existing cassettes (no API keys needed)
- name: Run OpenAI tests
-  run: |
-    # Tests will use replay mode with existing cassettes
-    python -m pytest tests/test_o3_pro.py
+tests/
+├── openai_cassettes/           # Recorded API interactions
+│   └── *.json                  # Cassette files
+├── http_transport_recorder.py  # Transport implementation
+└── test_o3_pro_output_text_fix.py  # Example usage
 ```

 ## Cost Management

- **One-time cost**: Initial recording per test scenario
+- **One-time cost**: Initial recording only
 - **Zero ongoing cost**: Replays are free
- **Controlled re-recording**: Manual cassette deletion required
- **CI-friendly**: No accidental API calls in automation
+- **CI-friendly**: No API keys needed for replay
+
+## Re-recording
+
+When API changes require new recordings:
+
+```bash
+# Delete specific cassette
+rm tests/openai_cassettes/my_test.json
+
+# Run test with real API key
+python -m pytest tests/test_o3_pro_output_text_fix.py
+```
+
+## Implementation Details
+
+- **RecordingTransport**: Captures real HTTP calls
+- **ReplayTransport**: Serves saved responses
+- **TransportFactory**: Auto-selects mode based on cassette existence
+- **PII Sanitization**: Automatically removes API keys from recordings
+
+**Security Note**: Always review new cassette files before committing to ensure no sensitive data is included.
+
+For implementation details, see `tests/http_transport_recorder.py`.

-This HTTP transport recorder approach provides accurate API testing with cost efficiency, specifically optimized for expensive endpoints like o3-pro while being flexible enough for any HTTP-based API.