refactor: Clean up test files and simplify documentation

- Remove unused cassette files with incomplete recordings - Delete broken respx test files (test_o3_pro_respx_simple.py, test_o3_pro_http_recording.py) - Fix respx references in docstrings to mention HTTP transport recorder - Simplify vcr-testing.md documentation (60% reduction, more task-oriented) - Add simplified PR template with better test instructions - Fix cassette path consistency in examples - Add security note about reviewing cassettes before committing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-12 19:24:51 -06:00
parent 7f92085c70
commit a1451befd2
8 changed files with 212 additions and 651 deletions
--- a/docs/vcr-testing.md
+++ b/docs/vcr-testing.md
@@ -1,216 +1,87 @@
-# HTTP Recording/Replay Testing with HTTP Transport Recorder
+# HTTP Transport Recorder for Testing
-This project uses a custom HTTP Transport Recorder for testing expensive API integrations (like o3-pro) with real recorded responses.
+A custom HTTP recorder for testing expensive API calls (like o3-pro) with real responses.
-## What is HTTP Transport Recorder?
+## Overview
-The HTTP Transport Recorder is a custom httpx transport implementation that intercepts HTTP requests/responses at the transport layer. This approach provides:
+The HTTP Transport Recorder captures and replays HTTP interactions at the transport layer, enabling:
 - Cost-efficient testing of expensive APIs (record once, replay forever)
 - Deterministic tests with real API responses
 - Seamless integration with httpx and OpenAI SDK
- **Real API structure**: Tests use actual API responses, not guessed mocks
+## Quick Start
 - **Cost efficiency**: Only pay for API calls once during recording
 - **Deterministic tests**: Same response every time, no API variability
 - **Transport-level interception**: Works seamlessly with httpx and OpenAI SDK
 - **Full response capture**: Captures complete HTTP responses including headers and gzipped content
 ## Directory Structure
 ```
 tests/
 ├── openai_cassettes/         # Recorded HTTP interactions
 │   ├── o3_pro_basic_math.json
 │   └── o3_pro_content_capture.json
 ├── http_transport_recorder.py  # Transport recorder implementation
 ├── test_content_capture.py     # Example recording test
 └── test_replay.py             # Example replay test
 ```
 ## Key Components
 ### RecordingTransport
 - Wraps httpx's default transport
 - Makes real HTTP calls and captures responses
 - Handles gzip compression/decompression properly
 - Saves interactions to JSON cassettes
 ### ReplayTransport
 - Serves saved responses from cassettes
 - No real HTTP calls made
 - Matches requests by method, path, and content hash
 - Re-applies gzip compression when needed
 ### TransportFactory
 - Auto-selects record vs replay mode based on cassette existence
 - Simplifies test setup
 ## Workflow
 ### 1. Use Transport Recorder in Tests
 ```python
 from tests.http_transport_recorder import TransportFactory
 from providers import ModelProviderRegistry
-# Create transport based on cassette existence
+# Setup transport recorder
 cassette_path = "tests/openai_cassettes/my_test.json"
 transport = TransportFactory.create_transport(cassette_path)
-# Inject into OpenAI provider
+# Inject into provider
 provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
 provider._test_transport = transport
-# Make API calls - will be recorded/replayed automatically
+# Make API calls - automatically recorded/replayed
 ```
 ### 2. Initial Recording (Expensive)
 ```bash
 # With real API key, cassette doesn't exist -> records
 python test_content_capture.py
 # ⚠️ This will cost money! O3-Pro is $15-60 per 1K tokens
 # But only needs to be done once
 ```
 ### 3. Subsequent Runs (Free)
 ```bash
 # Cassette exists -> replays
 python test_replay.py
 # Can even use fake API key to prove no real calls
 OPENAI_API_KEY="sk-fake-key" python test_replay.py
 # Fast, free, deterministic
 ```
 ### 4. Re-recording (When API Changes)
 ```bash
 # Delete cassette to force re-recording
 rm tests/openai_cassettes/my_test.json
 # Run test again with real API key
 python test_content_capture.py
 ```
 ## How It Works
-1. **Transport Injection**: Custom transport injected into httpx client
+1. **First run** (cassette doesn't exist): Records real API calls
-2. **Request Interception**: All HTTP requests go through custom transport
+2. **Subsequent runs** (cassette exists): Replays saved responses
-3. **Mode Detection**: Checks if cassette exists (replay) or needs creation (record)
+3. **Re-record**: Delete cassette file and run again
 4. **Content Capture**: Properly handles streaming responses and gzip encoding
 5. **Request Matching**: Uses method + path + content hash for deterministic matching
-## Cassette Format
+## Usage in Tests
-```json
+See `test_o3_pro_output_text_fix.py` for a complete example:
 {
  "interactions": [
    {
      "request": {
        "method": "POST",
        "url": "https://api.openai.com/v1/responses",
        "path": "/v1/responses",
        "headers": {
          "content-type": "application/json",
          "accept-encoding": "gzip, deflate"
        },
        "content": {
          "model": "o3-pro-2025-06-10",
          "input": [...],
          "reasoning": {"effort": "medium"}
        }
      },
      "response": {
        "status_code": 200,
        "headers": {
          "content-type": "application/json",
          "content-encoding": "gzip"
        },
        "content": {
          "data": "base64_encoded_response_body",
          "encoding": "base64",
          "size": 1413
        },
        "reason_phrase": "OK"
      }
    }
  ]
 }
 ```
 Key features:
 - Complete request/response capture
 - Base64 encoding for binary content
 - Preserves gzip compression
 - Sanitizes sensitive data (API keys removed)
 ## Benefits Over Previous Approaches
 1. **Works with any HTTP client**: Not tied to OpenAI SDK specifically
 2. **Handles compression**: Properly manages gzipped responses
 3. **Full HTTP fidelity**: Captures headers, status codes, etc.
 4. **Simpler than VCR.py**: No sync/async conflicts or monkey patching
 5. **Better than respx**: No streaming response issues
 ## Example Test
 ```python
 #!/usr/bin/env python3
 import asyncio
 from pathlib import Path
 from tests.http_transport_recorder import TransportFactory
 from providers import ModelProviderRegistry
 from tools.chat import ChatTool
 async def test_with_recording():
-    cassette_path = "tests/openai_cassettes/test_example.json"
+    # Transport factory auto-detects record vs replay mode
-    
+    transport = TransportFactory.create_transport("tests/openai_cassettes/my_test.json")
    # Setup transport
    transport = TransportFactory.create_transport(cassette_path)
    provider = ModelProviderRegistry.get_provider_for_model("o3-pro")
    provider._test_transport = transport
    # Use ChatTool normally
    chat_tool = ChatTool()
    result = await chat_tool.execute({
        "prompt": "What is 2+2?",
        "model": "o3-pro",
        "temperature": 1.0
    })
    print(f"Response: {result[0].text}")
-if __name__ == "__main__":
+    # Use normally - recording happens transparently
-    asyncio.run(test_with_recording())
+    result = await chat_tool.execute({"prompt": "2+2?", "model": "o3-pro"})
 ```
-## Timeout Protection
+## File Structure
 Tests can use GNU timeout to prevent hanging:
 ```bash
 # Install GNU coreutils if needed
 brew install coreutils
 # Run with 30 second timeout
 gtimeout 30s python test_content_capture.py
 ```
-
+tests/
-## CI/CD Integration
+├── openai_cassettes/           # Recorded API interactions
-
+│   └── *.json                  # Cassette files
-```yaml
+├── http_transport_recorder.py  # Transport implementation
-# In CI, tests use existing cassettes (no API keys needed)
+└── test_o3_pro_output_text_fix.py  # Example usage
 - name: Run OpenAI tests
  run: |
    # Tests will use replay mode with existing cassettes
    python -m pytest tests/test_o3_pro.py
 ```
 ## Cost Management
- **One-time cost**: Initial recording per test scenario
+- **One-time cost**: Initial recording only
 - **Zero ongoing cost**: Replays are free
- **Controlled re-recording**: Manual cassette deletion required
+- **CI-friendly**: No API keys needed for replay
- **CI-friendly**: No accidental API calls in automation
+
 ## Re-recording
 When API changes require new recordings:
 ```bash
 # Delete specific cassette
 rm tests/openai_cassettes/my_test.json
 # Run test with real API key
 python -m pytest tests/test_o3_pro_output_text_fix.py
 ```
 ## Implementation Details
 - **RecordingTransport**: Captures real HTTP calls
 - **ReplayTransport**: Serves saved responses
 - **TransportFactory**: Auto-selects mode based on cassette existence
 - **PII Sanitization**: Automatically removes API keys from recordings
 **Security Note**: Always review new cassette files before committing to ensure no sensitive data is included.
 For implementation details, see `tests/http_transport_recorder.py`.
 This HTTP transport recorder approach provides accurate API testing with cost efficiency, specifically optimized for expensive endpoints like o3-pro while being flexible enough for any HTTP-based API.
--- a/pr_template_filled.md
+++ b/pr_template_filled.md
@@ -0,0 +1,99 @@
 ## PR Title Format
 **fix: Fix o3-pro empty response issue by using output_text field**
 ## Description
 This PR fixes a critical bug where o3-pro API calls were returning empty responses. The root cause was incorrect response parsing - the code was trying to manually parse `response.output.content[]` array structure, but o3-pro provides a simpler `output_text` convenience field directly on the response object. This PR also introduces a secure HTTP recording system for testing expensive o3-pro calls.
 ## Changes Made
 - [x] Fixed o3-pro response parsing by using the `output_text` convenience field instead of manual parsing
 - [x] Added `_safe_extract_output_text` method with proper validation to handle o3-pro's response format
 - [x] Implemented custom HTTP transport recorder to replace respx for more reliable test recordings
 - [x] Added comprehensive PII sanitization to prevent accidental API key exposure in test cassettes
 - [x] Sanitized all existing test cassettes to remove any exposed secrets
 - [x] Updated documentation for the new testing infrastructure
 - [x] Added test suite to validate the fix and ensure PII sanitization works correctly
 **No breaking changes** - The fix only affects o3-pro model parsing internally.
 **Dependencies added:**
 - None (uses existing httpx and standard library modules)
 ## Testing
 ### Run all linting and tests (required):
 ```bash
 # Activate virtual environment first
 source venv/bin/activate
 # Run comprehensive code quality checks (recommended)
 ./code_quality_checks.sh
 # If you made tool changes, also run simulator tests
 python communication_simulator_test.py
 ```
 - [x] All linting passes (ruff, black, isort)
 - [x] All unit tests pass
 - [x] **For bug fixes**: Tests added to prevent regression
  - `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing works correctly
  - `test_o3_pro_http_recording.py` - Tests HTTP recording functionality
  - `test_pii_sanitizer.py` - Ensures PII sanitization works properly
 - [x] Manual testing completed with realistic scenarios
  - Verified o3-pro calls return actual content instead of empty responses
  - Validated that recorded cassettes contain no exposed API keys
 ## Related Issues
 Fixes o3-pro API calls returning empty responses on master branch.
 ## Checklist
 - [x] PR title follows the format guidelines above
 - [x] **Activated venv and ran code quality checks: `source venv/bin/activate && ./code_quality_checks.sh`**
 - [x] Self-review completed
 - [x] **Tests added for ALL changes** (see Testing section above)
 - [x] Documentation updated as needed
  - Updated `docs/testing.md` with new testing approach
  - Added `docs/vcr-testing.md` for HTTP recording documentation
 - [x] All unit tests passing
 - [x] Ready for review
 ## Additional Notes
 ### The Bug:
 On master branch, o3-pro API calls were returning empty responses because the code was trying to parse the response incorrectly:
 ```python
 # Master branch - incorrect parsing
 if hasattr(response.output, "content") and response.output.content:
    for content_item in response.output.content:
        if hasattr(content_item, "type") and content_item.type == "output_text":
            content = content_item.text
            break
 ```
 The o3-pro response object actually provides an `output_text` convenience field directly:
 ```python
 # Fixed version - correct parsing
 content = response.output_text
 ```
 ### The Fix:
 1. Added `_safe_extract_output_text` method that properly validates and extracts the `output_text` field
 2. Updated the response parsing logic in `_generate_with_responses_endpoint` to use this new method
 3. Added proper error handling and validation to catch future response format issues
 ### Additional Improvements:
 - **Testing Infrastructure**: Implemented HTTP transport recorder to enable testing without repeated expensive API calls
 - **Security**: Added automatic PII sanitization to prevent API keys from being accidentally committed in test recordings
 ### Development Notes:
 - During development, we encountered timeout issues with the initial respx-based approach which led to implementing the custom HTTP transport recorder
 - The transport recorder solution properly handles streaming responses and gzip compression
 ### For Reviewers:
 - The core fix is in `providers/openai_compatible.py` lines 307-335 and line 396
 - The HTTP transport recorder is test infrastructure only and doesn't affect production code
 - All test cassettes have been sanitized and verified to contain no secrets
--- a/pr_template_filled_simplified.md
+++ b/pr_template_filled_simplified.md
@@ -0,0 +1,59 @@
 ## PR Title
 **fix: Fix o3-pro empty response issue by using output_text field**
 ## Summary
 Fixes o3-pro API calls returning empty responses due to incorrect response parsing. The code was trying to parse `response.output.content[]` array, but o3-pro provides `output_text` directly.
 ## Changes
 - Fixed o3-pro response parsing to use `output_text` field
 - Added `_safe_extract_output_text` method with validation
 - Implemented HTTP transport recorder for testing expensive API calls
 - Added PII sanitization for test recordings
 - Added regression tests
 **No breaking changes** - Internal fix only
 ## Testing
 ```bash
 source venv/bin/activate
 ./code_quality_checks.sh
 # Run the new tests added in this PR
 python -m pytest tests/test_o3_pro_output_text_fix.py -v
 python -m pytest tests/test_pii_sanitizer.py -v
 # Or run all new tests together
 python -m pytest tests/test_o3_pro_output_text_fix.py tests/test_pii_sanitizer.py -v
 ```
 - [x] All checks pass
 - [x] Regression tests added:
  - `test_o3_pro_output_text_fix.py` - Validates o3-pro response parsing and HTTP transport recording
  - `test_pii_sanitizer.py` - Ensures API key sanitization
 ## Code Example
 **Before:**
 ```python
 # Incorrect - manual parsing
 for content_item in response.output.content:
    if content_item.type == "output_text":
        content = content_item.text
 ```
 **After:**
 ```python
 # Correct - direct field access
 content = response.output_text
 ```
 ## For Reviewers
 - Core fix: `providers/openai_compatible.py` - see `_safe_extract_output_text()` method
 - Response parsing: `_generate_with_responses_endpoint()` method now uses the direct field
 - Test infrastructure changes don't affect production code
 - All test recordings sanitized for security
--- a/tests/openai_cassettes/o3_pro_quick_test.json
+++ b/tests/openai_cassettes/o3_pro_quick_test.json
--- a/tests/openai_cassettes/o3_pro_simple_enhanced.json
+++ b/tests/openai_cassettes/o3_pro_simple_enhanced.json
--- a/tests/test_o3_pro_http_recording.py
+++ b/tests/test_o3_pro_http_recording.py
@@ -1,104 +0,0 @@
 """
 Tests for o3-pro output_text parsing fix using HTTP-level recording via respx.
 This test validates the fix using real OpenAI SDK objects by recording/replaying
 HTTP responses instead of creating mock objects.
 """
 import os
 import unittest
 from pathlib import Path
 import pytest
 from dotenv import load_dotenv
 from tests.test_helpers.http_recorder import HTTPRecorder
 from tools.chat import ChatTool
 # Load environment variables from .env file
 load_dotenv()
 # Use absolute path for cassette directory
 cassette_dir = Path(__file__).parent / "http_cassettes"
 cassette_dir.mkdir(exist_ok=True)
@pytest.mark.no_mock_provider  # Disable provider mocking for this test
 class TestO3ProHTTPRecording(unittest.IsolatedAsyncioTestCase):
    """Test o3-pro response parsing using HTTP-level recording with real SDK objects."""
    async def test_o3_pro_real_sdk_objects(self):
        """Test that o3-pro parsing works with real OpenAI SDK objects from HTTP replay."""
        # Skip if no API key available and cassette doesn't exist
        cassette_path = cassette_dir / "o3_pro_real_sdk.json"
        if not cassette_path.exists() and not os.getenv("OPENAI_API_KEY"):
            pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
        # Use HTTPRecorder to record/replay raw HTTP responses
        async with HTTPRecorder(str(cassette_path)):
            # Execute the chat tool test - real SDK objects will be created
            result = await self._execute_chat_tool_test()
            # Verify the response works correctly with real SDK objects
            self._verify_chat_tool_response(result)
        # Verify cassette was created in record mode
        if os.getenv("OPENAI_API_KEY") and not os.getenv("OPENAI_API_KEY").startswith("dummy"):
            self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
    async def _execute_chat_tool_test(self):
        """Execute the ChatTool with o3-pro and return the result."""
        chat_tool = ChatTool()
        arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
        return await chat_tool.execute(arguments)
    def _verify_chat_tool_response(self, result):
        """Verify the ChatTool response contains expected data."""
        # Verify we got a valid response
        self.assertIsNotNone(result, "Should get response from ChatTool")
        # Parse the result content (ChatTool returns MCP TextContent format)
        self.assertIsInstance(result, list, "ChatTool should return list of content")
        self.assertTrue(len(result) > 0, "Should have at least one content item")
        # Get the text content (result is a list of TextContent objects)
        content_item = result[0]
        self.assertEqual(content_item.type, "text", "First item should be text content")
        text_content = content_item.text
        self.assertTrue(len(text_content) > 0, "Should have text content")
        # Parse the JSON response from chat tool
        import json
        try:
            response_data = json.loads(text_content)
        except json.JSONDecodeError:
            self.fail(f"Could not parse chat tool response as JSON: {text_content}")
        # Verify the response makes sense for the math question
        actual_content = response_data.get("content", "")
        self.assertIn("4", actual_content, "Should contain the answer '4'")
        # Verify metadata shows o3-pro was used
        metadata = response_data.get("metadata", {})
        self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
        self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
        # Additional verification that the fix is working
        self.assertTrue(actual_content.strip(), "Content should not be empty")
        self.assertIsInstance(actual_content, str, "Content should be string")
        # Verify successful status
        self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
 if __name__ == "__main__":
    print("🌐 HTTP-Level Recording Tests for O3-Pro with Real SDK Objects")
    print("=" * 60)
    print("FIRST RUN: Requires OPENAI_API_KEY - records HTTP responses (EXPENSIVE!)")
    print("SUBSEQUENT RUNS: Uses recorded HTTP responses - free and fast")
    print("RECORDING: Delete .json files in tests/http_cassettes/ to re-record")
    print()
    unittest.main()
--- a/tests/test_o3_pro_output_text_fix.py
+++ b/tests/test_o3_pro_output_text_fix.py
@@ -1,10 +1,10 @@
 """
-Tests for o3-pro output_text parsing fix using respx response recording.
+Tests for o3-pro output_text parsing fix using HTTP transport recording.
 This test validates the fix that uses `response.output_text` convenience field
 instead of manually parsing `response.output.content[].text`.
-Uses respx to record real o3-pro API responses at the HTTP level while allowing
+Uses HTTP transport recorder to record real o3-pro API responses at the HTTP level while allowing
 the OpenAI SDK to create real response objects that we can test.
 RECORDING: To record new responses, delete the cassette file and run with real API keys.
--- a/tests/test_o3_pro_respx_simple.py
+++ b/tests/test_o3_pro_respx_simple.py
@@ -1,104 +0,0 @@
 """
 Tests for o3-pro output_text parsing fix using respx for HTTP recording/replay.
 This test uses respx's built-in recording capabilities to record/replay HTTP responses,
 allowing the OpenAI SDK to create real response objects with all convenience methods.
 """
 import os
 import unittest
 from pathlib import Path
 import pytest
 from dotenv import load_dotenv
 from tests.test_helpers.respx_recorder import RespxRecorder
 from tools.chat import ChatTool
 # Load environment variables from .env file
 load_dotenv()
 # Use absolute path for cassette directory  
 cassette_dir = Path(__file__).parent / "respx_cassettes"
 cassette_dir.mkdir(exist_ok=True)
@pytest.mark.no_mock_provider  # Disable provider mocking for this test
 class TestO3ProRespxSimple(unittest.IsolatedAsyncioTestCase):
    """Test o3-pro response parsing using respx for HTTP recording/replay."""
    async def test_o3_pro_with_respx_recording(self):
        """Test o3-pro parsing with respx HTTP recording - real SDK objects."""
        cassette_path = cassette_dir / "o3_pro_respx.json"
        # Skip if no API key available and no cassette exists
        if not cassette_path.exists() and (not os.getenv("OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY").startswith("dummy")):
            pytest.skip("Set real OPENAI_API_KEY to record HTTP cassettes")
        # Use RespxRecorder for automatic recording/replay
        async with RespxRecorder(str(cassette_path)) as recorder:
            # Execute the chat tool test - recorder handles recording or replay automatically
            result = await self._execute_chat_tool_test()
            # Verify the response works correctly with real SDK objects
            self._verify_chat_tool_response(result)
        # Verify cassette was created in record mode
        if not os.getenv("OPENAI_API_KEY", "").startswith("dummy"):
            self.assertTrue(cassette_path.exists(), f"HTTP cassette not created at {cassette_path}")
    async def _execute_chat_tool_test(self):
        """Execute the ChatTool with o3-pro and return the result."""
        chat_tool = ChatTool()
        arguments = {"prompt": "What is 2 + 2?", "model": "o3-pro", "temperature": 1.0}
        return await chat_tool.execute(arguments)
    def _verify_chat_tool_response(self, result):
        """Verify the ChatTool response contains expected data."""
        # Verify we got a valid response
        self.assertIsNotNone(result, "Should get response from ChatTool")
        # Parse the result content (ChatTool returns MCP TextContent format)
        self.assertIsInstance(result, list, "ChatTool should return list of content")
        self.assertTrue(len(result) > 0, "Should have at least one content item")
        # Get the text content (result is a list of TextContent objects)
        content_item = result[0]
        self.assertEqual(content_item.type, "text", "First item should be text content")
        text_content = content_item.text
        self.assertTrue(len(text_content) > 0, "Should have text content")
        # Parse the JSON response from chat tool
        import json
        try:
            response_data = json.loads(text_content)
        except json.JSONDecodeError:
            self.fail(f"Could not parse chat tool response as JSON: {text_content}")
        # Verify the response makes sense for the math question
        actual_content = response_data.get("content", "")
        self.assertIn("4", actual_content, "Should contain the answer '4'")
        # Verify metadata shows o3-pro was used
        metadata = response_data.get("metadata", {})
        self.assertEqual(metadata.get("model_used"), "o3-pro", "Should use o3-pro model")
        self.assertEqual(metadata.get("provider_used"), "openai", "Should use OpenAI provider")
        # Additional verification
        self.assertTrue(actual_content.strip(), "Content should not be empty")
        self.assertIsInstance(actual_content, str, "Content should be string")
        # Verify successful status
        self.assertEqual(response_data.get("status"), "continuation_available", "Should have successful status")
 if __name__ == "__main__":
    print("🔥 Respx HTTP Recording Tests for O3-Pro with Real SDK Objects")
    print("=" * 60)
    print("This tests the concept of using respx for HTTP-level recording")
    print("Currently using pass_through mode to validate the approach")
    print()
    unittest.main()