fix: Resolve o3-pro test isolation issues and convert print to logging

- Fix test isolation by clearing LOCALE env var in o3-pro test
- Add restriction service cleanup in test_model_restrictions.py
- Fix PII sanitizer phone regex to not match timestamps
- Convert all print statements to logging in test files per PR review
- Re-record o3-pro cassette with correct environment

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Josh Vera
2025-07-13 10:41:43 -06:00
parent 3b1c80865b
commit 9248947e39
6 changed files with 152 additions and 135 deletions

View File

@@ -1,94 +0,0 @@
# Registry Bisection Debug Findings
## Final Conclusions
Through systematic bisection testing, I've discovered that **NONE of the 6 registry operations in TestO3ProOutputTextFix are actually necessary**.
## Key Findings
### Bisection Results
1. **Test 1 (no operations)** - ✅ PASSED with full test suite
2. **Test 2 (cache clear only)** - ✅ PASSED with full test suite
3. **Test 3 (instance reset only)** - ❌ FAILED - clears all provider registrations
4. **Test 4 (both ops + re-register)** - ✅ PASSED with full test suite
5. **Original test without setup/teardown** - ✅ PASSED with full test suite
### Critical Discovery
The `allow_all_models` fixture alone is sufficient! It:
- Clears the model restrictions singleton
- Clears the registry cache (which is all that's needed)
- Sets up the dummy API key for transport replay
### Why the Original Has 6 Operations
1. **Historical reasons** - Likely copied from other tests or added defensively
2. **Misunderstanding** - The comment says "Registry reset in setup/teardown is required to ensure fresh provider instance for transport injection" but this is FALSE
3. **Over-engineering** - The singleton reset is unnecessary and actually harmful (Test 3 proved this)
### The Real Requirements
- Only need `ModelProviderRegistry.clear_cache()` in the fixture (already there)
- Transport injection via monkeypatch works fine without instance reset
- The `@pytest.mark.no_mock_provider` ensures conftest auto-mocking doesn't interfere
## Recommendations
### Immediate Action
Remove all 6 registry operations from test_o3_pro_output_text_fix.py:
- Remove `setup_method` entirely
- Remove `teardown_method` entirely
- The fixture already handles everything needed
### Code to Remove
```python
def setup_method(self):
"""Set up clean registry for transport injection."""
# DELETE ALL OF THIS
ModelProviderRegistry._instance = None
ModelProviderRegistry.clear_cache()
ModelProviderRegistry.register_provider(ProviderType.OPENAI, OpenAIModelProvider)
def teardown_method(self):
"""Reset registry to prevent test pollution."""
# DELETE ALL OF THIS
ModelProviderRegistry._instance = None
ModelProviderRegistry.clear_cache()
```
### Long-term Improvements
1. **Document the pattern** - Add comments explaining that transport injection only needs cache clearing
2. **Update other tests** - Many tests likely have unnecessary registry operations
3. **Consider fixture improvements** - Create a `clean_registry_cache` fixture for tests that need it
## Technical Analysis
### Why Cache Clear is Sufficient
- The registry singleton pattern uses `_providers` and `_initialized_providers` caches
- Clearing these caches forces re-initialization of providers
- Transport injection happens during provider initialization
- No need to reset the singleton instance itself
### Why Instance Reset is Harmful
- Resetting `_instance = None` clears ALL provider registrations
- Test 3 proved this - the registry becomes empty
- Requires re-registering all providers (unnecessary complexity)
### Fixture Design
The `allow_all_models` fixture is well-designed:
- Clears model restrictions (for testing all models)
- Clears registry cache (for clean provider state)
- Sets dummy API key (for transport replay)
- Cleans up after itself
## Summary
The 6 registry operations in TestO3ProOutputTextFix are **completely unnecessary**. The test works perfectly with just the `allow_all_models` fixture. This is a clear case of over-engineering and cargo-cult programming - copying patterns without understanding their necessity.
The systematic bisection proved that simpler is better. The fixture provides all needed isolation, and the extra registry manipulations just add complexity and confusion.
## Implementation Complete
✅ Successfully removed all 6 unnecessary registry operations from test_o3_pro_output_text_fix.py
✅ Test passes in isolation and with full test suite
✅ Code quality checks pass 100%
✅ O3-pro validated the findings and approved the simplification
The test is now 22 lines shorter and much clearer without the unnecessary setup/teardown methods.

View File

@@ -16,6 +16,7 @@ Key Features:
import base64 import base64
import hashlib import hashlib
import json import json
import logging
from pathlib import Path from pathlib import Path
from typing import Any, Optional from typing import Any, Optional
@@ -23,6 +24,8 @@ import httpx
from .pii_sanitizer import PIISanitizer from .pii_sanitizer import PIISanitizer
logger = logging.getLogger(__name__)
class RecordingTransport(httpx.HTTPTransport): class RecordingTransport(httpx.HTTPTransport):
"""Transport that wraps default httpx transport and records all interactions.""" """Transport that wraps default httpx transport and records all interactions."""
@@ -36,7 +39,7 @@ class RecordingTransport(httpx.HTTPTransport):
def handle_request(self, request: httpx.Request) -> httpx.Response: def handle_request(self, request: httpx.Request) -> httpx.Response:
"""Handle request by recording interaction and delegating to real transport.""" """Handle request by recording interaction and delegating to real transport."""
print(f"🎬 RecordingTransport: Making request to {request.method} {request.url}") logger.debug(f"🎬 RecordingTransport: Making request to {request.method} {request.url}")
# Record request BEFORE making the call # Record request BEFORE making the call
request_data = self._serialize_request(request) request_data = self._serialize_request(request)
@@ -44,7 +47,7 @@ class RecordingTransport(httpx.HTTPTransport):
# Make real HTTP call using parent transport # Make real HTTP call using parent transport
response = super().handle_request(request) response = super().handle_request(request)
print(f"🎬 RecordingTransport: Got response {response.status_code}") logger.debug(f"🎬 RecordingTransport: Got response {response.status_code}")
# Post-response content capture (proper approach) # Post-response content capture (proper approach)
if self.capture_content: if self.capture_content:
@@ -53,7 +56,7 @@ class RecordingTransport(httpx.HTTPTransport):
# Note: httpx automatically handles gzip decompression # Note: httpx automatically handles gzip decompression
content_bytes = response.read() content_bytes = response.read()
response.close() # Close the original stream response.close() # Close the original stream
print(f"🎬 RecordingTransport: Captured {len(content_bytes)} bytes of decompressed content") logger.debug(f"🎬 RecordingTransport: Captured {len(content_bytes)} bytes of decompressed content")
# Serialize response with captured content # Serialize response with captured content
response_data = self._serialize_response_with_content(response, content_bytes) response_data = self._serialize_response_with_content(response, content_bytes)
@@ -64,9 +67,9 @@ class RecordingTransport(httpx.HTTPTransport):
if response.headers.get("content-encoding") == "gzip": if response.headers.get("content-encoding") == "gzip":
import gzip import gzip
print(f"🗜️ Re-compressing {len(content_bytes)} bytes with gzip...") logger.debug(f"🗜️ Re-compressing {len(content_bytes)} bytes with gzip...")
response_content = gzip.compress(content_bytes) response_content = gzip.compress(content_bytes)
print(f"🗜️ Compressed to {len(response_content)} bytes") logger.debug(f"🗜️ Compressed to {len(response_content)} bytes")
new_response = httpx.Response( new_response = httpx.Response(
status_code=response.status_code, status_code=response.status_code,
@@ -83,10 +86,10 @@ class RecordingTransport(httpx.HTTPTransport):
return new_response return new_response
except Exception as e: except Exception as e:
print(f"⚠️ Content capture failed: {e}, falling back to stub") logger.warning(f"⚠️ Content capture failed: {e}, falling back to stub")
import traceback import traceback
print(f"⚠️ Full exception traceback:\n{traceback.format_exc()}") logger.warning(f"⚠️ Full exception traceback:\n{traceback.format_exc()}")
response_data = self._serialize_response(response) response_data = self._serialize_response(response)
self._record_interaction(request_data, response_data) self._record_interaction(request_data, response_data)
return response return response
@@ -101,7 +104,7 @@ class RecordingTransport(httpx.HTTPTransport):
interaction = {"request": request_data, "response": response_data} interaction = {"request": request_data, "response": response_data}
self.recorded_interactions.append(interaction) self.recorded_interactions.append(interaction)
self._save_cassette() self._save_cassette()
print(f"🎬 RecordingTransport: Saved cassette to {self.cassette_path}") logger.debug(f"🎬 RecordingTransport: Saved cassette to {self.cassette_path}")
def _serialize_request(self, request: httpx.Request) -> dict[str, Any]: def _serialize_request(self, request: httpx.Request) -> dict[str, Any]:
"""Serialize httpx.Request to JSON-compatible format.""" """Serialize httpx.Request to JSON-compatible format."""
@@ -147,21 +150,21 @@ class RecordingTransport(httpx.HTTPTransport):
"""Serialize httpx.Response with captured content.""" """Serialize httpx.Response with captured content."""
try: try:
# Debug: check what we got # Debug: check what we got
print(f"🔍 Content type: {type(content_bytes)}, size: {len(content_bytes)}") logger.debug(f"🔍 Content type: {type(content_bytes)}, size: {len(content_bytes)}")
print(f"🔍 First 100 chars: {content_bytes[:100]}") logger.debug(f"🔍 First 100 chars: {content_bytes[:100]}")
# Ensure we have bytes for base64 encoding # Ensure we have bytes for base64 encoding
if not isinstance(content_bytes, bytes): if not isinstance(content_bytes, bytes):
print(f"⚠️ Content is not bytes, converting from {type(content_bytes)}") logger.warning(f"⚠️ Content is not bytes, converting from {type(content_bytes)}")
if isinstance(content_bytes, str): if isinstance(content_bytes, str):
content_bytes = content_bytes.encode("utf-8") content_bytes = content_bytes.encode("utf-8")
else: else:
content_bytes = str(content_bytes).encode("utf-8") content_bytes = str(content_bytes).encode("utf-8")
# Encode content as base64 for JSON storage # Encode content as base64 for JSON storage
print(f"🔍 Base64 encoding {len(content_bytes)} bytes...") logger.debug(f"🔍 Base64 encoding {len(content_bytes)} bytes...")
content_b64 = base64.b64encode(content_bytes).decode("utf-8") content_b64 = base64.b64encode(content_bytes).decode("utf-8")
print(f"✅ Base64 encoded successfully, result length: {len(content_b64)}") logger.debug(f"✅ Base64 encoded successfully, result length: {len(content_b64)}")
response_data = { response_data = {
"status_code": response.status_code, "status_code": response.status_code,
@@ -176,10 +179,10 @@ class RecordingTransport(httpx.HTTPTransport):
return response_data return response_data
except Exception as e: except Exception as e:
print(f"🔍 Error in _serialize_response_with_content: {e}") logger.error(f"🔍 Error in _serialize_response_with_content: {e}")
import traceback import traceback
print(f"🔍 Full traceback: {traceback.format_exc()}") logger.error(f"🔍 Full traceback: {traceback.format_exc()}")
# Fall back to minimal info # Fall back to minimal info
return { return {
"status_code": response.status_code, "status_code": response.status_code,
@@ -231,11 +234,11 @@ class ReplayTransport(httpx.MockTransport):
def _handle_request(self, request: httpx.Request) -> httpx.Response: def _handle_request(self, request: httpx.Request) -> httpx.Response:
"""Handle request by finding matching interaction and returning saved response.""" """Handle request by finding matching interaction and returning saved response."""
print(f"🔍 ReplayTransport: Looking for {request.method} {request.url}") logger.debug(f"🔍 ReplayTransport: Looking for {request.method} {request.url}")
# Debug: show what we're trying to match # Debug: show what we're trying to match
request_signature = self._get_request_signature(request) request_signature = self._get_request_signature(request)
print(f"🔍 Request signature: {request_signature}") logger.debug(f"🔍 Request signature: {request_signature}")
# Debug: show actual request content # Debug: show actual request content
content = request.content content = request.content
@@ -245,22 +248,22 @@ class ReplayTransport(httpx.MockTransport):
content_str = content.decode("utf-8", errors="ignore") content_str = content.decode("utf-8", errors="ignore")
else: else:
content_str = str(content) if content else "" content_str = str(content) if content else ""
print(f"🔍 Actual request content: {content_str}") logger.debug(f"🔍 Actual request content: {content_str}")
# Debug: show available signatures # Debug: show available signatures
for i, interaction in enumerate(self.interactions): for i, interaction in enumerate(self.interactions):
saved_signature = self._get_saved_request_signature(interaction["request"]) saved_signature = self._get_saved_request_signature(interaction["request"])
saved_content = interaction["request"].get("content", {}) saved_content = interaction["request"].get("content", {})
print(f"🔍 Available signature {i}: {saved_signature}") logger.debug(f"🔍 Available signature {i}: {saved_signature}")
print(f"🔍 Saved content {i}: {saved_content}") logger.debug(f"🔍 Saved content {i}: {saved_content}")
# Find matching interaction # Find matching interaction
interaction = self._find_matching_interaction(request) interaction = self._find_matching_interaction(request)
if not interaction: if not interaction:
print("🚨 MYSTERY SOLVED: No matching interaction found! This should fail...") logger.warning("🚨 MYSTERY SOLVED: No matching interaction found! This should fail...")
raise ValueError(f"No matching interaction found for {request.method} {request.url}") raise ValueError(f"No matching interaction found for {request.method} {request.url}")
print("✅ Found matching interaction from cassette!") logger.debug("✅ Found matching interaction from cassette!")
# Build response from saved data # Build response from saved data
response_data = interaction["response"] response_data = interaction["response"]
@@ -273,9 +276,9 @@ class ReplayTransport(httpx.MockTransport):
# Decode base64 content # Decode base64 content
try: try:
content_bytes = base64.b64decode(content["data"]) content_bytes = base64.b64decode(content["data"])
print(f"🎬 ReplayTransport: Decoded {len(content_bytes)} bytes from base64") logger.debug(f"🎬 ReplayTransport: Decoded {len(content_bytes)} bytes from base64")
except Exception as e: except Exception as e:
print(f"⚠️ Failed to decode base64 content: {e}") logger.warning(f"⚠️ Failed to decode base64 content: {e}")
content_bytes = json.dumps(content).encode("utf-8") content_bytes = json.dumps(content).encode("utf-8")
else: else:
# Legacy format or stub content # Legacy format or stub content
@@ -289,11 +292,11 @@ class ReplayTransport(httpx.MockTransport):
# Re-compress the content for httpx # Re-compress the content for httpx
import gzip import gzip
print(f"🗜️ ReplayTransport: Re-compressing {len(content_bytes)} bytes with gzip...") logger.debug(f"🗜️ ReplayTransport: Re-compressing {len(content_bytes)} bytes with gzip...")
content_bytes = gzip.compress(content_bytes) content_bytes = gzip.compress(content_bytes)
print(f"🗜️ ReplayTransport: Compressed to {len(content_bytes)} bytes") logger.debug(f"🗜️ ReplayTransport: Compressed to {len(content_bytes)} bytes")
print(f"🎬 ReplayTransport: Returning cassette response with content: {content_bytes[:100]}...") logger.debug(f"🎬 ReplayTransport: Returning cassette response with content: {content_bytes[:100]}...")
# Create httpx.Response # Create httpx.Response
return httpx.Response( return httpx.Response(

File diff suppressed because one or more lines are too long

View File

@@ -112,7 +112,7 @@ class PIISanitizer:
), ),
PIIPattern.create( PIIPattern.create(
name="phone_number", name="phone_number",
pattern=r"(?:\+\d{1,3}[\s\-]?)?\(?\d{3}\)?[\s\-]?\d{3}[\s\-]?\d{4}", pattern=r"(?:\+\d{1,3}[\s\-]?)?\(?\d{3}\)?[\s\-]?\d{3}[\s\-]?\d{4}\b(?![\d\.\,\]\}])",
replacement="(XXX) XXX-XXXX", replacement="(XXX) XXX-XXXX",
description="Phone numbers (all formats)", description="Phone numbers (all formats)",
), ),

View File

@@ -694,6 +694,6 @@ class TestAutoModeWithRestrictions:
registry._initialized_providers.clear() registry._initialized_providers.clear()
registry._providers.update(original_providers) registry._providers.update(original_providers)
registry._initialized_providers.update(original_initialized) registry._initialized_providers.update(original_initialized)
# Clear the restriction service to prevent state leakage # Clear the restriction service to prevent state leakage
utils.model_restrictions._restriction_service = None utils.model_restrictions._restriction_service = None

View File

@@ -10,6 +10,7 @@ the OpenAI SDK to create real response objects that we can test.
RECORDING: To record new responses, delete the cassette file and run with real API keys. RECORDING: To record new responses, delete the cassette file and run with real API keys.
""" """
import logging
import os import os
import unittest import unittest
from pathlib import Path from pathlib import Path
@@ -22,6 +23,8 @@ from providers import ModelProviderRegistry
from tests.transport_helpers import inject_transport from tests.transport_helpers import inject_transport
from tools.chat import ChatTool from tools.chat import ChatTool
logger = logging.getLogger(__name__)
# Load environment variables from .env file # Load environment variables from .env file
load_dotenv() load_dotenv()
@@ -46,17 +49,27 @@ class TestO3ProOutputTextFix:
ModelProviderRegistry.reset_for_testing() ModelProviderRegistry.reset_for_testing()
@pytest.mark.no_mock_provider # Disable provider mocking for this test @pytest.mark.no_mock_provider # Disable provider mocking for this test
@patch.dict(os.environ, {"OPENAI_ALLOWED_MODELS": "o3-pro,o3-pro-2025-06-10"}) @patch.dict(os.environ, {"OPENAI_ALLOWED_MODELS": "o3-pro,o3-pro-2025-06-10", "LOCALE": ""})
async def test_o3_pro_uses_output_text_field(self, monkeypatch): async def test_o3_pro_uses_output_text_field(self, monkeypatch):
"""Test that o3-pro parsing uses the output_text convenience field via ChatTool.""" """Test that o3-pro parsing uses the output_text convenience field via ChatTool."""
# Set API key inline - helper will handle provider registration
monkeypatch.setenv("OPENAI_API_KEY", "dummy-key-for-replay")
cassette_path = cassette_dir / "o3_pro_basic_math.json" cassette_path = cassette_dir / "o3_pro_basic_math.json"
# Require cassette for test - no cargo culting # Check if we need to record or replay
if not cassette_path.exists(): if not cassette_path.exists():
pytest.skip("Cassette file required - record with real OPENAI_API_KEY") # Recording mode - check for real API key
real_api_key = os.getenv("OPENAI_API_KEY", "").strip()
if not real_api_key or real_api_key.startswith("dummy"):
pytest.fail(
f"Cassette file not found at {cassette_path}. "
"To record: Set OPENAI_API_KEY environment variable to a valid key and run this test. "
"Note: Recording will make a real API call to OpenAI."
)
# Real API key is available, we'll record the cassette
logger.debug("🎬 Recording mode: Using real API key to record cassette")
else:
# Replay mode - use dummy key
monkeypatch.setenv("OPENAI_API_KEY", "dummy-key-for-replay")
logger.debug("📼 Replay mode: Using recorded cassette")
# Simplified transport injection - just one line! # Simplified transport injection - just one line!
inject_transport(monkeypatch, cassette_path) inject_transport(monkeypatch, cassette_path)
@@ -90,7 +103,12 @@ class TestO3ProOutputTextFix:
response_data = json.loads(result[0].text) response_data = json.loads(result[0].text)
# Debug log the response
logger.debug(f"Response data: {json.dumps(response_data, indent=2)}")
# Verify response structure - no cargo culting # Verify response structure - no cargo culting
if response_data["status"] == "error":
pytest.fail(f"Chat tool returned error: {response_data.get('error', 'Unknown error')}")
assert response_data["status"] in ["success", "continuation_available"] assert response_data["status"] in ["success", "continuation_available"]
assert "4" in response_data["content"] assert "4" in response_data["content"]
@@ -101,11 +119,11 @@ class TestO3ProOutputTextFix:
if __name__ == "__main__": if __name__ == "__main__":
print("🎥 OpenAI Response Recording Tests for O3-Pro Output Text Fix") logging.basicConfig(level=logging.INFO)
print("=" * 50) logger.info("🎥 OpenAI Response Recording Tests for O3-Pro Output Text Fix")
print("RECORD MODE: Requires OPENAI_API_KEY - makes real API calls through ChatTool") logger.info("=" * 50)
print("REPLAY MODE: Uses recorded HTTP responses - free and fast") logger.info("RECORD MODE: Requires OPENAI_API_KEY - makes real API calls through ChatTool")
print("RECORDING: Delete .json files in tests/openai_cassettes/ to re-record") logger.info("REPLAY MODE: Uses recorded HTTP responses - free and fast")
print() logger.info("RECORDING: Delete .json files in tests/openai_cassettes/ to re-record")
unittest.main() unittest.main()