fix: Resolve o3-pro test isolation issues and convert print to logging

- Fix test isolation by clearing LOCALE env var in o3-pro test - Add restriction service cleanup in test_model_restrictions.py - Fix PII sanitizer phone regex to not match timestamps - Convert all print statements to logging in test files per PR review - Re-record o3-pro cassette with correct environment 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-07-13 10:41:43 -06:00
parent 3b1c80865b
commit 9248947e39
6 changed files with 152 additions and 135 deletions
--- a/debug_findings_registry_bisect.md
+++ b/debug_findings_registry_bisect.md
@@ -1,94 +0,0 @@
 # Registry Bisection Debug Findings
 ## Final Conclusions
 Through systematic bisection testing, I've discovered that **NONE of the 6 registry operations in TestO3ProOutputTextFix are actually necessary**.
 ## Key Findings
 ### Bisection Results
 1. **Test 1 (no operations)** - ✅ PASSED with full test suite
 2. **Test 2 (cache clear only)** - ✅ PASSED with full test suite  
 3. **Test 3 (instance reset only)** - ❌ FAILED - clears all provider registrations
 4. **Test 4 (both ops + re-register)** - ✅ PASSED with full test suite
 5. **Original test without setup/teardown** - ✅ PASSED with full test suite
 ### Critical Discovery
 The `allow_all_models` fixture alone is sufficient! It:
 - Clears the model restrictions singleton
 - Clears the registry cache (which is all that's needed)
 - Sets up the dummy API key for transport replay
 ### Why the Original Has 6 Operations
 1. **Historical reasons** - Likely copied from other tests or added defensively
 2. **Misunderstanding** - The comment says "Registry reset in setup/teardown is required to ensure fresh provider instance for transport injection" but this is FALSE
 3. **Over-engineering** - The singleton reset is unnecessary and actually harmful (Test 3 proved this)
 ### The Real Requirements
 - Only need `ModelProviderRegistry.clear_cache()` in the fixture (already there)
 - Transport injection via monkeypatch works fine without instance reset
 - The `@pytest.mark.no_mock_provider` ensures conftest auto-mocking doesn't interfere
 ## Recommendations
 ### Immediate Action
 Remove all 6 registry operations from test_o3_pro_output_text_fix.py:
 - Remove `setup_method` entirely
 - Remove `teardown_method` entirely  
 - The fixture already handles everything needed
 ### Code to Remove
 ```python
 def setup_method(self):
    """Set up clean registry for transport injection."""
    # DELETE ALL OF THIS
    ModelProviderRegistry._instance = None
    ModelProviderRegistry.clear_cache()
    ModelProviderRegistry.register_provider(ProviderType.OPENAI, OpenAIModelProvider)
 def teardown_method(self):
    """Reset registry to prevent test pollution."""
    # DELETE ALL OF THIS
    ModelProviderRegistry._instance = None
    ModelProviderRegistry.clear_cache()
 ```
 ### Long-term Improvements
 1. **Document the pattern** - Add comments explaining that transport injection only needs cache clearing
 2. **Update other tests** - Many tests likely have unnecessary registry operations
 3. **Consider fixture improvements** - Create a `clean_registry_cache` fixture for tests that need it
 ## Technical Analysis
 ### Why Cache Clear is Sufficient
 - The registry singleton pattern uses `_providers` and `_initialized_providers` caches
 - Clearing these caches forces re-initialization of providers
 - Transport injection happens during provider initialization
 - No need to reset the singleton instance itself
 ### Why Instance Reset is Harmful  
 - Resetting `_instance = None` clears ALL provider registrations
 - Test 3 proved this - the registry becomes empty
 - Requires re-registering all providers (unnecessary complexity)
 ### Fixture Design
 The `allow_all_models` fixture is well-designed:
 - Clears model restrictions (for testing all models)
 - Clears registry cache (for clean provider state)
 - Sets dummy API key (for transport replay)
 - Cleans up after itself
 ## Summary
 The 6 registry operations in TestO3ProOutputTextFix are **completely unnecessary**. The test works perfectly with just the `allow_all_models` fixture. This is a clear case of over-engineering and cargo-cult programming - copying patterns without understanding their necessity.
 The systematic bisection proved that simpler is better. The fixture provides all needed isolation, and the extra registry manipulations just add complexity and confusion.
 ## Implementation Complete
 ✅ Successfully removed all 6 unnecessary registry operations from test_o3_pro_output_text_fix.py
 ✅ Test passes in isolation and with full test suite
 ✅ Code quality checks pass 100%
 ✅ O3-pro validated the findings and approved the simplification
 The test is now 22 lines shorter and much clearer without the unnecessary setup/teardown methods.
--- a/tests/http_transport_recorder.py
+++ b/tests/http_transport_recorder.py
@@ -16,6 +16,7 @@ Key Features:
 import base64
 import hashlib
 import json
 import logging
 from pathlib import Path
 from typing import Any, Optional
@@ -23,6 +24,8 @@ import httpx
 from .pii_sanitizer import PIISanitizer
 logger = logging.getLogger(__name__)
 class RecordingTransport(httpx.HTTPTransport):
    """Transport that wraps default httpx transport and records all interactions."""
@@ -36,7 +39,7 @@ class RecordingTransport(httpx.HTTPTransport):
    def handle_request(self, request: httpx.Request) -> httpx.Response:
        """Handle request by recording interaction and delegating to real transport."""
-        print(f"🎬 RecordingTransport: Making request to {request.method} {request.url}")
+        logger.debug(f"🎬 RecordingTransport: Making request to {request.method} {request.url}")
        # Record request BEFORE making the call
        request_data = self._serialize_request(request)
@@ -44,7 +47,7 @@ class RecordingTransport(httpx.HTTPTransport):
        # Make real HTTP call using parent transport
        response = super().handle_request(request)
-        print(f"🎬 RecordingTransport: Got response {response.status_code}")
+        logger.debug(f"🎬 RecordingTransport: Got response {response.status_code}")
        # Post-response content capture (proper approach)
        if self.capture_content:
@@ -53,7 +56,7 @@ class RecordingTransport(httpx.HTTPTransport):
                # Note: httpx automatically handles gzip decompression
                content_bytes = response.read()
                response.close()  # Close the original stream
-                print(f"🎬 RecordingTransport: Captured {len(content_bytes)} bytes of decompressed content")
+                logger.debug(f"🎬 RecordingTransport: Captured {len(content_bytes)} bytes of decompressed content")
                # Serialize response with captured content
                response_data = self._serialize_response_with_content(response, content_bytes)
@@ -64,9 +67,9 @@ class RecordingTransport(httpx.HTTPTransport):
                if response.headers.get("content-encoding") == "gzip":
                    import gzip
-                    print(f"🗜️ Re-compressing {len(content_bytes)} bytes with gzip...")
+                    logger.debug(f"🗜️ Re-compressing {len(content_bytes)} bytes with gzip...")
                    response_content = gzip.compress(content_bytes)
-                    print(f"🗜️ Compressed to {len(response_content)} bytes")
+                    logger.debug(f"🗜️ Compressed to {len(response_content)} bytes")
                new_response = httpx.Response(
                    status_code=response.status_code,
@@ -83,10 +86,10 @@ class RecordingTransport(httpx.HTTPTransport):
                return new_response
            except Exception as e:
-                print(f"⚠️ Content capture failed: {e}, falling back to stub")
+                logger.warning(f"⚠️ Content capture failed: {e}, falling back to stub")
                import traceback
-                print(f"⚠️ Full exception traceback:\n{traceback.format_exc()}")
+                logger.warning(f"⚠️ Full exception traceback:\n{traceback.format_exc()}")
                response_data = self._serialize_response(response)
                self._record_interaction(request_data, response_data)
                return response
@@ -101,7 +104,7 @@ class RecordingTransport(httpx.HTTPTransport):
        interaction = {"request": request_data, "response": response_data}
        self.recorded_interactions.append(interaction)
        self._save_cassette()
-        print(f"🎬 RecordingTransport: Saved cassette to {self.cassette_path}")
+        logger.debug(f"🎬 RecordingTransport: Saved cassette to {self.cassette_path}")
    def _serialize_request(self, request: httpx.Request) -> dict[str, Any]:
        """Serialize httpx.Request to JSON-compatible format."""
@@ -147,21 +150,21 @@ class RecordingTransport(httpx.HTTPTransport):
        """Serialize httpx.Response with captured content."""
        try:
            # Debug: check what we got
-            print(f"🔍 Content type: {type(content_bytes)}, size: {len(content_bytes)}")
+            logger.debug(f"🔍 Content type: {type(content_bytes)}, size: {len(content_bytes)}")
-            print(f"🔍 First 100 chars: {content_bytes[:100]}")
+            logger.debug(f"🔍 First 100 chars: {content_bytes[:100]}")
            # Ensure we have bytes for base64 encoding
            if not isinstance(content_bytes, bytes):
-                print(f"⚠️ Content is not bytes, converting from {type(content_bytes)}")
+                logger.warning(f"⚠️ Content is not bytes, converting from {type(content_bytes)}")
                if isinstance(content_bytes, str):
                    content_bytes = content_bytes.encode("utf-8")
                else:
                    content_bytes = str(content_bytes).encode("utf-8")
            # Encode content as base64 for JSON storage
-            print(f"🔍 Base64 encoding {len(content_bytes)} bytes...")
+            logger.debug(f"🔍 Base64 encoding {len(content_bytes)} bytes...")
            content_b64 = base64.b64encode(content_bytes).decode("utf-8")
-            print(f"✅ Base64 encoded successfully, result length: {len(content_b64)}")
+            logger.debug(f"✅ Base64 encoded successfully, result length: {len(content_b64)}")
            response_data = {
                "status_code": response.status_code,
@@ -176,10 +179,10 @@ class RecordingTransport(httpx.HTTPTransport):
            return response_data
        except Exception as e:
-            print(f"🔍 Error in _serialize_response_with_content: {e}")
+            logger.error(f"🔍 Error in _serialize_response_with_content: {e}")
            import traceback
-            print(f"🔍 Full traceback: {traceback.format_exc()}")
+            logger.error(f"🔍 Full traceback: {traceback.format_exc()}")
            # Fall back to minimal info
            return {
                "status_code": response.status_code,
@@ -231,11 +234,11 @@ class ReplayTransport(httpx.MockTransport):
    def _handle_request(self, request: httpx.Request) -> httpx.Response:
        """Handle request by finding matching interaction and returning saved response."""
-        print(f"🔍 ReplayTransport: Looking for {request.method} {request.url}")
+        logger.debug(f"🔍 ReplayTransport: Looking for {request.method} {request.url}")
        # Debug: show what we're trying to match
        request_signature = self._get_request_signature(request)
-        print(f"🔍 Request signature: {request_signature}")
+        logger.debug(f"🔍 Request signature: {request_signature}")
        # Debug: show actual request content
        content = request.content
@@ -245,22 +248,22 @@ class ReplayTransport(httpx.MockTransport):
            content_str = content.decode("utf-8", errors="ignore")
        else:
            content_str = str(content) if content else ""
-        print(f"🔍 Actual request content: {content_str}")
+        logger.debug(f"🔍 Actual request content: {content_str}")
        # Debug: show available signatures
        for i, interaction in enumerate(self.interactions):
            saved_signature = self._get_saved_request_signature(interaction["request"])
            saved_content = interaction["request"].get("content", {})
-            print(f"🔍 Available signature {i}: {saved_signature}")
+            logger.debug(f"🔍 Available signature {i}: {saved_signature}")
-            print(f"🔍 Saved content {i}: {saved_content}")
+            logger.debug(f"🔍 Saved content {i}: {saved_content}")
        # Find matching interaction
        interaction = self._find_matching_interaction(request)
        if not interaction:
-            print("🚨 MYSTERY SOLVED: No matching interaction found! This should fail...")
+            logger.warning("🚨 MYSTERY SOLVED: No matching interaction found! This should fail...")
            raise ValueError(f"No matching interaction found for {request.method} {request.url}")
-        print("✅ Found matching interaction from cassette!")
+        logger.debug("✅ Found matching interaction from cassette!")
        # Build response from saved data
        response_data = interaction["response"]
@@ -273,9 +276,9 @@ class ReplayTransport(httpx.MockTransport):
                # Decode base64 content
                try:
                    content_bytes = base64.b64decode(content["data"])
-                    print(f"🎬 ReplayTransport: Decoded {len(content_bytes)} bytes from base64")
+                    logger.debug(f"🎬 ReplayTransport: Decoded {len(content_bytes)} bytes from base64")
                except Exception as e:
-                    print(f"⚠️ Failed to decode base64 content: {e}")
+                    logger.warning(f"⚠️ Failed to decode base64 content: {e}")
                    content_bytes = json.dumps(content).encode("utf-8")
            else:
                # Legacy format or stub content
@@ -289,11 +292,11 @@ class ReplayTransport(httpx.MockTransport):
            # Re-compress the content for httpx
            import gzip
-            print(f"🗜️ ReplayTransport: Re-compressing {len(content_bytes)} bytes with gzip...")
+            logger.debug(f"🗜️ ReplayTransport: Re-compressing {len(content_bytes)} bytes with gzip...")
            content_bytes = gzip.compress(content_bytes)
-            print(f"🗜️ ReplayTransport: Compressed to {len(content_bytes)} bytes")
+            logger.debug(f"🗜️ ReplayTransport: Compressed to {len(content_bytes)} bytes")
-        print(f"🎬 ReplayTransport: Returning cassette response with content: {content_bytes[:100]}...")
+        logger.debug(f"🎬 ReplayTransport: Returning cassette response with content: {content_bytes[:100]}...")
        # Create httpx.Response
        return httpx.Response(
--- a/tests/openai_cassettes/o3_pro_basic_math.json
+++ b/tests/openai_cassettes/o3_pro_basic_math.json
--- a/tests/pii_sanitizer.py
+++ b/tests/pii_sanitizer.py
@@ -112,7 +112,7 @@ class PIISanitizer:
            ),
            PIIPattern.create(
                name="phone_number",
-                pattern=r"(?:\+\d{1,3}[\s\-]?)?\(?\d{3}\)?[\s\-]?\d{3}[\s\-]?\d{4}",
+                pattern=r"(?:\+\d{1,3}[\s\-]?)?\(?\d{3}\)?[\s\-]?\d{3}[\s\-]?\d{4}\b(?![\d\.\,\]\}])",
                replacement="(XXX) XXX-XXXX",
                description="Phone numbers (all formats)",
            ),
--- a/tests/test_model_restrictions.py
+++ b/tests/test_model_restrictions.py
@@ -694,6 +694,6 @@ class TestAutoModeWithRestrictions:
            registry._initialized_providers.clear()
            registry._providers.update(original_providers)
            registry._initialized_providers.update(original_initialized)
-            
+
            # Clear the restriction service to prevent state leakage
            utils.model_restrictions._restriction_service = None
--- a/tests/test_o3_pro_output_text_fix.py
+++ b/tests/test_o3_pro_output_text_fix.py
@@ -10,6 +10,7 @@ the OpenAI SDK to create real response objects that we can test.
 RECORDING: To record new responses, delete the cassette file and run with real API keys.
 """
 import logging
 import os
 import unittest
 from pathlib import Path
@@ -22,6 +23,8 @@ from providers import ModelProviderRegistry
 from tests.transport_helpers import inject_transport
 from tools.chat import ChatTool
 logger = logging.getLogger(__name__)
 # Load environment variables from .env file
 load_dotenv()
@@ -46,17 +49,27 @@ class TestO3ProOutputTextFix:
        ModelProviderRegistry.reset_for_testing()
    @pytest.mark.no_mock_provider  # Disable provider mocking for this test
-    @patch.dict(os.environ, {"OPENAI_ALLOWED_MODELS": "o3-pro,o3-pro-2025-06-10"})
+    @patch.dict(os.environ, {"OPENAI_ALLOWED_MODELS": "o3-pro,o3-pro-2025-06-10", "LOCALE": ""})
    async def test_o3_pro_uses_output_text_field(self, monkeypatch):
        """Test that o3-pro parsing uses the output_text convenience field via ChatTool."""
        # Set API key inline - helper will handle provider registration
        monkeypatch.setenv("OPENAI_API_KEY", "dummy-key-for-replay")
        cassette_path = cassette_dir / "o3_pro_basic_math.json"
-        # Require cassette for test - no cargo culting
+        # Check if we need to record or replay
        if not cassette_path.exists():
-            pytest.skip("Cassette file required - record with real OPENAI_API_KEY")
+            # Recording mode - check for real API key
            real_api_key = os.getenv("OPENAI_API_KEY", "").strip()
            if not real_api_key or real_api_key.startswith("dummy"):
                pytest.fail(
                    f"Cassette file not found at {cassette_path}. "
                    "To record: Set OPENAI_API_KEY environment variable to a valid key and run this test. "
                    "Note: Recording will make a real API call to OpenAI."
                )
            # Real API key is available, we'll record the cassette
            logger.debug("🎬 Recording mode: Using real API key to record cassette")
        else:
            # Replay mode - use dummy key
            monkeypatch.setenv("OPENAI_API_KEY", "dummy-key-for-replay")
            logger.debug("📼 Replay mode: Using recorded cassette")
        # Simplified transport injection - just one line!
        inject_transport(monkeypatch, cassette_path)
@@ -90,7 +103,12 @@ class TestO3ProOutputTextFix:
        response_data = json.loads(result[0].text)
        # Debug log the response
        logger.debug(f"Response data: {json.dumps(response_data, indent=2)}")
        # Verify response structure - no cargo culting
        if response_data["status"] == "error":
            pytest.fail(f"Chat tool returned error: {response_data.get('error', 'Unknown error')}")
        assert response_data["status"] in ["success", "continuation_available"]
        assert "4" in response_data["content"]
@@ -101,11 +119,11 @@ class TestO3ProOutputTextFix:
 if __name__ == "__main__":
-    print("🎥 OpenAI Response Recording Tests for O3-Pro Output Text Fix")
+    logging.basicConfig(level=logging.INFO)
-    print("=" * 50)
+    logger.info("🎥 OpenAI Response Recording Tests for O3-Pro Output Text Fix")
-    print("RECORD MODE: Requires OPENAI_API_KEY - makes real API calls through ChatTool")
+    logger.info("=" * 50)
-    print("REPLAY MODE: Uses recorded HTTP responses - free and fast")
+    logger.info("RECORD MODE: Requires OPENAI_API_KEY - makes real API calls through ChatTool")
-    print("RECORDING: Delete .json files in tests/openai_cassettes/ to re-record")
+    logger.info("REPLAY MODE: Uses recorded HTTP responses - free and fast")
-    print()
+    logger.info("RECORDING: Delete .json files in tests/openai_cassettes/ to re-record")
    unittest.main()