Add DocGen tool with comprehensive documentation generation capabilities (#109)

* WIP: new workflow architecture * WIP: further improvements and cleanup * WIP: cleanup and docks, replace old tool with new * WIP: cleanup and docks, replace old tool with new * WIP: new planner implementation using workflow * WIP: precommit tool working as a workflow instead of a basic tool Support for passing False to use_assistant_model to skip external models completely and use Claude only * WIP: precommit workflow version swapped with old * WIP: codereview * WIP: replaced codereview * WIP: replaced codereview * WIP: replaced refactor * WIP: workflow for thinkdeep * WIP: ensure files get embedded correctly * WIP: thinkdeep replaced with workflow version * WIP: improved messaging when an external model's response is received * WIP: analyze tool swapped * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: updated tests * Extract only the content when building history * Use "relevant_files" for workflow tools only * WIP: fixed get_completion_next_steps_message missing param * Fixed tests Request for files consistently * Fixed tests Request for files consistently * Fixed tests * New testgen workflow tool Updated docs * Swap testgen workflow * Fix CI test failures by excluding API-dependent tests - Update GitHub Actions workflow to exclude simulation tests that require API keys - Fix collaboration tests to properly mock workflow tool expert analysis calls - Update test assertions to handle new workflow tool response format - Ensure unit tests run without external API dependencies in CI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * WIP - Update tests to match new tools * WIP - Update tests to match new tools * WIP - Update tests to match new tools * Should help with https://github.com/BeehiveInnovations/zen-mcp-server/issues/97 Clear python cache when running script: https://github.com/BeehiveInnovations/zen-mcp-server/issues/96 Improved retry error logging Cleanup * WIP - chat tool using new architecture and improved code sharing * Removed todo * Removed todo * Cleanup old name * Tweak wordings * Tweak wordings Migrate old tests * Support for Flash 2.0 and Flash Lite 2.0 * Support for Flash 2.0 and Flash Lite 2.0 * Support for Flash 2.0 and Flash Lite 2.0 Fixed test * Improved consensus to use the workflow base class * Improved consensus to use the workflow base class * Allow images * Allow images * Replaced old consensus tool * Cleanup tests * Tests for prompt size * New tool: docgen Tests for prompt size Fixes: https://github.com/BeehiveInnovations/zen-mcp-server/issues/107 Use available token size limits: https://github.com/BeehiveInnovations/zen-mcp-server/issues/105 * Improved docgen prompt Exclude TestGen from pytest inclusion * Updated errors * Lint * DocGen instructed not to fix bugs, surface them and stick to d * WIP * Stop claude from being lazy and only documenting a small handful * More style rules --------- Co-authored-by: Claude <noreply@anthropic.com>
2025-06-21 23:21:19 -07:00
parent 0655590a51
commit c960bcb720
58 changed files with 5492 additions and 5558 deletions
--- a/tests/test_prompt_regression.py
+++ b/tests/test_prompt_regression.py
@@ -1,163 +1,191 @@
 """
-Regression tests to ensure normal prompt handling still works after large prompt changes.
+Integration tests to ensure normal prompt handling works with real API calls.

 This test module verifies that all tools continue to work correctly with
-normal-sized prompts after implementing the large prompt handling feature.
+normal-sized prompts using real integration testing instead of mocks.
+
+INTEGRATION TESTS:
+These tests are marked with @pytest.mark.integration and make real API calls.
+They use the local-llama model which is FREE and runs locally via Ollama.
+
+Prerequisites:
+- Ollama installed and running locally
+- CUSTOM_API_URL environment variable set to your Ollama endpoint (e.g., http://localhost:11434)
+- local-llama model available through custom provider configuration
+- No API keys required - completely FREE to run unlimited times!
+
+Running Tests:
+- All tests (including integration): pytest tests/test_prompt_regression.py
+- Unit tests only: pytest tests/test_prompt_regression.py -m "not integration"
+- Integration tests only: pytest tests/test_prompt_regression.py -m "integration"
+
+Note: Integration tests skip gracefully if CUSTOM_API_URL is not set.
+They are excluded from CI/CD but run by default locally when Ollama is configured.
 """

 import json
-from unittest.mock import MagicMock, patch
+import os
+import tempfile

 import pytest

+# Load environment variables from .env file
+from dotenv import load_dotenv
+
 from tools.analyze import AnalyzeTool
 from tools.chat import ChatTool
 from tools.codereview import CodeReviewTool
-
-# from tools.debug import DebugIssueTool  # Commented out - debug tool refactored
 from tools.thinkdeep import ThinkDeepTool

+load_dotenv()

-class TestPromptRegression:
-    """Regression test suite for normal prompt handling."""
+# Check if CUSTOM_API_URL is available for local-llama
+CUSTOM_API_AVAILABLE = os.getenv("CUSTOM_API_URL") is not None

-    @pytest.fixture
-    def mock_model_response(self):
-        """Create a mock model response."""
-        from unittest.mock import Mock

-        def _create_response(text="Test response"):
-            # Return a Mock that acts like ModelResponse
-            return Mock(
-                content=text,
-                usage={"input_tokens": 10, "output_tokens": 20, "total_tokens": 30},
-                model_name="gemini-2.5-flash",
-                metadata={"finish_reason": "STOP"},
-            )
+def skip_if_no_custom_api():
+    """Helper to skip integration tests if CUSTOM_API_URL is not available."""
+    if not CUSTOM_API_AVAILABLE:
+        pytest.skip(
+            "CUSTOM_API_URL not set. To run integration tests with local-llama, ensure CUSTOM_API_URL is set in .env file (e.g., http://localhost:11434/v1)"
+        )

-        return _create_response

+class TestPromptIntegration:
+    """Integration test suite for normal prompt handling with real API calls."""
+
+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_chat_normal_prompt(self, mock_model_response):
-        """Test chat tool with normal prompt."""
+    async def test_chat_normal_prompt(self):
+        """Test chat tool with normal prompt using real API."""
+        skip_if_no_custom_api()
+
        tool = ChatTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response(
-                "This is a helpful response about Python."
-            )
-            mock_get_provider.return_value = mock_provider
+        result = await tool.execute(
+            {
+                "prompt": "Explain Python decorators in one sentence",
+                "model": "local-llama",  # Use available model for integration tests
+            }
+        )

-            result = await tool.execute({"prompt": "Explain Python decorators"})
+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        assert output["status"] in ["success", "continuation_available"]
+        assert "content" in output
+        assert len(output["content"]) > 0
+
+    @pytest.mark.integration
+    @pytest.mark.asyncio
+    async def test_chat_with_files(self):
+        """Test chat tool with files parameter using real API."""
+        skip_if_no_custom_api()
+
+        tool = ChatTool()
+
+        # Create a temporary Python file for testing
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+            f.write(
+                """
+def hello_world():
+    \"\"\"A simple hello world function.\"\"\"
+    return "Hello, World!"
+
+if __name__ == "__main__":
+    print(hello_world())
+"""
+            )
+            temp_file = f.name
+
+        try:
+            result = await tool.execute(
+                {"prompt": "What does this Python code do?", "files": [temp_file], "model": "local-llama"}
+            )

            assert len(result) == 1
            output = json.loads(result[0].text)
-            assert output["status"] == "success"
-            assert "helpful response about Python" in output["content"]
-
-            # Verify provider was called
-            mock_provider.generate_content.assert_called_once()
+            assert output["status"] in ["success", "continuation_available"]
+            assert "content" in output
+            # Should mention the hello world function
+            assert "hello" in output["content"].lower() or "function" in output["content"].lower()
+        finally:
+            # Clean up temp file
+            os.unlink(temp_file)

+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_chat_with_files(self, mock_model_response):
-        """Test chat tool with files parameter."""
-        tool = ChatTool()
+    async def test_thinkdeep_normal_analysis(self):
+        """Test thinkdeep tool with normal analysis using real API."""
+        skip_if_no_custom_api()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
-
-            # Mock file reading through the centralized method
-            with patch.object(tool, "_prepare_file_content_for_prompt") as mock_prepare_files:
-                mock_prepare_files.return_value = ("File content here", ["/path/to/file.py"])
-
-                result = await tool.execute({"prompt": "Analyze this code", "files": ["/path/to/file.py"]})
-
-                assert len(result) == 1
-                output = json.loads(result[0].text)
-                assert output["status"] == "success"
-                mock_prepare_files.assert_called_once_with(["/path/to/file.py"], None, "Context files")
-
-    @pytest.mark.asyncio
-    async def test_thinkdeep_normal_analysis(self, mock_model_response):
-        """Test thinkdeep tool with normal analysis."""
        tool = ThinkDeepTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response(
-                "Here's a deeper analysis with edge cases..."
-            )
-            mock_get_provider.return_value = mock_provider
+        result = await tool.execute(
+            {
+                "step": "I think we should use a cache for performance",
+                "step_number": 1,
+                "total_steps": 1,
+                "next_step_required": False,
+                "findings": "Building a high-traffic API - considering scalability and reliability",
+                "problem_context": "Building a high-traffic API",
+                "focus_areas": ["scalability", "reliability"],
+                "model": "local-llama",
+            }
+        )

+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        # ThinkDeep workflow tool should process the analysis
+        assert "status" in output
+        assert output["status"] in ["calling_expert_analysis", "analysis_complete", "pause_for_investigation"]
+
+    @pytest.mark.integration
+    @pytest.mark.asyncio
+    async def test_codereview_normal_review(self):
+        """Test codereview tool with workflow inputs using real API."""
+        skip_if_no_custom_api()
+
+        tool = CodeReviewTool()
+
+        # Create a temporary Python file for testing
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+            f.write(
+                """
+def process_user_input(user_input):
+    # Potentially unsafe code for demonstration
+    query = f"SELECT * FROM users WHERE name = '{user_input}'"
+    return query
+
+def main():
+    user_name = input("Enter name: ")
+    result = process_user_input(user_name)
+    print(result)
+"""
+            )
+            temp_file = f.name
+
+        try:
            result = await tool.execute(
                {
-                    "step": "I think we should use a cache for performance",
+                    "step": "Initial code review investigation - examining security vulnerabilities",
                    "step_number": 1,
-                    "total_steps": 1,
-                    "next_step_required": False,
-                    "findings": "Building a high-traffic API - considering scalability and reliability",
-                    "problem_context": "Building a high-traffic API",
-                    "focus_areas": ["scalability", "reliability"],
+                    "total_steps": 2,
+                    "next_step_required": True,
+                    "findings": "Found security issues in code",
+                    "relevant_files": [temp_file],
+                    "review_type": "security",
+                    "focus_on": "Look for SQL injection vulnerabilities",
+                    "model": "local-llama",
                }
            )

            assert len(result) == 1
            output = json.loads(result[0].text)
-            # ThinkDeep workflow tool returns calling_expert_analysis status when complete
-            assert output["status"] == "calling_expert_analysis"
-            # Check that expert analysis was performed and contains expected content
-            if "expert_analysis" in output:
-                expert_analysis = output["expert_analysis"]
-                analysis_content = str(expert_analysis)
-                assert (
-                    "Critical Evaluation Required" in analysis_content
-                    or "deeper analysis" in analysis_content
-                    or "cache" in analysis_content
-                )
-
-    @pytest.mark.asyncio
-    async def test_codereview_normal_review(self, mock_model_response):
-        """Test codereview tool with workflow inputs."""
-        tool = CodeReviewTool()
-
-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response(
-                "Found 3 issues: 1) Missing error handling..."
-            )
-            mock_get_provider.return_value = mock_provider
-
-            # Mock file reading
-            with patch("tools.base.read_files") as mock_read_files:
-                mock_read_files.return_value = "def main(): pass"
-
-                result = await tool.execute(
-                    {
-                        "step": "Initial code review investigation - examining security vulnerabilities",
-                        "step_number": 1,
-                        "total_steps": 2,
-                        "next_step_required": True,
-                        "findings": "Found security issues in code",
-                        "relevant_files": ["/path/to/code.py"],
-                        "review_type": "security",
-                        "focus_on": "Look for SQL injection vulnerabilities",
-                    }
-                )
-
-                assert len(result) == 1
-                output = json.loads(result[0].text)
-                assert output["status"] == "pause_for_code_review"
+            assert "status" in output
+            assert output["status"] in ["pause_for_code_review", "calling_expert_analysis"]
+        finally:
+            # Clean up temp file
+            os.unlink(temp_file)

    # NOTE: Precommit test has been removed because the precommit tool has been
    # refactored to use a workflow-based pattern instead of accepting simple prompt/path fields.
@@ -193,164 +221,196 @@ class TestPromptRegression:
    #
    #         assert len(result) == 1
    #         output = json.loads(result[0].text)
-    #         assert output["status"] == "success"
+    #         assert output["status"] in ["success", "continuation_available"]
    #         assert "Next Steps:" in output["content"]
    #         assert "Root cause" in output["content"]

+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_analyze_normal_question(self, mock_model_response):
-        """Test analyze tool with normal question."""
+    async def test_analyze_normal_question(self):
+        """Test analyze tool with normal question using real API."""
+        skip_if_no_custom_api()
+
        tool = AnalyzeTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response(
-                "The code follows MVC pattern with clear separation..."
+        # Create a temporary Python file demonstrating MVC pattern
+        with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+            f.write(
+                """
+# Model
+class User:
+    def __init__(self, name, email):
+        self.name = name
+        self.email = email
+
+# View
+class UserView:
+    def display_user(self, user):
+        return f"User: {user.name} ({user.email})"
+
+# Controller
+class UserController:
+    def __init__(self, model, view):
+        self.model = model
+        self.view = view
+
+    def get_user_display(self):
+        return self.view.display_user(self.model)
+"""
            )
-            mock_get_provider.return_value = mock_provider
+            temp_file = f.name

-            # Mock file reading
-            with patch("tools.base.read_files") as mock_read_files:
-                mock_read_files.return_value = "class UserController: ..."
-
-                result = await tool.execute(
-                    {
-                        "step": "What design patterns are used in this codebase?",
-                        "step_number": 1,
-                        "total_steps": 1,
-                        "next_step_required": False,
-                        "findings": "Initial architectural analysis",
-                        "relevant_files": ["/path/to/project"],
-                        "analysis_type": "architecture",
-                    }
-                )
-
-                assert len(result) == 1
-                output = json.loads(result[0].text)
-                # Workflow analyze tool returns "calling_expert_analysis" for step 1
-                assert output["status"] == "calling_expert_analysis"
-                assert "step_number" in output
-
-    @pytest.mark.asyncio
-    async def test_empty_optional_fields(self, mock_model_response):
-        """Test tools work with empty optional fields."""
-        tool = ChatTool()
-
-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
-
-            # Test with no files parameter
-            result = await tool.execute({"prompt": "Hello"})
+        try:
+            result = await tool.execute(
+                {
+                    "step": "What design patterns are used in this codebase?",
+                    "step_number": 1,
+                    "total_steps": 1,
+                    "next_step_required": False,
+                    "findings": "Initial architectural analysis",
+                    "relevant_files": [temp_file],
+                    "analysis_type": "architecture",
+                    "model": "local-llama",
+                }
+            )

            assert len(result) == 1
            output = json.loads(result[0].text)
-            assert output["status"] == "success"
+            assert "status" in output
+            # Workflow analyze tool should process the analysis
+            assert output["status"] in ["calling_expert_analysis", "pause_for_investigation"]
+        finally:
+            # Clean up temp file
+            os.unlink(temp_file)

+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_thinking_modes_work(self, mock_model_response):
-        """Test that thinking modes are properly passed through."""
+    async def test_empty_optional_fields(self):
+        """Test tools work with empty optional fields using real API."""
+        skip_if_no_custom_api()
+
        tool = ChatTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
+        # Test with no files parameter
+        result = await tool.execute({"prompt": "Hello", "model": "local-llama"})

-            result = await tool.execute({"prompt": "Test", "thinking_mode": "high", "temperature": 0.8})
-
-            assert len(result) == 1
-            output = json.loads(result[0].text)
-            assert output["status"] == "success"
-
-            # Verify generate_content was called with correct parameters
-            mock_provider.generate_content.assert_called_once()
-            call_kwargs = mock_provider.generate_content.call_args[1]
-            assert call_kwargs.get("temperature") == 0.8
-            # thinking_mode would be passed if the provider supports it
-            # In this test, we set supports_thinking_mode to False, so it won't be passed
+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        assert output["status"] in ["success", "continuation_available"]
+        assert "content" in output

+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_special_characters_in_prompts(self, mock_model_response):
-        """Test prompts with special characters work correctly."""
+    async def test_thinking_modes_work(self):
+        """Test that thinking modes are properly passed through using real API."""
+        skip_if_no_custom_api()
+
        tool = ChatTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
+        result = await tool.execute(
+            {
+                "prompt": "Explain quantum computing briefly",
+                "thinking_mode": "low",
+                "temperature": 0.8,
+                "model": "local-llama",
+            }
+        )

-            special_prompt = 'Test with "quotes" and\nnewlines\tand tabs'
-            result = await tool.execute({"prompt": special_prompt})
-
-            assert len(result) == 1
-            output = json.loads(result[0].text)
-            assert output["status"] == "success"
+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        assert output["status"] in ["success", "continuation_available"]
+        assert "content" in output
+        # Should contain some quantum-related content
+        assert "quantum" in output["content"].lower() or "computing" in output["content"].lower()

+    @pytest.mark.integration
    @pytest.mark.asyncio
-    async def test_mixed_file_paths(self, mock_model_response):
-        """Test handling of various file path formats."""
+    async def test_special_characters_in_prompts(self):
+        """Test prompts with special characters work correctly using real API."""
+        skip_if_no_custom_api()
+
+        tool = ChatTool()
+
+        special_prompt = (
+            'Test with "quotes" and\nnewlines\tand tabs. Please just respond with the number that is the answer to 1+1.'
+        )
+        result = await tool.execute({"prompt": special_prompt, "model": "local-llama"})
+
+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        assert output["status"] in ["success", "continuation_available"]
+        assert "content" in output
+        # Should handle the special characters without crashing - the exact content doesn't matter as much as not failing
+        assert len(output["content"]) > 0
+
+    @pytest.mark.integration
+    @pytest.mark.asyncio
+    async def test_mixed_file_paths(self):
+        """Test handling of various file path formats using real API."""
+        skip_if_no_custom_api()
+
        tool = AnalyzeTool()

-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
+        # Create multiple temporary files to test different path formats
+        temp_files = []
+        try:
+            # Create first file
+            with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+                f.write("def function_one(): pass")
+                temp_files.append(f.name)

-            with patch("utils.file_utils.read_files") as mock_read_files:
-                mock_read_files.return_value = "Content"
+            # Create second file
+            with tempfile.NamedTemporaryFile(mode="w", suffix=".js", delete=False) as f:
+                f.write("function functionTwo() { return 'hello'; }")
+                temp_files.append(f.name)

-                result = await tool.execute(
-                    {
-                        "step": "Analyze these files",
-                        "step_number": 1,
-                        "total_steps": 1,
-                        "next_step_required": False,
-                        "findings": "Initial file analysis",
-                        "relevant_files": [
-                            "/absolute/path/file.py",
-                            "/Users/name/project/src/",
-                            "/home/user/code.js",
-                        ],
-                    }
-                )
-
-                assert len(result) == 1
-                output = json.loads(result[0].text)
-                # Analyze workflow tool returns calling_expert_analysis status when complete
-                assert output["status"] == "calling_expert_analysis"
-                mock_read_files.assert_called_once()
-
-    @pytest.mark.asyncio
-    async def test_unicode_content(self, mock_model_response):
-        """Test handling of unicode content in prompts."""
-        tool = ChatTool()
-
-        with patch.object(tool, "get_model_provider") as mock_get_provider:
-            mock_provider = MagicMock()
-            mock_provider.get_provider_type.return_value = MagicMock(value="google")
-            mock_provider.supports_thinking_mode.return_value = False
-            mock_provider.generate_content.return_value = mock_model_response()
-            mock_get_provider.return_value = mock_provider
-
-            unicode_prompt = "Explain this: 你好世界 مرحبا بالعالم"
-            result = await tool.execute({"prompt": unicode_prompt})
+            result = await tool.execute(
+                {
+                    "step": "Analyze these files",
+                    "step_number": 1,
+                    "total_steps": 1,
+                    "next_step_required": False,
+                    "findings": "Initial file analysis",
+                    "relevant_files": temp_files,
+                    "model": "local-llama",
+                }
+            )

            assert len(result) == 1
            output = json.loads(result[0].text)
-            assert output["status"] == "success"
+            assert "status" in output
+            # Should process the files
+            assert output["status"] in [
+                "calling_expert_analysis",
+                "pause_for_investigation",
+                "files_required_to_continue",
+            ]
+        finally:
+            # Clean up temp files
+            for temp_file in temp_files:
+                if os.path.exists(temp_file):
+                    os.unlink(temp_file)
+
+    @pytest.mark.integration
+    @pytest.mark.asyncio
+    async def test_unicode_content(self):
+        """Test handling of unicode content in prompts using real API."""
+        skip_if_no_custom_api()
+
+        tool = ChatTool()
+
+        unicode_prompt = "Explain what these mean: 你好世界 (Chinese) and مرحبا بالعالم (Arabic)"
+        result = await tool.execute({"prompt": unicode_prompt, "model": "local-llama"})
+
+        assert len(result) == 1
+        output = json.loads(result[0].text)
+        assert output["status"] in ["success", "continuation_available"]
+        assert "content" in output
+        # Should mention hello or world or greeting in some form
+        content_lower = output["content"].lower()
+        assert "hello" in content_lower or "world" in content_lower or "greeting" in content_lower


 if __name__ == "__main__":
-    pytest.main([__file__, "-v"])
+    # Run integration tests by default when called directly
+    pytest.main([__file__, "-v", "-m", "integration"])