Files

Beehive Innovations 69a3121452 🚀 Major Enhancement: Workflow-Based Tool Architecture v5.5.0 (#95 )

* WIP: new workflow architecture

* WIP: further improvements and cleanup

* WIP: cleanup and docks, replace old tool with new

* WIP: cleanup and docks, replace old tool with new

* WIP: new planner implementation using workflow

* WIP: precommit tool working as a workflow instead of a basic tool
Support for passing False to use_assistant_model to skip external models completely and use Claude only

* WIP: precommit workflow version swapped with old

* WIP: codereview

* WIP: replaced codereview

* WIP: replaced codereview

* WIP: replaced refactor

* WIP: workflow for thinkdeep

* WIP: ensure files get embedded correctly

* WIP: thinkdeep replaced with workflow version

* WIP: improved messaging when an external model's response is received

* WIP: analyze tool swapped

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: updated tests
* Extract only the content when building history
* Use "relevant_files" for workflow tools only

* WIP: fixed get_completion_next_steps_message missing param

* Fixed tests
Request for files consistently

* Fixed tests
Request for files consistently

* Fixed tests

* New testgen workflow tool
Updated docs

* Swap testgen workflow

* Fix CI test failures by excluding API-dependent tests

- Update GitHub Actions workflow to exclude simulation tests that require API keys
- Fix collaboration tests to properly mock workflow tool expert analysis calls
- Update test assertions to handle new workflow tool response format
- Ensure unit tests run without external API dependencies in CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* WIP - Update tests to match new tools

* WIP - Update tests to match new tools

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-06-21 00:08:11 +04:00

10 KiB

Raw Blame History

Debug Tool - Systematic Investigation & Expert Analysis

Step-by-step investigation followed by expert debugging assistance

The debug tool guides Claude through a systematic investigation process where Claude performs methodical code examination, evidence collection, and hypothesis formation across multiple steps. Once the investigation is complete, the tool provides expert analysis from the selected AI model based on all gathered findings.

Thinking Mode

Default is medium (8,192 tokens). Use high for tricky bugs (investment in finding root cause) or low for simple errors (save tokens).

Example Prompts

Basic Usage:

Get gemini to debug why my API returns 400 errors randomly with the full stack trace: [paste traceback]

How It Works

The debug tool implements a systematic investigation methodology where Claude is guided through structured debugging steps:

Investigation Phase:

Step 1: Claude describes the issue and begins thinking deeply about possible underlying causes, side-effects, and contributing factors
Step 2+: Claude examines relevant code, traces errors, tests hypotheses, and gathers evidence
Throughout: Claude tracks findings, relevant files, methods, and evolving hypotheses with confidence levels
Backtracking: Claude can revise previous steps when new insights emerge
Completion: Once investigation is thorough, Claude signals completion

Expert Analysis Phase: After Claude completes the investigation, it automatically calls the selected AI model with (unless confidence is certain, in which case expert analysis is bypassed):

Complete investigation summary with all steps and findings
Relevant files and methods identified during investigation
Final hypothesis and confidence assessment
Error context and supporting evidence
Visual debugging materials if provided

This structured approach ensures Claude performs methodical groundwork before expert analysis, resulting in significantly better debugging outcomes and more efficient token usage.

Special Note: If you want Claude to perform the entire debugging investigation without calling another model, you can include "don't use any other model" in your prompt, and Claude will complete the full workflow independently.

Key Features

Multi-step investigation process with evidence collection and hypothesis evolution
Systematic code examination with file and method tracking throughout investigation
Confidence assessment and revision capabilities for investigative steps
Backtracking support to revise previous steps when new insights emerge
Expert analysis integration that provides final debugging recommendations based on complete investigation
Error context support: Stack traces, logs, and runtime information
Visual debugging: Include error screenshots, stack traces, console output
Conversation threading: Continue investigations across multiple sessions
Large context analysis: Handle extensive log files and multiple related code files
Multi-language support: Debug issues across Python, JavaScript, Java, C#, Swift, and more
Web search integration: Identifies when additional research would help solve problems

Tool Parameters

Investigation Step Parameters:

step: Current investigation step description (required)
step_number: Current step number in investigation sequence (required)
total_steps: Estimated total investigation steps (adjustable as process evolves)
next_step_required: Whether another investigation step is needed
findings: Discoveries and evidence collected in this step (required)
files_checked: All files examined during investigation (tracks exploration path)
relevant_files: Files directly tied to the root cause or its effects
relevant_methods: Specific methods/functions involved in the issue
hypothesis: Current best guess about the underlying cause
confidence: Confidence level in current hypothesis (exploring/low/medium/high/certain)
backtrack_from_step: Step number to backtrack from (for revisions)
continuation_id: Thread ID for continuing investigations across sessions
images: Visual debugging materials (error screenshots, logs, etc.)

Model Selection:

model: auto|pro|flash|o3|o3-mini|o4-mini|o4-mini-high (default: server default)
thinking_mode: minimal|low|medium|high|max (default: medium, Gemini only)
use_websearch: Enable web search for documentation and solutions (default: true)
use_assistant_model: Whether to use expert analysis phase (default: true, set to false to use Claude only)

Usage Examples

Basic Error Debugging:

"Debug this TypeError: 'NoneType' object has no attribute 'split' in my parser.py"

With Stack Trace:

"Use gemini to debug why my API returns 500 errors with this stack trace: [paste full traceback]"

With File Context:

"Debug the authentication failure in auth.py and user_model.py with o3"

Performance Debugging:

"Use pro to debug why my application is consuming excessive memory during bulk operations"

With Visual Context:

"Debug this crash using the error screenshot and the related crash_report.log"

Runtime Environment Issues:

"Debug deployment issues with server startup failures, here's the runtime info: [environment details]"

Investigation Methodology

The debug tool enforces a structured investigation process:

Step-by-Step Investigation (Claude-Led):

Initial Problem Description: Claude describes the issue and begins thinking about possible causes, side-effects, and contributing factors
Code Examination: Claude systematically examines relevant files, traces execution paths, and identifies suspicious patterns
Evidence Collection: Claude gathers findings, tracks files checked, and identifies methods/functions involved
Hypothesis Formation: Claude develops working theories about the root cause with confidence assessments
Iterative Refinement: Claude can backtrack and revise previous steps as understanding evolves
Investigation Completion: Claude signals when sufficient evidence has been gathered

Expert Analysis Phase (AI Model): Once investigation is complete, the selected AI model performs:

Root Cause Analysis: Deep analysis of all investigation findings and evidence
Solution Recommendations: Specific fixes with implementation guidance
Prevention Strategies: Measures to avoid similar issues in the future
Testing Approaches: Validation methods for proposed solutions

Key Benefits:

Methodical Evidence Collection: Ensures no critical information is missed
Progressive Understanding: Hypotheses evolve as investigation deepens
Complete Context: Expert analysis receives full investigation history
Efficient Token Usage: Structured approach prevents redundant back-and-forth

Debugging Categories

Runtime Errors:

Exceptions and crashes
Null pointer/reference errors
Type errors and casting issues
Memory leaks and resource exhaustion

Logic Errors:

Incorrect algorithm implementation
Off-by-one errors and boundary conditions
State management issues
Race conditions and concurrency bugs

Integration Issues:

API communication failures
Database connection problems
Third-party service integration
Configuration and environment issues

Performance Problems:

Slow response times
Memory usage spikes
CPU-intensive operations
I/O bottlenecks

Best Practices

For Investigation Steps:

Be thorough in step descriptions: Explain what you're examining and why
Track all files examined: Include even files that don't contain the bug (tracks investigation path)
Document findings clearly: Summarize discoveries, suspicious patterns, and evidence
Evolve hypotheses: Update theories as investigation progresses
Use backtracking wisely: Revise previous steps when new insights emerge
Include visual evidence: Screenshots, error dialogs, console output

For Initial Problem Description:

Provide complete error context: Full stack traces, error messages, and logs
Describe expected vs actual behavior: Clear symptom description
Include environment details: Runtime versions, configuration, deployment context
Mention previous attempts: What debugging steps have already been tried
Be specific about occurrence: When, where, and how the issue manifests

Advanced Features

Large Log Analysis: With models like Gemini Pro (1M context), you can include extensive log files for comprehensive analysis:

"Debug application crashes using these large log files: app.log, error.log, system.log"

Multi-File Investigation: Analyze multiple related files simultaneously to understand complex issues:

"Debug the data processing pipeline issues across processor.py, validator.py, and output_handler.py"

Web Search Integration: The tool can recommend specific searches for error messages, known issues, or documentation:

After analysis: "Recommended searches for Claude: 'Django 4.2 migration error specific_error_code', 'PostgreSQL connection pool exhaustion solutions'"

When to Use Debug vs Other Tools

Use debug for: Specific runtime errors, exceptions, crashes, performance issues requiring systematic investigation
Use codereview for: Finding potential bugs in code without specific errors or symptoms
Use analyze for: Understanding code structure and flow without troubleshooting specific issues
Use precommit for: Validating changes before commit to prevent introducing bugs

Investigation Example

Step 1: "The user authentication fails intermittently with no error logs. I need to investigate the auth flow and identify where failures might occur silently."

Step 2: "Examined auth.py and found three potential failure points: token validation, database connectivity, and session management. No obvious bugs yet but need to trace execution flow."

Step 3: "Found suspicious async/await pattern in session_manager.py lines 45-67. The await might be missing exception handling. This could explain silent failures."

Completion: Investigation reveals likely root cause in exception handling, ready for expert analysis with full context.

10 KiB Raw Blame History