feat: add full directory support and smart file handling

Major improvements to file handling capabilities:

- Add directory traversal support to all file-processing tools
- Tools now accept both individual files and entire directories
- Automatically expand directories and discover code files recursively
- Smart filtering: skip hidden files, __pycache__, and non-code files
- Progressive token loading: read as many files as possible within limits
- Clear file separation markers with full paths for Gemini

Key changes:
- Rewrite file_utils.py with expand_paths() and improved read_files()
- Update all tool descriptions to indicate directory support
- Add comprehensive tests for directory handling and token limits
- Document tool parameters and examples in README
- Bump version to 2.4.2

All tools (analyze, review_code, debug_issue, think_deeper) now support:
- Single files: "analyze main.py"
- Directories: "review src/"
- Mixed paths: "analyze config.py, src/, tests/"

This enables analyzing entire projects or specific subsystems efficiently
while respecting token limits and providing clear file boundaries.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Fahad
2025-06-09 06:00:25 +04:00
parent 024fdd48c9
commit 545338ca23
9 changed files with 384 additions and 46 deletions

View File

@@ -257,6 +257,68 @@ Just ask Claude naturally:
"Get gemini to show server configuration"
```
## Tool Parameters
All tools that work with files now support **both individual files and entire directories**. The server automatically expands directories, filters for relevant code files, and manages token limits.
### File-Processing Tools
**`analyze`** - Analyze files or directories
- `files`: List of file paths or directories (required)
- `question`: What to analyze (required)
- `analysis_type`: architecture|performance|security|quality|general
- `output_format`: summary|detailed|actionable
```
"Use gemini to analyze the src/ directory for architectural patterns"
"Get gemini to analyze main.py and tests/ to understand test coverage"
```
**`review_code`** - Review code files or directories
- `files`: List of file paths or directories (required)
- `review_type`: full|security|performance|quick
- `focus_on`: Specific aspects to focus on
- `standards`: Coding standards to enforce
- `severity_filter`: critical|high|medium|all
```
"Use gemini to review the entire api/ directory for security issues"
"Get gemini to review src/ with focus on performance, only show critical issues"
```
**`debug_issue`** - Debug with file context
- `error_description`: Description of the issue (required)
- `error_context`: Stack trace or logs
- `relevant_files`: Files or directories related to the issue
- `runtime_info`: Environment details
- `previous_attempts`: What you've tried
```
"Use gemini to debug this error with context from the entire backend/ directory"
```
**`think_deeper`** - Extended analysis with file context
- `current_analysis`: Your current thinking (required)
- `problem_context`: Additional context
- `focus_areas`: Specific aspects to focus on
- `reference_files`: Files or directories for context
```
"Use gemini to think deeper about my design with reference to the src/models/ directory"
```
### Directory Support Features
- **Automatic Expansion**: Directories are recursively scanned for code files
- **Smart Filtering**: Hidden files, caches, and non-code files are automatically excluded
- **Token Management**: Loads as many files as possible within token limits
- **Clear Markers**: Each file is marked with full path for Gemini to distinguish
Example with mixed paths:
```
"Use gemini to analyze config.py, src/, and tests/unit/ to understand the testing strategy"
```
## Collaborative Workflows
### Design → Review → Implement

View File

@@ -3,7 +3,7 @@ Configuration and constants for Gemini MCP Server
"""
# Version and metadata
__version__ = "2.4.1"
__version__ = "2.4.2"
__updated__ = "2025-06-09"
__author__ = "Fahad Gilani"

View File

@@ -16,23 +16,26 @@ class TestFileUtils:
"def hello():\n return 'world'", encoding="utf-8"
)
content = read_file_content(str(test_file))
content, tokens = read_file_content(str(test_file))
assert "--- BEGIN FILE:" in content
assert "--- END FILE:" in content
assert "def hello():" in content
assert "return 'world'" in content
assert tokens > 0 # Should have estimated tokens
def test_read_file_content_not_found(self):
"""Test reading non-existent file"""
content = read_file_content("/nonexistent/file.py")
content, tokens = read_file_content("/nonexistent/file.py")
assert "--- FILE NOT FOUND:" in content
assert "Error: File does not exist" in content
assert tokens > 0
def test_read_file_content_directory(self, tmp_path):
"""Test reading a directory"""
content = read_file_content(str(tmp_path))
content, tokens = read_file_content(str(tmp_path))
assert "--- NOT A FILE:" in content
assert "Error: Path is not a file" in content
assert tokens > 0
def test_read_files_multiple(self, tmp_path):
"""Test reading multiple files"""
@@ -49,7 +52,7 @@ class TestFileUtils:
assert "print('file1')" in content
assert "print('file2')" in content
assert "Reading 2 file(s)" in summary
assert "Read 2 file(s)" in summary
def test_read_files_with_code(self):
"""Test reading with direct code"""
@@ -62,6 +65,121 @@ class TestFileUtils:
assert "Direct code:" in summary
def test_read_files_directory_support(self, tmp_path):
"""Test reading all files from a directory"""
# Create directory structure
(tmp_path / "file1.py").write_text("print('file1')", encoding="utf-8")
(tmp_path / "file2.js").write_text("console.log('file2')", encoding="utf-8")
(tmp_path / "readme.md").write_text("# README", encoding="utf-8")
# Create subdirectory
subdir = tmp_path / "src"
subdir.mkdir()
(subdir / "module.py").write_text("class Module: pass", encoding="utf-8")
# Create hidden file (should be skipped)
(tmp_path / ".hidden").write_text("secret", encoding="utf-8")
# Read the directory
content, summary = read_files([str(tmp_path)])
# Check files are included
assert "file1.py" in content
assert "file2.js" in content
assert "readme.md" in content
assert "src/module.py" in content
# Check content
assert "print('file1')" in content
assert "console.log('file2')" in content
assert "# README" in content
assert "class Module: pass" in content
# Hidden file should not be included
assert ".hidden" not in content
assert "secret" not in content
# Check summary
assert "Processed 1 dir(s)" in summary
assert "Read 4 file(s)" in summary
def test_read_files_mixed_paths(self, tmp_path):
"""Test reading mix of files and directories"""
# Create files
file1 = tmp_path / "direct.py"
file1.write_text("# Direct file", encoding="utf-8")
# Create directory with files
subdir = tmp_path / "subdir"
subdir.mkdir()
(subdir / "sub1.py").write_text("# Sub file 1", encoding="utf-8")
(subdir / "sub2.py").write_text("# Sub file 2", encoding="utf-8")
# Read mix of direct file and directory
content, summary = read_files([str(file1), str(subdir)])
assert "direct.py" in content
assert "sub1.py" in content
assert "sub2.py" in content
assert "# Direct file" in content
assert "# Sub file 1" in content
assert "# Sub file 2" in content
assert "Processed 1 dir(s)" in summary
assert "Read 3 file(s)" in summary
def test_read_files_token_limit(self, tmp_path):
"""Test token limit handling"""
# Create files with known token counts
# ~250 tokens each (1000 chars)
large_content = "x" * 1000
for i in range(5):
(tmp_path / f"file{i}.txt").write_text(large_content, encoding="utf-8")
# Read with small token limit (should skip some files)
# Reserve 50k tokens, limit to 51k total = 1k available
# Each file ~250 tokens, so should read ~3-4 files
content, summary = read_files([str(tmp_path)], max_tokens=51_000)
assert "Skipped" in summary
assert "token limit" in summary
assert "--- SKIPPED FILES (TOKEN LIMIT) ---" in content
# Count how many files were read
read_count = content.count("--- BEGIN FILE:")
assert 2 <= read_count <= 4 # Should read some but not all
def test_read_files_large_file(self, tmp_path):
"""Test handling of large files"""
# Create a file larger than max_size (1MB)
large_file = tmp_path / "large.txt"
large_file.write_text("x" * 2_000_000, encoding="utf-8") # 2MB
content, summary = read_files([str(large_file)])
assert "--- FILE TOO LARGE:" in content
assert "2,000,000 bytes" in content
assert "Read 1 file(s)" in summary # File is counted but shows error message
def test_read_files_file_extensions(self, tmp_path):
"""Test file extension filtering"""
# Create various file types
(tmp_path / "code.py").write_text("python", encoding="utf-8")
(tmp_path / "style.css").write_text("css", encoding="utf-8")
(tmp_path / "binary.exe").write_text("exe", encoding="utf-8")
(tmp_path / "image.jpg").write_text("jpg", encoding="utf-8")
content, summary = read_files([str(tmp_path)])
# Code files should be included
assert "code.py" in content
assert "style.css" in content
# Binary files should not be included (not in CODE_EXTENSIONS)
assert "binary.exe" not in content
assert "image.jpg" not in content
class TestTokenUtils:
"""Test token counting utilities"""

View File

@@ -16,7 +16,7 @@ from .base import BaseTool, ToolRequest
class AnalyzeRequest(ToolRequest):
"""Request model for analyze tool"""
files: List[str] = Field(..., description="Files to analyze")
files: List[str] = Field(..., description="Files or directories to analyze")
question: str = Field(..., description="What to analyze or look for")
analysis_type: Optional[str] = Field(
None,
@@ -36,6 +36,7 @@ class AnalyzeTool(BaseTool):
def get_description(self) -> str:
return (
"ANALYZE FILES & CODE - General-purpose analysis for understanding code. "
"Supports both individual files and entire directories. "
"Use this for examining files, understanding architecture, or investigating specific aspects. "
"Triggers: 'analyze these files', 'examine this code', 'understand this'. "
"Perfect for: codebase exploration, dependency analysis, pattern detection. "
@@ -49,7 +50,7 @@ class AnalyzeTool(BaseTool):
"files": {
"type": "array",
"items": {"type": "string"},
"description": "Files to analyze",
"description": "Files or directories to analyze",
},
"question": {
"type": "string",

View File

@@ -23,7 +23,7 @@ class DebugIssueRequest(ToolRequest):
None, description="Stack trace, logs, or additional error context"
)
relevant_files: Optional[List[str]] = Field(
None, description="Files that might be related to the issue"
None, description="Files or directories that might be related to the issue"
)
runtime_info: Optional[str] = Field(
None, description="Environment, versions, or runtime information"

View File

@@ -16,7 +16,7 @@ from .base import BaseTool, ToolRequest
class ReviewCodeRequest(ToolRequest):
"""Request model for review_code tool"""
files: List[str] = Field(..., description="Code files to review")
files: List[str] = Field(..., description="Code files or directories to review")
review_type: str = Field(
"full", description="Type of review: full|security|performance|quick"
)
@@ -41,6 +41,7 @@ class ReviewCodeTool(BaseTool):
def get_description(self) -> str:
return (
"PROFESSIONAL CODE REVIEW - Comprehensive analysis for bugs, security, and quality. "
"Supports both individual files and entire directories/projects. "
"Use this for thorough code review with actionable feedback. "
"Triggers: 'review this code', 'check for issues', 'find bugs', 'security audit'. "
"I'll identify issues by severity (Critical→High→Medium→Low) with specific fixes. "
@@ -54,7 +55,7 @@ class ReviewCodeTool(BaseTool):
"files": {
"type": "array",
"items": {"type": "string"},
"description": "Code files to review",
"description": "Code files or directories to review",
},
"review_type": {
"type": "string",

View File

@@ -27,7 +27,7 @@ class ThinkDeeperRequest(ToolRequest):
description="Specific aspects to focus on (architecture, performance, security, etc.)",
)
reference_files: Optional[List[str]] = Field(
None, description="Optional file paths for additional context"
None, description="Optional file paths or directories for additional context"
)

View File

@@ -2,12 +2,14 @@
Utility functions for Gemini MCP Server
"""
from .file_utils import read_file_content, read_files
from .file_utils import read_file_content, read_files, expand_paths, CODE_EXTENSIONS
from .token_utils import check_token_limit, estimate_tokens
__all__ = [
"read_files",
"read_file_content",
"expand_paths",
"CODE_EXTENSIONS",
"estimate_tokens",
"check_token_limit",
]

View File

@@ -1,63 +1,217 @@
"""
File reading utilities
File reading utilities with directory support and token management
"""
import os
from pathlib import Path
from typing import List, Optional, Tuple
from typing import List, Optional, Tuple, Set
from .token_utils import estimate_tokens, MAX_CONTEXT_TOKENS
def read_file_content(file_path: str) -> str:
"""Read a single file and format it for Gemini"""
# Common code file extensions
CODE_EXTENSIONS = {
'.py', '.js', '.ts', '.jsx', '.tsx', '.java', '.cpp', '.c', '.h', '.hpp',
'.cs', '.go', '.rs', '.rb', '.php', '.swift', '.kt', '.scala', '.r', '.m',
'.mm', '.sql', '.sh', '.bash', '.zsh', '.fish', '.ps1', '.bat', '.cmd',
'.yml', '.yaml', '.json', '.xml', '.toml', '.ini', '.cfg', '.conf',
'.txt', '.md', '.rst', '.tex', '.html', '.css', '.scss', '.sass', '.less'
}
def expand_paths(paths: List[str], extensions: Optional[Set[str]] = None) -> List[str]:
"""
Expand paths to individual files, handling both files and directories.
Args:
paths: List of file or directory paths
extensions: Optional set of file extensions to include
Returns:
List of individual file paths
"""
if extensions is None:
extensions = CODE_EXTENSIONS
expanded_files = []
seen = set()
for path in paths:
path_obj = Path(path)
if not path_obj.exists():
continue
if path_obj.is_file():
# Add file directly
if str(path_obj) not in seen:
expanded_files.append(str(path_obj))
seen.add(str(path_obj))
elif path_obj.is_dir():
# Walk directory recursively
for root, dirs, files in os.walk(path_obj):
# Skip hidden directories and __pycache__
dirs[:] = [d for d in dirs if not d.startswith('.') and d != '__pycache__']
for file in files:
# Skip hidden files
if file.startswith('.'):
continue
file_path = Path(root) / file
# Check extension
if not extensions or file_path.suffix.lower() in extensions:
full_path = str(file_path)
if full_path not in seen:
expanded_files.append(full_path)
seen.add(full_path)
# Sort for consistent ordering
expanded_files.sort()
return expanded_files
def read_file_content(file_path: str, max_size: int = 1_000_000) -> Tuple[str, int]:
"""
Read a single file and format it for Gemini.
Args:
file_path: Path to file
max_size: Maximum file size to read
Returns:
(formatted_content, estimated_tokens)
"""
path = Path(file_path)
try:
# Check if path exists and is a file
if not path.exists():
return f"\n--- FILE NOT FOUND: {file_path} ---\nError: File does not exist\n--- END FILE ---\n"
content = f"\n--- FILE NOT FOUND: {file_path} ---\nError: File does not exist\n--- END FILE ---\n"
return content, estimate_tokens(content)
if not path.is_file():
return f"\n--- NOT A FILE: {file_path} ---\nError: Path is not a file\n--- END FILE ---\n"
content = f"\n--- NOT A FILE: {file_path} ---\nError: Path is not a file\n--- END FILE ---\n"
return content, estimate_tokens(content)
# Check file size
file_size = path.stat().st_size
if file_size > max_size:
content = f"\n--- FILE TOO LARGE: {file_path} ---\nFile size: {file_size:,} bytes (max: {max_size:,})\n--- END FILE ---\n"
return content, estimate_tokens(content)
# Read the file
with open(path, "r", encoding="utf-8") as f:
content = f.read()
with open(path, "r", encoding="utf-8", errors="replace") as f:
file_content = f.read()
# Format with clear delimiters for Gemini
return f"\n--- BEGIN FILE: {file_path} ---\n{content}\n--- END FILE: {file_path} ---\n"
formatted = f"\n--- BEGIN FILE: {file_path} ---\n{file_content}\n--- END FILE: {file_path} ---\n"
return formatted, estimate_tokens(formatted)
except Exception as e:
return f"\n--- ERROR READING FILE: {file_path} ---\nError: {str(e)}\n--- END FILE ---\n"
content = f"\n--- ERROR READING FILE: {file_path} ---\nError: {str(e)}\n--- END FILE ---\n"
return content, estimate_tokens(content)
def read_files(
file_paths: List[str], code: Optional[str] = None
file_paths: List[str],
code: Optional[str] = None,
max_tokens: Optional[int] = None,
reserve_tokens: int = 50_000
) -> Tuple[str, str]:
"""
Read multiple files and optional direct code.
Returns: (full_content, brief_summary)
Read multiple files and optional direct code with smart token management.
Args:
file_paths: List of file or directory paths
code: Optional direct code to include
max_tokens: Maximum tokens to use (defaults to MAX_CONTEXT_TOKENS)
reserve_tokens: Tokens to reserve for prompt and response
Returns:
(full_content, brief_summary)
"""
if max_tokens is None:
max_tokens = MAX_CONTEXT_TOKENS
content_parts = []
summary_parts = []
# Process files
if file_paths:
summary_parts.append(f"Reading {len(file_paths)} file(s)")
for file_path in file_paths:
content = read_file_content(file_path)
content_parts.append(content)
# Add direct code if provided
total_tokens = 0
available_tokens = max_tokens - reserve_tokens
files_read = []
files_skipped = []
dirs_processed = []
# First, handle direct code if provided
if code:
formatted_code = (
f"\n--- BEGIN DIRECT CODE ---\n{code}\n--- END DIRECT CODE ---\n"
)
content_parts.append(formatted_code)
code_preview = code[:50] + "..." if len(code) > 50 else code
summary_parts.append(f"Direct code: {code_preview}")
full_content = "\n\n".join(content_parts)
summary = (
" | ".join(summary_parts) if summary_parts else "No input provided"
)
formatted_code = f"\n--- BEGIN DIRECT CODE ---\n{code}\n--- END DIRECT CODE ---\n"
code_tokens = estimate_tokens(formatted_code)
if code_tokens <= available_tokens:
content_parts.append(formatted_code)
total_tokens += code_tokens
available_tokens -= code_tokens
code_preview = code[:50] + "..." if len(code) > 50 else code
summary_parts.append(f"Direct code: {code_preview}")
else:
summary_parts.append("Direct code skipped (too large)")
# Expand all paths to get individual files
if file_paths:
# Track which paths are directories
for path in file_paths:
if Path(path).is_dir():
dirs_processed.append(path)
# Expand to get all files
all_files = expand_paths(file_paths)
if not all_files and file_paths:
# No files found but paths were provided
content_parts.append(f"\n--- NO FILES FOUND ---\nProvided paths: {', '.join(file_paths)}\n--- END ---\n")
else:
# Read files up to token limit
for file_path in all_files:
if total_tokens >= available_tokens:
files_skipped.append(file_path)
continue
file_content, file_tokens = read_file_content(file_path)
# Check if adding this file would exceed limit
if total_tokens + file_tokens <= available_tokens:
content_parts.append(file_content)
total_tokens += file_tokens
files_read.append(file_path)
else:
files_skipped.append(file_path)
# Build summary
if dirs_processed:
summary_parts.append(f"Processed {len(dirs_processed)} dir(s)")
if files_read:
summary_parts.append(f"Read {len(files_read)} file(s)")
if files_skipped:
summary_parts.append(f"Skipped {len(files_skipped)} file(s) (token limit)")
if total_tokens > 0:
summary_parts.append(f"~{total_tokens:,} tokens used")
# Add skipped files note if any were skipped
if files_skipped:
skip_note = f"\n\n--- SKIPPED FILES (TOKEN LIMIT) ---\n"
skip_note += f"Total skipped: {len(files_skipped)}\n"
# Show first 10 skipped files
for i, file_path in enumerate(files_skipped[:10]):
skip_note += f" - {file_path}\n"
if len(files_skipped) > 10:
skip_note += f" ... and {len(files_skipped) - 10} more\n"
skip_note += "--- END SKIPPED FILES ---\n"
content_parts.append(skip_note)
full_content = "\n\n".join(content_parts) if content_parts else ""
summary = " | ".join(summary_parts) if summary_parts else "No input provided"
return full_content, summary