Files
my-pal-mcp-server/utils/token_utils.py
Fahad 27add4d05d feat: Major refactoring and improvements v2.11.0
## 🚀 Major Improvements

### Docker Environment Simplification
- **BREAKING**: Simplified Docker configuration by auto-detecting sandbox from WORKSPACE_ROOT
- Removed redundant MCP_PROJECT_ROOT requirement for Docker setups
- Updated all Docker config examples and setup scripts
- Added security validation for dangerous WORKSPACE_ROOT paths

### Security Enhancements
- **CRITICAL**: Fixed insecure PROJECT_ROOT fallback to use current directory instead of home
- Enhanced path validation with proper Docker environment detection
- Removed information disclosure in error messages
- Strengthened symlink and path traversal protection

### File Handling Optimization
- **PERFORMANCE**: Optimized read_files() to return content only (removed summary)
- Unified file reading across all tools using standardized file_utils routines
- Fixed review_changes tool to use consistent file loading patterns
- Improved token management and reduced unnecessary processing

### Tool Improvements
- **UX**: Enhanced ReviewCodeTool to require user context for targeted reviews
- Removed deprecated _get_secure_container_path function and _sanitize_filename
- Standardized file access patterns across analyze, review_changes, and other tools
- Added contextual prompting to align reviews with user expectations

### Code Quality & Testing
- Updated all tests for new function signatures and requirements
- Added comprehensive Docker path integration tests
- Achieved 100% test coverage (95 tests passing)
- Full compliance with ruff, black, and isort linting standards

### Configuration & Deployment
- Added pyproject.toml for modern Python packaging
- Streamlined Docker setup removing redundant environment variables
- Updated setup scripts across all platforms (Windows, macOS, Linux)
- Improved error handling and validation throughout

## 🔧 Technical Changes

- **Removed**: `_get_secure_container_path()`, `_sanitize_filename()`, unused SANDBOX_MODE
- **Enhanced**: Path translation, security validation, token management
- **Standardized**: File reading patterns, error handling, Docker detection
- **Updated**: All tool prompts for better context alignment

## 🛡️ Security Notes

This release significantly improves the security posture by:
- Eliminating broad filesystem access defaults
- Adding validation for Docker environment variables
- Removing information disclosure in error paths
- Strengthening path traversal and symlink protections

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-10 09:50:05 +04:00

53 lines
1.6 KiB
Python

"""
Token counting utilities for managing API context limits
This module provides functions for estimating token counts to ensure
requests stay within the Gemini API's context window limits.
Note: The estimation uses a simple character-to-token ratio which is
approximate. For production systems requiring precise token counts,
consider using the actual tokenizer for the specific model.
"""
from config import MAX_CONTEXT_TOKENS
def estimate_tokens(text: str) -> int:
"""
Estimate token count using a character-based approximation.
This uses a rough heuristic where 1 token ≈ 4 characters, which is
a reasonable approximation for English text. The actual token count
may vary based on:
- Language (non-English text may have different ratios)
- Code vs prose (code often has more tokens per character)
- Special characters and formatting
Args:
text: The text to estimate tokens for
Returns:
int: Estimated number of tokens
"""
return len(text) // 4
def check_token_limit(text: str) -> tuple[bool, int]:
"""
Check if text exceeds the maximum token limit for Gemini models.
This function is used to validate that prepared prompts will fit
within the model's context window, preventing API errors and ensuring
reliable operation.
Args:
text: The text to check
Returns:
Tuple[bool, int]: (is_within_limit, estimated_tokens)
- is_within_limit: True if the text fits within MAX_CONTEXT_TOKENS
- estimated_tokens: The estimated token count
"""
estimated = estimate_tokens(text)
return estimated <= MAX_CONTEXT_TOKENS, estimated