Files
lovdata-test/MCP_SERVER_TESTING.md

206 lines
6.8 KiB
Markdown

# MCP Server Testing Guide
This guide shows how to test-run the MCP server developed in `~/git/lovdata-ai`.
## Server Status ✅
The MCP server is fully operational:
- Database connected (lovdata-test)
- 769 laws, 20,254 provisions loaded
- All core functionality working
## Testing Methods
### 1. Health Check (Recommended for Verification)
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test python -c "
import asyncio
from lovdata_mcp.database import initialize_connection_pool
from lovdata_mcp.server import _health_check
async def test():
await initialize_connection_pool()
result = await _health_check()
print('Status:', result)
asyncio.run(test())
"
```
Expected output:
```
✅ Health check passed
Result: {'status': 'healthy', 'database': 'connected', 'laws': 769, 'provisions': 20254, 'provisions_with_embeddings': 0}
```
### 2. Run MCP Server (STDIO Mode)
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test python -m lovdata_mcp.server
```
### 3. Run HTTP Server
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test uvicorn lovdata_mcp.http_server:create_app --reload
```
API will be available at `http://localhost:8000`
### 4. Via Quint (Configured in .mcp.json)
```bash
cd ~/git/lovdata-ai
~/.local/bin/quint-code serve
```
### 5. Run Unit Tests
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
python -m pytest tests/ -v
```
## Environment Setup
- **Database**: `PGDATABASE=lovdata-test` (test database available)
- **Virtual Environment**: `source .venv/bin/activate`
- **Working Directory**: `~/git/lovdata-ai/python/lovdata-mcp`
## Available MCP Tools
The server provides these tools for Norwegian legal document retrieval:
- `get_law` - Retrieve law by slug/doc_id
- `get_provision` - Get single provision by ID
- `get_provisions_batch` - Batch retrieve multiple provisions
- `list_provisions` - List provisions with pagination
- `search_provisions_fts` - Full-text search (Norwegian)
- `search_provisions_vector` - Vector similarity search
- `health_check` - Database connectivity check
## Example Usage
Once running, you can test with MCP clients like Claude Desktop or custom scripts that connect via STDIO or HTTP.
---
# Testing Findings: Health Register Research Task
## Task Context
Research task: Identify Norwegian health registers with conflicting purpose clauses that make data consolidation difficult for the Norwegian Institute of Public Health (FHI).
## Key Gaps Identified
### 1. Missing Structured Gazette Provisions for Health Register Regulations
**Problem**: Health register regulations (forskrifter) are indexed as `gazette_documents` but their individual provisions are not parsed into the `gazette_provisions` table.
**Specific cases**:
- Kreftregisterforskriften (SF/forskrift/2001-12-21-1477)
- Medisinsk fødselsregisterforskriften (SF/forskrift/2001-12-21-1483)
- Dødsårsaksregisterforskriften (SF/forskrift/2001-12-21-1476)
- Norsk pasientregisterforskriften (SF/forskrift/2007-12-07-1389)
- MSIS-forskriften (SF/forskrift/2003-06-20-740)
**Impact**:
- Cannot search for specific sections (e.g., § 1-1 on purpose/formål) within these regulations
- `search_gazette_provisions_fts()` returns zero results when filtering by these doc_ids
- Forces workarounds like:
- Using WebFetch to scrape Lovdata.no (blocked by user in this session)
- Relying only on parent law (helseregisterloven) rather than specific regulations
- Manual knowledge of regulation contents
**Expected behavior**:
Should be able to search for provisions like:
```python
search_gazette_provisions_fts(
query_text="§ 1-1 formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
```
And retrieve the structured purpose clause text.
### 2. Limited Search Capabilities for Regulation Content
**Problem**: The current MCP tools don't provide a way to:
- List all provisions within a specific regulation
- Navigate the hierarchical structure of a regulation (chapters, sections, paragraphs)
- Get a "table of contents" view of a regulation
**Current workaround**:
- Must know exact search terms
- Vector search helps but is imprecise for finding specific structural elements
**Suggested enhancement**:
Add a tool like `list_provisions_by_regulation(doc_id)` that returns the hierarchical structure.
### 3. Unclear Distinction Between Provisions and Gazette Provisions
**Observation**:
- `provisions` table: Contains law (lov) provisions - works well
- `gazette_provisions` table: Should contain regulation (forskrift) provisions - often empty/incomplete
- `search_all_provisions_*` functions search both, but it's not always clear which source returned results
**Suggestion**:
Better documentation or response metadata indicating:
- Which table the result came from
- Whether a regulation has been fully parsed or is just a document stub
## Recommendations
### High Priority
1. **Parse health register regulations into gazette_provisions** - These are critical legal documents for health data governance research
2. **Add bulk import for similar regulation families** - Many regulations under same parent law likely have same issue
### Medium Priority
3. **Add hierarchical navigation tools** - Help users explore regulation structure
4. **Improve search result metadata** - Clearly indicate source table and completeness
### Low Priority
5. **Add provision counting** - Quick check: "This regulation has 45 structured provisions" vs "This regulation is unparsed"
## Test Queries That Failed
```python
# All returned 0 results despite regulations existing in gazette_documents:
search_gazette_provisions_fts(
query_text="formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
search_gazette_provisions_fts(
query_text="Kreftregisteret",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
search_gazette_provisions_vector(
query_text="Kreftregisteret har til formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
```
## Successful Workarounds Used
1. ✅ Searched in parent law (helseregisterloven) provisions table instead
2. ✅ Used helseregisterloven § 11 which lists all registers by name
3. ✅ Applied legal reasoning based on statutory framework rather than specific regulation text
4. ❌ Attempted WebFetch to lovdata.no (rejected by user)
5. ❌ Attempted direct PostgreSQL access (authentication failed)
## Conclusion
The Lovdata MCP server works well for **laws (lover)** but has significant gaps for **regulations (forskrifter)**. For legal research requiring detailed regulation analysis, the missing gazette_provisions data is a critical limitation.
**Severity**: High for regulatory compliance and governance research
**Affected domains**: Health law, environmental regulations, sector-specific rules where detailed regulation text is essential