From dc7f9fb048cab7c564d99440fd6cec2e1819b0ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Torbj=C3=B8rn=20Lindahl?= Date: Mon, 12 Jan 2026 14:52:51 +0100 Subject: [PATCH] updated the skill with the latest rewrite --- .../skills/norwegian-legal-research/SKILL.md | 199 +++++++++-------- MCP_SERVER_TESTING.md | 205 ------------------ README.md | 21 +- 3 files changed, 115 insertions(+), 310 deletions(-) delete mode 100644 MCP_SERVER_TESTING.md diff --git a/.claude/skills/norwegian-legal-research/SKILL.md b/.claude/skills/norwegian-legal-research/SKILL.md index 7a4e8cd..d296fa7 100644 --- a/.claude/skills/norwegian-legal-research/SKILL.md +++ b/.claude/skills/norwegian-legal-research/SKILL.md @@ -499,28 +499,40 @@ https://lovdata.no/lov/2005-06-17-64/§2-1 ## System Integration ## System Integration -### Lovdata API Endpoints -**Available search capabilities:** +### Lovdata MCP Tools +**Available MCP tools:** -**Vector Search Endpoints:** -- `search_provisions_vector` - Semantic search across law provisions -- `search_gazette_provisions_vector` - Semantic search across gazette provisions -- `search_all_provisions_vector` - Combined semantic search across all provisions +**Law-Level Tools:** +- `get_law(doc_id)` - Retrieve a law or regulation by doc_id or short title +- `list_laws(document_type, legal_area, limit, offset)` - List all laws/regulations with filtering +- `search_laws_fulltext(query, limit)` - Full-text search in laws (Norwegian) +- `search_laws_semantic(query, limit, threshold)` - Semantic search in laws -**Full-Text Search Endpoints:** -- `search_provisions` - Full-text search in law provisions -- `search_gazette_provisions` - Full-text search in gazette provisions -- `search_all_provisions` - Combined full-text search +**Provision-Level Tools:** +- `get_provision(provision_id)` - Get a single provision by ID +- `get_provisions_batch(ids)` - Get multiple provisions by IDs (for RAG) +- `list_provisions(law_id, limit, offset)` - List all provisions for a law +- `search_provisions_fulltext(query, law_id, limit)` - Full-text search in provisions +- `search_provisions_semantic(query, law_id, limit, threshold)` - Semantic search in provisions -**Metadata Endpoints:** -- `get_law_metadata` - Retrieve law document information -- `get_provision_details` - Get detailed provision information -- `get_cross_references` - Find related provisions +**Cross-Reference Tools:** +- `get_cross_references(provision_id)` - Get all cross-references from a provision +- `get_cross_references_by_law(law_id)` - Get all provisions referencing a law +- `resolve_reference(reference)` - Resolve legal reference (e.g., 'lov/2014-06-20-42/§8') to provision + +**Content Retrieval Tools:** +- `get_law_content(doc_id)` - Get HTML content of a law/regulation +- `get_law_text(doc_id)` - Get plain text content (without HTML tags) + +**System Tools:** +- `health_check()` - Check database connection and statistics ### Query Formulation Strategies **Optimizing search effectiveness:** **Semantic Search Best Practices:** +- Use `search_laws_semantic` for law-level searches across all laws +- Use `search_provisions_semantic` for provision-level searches (optionally filtered by law_id) - Use complete Norwegian phrases rather than single keywords - Include contextual terms that describe the legal situation - Consider both formal legal terminology and common language @@ -562,16 +574,17 @@ https://lovdata.no/lov/2005-06-17-64/§2-1 **Effective system usage workflows:** **Initial Research Query:** -1. Start with broad semantic search using `search_all_provisions_vector` +1. Start with broad semantic search using `search_laws_semantic` for law-level search or `search_provisions_semantic` for provision-level search 2. Review top 10-15 results for relevance 3. Identify key provisions and their hierarchical context -4. Follow cross-references to related provisions +4. Follow cross-references using `get_cross_references` or `get_cross_references_by_law` **Focused Legal Analysis:** -1. Use specific domain searches (laws vs regulations) -2. Combine semantic and full-text searches for comprehensive coverage -3. Track amendment history for each relevant provision -4. Build citation network of related legal sources +1. Use law-level searches (`search_laws_*`) for finding relevant legislation +2. Use provision-level searches (`search_provisions_*`) for detailed analysis, optionally filtered by law_id +3. Combine semantic and full-text searches for comprehensive coverage +4. Track amendment history by checking the law metadata +5. Build citation network using cross-reference tools **Amendment Impact Assessment:** 1. Search for provisions using original enactment dates @@ -579,92 +592,73 @@ https://lovdata.no/lov/2005-06-17-64/§2-1 3. Assess cumulative impact of changes 4. Verify effective dates and transitional provisions -### Database Schema Integration -**Understanding data relationships:** +### Data Model Understanding +**How the MCP server organizes legal data:** -**Core Tables:** -- `laws`: Document-level metadata (titles, dates, legal areas) -- `provisions`: Individual legal provisions with hierarchical context -- `gazette_provisions`: Gazette-published provisions -- `cross_references`: Relationships between provisions +**Two-Level Structure:** +- **Laws/Regulations**: Document-level metadata (titles, dates, legal areas) +- **Provisions**: Individual legal provisions within each law/regulation -**Key Fields:** -- `doc_id`: Unique identifier (e.g., 'NL/lov/2025-06-20-100') +**Key Concepts:** +- `doc_id`: Unique identifier (e.g., 'NL/lov/2014-06-20-42') +- `korttittel`: Short title for easy reference (e.g., 'Pasientjournalloven') +- `provision_id`: Unique identifier for individual provisions - `provision_text`: Full text of the legal provision -- `embedding`: Vector representation for semantic search -- `search_vector`: PostgreSQL tsvector for full-text search -**Hierarchical Structure:** +**Search Capabilities:** +- **Semantic search**: Uses vector embeddings for conceptual matching +- **Full-text search**: Uses PostgreSQL for keyword matching +- **Cross-references**: Links between related provisions + +**Hierarchical Provision Structure:** - `book_num`, `chapter_num`, `article_num`, `paragraph_num` -- Text-based numbering (supports "8a", "III", "første ledd") +- Text-based numbering supports Norwegian conventions ("8a", "III", "første ledd") - Parent-child relationships between provisions -### Document Type Routing -**Choosing the correct MCP tool based on document identifiers:** +### Document Identification +**Working with Norwegian legal documents:** -**Document Type Identification by Prefix:** +**Document Identifiers:** -| Prefix | Document Type | Source Table | Correct MCP Tool | -|--------|--------------|--------------|------------------| -| `NL/lov/...` | Current consolidated law | `lover` | `get_lov(identifier)` | -| `NL/forskrift/...` | Current regulation | `lover` | `get_lov(identifier)` | -| `SF/forskrift/...` | Central regulation (Sentrale Forskrifter) | `lover` | `get_lov(identifier)` | -| `LTI/lov/...` | Historical gazette law | `lovtidender` | `get_full_lovtidend(lovtidend_id)` | -| `LTI/forskrift/...` | Historical gazette regulation | `lovtidender` | `get_full_lovtidend(lovtidend_id)` | +Norwegian legal documents use standardized identifiers: +- **Full doc_id format:** `NL/lov/YYYY-MM-DD-NN` (e.g., `NL/lov/2014-06-20-42`) +- **Short title (korttittel):** Common name (e.g., `Pasientjournalloven`) -**Using Search Result Metadata:** +**The MCP Tool:** -When search tools return results, they include a `"source"` field: +Use `get_law(doc_id)` for all law and regulation retrieval: +- Accepts full doc_id: `get_law("NL/lov/2014-06-20-42")` +- Accepts short title: `get_law("Pasientjournalloven")` +- Works for both laws (lov) and regulations (forskrift) -- `"source": "forskrift"` → Use provision tools (`get_forskrift`) -- `"source": "lovtidendebestemmelse"` → Use gazette provision tools (`get_lovtidendebestemmelse`) -- `"source": "lovtidend"` → Use `get_full_lovtidend(doc_id)` - -**Routing Decision Tree:** +**Example Workflows:** +**From Search Results:** ``` -1. Did you get doc_id from search results? - → Check the "source" field: - - "lovtidend" → get_full_lovtidend(doc_id) - - "forskrift" → get_forskrift(id) or get_lov(law_doc_id) - - "lovtidendebestemmelse" → get_lovtidendebestemmelse(id) - -2. Do you have a raw doc_id string? - → Check prefix: - - Starts with "LTI/" → get_full_lovtidend(doc_id) - - Starts with "NL/" or "SF/" → get_lov(doc_id) - - No prefix (slug) → get_lov(identifier) -``` - -**Common Errors to Avoid:** - -❌ **Wrong:** Using `get_lov("LTI/lov/2001-05-18-24")` -✅ **Correct:** Using `get_full_lovtidend("LTI/lov/2001-05-18-24")` - -❌ **Wrong:** Ignoring the `"source"` field from search results -✅ **Correct:** Routing based on `"source"` metadata - -**Example Workflow:** - -``` -1. Search: search_all_forskrifter_fts("helseregisterloven") +1. Search: search_laws_fulltext("helseregisterloven") 2. Results include: { - "doc_id": "LTI/lov/2001-05-18-24", - "source": "lovtidend", + "doc_id": "NL/lov/2014-06-20-42", + "korttittel": "Pasientjournalloven", ... } -3. Correct tool: get_full_lovtidend("LTI/lov/2001-05-18-24") +3. Retrieve full law: get_law("NL/lov/2014-06-20-42") + OR: get_law("Pasientjournalloven") ``` -**Note on Multiple Versions:** +**From Known Reference:** +``` +User asks about "pasientjournalloven §8" +1. get_law("Pasientjournalloven") - get the law +2. resolve_reference("lov/2014-06-20-42/§8") - resolve to provision +``` -Norwegian legal documents may exist in multiple forms: -- **Current consolidated laws** (`NL/lov/` prefix) - Recommended for current law -- **Central regulations** (`SF/forskrift/` prefix) - Active regulations (not yet imported) -- **Original gazette versions** (`LTI/` prefix) - Historical reference -- **Amendment versions** (`LTI/` prefix) - Track legislative changes +**Document Types:** -Always verify you're using the appropriate version for your research purpose. +Norwegian legal documents include: +- **Lover (Laws)** - Framework legislation passed by Stortinget +- **Forskrifter (Regulations)** - Implementing regulations from ministries/agencies + +Both are accessed through the same `get_law(doc_id)` tool. ### Search Result Processing **Converting system output to legal analysis:** @@ -697,37 +691,38 @@ Always verify you're using the appropriate version for your research purpose. - Confirm hierarchical relationships are accurate **System Limitation Awareness:** -- Vector search may miss highly specific legal terms -- Full-text search requires exact keyword matches -- Cross-reference data may not be comprehensive -- Amendment tracking depends on data completeness +- Semantic search may miss highly specific legal terms (use full-text search as backup) +- Full-text search requires exact keyword matches (try synonyms) +- Cross-reference data may not be comprehensive (verify important links) +- Amendment tracking may have gaps (cross-check with lovdata.no) **Fallback Strategies:** -- Combine multiple search approaches -- Use full-text search for specific citations -- Manually verify critical provisions -- Consult official Lovdata website for complex cases +- Combine `search_laws_semantic` and `search_laws_fulltext` for comprehensive coverage +- Use `search_provisions_fulltext` for specific citations when semantic search is too broad +- Use `resolve_reference` to validate section references +- Consult official lovdata.no website for complex cases or ambiguous results ### Performance Optimization **Efficient system usage:** **Query Optimization:** - Use specific Norwegian legal terminology -- Prefer semantic search for conceptual queries -- Use full-text search for known citations -- Limit result sets to manageable numbers +- Use `search_laws_semantic` for conceptual queries about laws +- Use `search_provisions_semantic` for detailed provision searches +- Use `search_laws_fulltext` or `search_provisions_fulltext` for known citations +- Limit result sets to manageable numbers (use limit parameter) **Batch Processing:** +- Use `get_provisions_batch(ids)` to retrieve multiple provisions at once - Process related queries together -- Cache frequently accessed provisions - Reuse search results across related questions -- Build provision networks incrementally +- Build provision networks using `get_cross_references` and `get_cross_references_by_law` **Result Filtering:** -- Apply relevance thresholds based on similarity scores -- Filter by amendment dates for currency -- Prioritize by legal hierarchy (laws over regulations) -- Focus on specific legal domains when known +- Apply relevance thresholds using the threshold parameter in semantic searches +- Filter by document_type ('lov' or 'forskrift') in `list_laws` +- Filter by legal_area in `list_laws` when domain is known +- Use law_id parameter in provision searches to focus on specific laws ## Quality Assurance ## Troubleshooting diff --git a/MCP_SERVER_TESTING.md b/MCP_SERVER_TESTING.md deleted file mode 100644 index 2585c15..0000000 --- a/MCP_SERVER_TESTING.md +++ /dev/null @@ -1,205 +0,0 @@ -# MCP Server Testing Guide - -This guide shows how to test-run the MCP server developed in `~/git/lovdata-ai`. - -## Server Status ✅ - -The MCP server is fully operational: -- Database connected (lovdata-test) -- 769 laws, 20,254 provisions loaded -- All core functionality working - -## Testing Methods - -### 1. Health Check (Recommended for Verification) - -```bash -cd ~/git/lovdata-ai/python/lovdata-mcp -source .venv/bin/activate -PGDATABASE=lovdata-test python -c " -import asyncio -from lovdata_mcp.database import initialize_connection_pool -from lovdata_mcp.server import _health_check - -async def test(): - await initialize_connection_pool() - result = await _health_check() - print('Status:', result) - -asyncio.run(test()) -" -``` - -Expected output: -``` -✅ Health check passed -Result: {'status': 'healthy', 'database': 'connected', 'laws': 769, 'provisions': 20254, 'provisions_with_embeddings': 0} -``` - -### 2. Run MCP Server (STDIO Mode) - -```bash -cd ~/git/lovdata-ai/python/lovdata-mcp -source .venv/bin/activate -PGDATABASE=lovdata-test python -m lovdata_mcp.server -``` - -### 3. Run HTTP Server - -```bash -cd ~/git/lovdata-ai/python/lovdata-mcp -source .venv/bin/activate -PGDATABASE=lovdata-test uvicorn lovdata_mcp.http_server:create_app --reload -``` - -API will be available at `http://localhost:8000` - -### 4. Via Quint (Configured in .mcp.json) - -```bash -cd ~/git/lovdata-ai -~/.local/bin/quint-code serve -``` - -### 5. Run Unit Tests - -```bash -cd ~/git/lovdata-ai/python/lovdata-mcp -source .venv/bin/activate -python -m pytest tests/ -v -``` - -## Environment Setup - -- **Database**: `PGDATABASE=lovdata-test` (test database available) -- **Virtual Environment**: `source .venv/bin/activate` -- **Working Directory**: `~/git/lovdata-ai/python/lovdata-mcp` - -## Available MCP Tools - -The server provides these tools for Norwegian legal document retrieval: - -- `get_law` - Retrieve law by slug/doc_id -- `get_provision` - Get single provision by ID -- `get_provisions_batch` - Batch retrieve multiple provisions -- `list_provisions` - List provisions with pagination -- `search_provisions_fts` - Full-text search (Norwegian) -- `search_provisions_vector` - Vector similarity search -- `health_check` - Database connectivity check - -## Example Usage - -Once running, you can test with MCP clients like Claude Desktop or custom scripts that connect via STDIO or HTTP. - ---- - -# Testing Findings: Health Register Research Task - -## Task Context -Research task: Identify Norwegian health registers with conflicting purpose clauses that make data consolidation difficult for the Norwegian Institute of Public Health (FHI). - -## Key Gaps Identified - -### 1. Missing Structured Gazette Provisions for Health Register Regulations - -**Problem**: Health register regulations (forskrifter) are indexed as `gazette_documents` but their individual provisions are not parsed into the `gazette_provisions` table. - -**Specific cases**: -- Kreftregisterforskriften (SF/forskrift/2001-12-21-1477) -- Medisinsk fødselsregisterforskriften (SF/forskrift/2001-12-21-1483) -- Dødsårsaksregisterforskriften (SF/forskrift/2001-12-21-1476) -- Norsk pasientregisterforskriften (SF/forskrift/2007-12-07-1389) -- MSIS-forskriften (SF/forskrift/2003-06-20-740) - -**Impact**: -- Cannot search for specific sections (e.g., § 1-1 on purpose/formål) within these regulations -- `search_gazette_provisions_fts()` returns zero results when filtering by these doc_ids -- Forces workarounds like: - - Using WebFetch to scrape Lovdata.no (blocked by user in this session) - - Relying only on parent law (helseregisterloven) rather than specific regulations - - Manual knowledge of regulation contents - -**Expected behavior**: -Should be able to search for provisions like: -```python -search_gazette_provisions_fts( - query_text="§ 1-1 formål", - law_doc_ids=["SF/forskrift/2001-12-21-1477"] -) -``` - -And retrieve the structured purpose clause text. - -### 2. Limited Search Capabilities for Regulation Content - -**Problem**: The current MCP tools don't provide a way to: -- List all provisions within a specific regulation -- Navigate the hierarchical structure of a regulation (chapters, sections, paragraphs) -- Get a "table of contents" view of a regulation - -**Current workaround**: -- Must know exact search terms -- Vector search helps but is imprecise for finding specific structural elements - -**Suggested enhancement**: -Add a tool like `list_provisions_by_regulation(doc_id)` that returns the hierarchical structure. - -### 3. Unclear Distinction Between Provisions and Gazette Provisions - -**Observation**: -- `provisions` table: Contains law (lov) provisions - works well -- `gazette_provisions` table: Should contain regulation (forskrift) provisions - often empty/incomplete -- `search_all_provisions_*` functions search both, but it's not always clear which source returned results - -**Suggestion**: -Better documentation or response metadata indicating: -- Which table the result came from -- Whether a regulation has been fully parsed or is just a document stub - -## Recommendations - -### High Priority -1. **Parse health register regulations into gazette_provisions** - These are critical legal documents for health data governance research -2. **Add bulk import for similar regulation families** - Many regulations under same parent law likely have same issue - -### Medium Priority -3. **Add hierarchical navigation tools** - Help users explore regulation structure -4. **Improve search result metadata** - Clearly indicate source table and completeness - -### Low Priority -5. **Add provision counting** - Quick check: "This regulation has 45 structured provisions" vs "This regulation is unparsed" - -## Test Queries That Failed - -```python -# All returned 0 results despite regulations existing in gazette_documents: -search_gazette_provisions_fts( - query_text="formål", - law_doc_ids=["SF/forskrift/2001-12-21-1477"] -) - -search_gazette_provisions_fts( - query_text="Kreftregisteret", - law_doc_ids=["SF/forskrift/2001-12-21-1477"] -) - -search_gazette_provisions_vector( - query_text="Kreftregisteret har til formål", - law_doc_ids=["SF/forskrift/2001-12-21-1477"] -) -``` - -## Successful Workarounds Used - -1. ✅ Searched in parent law (helseregisterloven) provisions table instead -2. ✅ Used helseregisterloven § 11 which lists all registers by name -3. ✅ Applied legal reasoning based on statutory framework rather than specific regulation text -4. ❌ Attempted WebFetch to lovdata.no (rejected by user) -5. ❌ Attempted direct PostgreSQL access (authentication failed) - -## Conclusion - -The Lovdata MCP server works well for **laws (lover)** but has significant gaps for **regulations (forskrifter)**. For legal research requiring detailed regulation analysis, the missing gazette_provisions data is a critical limitation. - -**Severity**: High for regulatory compliance and governance research -**Affected domains**: Health law, environmental regulations, sector-specific rules where detailed regulation text is essential diff --git a/README.md b/README.md index 4489c70..811b074 100644 --- a/README.md +++ b/README.md @@ -54,13 +54,28 @@ rm -rf .claude When the server is running, you can query: +**Laws & Provisions:** - `get_law` - Retrieve Norwegian law by slug/doc_id - `get_provision` - Get single provision by ID - `get_provisions_batch` - Batch retrieve multiple provisions - `list_provisions` - List provisions with pagination -- `search_provisions_fts` - Full-text search (Norwegian) -- `search_provisions_vector` - Vector similarity search -- `health_check` - Database connectivity check +- `list_laws` - List available laws +- `list_legal_areas` - List legal areas/categories + +**Search:** +- `search_laws` - Search laws by keyword +- `search_provisions_fts` - Full-text search provisions (Norwegian) +- `search_provisions_vector` - Vector similarity search for provisions +- `search_all_provisions_fts` - Full-text search across all provisions +- `search_all_provisions_vector` - Vector search across all provisions +- `search_lover` - Search "lover" (laws) + +**Gazettes:** +- `search_gazettes_fts` - Full-text search gazettes +- `get_full_gazette` - Retrieve full gazette document + +**Legacy:** +- `search_all_forskrifter_fts` - Full-text search (deprecated terminology) ### Database