updated the skill with the latest rewrite

This commit is contained in:
2026-01-12 14:52:51 +01:00
parent d2f4fec561
commit dc7f9fb048
3 changed files with 115 additions and 310 deletions

View File

@@ -499,28 +499,40 @@ https://lovdata.no/lov/2005-06-17-64/§2-1
## System Integration ## System Integration
## System Integration ## System Integration
### Lovdata API Endpoints ### Lovdata MCP Tools
**Available search capabilities:** **Available MCP tools:**
**Vector Search Endpoints:** **Law-Level Tools:**
- `search_provisions_vector` - Semantic search across law provisions - `get_law(doc_id)` - Retrieve a law or regulation by doc_id or short title
- `search_gazette_provisions_vector` - Semantic search across gazette provisions - `list_laws(document_type, legal_area, limit, offset)` - List all laws/regulations with filtering
- `search_all_provisions_vector` - Combined semantic search across all provisions - `search_laws_fulltext(query, limit)` - Full-text search in laws (Norwegian)
- `search_laws_semantic(query, limit, threshold)` - Semantic search in laws
**Full-Text Search Endpoints:** **Provision-Level Tools:**
- `search_provisions` - Full-text search in law provisions - `get_provision(provision_id)` - Get a single provision by ID
- `search_gazette_provisions` - Full-text search in gazette provisions - `get_provisions_batch(ids)` - Get multiple provisions by IDs (for RAG)
- `search_all_provisions` - Combined full-text search - `list_provisions(law_id, limit, offset)` - List all provisions for a law
- `search_provisions_fulltext(query, law_id, limit)` - Full-text search in provisions
- `search_provisions_semantic(query, law_id, limit, threshold)` - Semantic search in provisions
**Metadata Endpoints:** **Cross-Reference Tools:**
- `get_law_metadata` - Retrieve law document information - `get_cross_references(provision_id)` - Get all cross-references from a provision
- `get_provision_details` - Get detailed provision information - `get_cross_references_by_law(law_id)` - Get all provisions referencing a law
- `get_cross_references` - Find related provisions - `resolve_reference(reference)` - Resolve legal reference (e.g., 'lov/2014-06-20-42/§8') to provision
**Content Retrieval Tools:**
- `get_law_content(doc_id)` - Get HTML content of a law/regulation
- `get_law_text(doc_id)` - Get plain text content (without HTML tags)
**System Tools:**
- `health_check()` - Check database connection and statistics
### Query Formulation Strategies ### Query Formulation Strategies
**Optimizing search effectiveness:** **Optimizing search effectiveness:**
**Semantic Search Best Practices:** **Semantic Search Best Practices:**
- Use `search_laws_semantic` for law-level searches across all laws
- Use `search_provisions_semantic` for provision-level searches (optionally filtered by law_id)
- Use complete Norwegian phrases rather than single keywords - Use complete Norwegian phrases rather than single keywords
- Include contextual terms that describe the legal situation - Include contextual terms that describe the legal situation
- Consider both formal legal terminology and common language - Consider both formal legal terminology and common language
@@ -562,16 +574,17 @@ https://lovdata.no/lov/2005-06-17-64/§2-1
**Effective system usage workflows:** **Effective system usage workflows:**
**Initial Research Query:** **Initial Research Query:**
1. Start with broad semantic search using `search_all_provisions_vector` 1. Start with broad semantic search using `search_laws_semantic` for law-level search or `search_provisions_semantic` for provision-level search
2. Review top 10-15 results for relevance 2. Review top 10-15 results for relevance
3. Identify key provisions and their hierarchical context 3. Identify key provisions and their hierarchical context
4. Follow cross-references to related provisions 4. Follow cross-references using `get_cross_references` or `get_cross_references_by_law`
**Focused Legal Analysis:** **Focused Legal Analysis:**
1. Use specific domain searches (laws vs regulations) 1. Use law-level searches (`search_laws_*`) for finding relevant legislation
2. Combine semantic and full-text searches for comprehensive coverage 2. Use provision-level searches (`search_provisions_*`) for detailed analysis, optionally filtered by law_id
3. Track amendment history for each relevant provision 3. Combine semantic and full-text searches for comprehensive coverage
4. Build citation network of related legal sources 4. Track amendment history by checking the law metadata
5. Build citation network using cross-reference tools
**Amendment Impact Assessment:** **Amendment Impact Assessment:**
1. Search for provisions using original enactment dates 1. Search for provisions using original enactment dates
@@ -579,92 +592,73 @@ https://lovdata.no/lov/2005-06-17-64/§2-1
3. Assess cumulative impact of changes 3. Assess cumulative impact of changes
4. Verify effective dates and transitional provisions 4. Verify effective dates and transitional provisions
### Database Schema Integration ### Data Model Understanding
**Understanding data relationships:** **How the MCP server organizes legal data:**
**Core Tables:** **Two-Level Structure:**
- `laws`: Document-level metadata (titles, dates, legal areas) - **Laws/Regulations**: Document-level metadata (titles, dates, legal areas)
- `provisions`: Individual legal provisions with hierarchical context - **Provisions**: Individual legal provisions within each law/regulation
- `gazette_provisions`: Gazette-published provisions
- `cross_references`: Relationships between provisions
**Key Fields:** **Key Concepts:**
- `doc_id`: Unique identifier (e.g., 'NL/lov/2025-06-20-100') - `doc_id`: Unique identifier (e.g., 'NL/lov/2014-06-20-42')
- `korttittel`: Short title for easy reference (e.g., 'Pasientjournalloven')
- `provision_id`: Unique identifier for individual provisions
- `provision_text`: Full text of the legal provision - `provision_text`: Full text of the legal provision
- `embedding`: Vector representation for semantic search
- `search_vector`: PostgreSQL tsvector for full-text search
**Hierarchical Structure:** **Search Capabilities:**
- **Semantic search**: Uses vector embeddings for conceptual matching
- **Full-text search**: Uses PostgreSQL for keyword matching
- **Cross-references**: Links between related provisions
**Hierarchical Provision Structure:**
- `book_num`, `chapter_num`, `article_num`, `paragraph_num` - `book_num`, `chapter_num`, `article_num`, `paragraph_num`
- Text-based numbering (supports "8a", "III", "første ledd") - Text-based numbering supports Norwegian conventions ("8a", "III", "første ledd")
- Parent-child relationships between provisions - Parent-child relationships between provisions
### Document Type Routing ### Document Identification
**Choosing the correct MCP tool based on document identifiers:** **Working with Norwegian legal documents:**
**Document Type Identification by Prefix:** **Document Identifiers:**
| Prefix | Document Type | Source Table | Correct MCP Tool | Norwegian legal documents use standardized identifiers:
|--------|--------------|--------------|------------------| - **Full doc_id format:** `NL/lov/YYYY-MM-DD-NN` (e.g., `NL/lov/2014-06-20-42`)
| `NL/lov/...` | Current consolidated law | `lover` | `get_lov(identifier)` | - **Short title (korttittel):** Common name (e.g., `Pasientjournalloven`)
| `NL/forskrift/...` | Current regulation | `lover` | `get_lov(identifier)` |
| `SF/forskrift/...` | Central regulation (Sentrale Forskrifter) | `lover` | `get_lov(identifier)` |
| `LTI/lov/...` | Historical gazette law | `lovtidender` | `get_full_lovtidend(lovtidend_id)` |
| `LTI/forskrift/...` | Historical gazette regulation | `lovtidender` | `get_full_lovtidend(lovtidend_id)` |
**Using Search Result Metadata:** **The MCP Tool:**
When search tools return results, they include a `"source"` field: Use `get_law(doc_id)` for all law and regulation retrieval:
- Accepts full doc_id: `get_law("NL/lov/2014-06-20-42")`
- Accepts short title: `get_law("Pasientjournalloven")`
- Works for both laws (lov) and regulations (forskrift)
- `"source": "forskrift"` → Use provision tools (`get_forskrift`) **Example Workflows:**
- `"source": "lovtidendebestemmelse"` → Use gazette provision tools (`get_lovtidendebestemmelse`)
- `"source": "lovtidend"` → Use `get_full_lovtidend(doc_id)`
**Routing Decision Tree:**
**From Search Results:**
``` ```
1. Did you get doc_id from search results? 1. Search: search_laws_fulltext("helseregisterloven")
→ Check the "source" field:
- "lovtidend" → get_full_lovtidend(doc_id)
- "forskrift" → get_forskrift(id) or get_lov(law_doc_id)
- "lovtidendebestemmelse" → get_lovtidendebestemmelse(id)
2. Do you have a raw doc_id string?
→ Check prefix:
- Starts with "LTI/" → get_full_lovtidend(doc_id)
- Starts with "NL/" or "SF/" → get_lov(doc_id)
- No prefix (slug) → get_lov(identifier)
```
**Common Errors to Avoid:**
**Wrong:** Using `get_lov("LTI/lov/2001-05-18-24")`
**Correct:** Using `get_full_lovtidend("LTI/lov/2001-05-18-24")`
**Wrong:** Ignoring the `"source"` field from search results
**Correct:** Routing based on `"source"` metadata
**Example Workflow:**
```
1. Search: search_all_forskrifter_fts("helseregisterloven")
2. Results include: { 2. Results include: {
"doc_id": "LTI/lov/2001-05-18-24", "doc_id": "NL/lov/2014-06-20-42",
"source": "lovtidend", "korttittel": "Pasientjournalloven",
... ...
} }
3. Correct tool: get_full_lovtidend("LTI/lov/2001-05-18-24") 3. Retrieve full law: get_law("NL/lov/2014-06-20-42")
OR: get_law("Pasientjournalloven")
``` ```
**Note on Multiple Versions:** **From Known Reference:**
```
User asks about "pasientjournalloven §8"
1. get_law("Pasientjournalloven") - get the law
2. resolve_reference("lov/2014-06-20-42/§8") - resolve to provision
```
Norwegian legal documents may exist in multiple forms: **Document Types:**
- **Current consolidated laws** (`NL/lov/` prefix) - Recommended for current law
- **Central regulations** (`SF/forskrift/` prefix) - Active regulations (not yet imported)
- **Original gazette versions** (`LTI/` prefix) - Historical reference
- **Amendment versions** (`LTI/` prefix) - Track legislative changes
Always verify you're using the appropriate version for your research purpose. Norwegian legal documents include:
- **Lover (Laws)** - Framework legislation passed by Stortinget
- **Forskrifter (Regulations)** - Implementing regulations from ministries/agencies
Both are accessed through the same `get_law(doc_id)` tool.
### Search Result Processing ### Search Result Processing
**Converting system output to legal analysis:** **Converting system output to legal analysis:**
@@ -697,37 +691,38 @@ Always verify you're using the appropriate version for your research purpose.
- Confirm hierarchical relationships are accurate - Confirm hierarchical relationships are accurate
**System Limitation Awareness:** **System Limitation Awareness:**
- Vector search may miss highly specific legal terms - Semantic search may miss highly specific legal terms (use full-text search as backup)
- Full-text search requires exact keyword matches - Full-text search requires exact keyword matches (try synonyms)
- Cross-reference data may not be comprehensive - Cross-reference data may not be comprehensive (verify important links)
- Amendment tracking depends on data completeness - Amendment tracking may have gaps (cross-check with lovdata.no)
**Fallback Strategies:** **Fallback Strategies:**
- Combine multiple search approaches - Combine `search_laws_semantic` and `search_laws_fulltext` for comprehensive coverage
- Use full-text search for specific citations - Use `search_provisions_fulltext` for specific citations when semantic search is too broad
- Manually verify critical provisions - Use `resolve_reference` to validate section references
- Consult official Lovdata website for complex cases - Consult official lovdata.no website for complex cases or ambiguous results
### Performance Optimization ### Performance Optimization
**Efficient system usage:** **Efficient system usage:**
**Query Optimization:** **Query Optimization:**
- Use specific Norwegian legal terminology - Use specific Norwegian legal terminology
- Prefer semantic search for conceptual queries - Use `search_laws_semantic` for conceptual queries about laws
- Use full-text search for known citations - Use `search_provisions_semantic` for detailed provision searches
- Limit result sets to manageable numbers - Use `search_laws_fulltext` or `search_provisions_fulltext` for known citations
- Limit result sets to manageable numbers (use limit parameter)
**Batch Processing:** **Batch Processing:**
- Use `get_provisions_batch(ids)` to retrieve multiple provisions at once
- Process related queries together - Process related queries together
- Cache frequently accessed provisions
- Reuse search results across related questions - Reuse search results across related questions
- Build provision networks incrementally - Build provision networks using `get_cross_references` and `get_cross_references_by_law`
**Result Filtering:** **Result Filtering:**
- Apply relevance thresholds based on similarity scores - Apply relevance thresholds using the threshold parameter in semantic searches
- Filter by amendment dates for currency - Filter by document_type ('lov' or 'forskrift') in `list_laws`
- Prioritize by legal hierarchy (laws over regulations) - Filter by legal_area in `list_laws` when domain is known
- Focus on specific legal domains when known - Use law_id parameter in provision searches to focus on specific laws
## Quality Assurance ## Quality Assurance
## Troubleshooting ## Troubleshooting

View File

@@ -1,205 +0,0 @@
# MCP Server Testing Guide
This guide shows how to test-run the MCP server developed in `~/git/lovdata-ai`.
## Server Status ✅
The MCP server is fully operational:
- Database connected (lovdata-test)
- 769 laws, 20,254 provisions loaded
- All core functionality working
## Testing Methods
### 1. Health Check (Recommended for Verification)
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test python -c "
import asyncio
from lovdata_mcp.database import initialize_connection_pool
from lovdata_mcp.server import _health_check
async def test():
await initialize_connection_pool()
result = await _health_check()
print('Status:', result)
asyncio.run(test())
"
```
Expected output:
```
✅ Health check passed
Result: {'status': 'healthy', 'database': 'connected', 'laws': 769, 'provisions': 20254, 'provisions_with_embeddings': 0}
```
### 2. Run MCP Server (STDIO Mode)
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test python -m lovdata_mcp.server
```
### 3. Run HTTP Server
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
PGDATABASE=lovdata-test uvicorn lovdata_mcp.http_server:create_app --reload
```
API will be available at `http://localhost:8000`
### 4. Via Quint (Configured in .mcp.json)
```bash
cd ~/git/lovdata-ai
~/.local/bin/quint-code serve
```
### 5. Run Unit Tests
```bash
cd ~/git/lovdata-ai/python/lovdata-mcp
source .venv/bin/activate
python -m pytest tests/ -v
```
## Environment Setup
- **Database**: `PGDATABASE=lovdata-test` (test database available)
- **Virtual Environment**: `source .venv/bin/activate`
- **Working Directory**: `~/git/lovdata-ai/python/lovdata-mcp`
## Available MCP Tools
The server provides these tools for Norwegian legal document retrieval:
- `get_law` - Retrieve law by slug/doc_id
- `get_provision` - Get single provision by ID
- `get_provisions_batch` - Batch retrieve multiple provisions
- `list_provisions` - List provisions with pagination
- `search_provisions_fts` - Full-text search (Norwegian)
- `search_provisions_vector` - Vector similarity search
- `health_check` - Database connectivity check
## Example Usage
Once running, you can test with MCP clients like Claude Desktop or custom scripts that connect via STDIO or HTTP.
---
# Testing Findings: Health Register Research Task
## Task Context
Research task: Identify Norwegian health registers with conflicting purpose clauses that make data consolidation difficult for the Norwegian Institute of Public Health (FHI).
## Key Gaps Identified
### 1. Missing Structured Gazette Provisions for Health Register Regulations
**Problem**: Health register regulations (forskrifter) are indexed as `gazette_documents` but their individual provisions are not parsed into the `gazette_provisions` table.
**Specific cases**:
- Kreftregisterforskriften (SF/forskrift/2001-12-21-1477)
- Medisinsk fødselsregisterforskriften (SF/forskrift/2001-12-21-1483)
- Dødsårsaksregisterforskriften (SF/forskrift/2001-12-21-1476)
- Norsk pasientregisterforskriften (SF/forskrift/2007-12-07-1389)
- MSIS-forskriften (SF/forskrift/2003-06-20-740)
**Impact**:
- Cannot search for specific sections (e.g., § 1-1 on purpose/formål) within these regulations
- `search_gazette_provisions_fts()` returns zero results when filtering by these doc_ids
- Forces workarounds like:
- Using WebFetch to scrape Lovdata.no (blocked by user in this session)
- Relying only on parent law (helseregisterloven) rather than specific regulations
- Manual knowledge of regulation contents
**Expected behavior**:
Should be able to search for provisions like:
```python
search_gazette_provisions_fts(
query_text="§ 1-1 formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
```
And retrieve the structured purpose clause text.
### 2. Limited Search Capabilities for Regulation Content
**Problem**: The current MCP tools don't provide a way to:
- List all provisions within a specific regulation
- Navigate the hierarchical structure of a regulation (chapters, sections, paragraphs)
- Get a "table of contents" view of a regulation
**Current workaround**:
- Must know exact search terms
- Vector search helps but is imprecise for finding specific structural elements
**Suggested enhancement**:
Add a tool like `list_provisions_by_regulation(doc_id)` that returns the hierarchical structure.
### 3. Unclear Distinction Between Provisions and Gazette Provisions
**Observation**:
- `provisions` table: Contains law (lov) provisions - works well
- `gazette_provisions` table: Should contain regulation (forskrift) provisions - often empty/incomplete
- `search_all_provisions_*` functions search both, but it's not always clear which source returned results
**Suggestion**:
Better documentation or response metadata indicating:
- Which table the result came from
- Whether a regulation has been fully parsed or is just a document stub
## Recommendations
### High Priority
1. **Parse health register regulations into gazette_provisions** - These are critical legal documents for health data governance research
2. **Add bulk import for similar regulation families** - Many regulations under same parent law likely have same issue
### Medium Priority
3. **Add hierarchical navigation tools** - Help users explore regulation structure
4. **Improve search result metadata** - Clearly indicate source table and completeness
### Low Priority
5. **Add provision counting** - Quick check: "This regulation has 45 structured provisions" vs "This regulation is unparsed"
## Test Queries That Failed
```python
# All returned 0 results despite regulations existing in gazette_documents:
search_gazette_provisions_fts(
query_text="formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
search_gazette_provisions_fts(
query_text="Kreftregisteret",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
search_gazette_provisions_vector(
query_text="Kreftregisteret har til formål",
law_doc_ids=["SF/forskrift/2001-12-21-1477"]
)
```
## Successful Workarounds Used
1. ✅ Searched in parent law (helseregisterloven) provisions table instead
2. ✅ Used helseregisterloven § 11 which lists all registers by name
3. ✅ Applied legal reasoning based on statutory framework rather than specific regulation text
4. ❌ Attempted WebFetch to lovdata.no (rejected by user)
5. ❌ Attempted direct PostgreSQL access (authentication failed)
## Conclusion
The Lovdata MCP server works well for **laws (lover)** but has significant gaps for **regulations (forskrifter)**. For legal research requiring detailed regulation analysis, the missing gazette_provisions data is a critical limitation.
**Severity**: High for regulatory compliance and governance research
**Affected domains**: Health law, environmental regulations, sector-specific rules where detailed regulation text is essential

View File

@@ -54,13 +54,28 @@ rm -rf .claude
When the server is running, you can query: When the server is running, you can query:
**Laws & Provisions:**
- `get_law` - Retrieve Norwegian law by slug/doc_id - `get_law` - Retrieve Norwegian law by slug/doc_id
- `get_provision` - Get single provision by ID - `get_provision` - Get single provision by ID
- `get_provisions_batch` - Batch retrieve multiple provisions - `get_provisions_batch` - Batch retrieve multiple provisions
- `list_provisions` - List provisions with pagination - `list_provisions` - List provisions with pagination
- `search_provisions_fts` - Full-text search (Norwegian) - `list_laws` - List available laws
- `search_provisions_vector` - Vector similarity search - `list_legal_areas` - List legal areas/categories
- `health_check` - Database connectivity check
**Search:**
- `search_laws` - Search laws by keyword
- `search_provisions_fts` - Full-text search provisions (Norwegian)
- `search_provisions_vector` - Vector similarity search for provisions
- `search_all_provisions_fts` - Full-text search across all provisions
- `search_all_provisions_vector` - Vector search across all provisions
- `search_lover` - Search "lover" (laws)
**Gazettes:**
- `search_gazettes_fts` - Full-text search gazettes
- `get_full_gazette` - Retrieve full gazette document
**Legacy:**
- `search_all_forskrifter_fts` - Full-text search (deprecated terminology)
### Database ### Database