diff --git a/mcp-server-plan.md b/mcp-server-plan.md
new file mode 100644
index 0000000..ebd6798
--- /dev/null
+++ b/mcp-server-plan.md
@@ -0,0 +1,627 @@
+# MCP Server for FHI Statistikk Open API
+
+## Overview
+
+An MCP (Model Context Protocol) server that exposes the FHI Statistikk Open API
+as tools optimized for AI agent consumption. The server wraps the REST API at
+`https://statistikk-data.fhi.no/api/open/v1/` and adds intelligent
+summarization, format translation, and convenience features that make the API
+practical for LLM-based agents.
+
+**Base API**: https://statistikk-data.fhi.no/api/open/v1/
+**API docs**: https://statistikk-data.fhi.no/swagger/index.html?urls.primaryName=Allvis%20Open%20API
+**License**: CC BY 4.0 (open data)
+**Auth**: None required
+
+## Problem Statement
+
+The raw API has several characteristics that make it hard for AI agents:
+
+1. **JSON-stat2 format** -- The data endpoint returns a multidimensional sparse
+ array format designed for statistical software, not LLMs.
+2. **Mandatory dimension specification** -- All dimensions must be included in
+ every data query, even single-valued ones like `KJONN=["0"]`.
+3. **Non-obvious value formats** -- Year values use `"2020_2020"` not `"2020"`.
+4. **Massive dimension trees** -- The GEO dimension can have 400+ entries in a
+ hierarchical tree (country > county > municipality > city district).
+5. **Multi-step discovery** -- Finding relevant data requires: list sources >
+ list tables > get dimensions > construct query > fetch data.
+6. **Metadata contains raw HTML** -- `
`, ``, `` tags in content fields.
+7. **Swagger spec is incomplete** -- Documents only `"item"` filter, but the API
+ actually supports `"item"`, `"all"`, `"top"`, `"bottom"`.
+
+## API Inventory
+
+### Sources (as of 2026-03-27)
+
+| ID | Title | Publisher |
+|----------|------------------------------------------|----------------------|
+| nokkel | Folkehelsestatistikk | Helsedirektoratet |
+| ngs | Mikrobiologisk genomovervåkning | FHI |
+| mfr | Medisinsk fødselsregister | FHI |
+| abr | Abortregisteret | FHI |
+| sysvak | Nasjonalt vaksinasjonsregister SYSVAK | FHI |
+| daar | Dødsårsakregisteret | FHI |
+| msis | Meldingssystem for smittsomme sykdommer | FHI |
+| lmr | Legemiddelregisteret | FHI |
+| gs | Grossiststatistikk | FHI |
+| npr | Norsk pasientregister | FHI |
+| kpr | Kommunalt pasient- og brukerregister | FHI |
+| hkr | Hjerte- og karsykdommer | FHI |
+| skast | Skadedyrstatistikk | FHI |
+
+### Endpoints
+
+| Method | Path | Purpose |
+|--------|-----------------------------------------------|----------------------------|
+| GET | `/Common/source` | List all sources |
+| GET | `/{sourceId}/Table` | List tables in source |
+| GET | `/{sourceId}/Table/{tableId}` | Table info |
+| GET | `/{sourceId}/Table/{tableId}/query` | Query template |
+| GET | `/{sourceId}/Table/{tableId}/dimension` | Dimensions and categories |
+| POST | `/{sourceId}/Table/{tableId}/data` | Fetch data |
+| GET | `/{sourceId}/Table/{tableId}/flag` | Flag/symbol definitions |
+| GET | `/{sourceId}/Table/{tableId}/metadata` | Table metadata |
+
+### Filter Types
+
+| Filter | Description | Example values |
+|----------|------------------------------------------------|--------------------------|
+| `item` | Exact match on listed values | `["2020_2020","2021_2021"]` |
+| `all` | Wildcard match with `*` | `["*"]` or `["A*","B*"]` |
+| `top` | First N categories | `["5"]` |
+| `bottom` | Last N categories | `["5"]` |
+
+### Response Formats (data endpoint)
+
+| Format | Content-Type | Description |
+|------------|---------------------------------|---------------------------------|
+| json-stat2 | application/json | JSON-Stat 2.0 sparse array |
+| csv2 | text/csv | CSV with human-readable labels |
+| csv3 | text/csv | CSV with dimension/measure codes|
+| parquet | application/vnd.apache.parquet | Apache Parquet columnar format |
+
+## MCP Tool Design
+
+### Tool 1: `list_sources`
+
+**Purpose**: Entry point. List all available data sources.
+
+**Parameters**: None.
+
+**Returns**: Array of `{id, title, description, published_by}`.
+
+**Implementation**: GET `/Common/source`. Pass through with minor field renaming
+(snake_case).
+
+**Caching**: Cache for 24 hours. Source list rarely changes.
+
+---
+
+### Tool 2: `list_tables`
+
+**Purpose**: Find tables within a source, with optional keyword search.
+
+**Parameters**:
+- `source_id` (string, required) -- Source identifier, e.g. `"nokkel"`.
+- `search` (string, optional) -- Case-insensitive keyword filter on table title.
+ Supports multiple words (all must match). Applied client-side.
+- `modified_after` (string, optional) -- ISO-8601 datetime. Only return tables
+ modified after this date. Passed to API server-side.
+
+**Returns**: Array of `{table_id, title, published_at, modified_at}`.
+
+**Implementation**: GET `/{sourceId}/Table?modifiedAfter=...`, then client-side
+filter on `search`. Sort by `modified_at` descending.
+
+**Caching**: Cache per source_id for 1 hour. Table lists update throughout the
+day as data is published.
+
+**Example**:
+```
+list_tables(source_id="nokkel", search="befolkning")
+→ [{table_id: 185, title: "Befolkningsvekst", ...},
+ {table_id: 338, title: "Befolkningssammensetning_antall_andel", ...},
+ {table_id: 171, title: "Befolkningsframskriving", ...}]
+```
+
+---
+
+### Tool 3: `describe_table`
+
+**Purpose**: The primary tool for understanding a table's structure. Gives the
+agent everything it needs to construct a data query.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+
+**Returns**: A structured summary combining table info, dimensions, metadata,
+and flags. This is a composite call (4 parallel API requests).
+
+**Response structure**:
+```
+{
+ "title": "Befolkningsvekst",
+ "published_at": "2025-10-21T08:56:39Z",
+ "modified_at": "2025-10-21T08:56:39Z",
+ "is_official_statistics": false,
+ "description": "Differansen mellom befolkningsmengden...",
+ "update_frequency": "Årlig",
+ "keywords": ["Befolkning", "Befolkningsvekst"],
+ "source_institution": "Statistisk sentralbyrå (SSB)",
+ "dimensions": [
+ {
+ "code": "GEO",
+ "label": "Geografi",
+ "total_categories": 356,
+ "is_hierarchical": true,
+ "hierarchy_depth": 4,
+ "top_level_values": [
+ {"value": "0", "label": "Hele landet", "child_count": 15}
+ ],
+ "note": "Use get_dimension_values to drill into sub-levels"
+ },
+ {
+ "code": "AAR",
+ "label": "År",
+ "total_categories": 23,
+ "is_hierarchical": false,
+ "value_format": "YYYY_YYYY (e.g. 2020_2020)",
+ "range": "2002..2024",
+ "values": ["2002_2002", "2003_2003", ..., "2024_2024"]
+ },
+ {
+ "code": "KJONN",
+ "label": "Kjønn",
+ "total_categories": 1,
+ "is_fixed": true,
+ "values": [{"value": "0", "label": "kjønn samlet"}],
+ "note": "Single-valued, auto-included in queries"
+ },
+ {
+ "code": "ALDER",
+ "label": "Alder",
+ "total_categories": 1,
+ "is_fixed": true,
+ "values": [{"value": "0_120", "label": "alle aldre"}],
+ "note": "Single-valued, auto-included in queries"
+ },
+ {
+ "code": "MEASURE_TYPE",
+ "label": "Måltall",
+ "total_categories": 2,
+ "is_fixed": false,
+ "values": [
+ {"value": "TELLER", "label": "antall"},
+ {"value": "RATE", "label": "prosent vekst"}
+ ]
+ }
+ ],
+ "flags": [
+ {"symbol": "", "description": "Verdi finnes i tabellen"}
+ ]
+}
+```
+
+**Key design decisions**:
+
+1. **Summarize large dimensions** -- For dimensions with >20 categories (mainly
+ GEO), show only top-level entries with child counts. The agent uses
+ `get_dimension_values` to drill down.
+
+2. **Mark fixed dimensions** -- Dimensions with exactly 1 category get
+ `is_fixed: true`. The agent knows to ignore these; `query_data` will
+ auto-include them.
+
+3. **Show value format** -- AAR values are `"2020_2020"`, not `"2020"`. Show
+ this explicitly so the agent gets the format right.
+
+4. **Include metadata inline** -- Strip HTML from metadata paragraphs. Extract
+ `description`, `keywords`, `update_frequency`, `source_institution` as
+ top-level fields.
+
+5. **Include flags inline** -- Flag definitions are small and always relevant.
+
+**Implementation**: Parallel fetch of:
+- GET `/{sourceId}/Table/{tableId}` (table info)
+- GET `/{sourceId}/Table/{tableId}/dimension` (dimensions)
+- GET `/{sourceId}/Table/{tableId}/metadata` (metadata)
+- GET `/{sourceId}/Table/{tableId}/flag` (flags)
+
+Then merge and transform.
+
+**Caching**: Cache per (source_id, table_id) for 6 hours. Dimension structure
+changes rarely.
+
+---
+
+### Tool 4: `get_dimension_values`
+
+**Purpose**: Drill into large hierarchical dimensions, typically GEO.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+- `dimension_code` (string, required) -- e.g. `"GEO"`.
+- `parent_value` (string, optional) -- Return only children of this category.
+ E.g. `"18"` for Nordland county. If omitted, returns top-level categories.
+- `search` (string, optional) -- Case-insensitive search on category labels.
+ E.g. `"tromsø"` to find the municipality.
+
+**Returns**: Array of `{value, label, child_count}`.
+
+**Implementation**: GET `/{sourceId}/Table/{tableId}/dimension`, then navigate
+the category tree client-side. The full tree is fetched and cached; filtering
+is done in the MCP server.
+
+**Examples**:
+```
+# Get all counties
+get_dimension_values("nokkel", 185, "GEO")
+→ [{value: "0", label: "Hele landet", child_count: 15}]
+
+# Get municipalities in Nordland
+get_dimension_values("nokkel", 185, "GEO", parent_value="18")
+→ [{value: "1804", label: "Bodø", child_count: 0},
+ {value: "1806", label: "Narvik", child_count: 0}, ...]
+
+# Search for a municipality
+get_dimension_values("nokkel", 185, "GEO", search="tromsø")
+→ [{value: "5501", label: "Tromsø", child_count: 0}]
+```
+
+**Caching**: Shares the dimension cache with `describe_table`.
+
+---
+
+### Tool 5: `query_data`
+
+**Purpose**: Fetch actual data from a table. The main data retrieval tool.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+- `dimensions` (array, required) -- Each element:
+ - `code` (string) -- Dimension code, e.g. `"GEO"`.
+ - `filter` (string) -- One of `"item"`, `"all"`, `"top"`, `"bottom"`.
+ Default: `"item"`.
+ - `values` (array of strings) -- Filter values.
+- `max_rows` (integer, optional) -- Limit returned rows. Default: 1000.
+ Set to 0 for no limit (be careful).
+
+**Returns**: Structured rows with labeled values.
+
+```
+{
+ "table": "Befolkningsvekst",
+ "total_rows": 4,
+ "rows": [
+ {"GEO": "Oslo", "AAR": "2023", "KJONN": "kjønn samlet",
+ "ALDER": "alle aldre", "TELLER": 3516, "RATE": 0.5},
+ ...
+ ],
+ "truncated": false,
+ "dimensions_used": {
+ "GEO": {"filter": "item", "values": ["0301"]},
+ "AAR": {"filter": "bottom", "values": ["2"]},
+ "KJONN": {"filter": "item", "values": ["0"]},
+ "ALDER": {"filter": "item", "values": ["0_120"]},
+ "MEASURE_TYPE": {"filter": "all", "values": ["*"]}
+ }
+}
+```
+
+**Key design decisions**:
+
+1. **Default to csv2 internally** -- Fetch as csv2 (human-readable labels),
+ parse into rows. CSV is simpler for an agent to reason about than JSON-stat2.
+ The tool internally requests csv2 and structures it.
+
+2. **Auto-include fixed dimensions** -- If the agent omits a dimension that has
+ only 1 category (like KJONN or ALDER), the tool adds it automatically with
+ `filter: "item"` and the single value. This means the agent only needs to
+ specify the dimensions it actually cares about.
+
+3. **Normalize year values** -- If the agent sends `"2020"` for AAR, the tool
+ translates to `"2020_2020"`. The `YYYY_YYYY` format is an internal API
+ convention the agent shouldn't need to know about.
+
+4. **Default MEASURE_TYPE** -- If omitted, default to `filter: "all", values:
+ ["*"]` to get all measures. Most agents want all available metrics.
+
+5. **Row limit with truncation flag** -- Default 1000 rows. Return a
+ `truncated: true` flag and `total_rows` count so the agent knows if there's
+ more data.
+
+6. **Echo back dimensions_used** -- Show what was actually sent to the API
+ (after auto-completion), so the agent can see the full query.
+
+**Implementation**:
+1. Fetch dimension info if not cached (to know fixed dimensions and validate)
+2. Auto-complete missing/fixed dimensions
+3. Normalize year values
+4. POST `/{sourceId}/Table/{tableId}/data` with format=csv2
+5. Parse CSV response into row objects
+6. Apply row limit, compute truncation
+
+**Error handling**: The API returns ProblemDetails (RFC 7807) on 400/404/422.
+Transform into clear error messages:
+- "Dimension 'XYZ' is not valid for this table. Available: GEO, AAR, ..."
+- "Value '2025_2025' not found in dimension AAR. Range: 2002..2024"
+- "maxRowCount exceeded. Requested ~50000 rows, limit is 1000. Narrow filters."
+
+---
+
+### Tool 6: `get_query_template`
+
+**Purpose**: Fallback tool returning the raw query template from the API. Useful
+when the agent needs to see exactly what the API expects.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+
+**Returns**: The raw DataRequest JSON as returned by the API.
+
+**Implementation**: GET `/{sourceId}/Table/{tableId}/query`. Pass through.
+
+**When to use**: When `query_data` auto-completion isn't behaving as expected,
+or the agent wants to see the complete list of available values for all
+dimensions.
+
+---
+
+## Tools NOT included (and why)
+
+| Considered tool | Decision | Reason |
+|------------------------------|----------|---------------------------------------------|
+| `get_flags` (standalone) | Dropped | Folded into `describe_table` |
+| `get_metadata` (standalone) | Dropped | Folded into `describe_table` |
+| `get_table_info` (standalone)| Dropped | Folded into `describe_table` |
+| `search_across_sources` | Dropped | Too expensive (13 API calls). Agent can call `list_tables` per source |
+| `get_data_jsonstat` | Dropped | Agents don't need raw JSON-stat2 |
+| `get_data_parquet` | Dropped | Binary format, not useful for LLM context |
+
+## Architecture
+
+### Stack
+
+- **Language**: Python 3.12+
+- **MCP framework**: FastMCP (`mcp[cli]`)
+- **HTTP server**: Uvicorn (`uvicorn>=0.30`) for SSE/HTTP transport
+- **HTTP client**: `httpx` (async)
+- **CSV parsing**: stdlib `csv`
+- **HTML stripping**: stdlib `html.parser` or `re` (simple tag removal)
+- **Build system**: Hatchling (matches Fhi.Metadata.MCPserver pattern)
+
+### Transport
+
+The server supports multiple transports via CLI flag, following the same pattern
+as `Fhi.Metadata.MCPserver`:
+
+| Transport | Use case | Endpoint |
+|------------------|---------------------------------------|-------------------|
+| `sse` | Local dev + Skybert deployment | `/sse` |
+| `streamable-http`| Future HTTP-only clients | `/mcp` |
+| `stdio` | Direct pipe (legacy) | stdin/stdout |
+
+**Default**: `sse` on `0.0.0.0:8000`. This means the server works over HTTP
+both locally and when deployed to Skybert, with no transport change needed.
+
+**CLI entry point**:
+```bash
+fhi-statistikk-mcp --transport sse --host 0.0.0.0 --port 8000
+```
+
+### Project Structure
+
+```
+fhi-statistikk-mcp/
+├── .github/
+│ └── workflows/
+│ └── docker-build-push.yaml # CI/CD → crfhiskybert.azurecr.io
+├── .mcp.json.local # Local dev: http://localhost:8000/sse
+├── .mcp.json.public # Production: https:///sse
+├── Dockerfile # Multi-stage, Python 3.12-slim
+├── pyproject.toml # Hatchling build, entry point
+├── README.md
+├── src/
+│ └── fhi_statistikk_mcp/
+│ ├── __init__.py
+│ ├── server.py # MCP server, tool definitions, main()
+│ ├── api_client.py # Async httpx client for FHI API
+│ ├── transformers.py # CSV parsing, dimension summarization
+│ └── cache.py # Simple TTL cache
+└── tests/
+ ├── test_transformers.py
+ ├── test_cache.py
+ └── fixtures/ # Recorded API responses
+ ├── sources.json
+ ├── tables_nokkel.json
+ ├── dimensions_185.json
+ ├── metadata_185.json
+ ├── flags_185.json
+ └── data_185.csv
+```
+
+### MCP Client Configuration
+
+**Local development** (`.mcp.json.local`):
+```json
+{
+ "mcpServers": {
+ "fhi-statistikk": {
+ "type": "sse",
+ "url": "http://localhost:8000/sse"
+ }
+ }
+}
+```
+
+**Production** (`.mcp.json.public`):
+```json
+{
+ "mcpServers": {
+ "fhi-statistikk": {
+ "type": "sse",
+ "url": "https:///sse"
+ }
+ }
+}
+```
+
+### Dockerfile
+
+Following the Fhi.Metadata.MCPserver pattern:
+```dockerfile
+FROM python:3.12-slim AS base
+WORKDIR /app
+COPY pyproject.toml .
+COPY src/ src/
+RUN pip install --no-cache-dir .
+
+FROM base AS prod
+EXPOSE 8000
+CMD ["fhi-statistikk-mcp", "--transport", "sse", "--host", "0.0.0.0", "--port", "8000"]
+```
+
+### CI/CD
+
+Same pipeline pattern as Fhi.Metadata.MCPserver:
+- Trigger on push to `main` touching `src/`, `Dockerfile`, or `pyproject.toml`
+- Azure Federated Identity (OIDC) login
+- Push to `crfhiskybert.azurecr.io/fida/ki/statistikk-mcp`
+- Tag: git short SHA + `latest`
+- Dispatch to GitOps repo for Skybert deployment
+
+### Logging
+
+Force all loggers (uvicorn, mcp, fastmcp) to stderr with simple format.
+Print startup info (API base URL, cache status) to stderr. No persistent log
+files -- container logging handles that on Skybert.
+
+### Caching Strategy
+
+| Data | TTL | Key | Reason |
+|------------------|----------|----------------------------|--------------------------------|
+| Source list | 24h | `"sources"` | Rarely changes |
+| Table list | 1h | `source_id` | New tables published daily |
+| Dimensions | 6h | `(source_id, table_id)` | Dimension structure is stable |
+| Metadata | 6h | `(source_id, table_id)` | Metadata edits are rare |
+| Flags | 6h | `(source_id, table_id)` | Flags rarely change |
+| Query templates | 6h | `(source_id, table_id)` | Follows dimension changes |
+| Data responses | No cache | -- | Queries vary too much to cache |
+
+In-memory dict with TTL. No external dependency needed -- the data volume is
+small and the server is single-process.
+
+### Rate Limiting
+
+No documented rate limits, but this is a government API. Be polite:
+- Max 5 concurrent requests
+- 100ms minimum between requests
+- Retry with exponential backoff on 429/503
+
+### Error Mapping
+
+| API Response | MCP Tool Error |
+|-----------------------|------------------------------------------------------|
+| 400 Bad Request | Descriptive message from ProblemDetails.detail |
+| 404 Not Found | "Source/table not found: {id}" |
+| 422 Client Error | "Query validation failed: {detail}" |
+| Network timeout | "API request timed out. Try reducing query scope." |
+| CSV parse error | "Failed to parse response. Try get_query_template." |
+
+### Unicode / Fuzzy Search
+
+Dimension value search (in `get_dimension_values`) normalizes both query and
+labels for accent-insensitive matching:
+- Normalize with `unicodedata.normalize("NFD")`, strip combining marks
+- Case-insensitive comparison
+- `"tromso"` matches `"Tromsø"`, `"barum"` matches `"Bærum"`
+- Preserve original labels in output
+
+## Implementation Plan
+
+### Phase 1: Core (MVP)
+
+1. Set up project skeleton: `pyproject.toml` with hatchling, `src/` layout,
+ entry point `fhi-statistikk-mcp`
+2. Set up `server.py` with FastMCP, SSE transport, CLI args (transport, host,
+ port), stderr logging
+3. Implement `api_client.py` with async httpx client, base URL config
+4. Implement `cache.py` with simple TTL dict
+5. Implement `list_sources` tool
+6. Implement `list_tables` tool with client-side keyword search
+7. Implement `describe_table` composite tool
+ - Parallel fetch of 4 endpoints
+ - Dimension summarization (large dim truncation, fixed dim detection)
+ - HTML stripping for metadata
+ - Merge into structured response
+8. Implement `query_data` tool
+ - Auto-completion of fixed dimensions
+ - Year value normalization (`"2020"` → `"2020_2020"`)
+ - Default MEASURE_TYPE to `all`/`["*"]`
+ - CSV parsing and row structuring
+ - Row limit and truncation
+9. Implement `get_dimension_values` with hierarchy navigation and accent-
+ insensitive search
+10. Implement `get_query_template` passthrough
+11. Add `.mcp.json.local` for local dev
+12. Test all tools against live API
+
+### Phase 2: Deployment & Polish
+
+13. Add `Dockerfile` (multi-stage, Python 3.12-slim)
+14. Add `.github/workflows/docker-build-push.yaml` for CI/CD
+15. Add `.mcp.json.public` with Skybert URL
+16. Add comprehensive error handling and error messages
+17. Add rate limiting
+18. Record API fixtures for offline testing
+19. Write unit tests for transformers and cache
+20. Write integration tests against live API
+
+### Phase 3: Optional Enhancements
+
+21. Add a `search_all_tables` convenience tool (if agents frequently need it)
+22. Add MCP resources for static reference data (source descriptions, common
+ dimension codes)
+23. Add MCP prompt templates (e.g. "finn helsedata om ")
+
+## Tool Description Guidelines
+
+MCP tool descriptions are what the agent uses to decide which tool to call. They
+should be written for an LLM audience:
+
+- Lead with the purpose, not the endpoint
+- Include example parameter values
+- Document non-obvious conventions (year format, dimension codes)
+- Mention what `describe_table` returns, since it's the prerequisite for
+ `query_data`
+- Note that Norwegian labels are the default (GEO labels are in Norwegian)
+
+### Example tool description for `query_data`:
+
+> Fetch statistical data from an FHI table. Before calling this, use
+> `describe_table` to understand the table's dimensions and available values.
+>
+> You only need to specify the dimensions you care about. Fixed dimensions
+> (single-valued, like KJONN="kjønn samlet") are auto-included. If you omit
+> MEASURE_TYPE, all measures are returned.
+>
+> Year values: use "2020" (auto-translated to "2020_2020") or the full format.
+>
+> Filters: "item" (exact values), "all" (wildcard, e.g. ["*"]),
+> "top" (first N), "bottom" (last N).
+>
+> Returns labeled rows, max 1000 by default. Check "truncated" field.
+
+## Resolved Decisions
+
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Hosting | SSE locally, same for Skybert | Follow Fhi.Metadata.MCPserver pattern. HTTP from day one, no transport change on deploy. |
+| JSON-stat2 output | No | csv2 is sufficient for LLM agents. JSON-stat2 is for statistical software. |
+| Fuzzy dimension search | Yes, accent-insensitive | Norwegian chars (æøå) will trip up agents. Normalize NFD + strip combining marks. |
+| Sample data in describe_table | No | Adds latency. Agent calls `query_data` with `max_rows=5` if it wants a preview. |