added plan

2026-03-27 16:52:31 +01:00
parent 3303151e4e
commit 372deffa29
1 changed files with 627 additions and 0 deletions
--- a/mcp-server-plan.md
+++ b/mcp-server-plan.md
@@ -0,0 +1,627 @@
+# MCP Server for FHI Statistikk Open API
+
+## Overview
+
+An MCP (Model Context Protocol) server that exposes the FHI Statistikk Open API
+as tools optimized for AI agent consumption. The server wraps the REST API at
+`https://statistikk-data.fhi.no/api/open/v1/` and adds intelligent
+summarization, format translation, and convenience features that make the API
+practical for LLM-based agents.
+
+**Base API**: https://statistikk-data.fhi.no/api/open/v1/
+**API docs**: https://statistikk-data.fhi.no/swagger/index.html?urls.primaryName=Allvis%20Open%20API
+**License**: CC BY 4.0 (open data)
+**Auth**: None required
+
+## Problem Statement
+
+The raw API has several characteristics that make it hard for AI agents:
+
+1. **JSON-stat2 format** -- The data endpoint returns a multidimensional sparse
+   array format designed for statistical software, not LLMs.
+2. **Mandatory dimension specification** -- All dimensions must be included in
+   every data query, even single-valued ones like `KJONN=["0"]`.
+3. **Non-obvious value formats** -- Year values use `"2020_2020"` not `"2020"`.
+4. **Massive dimension trees** -- The GEO dimension can have 400+ entries in a
+   hierarchical tree (country > county > municipality > city district).
+5. **Multi-step discovery** -- Finding relevant data requires: list sources >
+   list tables > get dimensions > construct query > fetch data.
+6. **Metadata contains raw HTML** -- `<p>`, `<a>`, `<ol>` tags in content fields.
+7. **Swagger spec is incomplete** -- Documents only `"item"` filter, but the API
+   actually supports `"item"`, `"all"`, `"top"`, `"bottom"`.
+
+## API Inventory
+
+### Sources (as of 2026-03-27)
+
+| ID       | Title                                    | Publisher            |
+|----------|------------------------------------------|----------------------|
+| nokkel   | Folkehelsestatistikk                     | Helsedirektoratet    |
+| ngs      | Mikrobiologisk genomovervåkning          | FHI                  |
+| mfr      | Medisinsk fødselsregister                | FHI                  |
+| abr      | Abortregisteret                          | FHI                  |
+| sysvak   | Nasjonalt vaksinasjonsregister SYSVAK    | FHI                  |
+| daar     | Dødsårsakregisteret                      | FHI                  |
+| msis     | Meldingssystem for smittsomme sykdommer  | FHI                  |
+| lmr      | Legemiddelregisteret                     | FHI                  |
+| gs       | Grossiststatistikk                       | FHI                  |
+| npr      | Norsk pasientregister                    | FHI                  |
+| kpr      | Kommunalt pasient- og brukerregister     | FHI                  |
+| hkr      | Hjerte- og karsykdommer                  | FHI                  |
+| skast    | Skadedyrstatistikk                       | FHI                  |
+
+### Endpoints
+
+| Method | Path                                          | Purpose                    |
+|--------|-----------------------------------------------|----------------------------|
+| GET    | `/Common/source`                              | List all sources           |
+| GET    | `/{sourceId}/Table`                           | List tables in source      |
+| GET    | `/{sourceId}/Table/{tableId}`                 | Table info                 |
+| GET    | `/{sourceId}/Table/{tableId}/query`           | Query template             |
+| GET    | `/{sourceId}/Table/{tableId}/dimension`       | Dimensions and categories  |
+| POST   | `/{sourceId}/Table/{tableId}/data`            | Fetch data                 |
+| GET    | `/{sourceId}/Table/{tableId}/flag`            | Flag/symbol definitions    |
+| GET    | `/{sourceId}/Table/{tableId}/metadata`        | Table metadata             |
+
+### Filter Types
+
+| Filter   | Description                                    | Example values           |
+|----------|------------------------------------------------|--------------------------|
+| `item`   | Exact match on listed values                   | `["2020_2020","2021_2021"]` |
+| `all`    | Wildcard match with `*`                        | `["*"]` or `["A*","B*"]` |
+| `top`    | First N categories                             | `["5"]`                  |
+| `bottom` | Last N categories                              | `["5"]`                  |
+
+### Response Formats (data endpoint)
+
+| Format     | Content-Type                    | Description                     |
+|------------|---------------------------------|---------------------------------|
+| json-stat2 | application/json                | JSON-Stat 2.0 sparse array     |
+| csv2       | text/csv                        | CSV with human-readable labels  |
+| csv3       | text/csv                        | CSV with dimension/measure codes|
+| parquet    | application/vnd.apache.parquet  | Apache Parquet columnar format  |
+
+## MCP Tool Design
+
+### Tool 1: `list_sources`
+
+**Purpose**: Entry point. List all available data sources.
+
+**Parameters**: None.
+
+**Returns**: Array of `{id, title, description, published_by}`.
+
+**Implementation**: GET `/Common/source`. Pass through with minor field renaming
+(snake_case).
+
+**Caching**: Cache for 24 hours. Source list rarely changes.
+
+---
+
+### Tool 2: `list_tables`
+
+**Purpose**: Find tables within a source, with optional keyword search.
+
+**Parameters**:
+- `source_id` (string, required) -- Source identifier, e.g. `"nokkel"`.
+- `search` (string, optional) -- Case-insensitive keyword filter on table title.
+  Supports multiple words (all must match). Applied client-side.
+- `modified_after` (string, optional) -- ISO-8601 datetime. Only return tables
+  modified after this date. Passed to API server-side.
+
+**Returns**: Array of `{table_id, title, published_at, modified_at}`.
+
+**Implementation**: GET `/{sourceId}/Table?modifiedAfter=...`, then client-side
+filter on `search`. Sort by `modified_at` descending.
+
+**Caching**: Cache per source_id for 1 hour. Table lists update throughout the
+day as data is published.
+
+**Example**:
+```
+list_tables(source_id="nokkel", search="befolkning")
+→ [{table_id: 185, title: "Befolkningsvekst", ...},
+   {table_id: 338, title: "Befolkningssammensetning_antall_andel", ...},
+   {table_id: 171, title: "Befolkningsframskriving", ...}]
+```
+
+---
+
+### Tool 3: `describe_table`
+
+**Purpose**: The primary tool for understanding a table's structure. Gives the
+agent everything it needs to construct a data query.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+
+**Returns**: A structured summary combining table info, dimensions, metadata,
+and flags. This is a composite call (4 parallel API requests).
+
+**Response structure**:
+```
+{
+  "title": "Befolkningsvekst",
+  "published_at": "2025-10-21T08:56:39Z",
+  "modified_at": "2025-10-21T08:56:39Z",
+  "is_official_statistics": false,
+  "description": "Differansen mellom befolkningsmengden...",
+  "update_frequency": "Årlig",
+  "keywords": ["Befolkning", "Befolkningsvekst"],
+  "source_institution": "Statistisk sentralbyrå (SSB)",
+  "dimensions": [
+    {
+      "code": "GEO",
+      "label": "Geografi",
+      "total_categories": 356,
+      "is_hierarchical": true,
+      "hierarchy_depth": 4,
+      "top_level_values": [
+        {"value": "0", "label": "Hele landet", "child_count": 15}
+      ],
+      "note": "Use get_dimension_values to drill into sub-levels"
+    },
+    {
+      "code": "AAR",
+      "label": "År",
+      "total_categories": 23,
+      "is_hierarchical": false,
+      "value_format": "YYYY_YYYY (e.g. 2020_2020)",
+      "range": "2002..2024",
+      "values": ["2002_2002", "2003_2003", ..., "2024_2024"]
+    },
+    {
+      "code": "KJONN",
+      "label": "Kjønn",
+      "total_categories": 1,
+      "is_fixed": true,
+      "values": [{"value": "0", "label": "kjønn samlet"}],
+      "note": "Single-valued, auto-included in queries"
+    },
+    {
+      "code": "ALDER",
+      "label": "Alder",
+      "total_categories": 1,
+      "is_fixed": true,
+      "values": [{"value": "0_120", "label": "alle aldre"}],
+      "note": "Single-valued, auto-included in queries"
+    },
+    {
+      "code": "MEASURE_TYPE",
+      "label": "Måltall",
+      "total_categories": 2,
+      "is_fixed": false,
+      "values": [
+        {"value": "TELLER", "label": "antall"},
+        {"value": "RATE", "label": "prosent vekst"}
+      ]
+    }
+  ],
+  "flags": [
+    {"symbol": "", "description": "Verdi finnes i tabellen"}
+  ]
+}
+```
+
+**Key design decisions**:
+
+1. **Summarize large dimensions** -- For dimensions with >20 categories (mainly
+   GEO), show only top-level entries with child counts. The agent uses
+   `get_dimension_values` to drill down.
+
+2. **Mark fixed dimensions** -- Dimensions with exactly 1 category get
+   `is_fixed: true`. The agent knows to ignore these; `query_data` will
+   auto-include them.
+
+3. **Show value format** -- AAR values are `"2020_2020"`, not `"2020"`. Show
+   this explicitly so the agent gets the format right.
+
+4. **Include metadata inline** -- Strip HTML from metadata paragraphs. Extract
+   `description`, `keywords`, `update_frequency`, `source_institution` as
+   top-level fields.
+
+5. **Include flags inline** -- Flag definitions are small and always relevant.
+
+**Implementation**: Parallel fetch of:
+- GET `/{sourceId}/Table/{tableId}` (table info)
+- GET `/{sourceId}/Table/{tableId}/dimension` (dimensions)
+- GET `/{sourceId}/Table/{tableId}/metadata` (metadata)
+- GET `/{sourceId}/Table/{tableId}/flag` (flags)
+
+Then merge and transform.
+
+**Caching**: Cache per (source_id, table_id) for 6 hours. Dimension structure
+changes rarely.
+
+---
+
+### Tool 4: `get_dimension_values`
+
+**Purpose**: Drill into large hierarchical dimensions, typically GEO.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+- `dimension_code` (string, required) -- e.g. `"GEO"`.
+- `parent_value` (string, optional) -- Return only children of this category.
+  E.g. `"18"` for Nordland county. If omitted, returns top-level categories.
+- `search` (string, optional) -- Case-insensitive search on category labels.
+  E.g. `"tromsø"` to find the municipality.
+
+**Returns**: Array of `{value, label, child_count}`.
+
+**Implementation**: GET `/{sourceId}/Table/{tableId}/dimension`, then navigate
+the category tree client-side. The full tree is fetched and cached; filtering
+is done in the MCP server.
+
+**Examples**:
+```
+# Get all counties
+get_dimension_values("nokkel", 185, "GEO")
+→ [{value: "0", label: "Hele landet", child_count: 15}]
+
+# Get municipalities in Nordland
+get_dimension_values("nokkel", 185, "GEO", parent_value="18")
+→ [{value: "1804", label: "Bodø", child_count: 0},
+   {value: "1806", label: "Narvik", child_count: 0}, ...]
+
+# Search for a municipality
+get_dimension_values("nokkel", 185, "GEO", search="tromsø")
+→ [{value: "5501", label: "Tromsø", child_count: 0}]
+```
+
+**Caching**: Shares the dimension cache with `describe_table`.
+
+---
+
+### Tool 5: `query_data`
+
+**Purpose**: Fetch actual data from a table. The main data retrieval tool.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+- `dimensions` (array, required) -- Each element:
+  - `code` (string) -- Dimension code, e.g. `"GEO"`.
+  - `filter` (string) -- One of `"item"`, `"all"`, `"top"`, `"bottom"`.
+    Default: `"item"`.
+  - `values` (array of strings) -- Filter values.
+- `max_rows` (integer, optional) -- Limit returned rows. Default: 1000.
+  Set to 0 for no limit (be careful).
+
+**Returns**: Structured rows with labeled values.
+
+```
+{
+  "table": "Befolkningsvekst",
+  "total_rows": 4,
+  "rows": [
+    {"GEO": "Oslo", "AAR": "2023", "KJONN": "kjønn samlet",
+     "ALDER": "alle aldre", "TELLER": 3516, "RATE": 0.5},
+    ...
+  ],
+  "truncated": false,
+  "dimensions_used": {
+    "GEO": {"filter": "item", "values": ["0301"]},
+    "AAR": {"filter": "bottom", "values": ["2"]},
+    "KJONN": {"filter": "item", "values": ["0"]},
+    "ALDER": {"filter": "item", "values": ["0_120"]},
+    "MEASURE_TYPE": {"filter": "all", "values": ["*"]}
+  }
+}
+```
+
+**Key design decisions**:
+
+1. **Default to csv2 internally** -- Fetch as csv2 (human-readable labels),
+   parse into rows. CSV is simpler for an agent to reason about than JSON-stat2.
+   The tool internally requests csv2 and structures it.
+
+2. **Auto-include fixed dimensions** -- If the agent omits a dimension that has
+   only 1 category (like KJONN or ALDER), the tool adds it automatically with
+   `filter: "item"` and the single value. This means the agent only needs to
+   specify the dimensions it actually cares about.
+
+3. **Normalize year values** -- If the agent sends `"2020"` for AAR, the tool
+   translates to `"2020_2020"`. The `YYYY_YYYY` format is an internal API
+   convention the agent shouldn't need to know about.
+
+4. **Default MEASURE_TYPE** -- If omitted, default to `filter: "all", values:
+   ["*"]` to get all measures. Most agents want all available metrics.
+
+5. **Row limit with truncation flag** -- Default 1000 rows. Return a
+   `truncated: true` flag and `total_rows` count so the agent knows if there's
+   more data.
+
+6. **Echo back dimensions_used** -- Show what was actually sent to the API
+   (after auto-completion), so the agent can see the full query.
+
+**Implementation**:
+1. Fetch dimension info if not cached (to know fixed dimensions and validate)
+2. Auto-complete missing/fixed dimensions
+3. Normalize year values
+4. POST `/{sourceId}/Table/{tableId}/data` with format=csv2
+5. Parse CSV response into row objects
+6. Apply row limit, compute truncation
+
+**Error handling**: The API returns ProblemDetails (RFC 7807) on 400/404/422.
+Transform into clear error messages:
+- "Dimension 'XYZ' is not valid for this table. Available: GEO, AAR, ..."
+- "Value '2025_2025' not found in dimension AAR. Range: 2002..2024"
+- "maxRowCount exceeded. Requested ~50000 rows, limit is 1000. Narrow filters."
+
+---
+
+### Tool 6: `get_query_template`
+
+**Purpose**: Fallback tool returning the raw query template from the API. Useful
+when the agent needs to see exactly what the API expects.
+
+**Parameters**:
+- `source_id` (string, required)
+- `table_id` (integer, required)
+
+**Returns**: The raw DataRequest JSON as returned by the API.
+
+**Implementation**: GET `/{sourceId}/Table/{tableId}/query`. Pass through.
+
+**When to use**: When `query_data` auto-completion isn't behaving as expected,
+or the agent wants to see the complete list of available values for all
+dimensions.
+
+---
+
+## Tools NOT included (and why)
+
+| Considered tool              | Decision | Reason                                      |
+|------------------------------|----------|---------------------------------------------|
+| `get_flags` (standalone)     | Dropped  | Folded into `describe_table`                |
+| `get_metadata` (standalone)  | Dropped  | Folded into `describe_table`                |
+| `get_table_info` (standalone)| Dropped  | Folded into `describe_table`                |
+| `search_across_sources`      | Dropped  | Too expensive (13 API calls). Agent can call `list_tables` per source |
+| `get_data_jsonstat`          | Dropped  | Agents don't need raw JSON-stat2            |
+| `get_data_parquet`           | Dropped  | Binary format, not useful for LLM context   |
+
+## Architecture
+
+### Stack
+
+- **Language**: Python 3.12+
+- **MCP framework**: FastMCP (`mcp[cli]`)
+- **HTTP server**: Uvicorn (`uvicorn>=0.30`) for SSE/HTTP transport
+- **HTTP client**: `httpx` (async)
+- **CSV parsing**: stdlib `csv`
+- **HTML stripping**: stdlib `html.parser` or `re` (simple tag removal)
+- **Build system**: Hatchling (matches Fhi.Metadata.MCPserver pattern)
+
+### Transport
+
+The server supports multiple transports via CLI flag, following the same pattern
+as `Fhi.Metadata.MCPserver`:
+
+| Transport        | Use case                              | Endpoint          |
+|------------------|---------------------------------------|-------------------|
+| `sse`            | Local dev + Skybert deployment        | `/sse`            |
+| `streamable-http`| Future HTTP-only clients              | `/mcp`            |
+| `stdio`          | Direct pipe (legacy)                  | stdin/stdout      |
+
+**Default**: `sse` on `0.0.0.0:8000`. This means the server works over HTTP
+both locally and when deployed to Skybert, with no transport change needed.
+
+**CLI entry point**:
+```bash
+fhi-statistikk-mcp --transport sse --host 0.0.0.0 --port 8000
+```
+
+### Project Structure
+
+```
+fhi-statistikk-mcp/
+├── .github/
+│   └── workflows/
+│       └── docker-build-push.yaml  # CI/CD → crfhiskybert.azurecr.io
+├── .mcp.json.local                 # Local dev: http://localhost:8000/sse
+├── .mcp.json.public                # Production: https://<skybert-url>/sse
+├── Dockerfile                      # Multi-stage, Python 3.12-slim
+├── pyproject.toml                  # Hatchling build, entry point
+├── README.md
+├── src/
+│   └── fhi_statistikk_mcp/
+│       ├── __init__.py
+│       ├── server.py               # MCP server, tool definitions, main()
+│       ├── api_client.py           # Async httpx client for FHI API
+│       ├── transformers.py         # CSV parsing, dimension summarization
+│       └── cache.py                # Simple TTL cache
+└── tests/
+    ├── test_transformers.py
+    ├── test_cache.py
+    └── fixtures/                   # Recorded API responses
+        ├── sources.json
+        ├── tables_nokkel.json
+        ├── dimensions_185.json
+        ├── metadata_185.json
+        ├── flags_185.json
+        └── data_185.csv
+```
+
+### MCP Client Configuration
+
+**Local development** (`.mcp.json.local`):
+```json
+{
+  "mcpServers": {
+    "fhi-statistikk": {
+      "type": "sse",
+      "url": "http://localhost:8000/sse"
+    }
+  }
+}
+```
+
+**Production** (`.mcp.json.public`):
+```json
+{
+  "mcpServers": {
+    "fhi-statistikk": {
+      "type": "sse",
+      "url": "https://<skybert-url>/sse"
+    }
+  }
+}
+```
+
+### Dockerfile
+
+Following the Fhi.Metadata.MCPserver pattern:
+```dockerfile
+FROM python:3.12-slim AS base
+WORKDIR /app
+COPY pyproject.toml .
+COPY src/ src/
+RUN pip install --no-cache-dir .
+
+FROM base AS prod
+EXPOSE 8000
+CMD ["fhi-statistikk-mcp", "--transport", "sse", "--host", "0.0.0.0", "--port", "8000"]
+```
+
+### CI/CD
+
+Same pipeline pattern as Fhi.Metadata.MCPserver:
+- Trigger on push to `main` touching `src/`, `Dockerfile`, or `pyproject.toml`
+- Azure Federated Identity (OIDC) login
+- Push to `crfhiskybert.azurecr.io/fida/ki/statistikk-mcp`
+- Tag: git short SHA + `latest`
+- Dispatch to GitOps repo for Skybert deployment
+
+### Logging
+
+Force all loggers (uvicorn, mcp, fastmcp) to stderr with simple format.
+Print startup info (API base URL, cache status) to stderr. No persistent log
+files -- container logging handles that on Skybert.
+
+### Caching Strategy
+
+| Data             | TTL      | Key                        | Reason                         |
+|------------------|----------|----------------------------|--------------------------------|
+| Source list       | 24h      | `"sources"`                | Rarely changes                 |
+| Table list        | 1h       | `source_id`                | New tables published daily     |
+| Dimensions        | 6h       | `(source_id, table_id)`    | Dimension structure is stable  |
+| Metadata          | 6h       | `(source_id, table_id)`    | Metadata edits are rare        |
+| Flags             | 6h       | `(source_id, table_id)`    | Flags rarely change            |
+| Query templates   | 6h       | `(source_id, table_id)`    | Follows dimension changes      |
+| Data responses    | No cache | --                         | Queries vary too much to cache |
+
+In-memory dict with TTL. No external dependency needed -- the data volume is
+small and the server is single-process.
+
+### Rate Limiting
+
+No documented rate limits, but this is a government API. Be polite:
+- Max 5 concurrent requests
+- 100ms minimum between requests
+- Retry with exponential backoff on 429/503
+
+### Error Mapping
+
+| API Response          | MCP Tool Error                                       |
+|-----------------------|------------------------------------------------------|
+| 400 Bad Request       | Descriptive message from ProblemDetails.detail        |
+| 404 Not Found         | "Source/table not found: {id}"                        |
+| 422 Client Error      | "Query validation failed: {detail}"                   |
+| Network timeout       | "API request timed out. Try reducing query scope."    |
+| CSV parse error       | "Failed to parse response. Try get_query_template."   |
+
+### Unicode / Fuzzy Search
+
+Dimension value search (in `get_dimension_values`) normalizes both query and
+labels for accent-insensitive matching:
+- Normalize with `unicodedata.normalize("NFD")`, strip combining marks
+- Case-insensitive comparison
+- `"tromso"` matches `"Tromsø"`, `"barum"` matches `"Bærum"`
+- Preserve original labels in output
+
+## Implementation Plan
+
+### Phase 1: Core (MVP)
+
+1. Set up project skeleton: `pyproject.toml` with hatchling, `src/` layout,
+   entry point `fhi-statistikk-mcp`
+2. Set up `server.py` with FastMCP, SSE transport, CLI args (transport, host,
+   port), stderr logging
+3. Implement `api_client.py` with async httpx client, base URL config
+4. Implement `cache.py` with simple TTL dict
+5. Implement `list_sources` tool
+6. Implement `list_tables` tool with client-side keyword search
+7. Implement `describe_table` composite tool
+   - Parallel fetch of 4 endpoints
+   - Dimension summarization (large dim truncation, fixed dim detection)
+   - HTML stripping for metadata
+   - Merge into structured response
+8. Implement `query_data` tool
+   - Auto-completion of fixed dimensions
+   - Year value normalization (`"2020"` → `"2020_2020"`)
+   - Default MEASURE_TYPE to `all`/`["*"]`
+   - CSV parsing and row structuring
+   - Row limit and truncation
+9. Implement `get_dimension_values` with hierarchy navigation and accent-
+   insensitive search
+10. Implement `get_query_template` passthrough
+11. Add `.mcp.json.local` for local dev
+12. Test all tools against live API
+
+### Phase 2: Deployment & Polish
+
+13. Add `Dockerfile` (multi-stage, Python 3.12-slim)
+14. Add `.github/workflows/docker-build-push.yaml` for CI/CD
+15. Add `.mcp.json.public` with Skybert URL
+16. Add comprehensive error handling and error messages
+17. Add rate limiting
+18. Record API fixtures for offline testing
+19. Write unit tests for transformers and cache
+20. Write integration tests against live API
+
+### Phase 3: Optional Enhancements
+
+21. Add a `search_all_tables` convenience tool (if agents frequently need it)
+22. Add MCP resources for static reference data (source descriptions, common
+    dimension codes)
+23. Add MCP prompt templates (e.g. "finn helsedata om <topic>")
+
+## Tool Description Guidelines
+
+MCP tool descriptions are what the agent uses to decide which tool to call. They
+should be written for an LLM audience:
+
+- Lead with the purpose, not the endpoint
+- Include example parameter values
+- Document non-obvious conventions (year format, dimension codes)
+- Mention what `describe_table` returns, since it's the prerequisite for
+  `query_data`
+- Note that Norwegian labels are the default (GEO labels are in Norwegian)
+
+### Example tool description for `query_data`:
+
+> Fetch statistical data from an FHI table. Before calling this, use
+> `describe_table` to understand the table's dimensions and available values.
+>
+> You only need to specify the dimensions you care about. Fixed dimensions
+> (single-valued, like KJONN="kjønn samlet") are auto-included. If you omit
+> MEASURE_TYPE, all measures are returned.
+>
+> Year values: use "2020" (auto-translated to "2020_2020") or the full format.
+>
+> Filters: "item" (exact values), "all" (wildcard, e.g. ["*"]),
+> "top" (first N), "bottom" (last N).
+>
+> Returns labeled rows, max 1000 by default. Check "truncated" field.
+
+## Resolved Decisions
+
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Hosting | SSE locally, same for Skybert | Follow Fhi.Metadata.MCPserver pattern. HTTP from day one, no transport change on deploy. |
+| JSON-stat2 output | No | csv2 is sufficient for LLM agents. JSON-stat2 is for statistical software. |
+| Fuzzy dimension search | Yes, accent-insensitive | Norwegian chars (æøå) will trip up agents. Normalize NFD + strip combining marks. |
+| Sample data in describe_table | No | Adds latency. Agent calls `query_data` with `max_rows=5` if it wants a preview. |