(Coming Soon)

Stack

This section describes the Kair production stack. Every layer is open-source and self-hostable. No proprietary lock-in. No commercial AI APIs in production.

The choices below reflect three constraints. Raw audio must never leave the room. Inference must run locally on workshop hardware without internet access. The contracting authority must retain full operational sovereignty over their data environment.

Platform

LAYER

CHOICE

NOTE

Backend

Rust (multi-crate workspace)

Poem / Poem-OpenAPI, Tokio async runtime, SeaORM 2

Frontend

SvelteKit 2 + Svelte 5

TypeScript 5, Tailwind CSS, Xyflow, Mermaid

Primary database

PostgreSQL

Bundle state, session records, discourse schema

Vector store

Qdrant

HiRAG embeddings, bundle Layer 3 retrieval

Cache / job queue

Redis

Worker coordination, session state

Audio storage

S3-compatible object storage

Deleted per agreed retention window after transcription

Hosting

Hetzner Online GmbH

Falkenstein, ISO/IEC 27001, EU jurisdiction

Inference

No data is transmitted to third-party AI APIs at any stage. Transcription uses OpenAI's Whisper model weights run locally via whisper-rs. LLM inference uses Meta's Llama 3.1 weights run locally via Ollama. No API call, no external transmission.

STAGE

MODEL

WHERE IT RUNS

Transcription

Whisper (ggml-large-v3)

Local workshop hardware, CUDA, whisper-rs

In-session sensemaking

Llama 3.1 8B

Local workshop hardware via Ollama

Post-session indexing

Llama 3.1 70B

Hetzner EU infrastructure via Ollama, Embers Engine

De-identification

spaCy + Presidio

NER model, local, before any transmission

Workers

Post-session processing runs across three standalone worker types. Each compiles to a single binary. All three communicate via shared PostgreSQL and Redis. They are deployable on any standard Linux machine with CUDA drivers, independently of the central Kair pipeline.

WORKER

WHAT IT DOES

Transcription Worker

Runs natively on Linux with CUDA. Processes raw audio with whisper-rs. Writes de-identified transcript to PostgreSQL.

Summarisation Worker

Processes de-identified transcripts. Produces structured summary output mapped to the discourse schema.

Sensemaking Worker

Requires Ollama and CUDA. Runs Llama 3.1 70B for knowledge graph construction, entity extraction, community detection, and Temporal Deliberation Graph synthesis. Indexes output into bundle Layer 3 via HiRAG.

Discourse Ontology + Embers Engine

The discourse ontology is the layer that structures what sessions write into bundles. It is not a fixed taxonomy. It is co-designed with the contracting authority before sessions begin, and it evolves as new terms surface in participant speech.

The Embers Engine is the post-session sensemaking pipeline. It runs a Hierarchical RAG (HiRAG) pipeline over de-identified session transcripts, constructing the ontology-structured knowledge graph that becomes the bundle's Layer 3. Across sessions, it assembles the Temporal Deliberation Graph: a longitudinal record of how positions, concerns, and participant coalitions have shifted over time.

HIRAG MODE

WHAT IT RETURNS

Local

Specific entities, quotes, individual positions — exact facts from a single session

Global

High-level themes, dominant narratives, community summaries — the overall shape of a session or bundle

Bridge

Reasoning paths connecting specific concerns to thematic patterns — how a detail relates to the whole