Stack
This section describes the Kair production stack. Every layer is open-source and
self-hostable. No proprietary lock-in. No commercial AI APIs in production.
The choices below reflect three constraints. Raw audio must never leave the room. Inference
must run locally on workshop hardware without internet access. The contracting authority
must retain full operational sovereignty over their data environment.
Platform
Backend
Rust (multi-crate workspace)
Poem / Poem-OpenAPI, Tokio async runtime, SeaORM 2
Frontend
SvelteKit 2 + Svelte 5
TypeScript 5, Tailwind CSS, Xyflow, Mermaid
Primary database
PostgreSQL
Bundle state, session records, discourse schema
Vector store
Qdrant
HiRAG embeddings, bundle Layer 3 retrieval
Cache / job queue
Redis
Worker coordination, session state
Audio storage
S3-compatible object storage
Deleted per agreed retention window after transcription
Hosting
Hetzner Online GmbH
Falkenstein, ISO/IEC 27001, EU jurisdiction
Inference
No data is transmitted to third-party AI APIs at any stage. Transcription uses OpenAI's
Whisper model weights run locally via whisper-rs. LLM inference uses Meta's Llama 3.1
weights run locally via Ollama. No API call, no external transmission.
Transcription
Whisper (ggml-large-v3)
Local workshop hardware, CUDA, whisper-rs
In-session sensemaking
Llama 3.1 8B
Local workshop hardware via Ollama
Post-session indexing
Llama 3.1 70B
Hetzner EU infrastructure via Ollama, Embers Engine
De-identification
spaCy + Presidio
NER model, local, before any transmission
Workers
Post-session processing runs across three standalone worker types. Each compiles to a single
binary. All three communicate via shared PostgreSQL and Redis. They are deployable on any
standard Linux machine with CUDA drivers, independently of the central Kair pipeline.
Transcription Worker
Runs natively on Linux with CUDA. Processes raw audio with whisper-rs. Writes de-identified transcript to PostgreSQL.
Summarisation Worker
Processes de-identified transcripts. Produces structured summary output mapped to the discourse schema.
Sensemaking Worker
Requires Ollama and CUDA. Runs Llama 3.1 70B for knowledge graph construction, entity extraction, community detection, and Temporal Deliberation Graph synthesis. Indexes output into bundle Layer 3 via HiRAG.
Discourse Ontology + Embers Engine
The discourse ontology is the layer that structures what sessions write into bundles. It is
not a fixed taxonomy. It is co-designed with the
contracting authority before sessions begin, and it evolves as new terms surface in
participant speech.
The Embers Engine is the post-session sensemaking pipeline. It runs a Hierarchical RAG
(HiRAG) pipeline over de-identified session transcripts, constructing the ontology-structured
knowledge graph that becomes the bundle's Layer 3. Across sessions, it assembles the
Temporal Deliberation Graph: a longitudinal record of how positions, concerns, and
participant coalitions have shifted over time.
Local
Specific entities, quotes, individual positions — exact facts from a single session
Global
High-level themes, dominant narratives, community summaries — the overall shape of a session or bundle
Bridge
Reasoning paths connecting specific concerns to thematic patterns — how a detail relates to the whole