Open App
(Coming Soon)

Stack

This section describes the Kair production stack. Every layer is open-source and self-hostable. No proprietary lock-in. No commercial AI APIs in production.

The choices below reflect three constraints. Raw audio must never leave the room. Inference must run locally on workshop hardware without internet access. The contracting authority must retain full operational sovereignty over their data environment.

Platform

LAYER
CHOICE
NOTE
Backend
Rust (multi-crate workspace)
Poem / Poem-OpenAPI, Tokio async runtime, SeaORM 2
Frontend
SvelteKit 2 + Svelte 5
TypeScript 5, Tailwind CSS, Xyflow, Mermaid
Primary database
PostgreSQL
Bundle state, session records, discourse schema
Vector store
Qdrant
HiRAG embeddings, bundle Layer 3 retrieval
Cache / job queue
Redis
Worker coordination, session state
Audio storage
S3-compatible object storage
Deleted per agreed retention window after transcription
Hosting
Hetzner Online GmbH
Falkenstein, ISO/IEC 27001, EU jurisdiction

Inference

No data is transmitted to third-party AI APIs at any stage. Transcription uses OpenAI's Whisper model weights run locally via whisper-rs. LLM inference uses Meta's Llama 3.1 weights run locally via Ollama. No API call, no external transmission.

STAGE
MODEL
WHERE IT RUNS
Transcription
Whisper (ggml-large-v3)
Local workshop hardware, CUDA, whisper-rs
In-session sensemaking
Llama 3.1 8B
Local workshop hardware via Ollama
Post-session indexing
Llama 3.1 70B
Hetzner EU infrastructure via Ollama, Embers Engine
De-identification
spaCy + Presidio
NER model, local, before any transmission

Workers

Post-session processing runs across three standalone worker types. Each compiles to a single binary. All three communicate via shared PostgreSQL and Redis. They are deployable on any standard Linux machine with CUDA drivers, independently of the central Kair pipeline.

WORKER
WHAT IT DOES
Transcription Worker
Runs natively on Linux with CUDA. Processes raw audio with whisper-rs. Writes de-identified transcript to PostgreSQL.
Summarisation Worker
Processes de-identified transcripts. Produces structured summary output mapped to the discourse schema.
Sensemaking Worker
Requires Ollama and CUDA. Runs Llama 3.1 70B for knowledge graph construction, entity extraction, community detection, and Temporal Deliberation Graph synthesis. Indexes output into bundle Layer 3 via HiRAG.

Discourse Ontology + Embers Engine

The discourse ontology is the layer that structures what sessions write into bundles. It is not a fixed taxonomy. It is co-designed with the contracting authority before sessions begin, and it evolves as new terms surface in participant speech.

The Embers Engine is the post-session sensemaking pipeline. It runs a Hierarchical RAG (HiRAG) pipeline over de-identified session transcripts, constructing the ontology-structured knowledge graph that becomes the bundle's Layer 3. Across sessions, it assembles the Temporal Deliberation Graph: a longitudinal record of how positions, concerns, and participant coalitions have shifted over time.

HIRAG MODE
WHAT IT RETURNS
Local
Specific entities, quotes, individual positions — exact facts from a single session
Global
High-level themes, dominant narratives, community summaries — the overall shape of a session or bundle
Bridge
Reasoning paths connecting specific concerns to thematic patterns — how a detail relates to the whole