Membria CE Client Runtime - Full Specification (RFC)
Status: DraftAudience: Engineering, Product, Security
Scope: Membria CE Client runtime architecture and execution model (Cloud first, Self-Hosted later)
Related:
client_updated.md (high-level overview), diagrams.md (reference diagrams)
0. Executive summary
Membria CE Client is a runtime that sits between the user and a fleet of AI capabilities. It provides:- A normal chat experience (expected baseline for power users).
- A persistent intelligence layer that compounds over time: memory, GraphRAG, and expert skill adapters.
- A Decision Black Box (DBB) that extracts durable decision artifacts from noisy work streams.
- A Decision Surface (DS) that reads DBB + memory (not raw chat) to show what matters.
- A local-first execution model (when self-hosted is available), with cloud escalation and a decentralized knowledge cache to prevent intelligence reset.
1. Goals
1.1 Product goals
- Compounding intelligence: user knowledge and reasoning should accumulate instead of resetting per chat/tool/model.
- Explainability by default: answers should be traceable to sources and reasoning artifacts (citations, link chains, provenance).
- Local-first posture: run as much as possible locally (SLM + memory + GraphRAG) when self-hosted is enabled.
- Selective escalation: use expensive models only when local capability + cache + graph are insufficient.
- Decision capture at scale: extract turning points and decisions across dozens of chats/tools without forcing rituals.
- Power user throughput: minimize friction; support terminal-like visibility without turning the UI into a console.
1.2 Engineering goals
- Modular components with clear contracts.
- Deterministic tool calling (schema-validated).
- Measurable reliability: latency budgets, cache hit rates, citation coverage, escalation rate.
- Secure-by-design permissions for sources, tools, and export.
2. Non-goals
- Replace every external tool UI (Slack, Gmail, GitHub, IDEs) in v1.
- Train a frontier foundation model from scratch.
- “Perfect truth” or guaranteed correctness; instead: confidence + provenance + escalation.
- A single monolithic “one prompt to rule them all” architecture.
3. Definitions and terminology
- Client: the user-facing app + local runtime services (desktop/mobile/web, depending on platform).
- SLM: small local model used for fast, cheap reasoning (open weights).
- Council: a set of stronger LLMs used during escalation (cloud).
- KCG / Global Knowledge Cache: decentralized layer storing verified knowledge artifacts and their provenance.
- CAG: cache-augmented generation; reusing verified answers and intermediate artifacts.
- GraphRAG: retrieval using a knowledge graph + vector similarity + graph-aware ranking with citations.
- ThoughtUnit: a normalized atomic fragment extracted from sources (message/doc chunk) with metadata.
- DBB: Decision Black Box; agent/process that detects decisions and turning points and logs them structurally.
- DS: Decision Surface; the “home screen” built from DBB + memory graph.
4. System context and product lines
Membria is shipped as:- Membria Cloud (managed service) - first release path.
- Membria Self-Hosted / Local - later release path for local/on-device execution.
- Decentralized knowledge backend - remains in the cloud and is shared across products (personal/SMB/enterprise), subject to permissions and tenancy.
5. Architecture overview
5.1 Core modules
- UI Shell: Chat, Decision Surface, Search, Sources/Files tray.
- Orchestrator (Router): chooses model path, retrieval depth, tool calls, caching strategy.
- Local stores: SQLite (transactional), DuckDB (analytics), embedding store, graph store.
- GraphRAG Engine: indexing, retrieval, ranking, citation assembly.
- DBB Engine: event stream ingestion -> extraction -> decision objects.
- Escalation Gateway: Council calls, KCG lookup, cost controls.
- Curator: verification, fusion, provenance, write-back policy.
- SkillForge: LoRA management and optional training workflow.
5.2 Execution paths
- Path A (local): UI -> Orchestrator -> SLM + GraphRAG + local memory -> answer + citations.
- Path B (cache): UI -> Orchestrator -> local cache/KCG -> answer + citations.
- Path C (escalation): UI -> Orchestrator -> Council -> Curator -> write back -> answer.
6. Data sources and ingestion
6.1 Supported sources (initial)
- Google Drive (docs, pdfs)
- Slack
- WhatsApp exports (where feasible)
- Chat tools: Claude Code, Codex logs (CLI traces), ChatGPT exports
- Forums/comments (import-based)
6.2 Access & scope control (must-have)
The onboarding must ask:- which sources to connect,
- scope per source: full history / last 12 months / custom range,
- estimated volume preview (“~18k messages · ~1.2k documents · ~3 years”),
- explicit permission grants and revocation.
6.3 Ingestion stages (cost-aware)
- Structural parsing (no heavy reasoning)
- normalize to ThoughtUnits
- extract: timestamps, author, thread id, source URI, message type
- Cheap embeddings
- compute embeddings per ThoughtUnit
- Semantic clustering
- group by topic/entity/thread
- Reasoning skeleton (form, not conclusions)
- detect: decisions, alternatives, unresolved loops, revisions, return patterns
- Conservative heuristics
- stability flags, drift flags, missing evidence flags
Important: These stages must be observable and UI-friendly (progress, estimates), but not “terminal cosplay”.
7. Storage and data model
7.1 SQLite (transactional core)
Purpose: fast local writes, sync state, metadata, event logs, permissions. Minimum tables (suggested):sources(id, type, auth_ref, scope, status)thought_units(id, source_id, uri, type, ts, author, text_ref, hash, metadata_json)threads(id, source_id, external_thread_id, topic_hint)events(id, ts, kind, payload_json, correlation_id)decisions(id, ts, title, status, confidence, provenance_json)decision_links(decision_id, thought_unit_id, relation)cache_entries(key, value_json, citations_json, ttl, created_at, provenance)skills(id, name, domain, version, enabled, checksum)
- WAL mode
- JSON1 for flexible artifacts
- content-addressed hashes to dedupe
7.2 DuckDB (analytics + batch)
Purpose: fast scans, clustering analytics, retention metrics, DS metrics.7.3 Embeddings store
Options:- local vector store (Qdrant embedded / sqlite-vss / pgvector in self-hosted)
- store embeddings in DuckDB extensions (if acceptable)
7.4 Graph store
GraphRAG needs:- entity nodes, relation edges, timestamps, source provenance
- ability to run explainable queries (“show path from A to B through evidence”)
- local: TUgraph (if embedded mode exists) or alternative graph DB
- accelerated: GPU-based graph analytics (cuGraph) for ranking/centrality, where feasible
Note: GPU acceleration should be treated as an optimization layer, not a hard dependency.
8. Orchestrator: routing policy
8.1 Inputs
- user message + mode (Chat / DS / Search)
- active context (selected card, selected sources, selected timeframe)
- cost policy (user tier, budget)
- latency policy (interactive vs background)
- safety policy (tool permissions)
8.2 Outputs
- selected execution plan: local/cache/escalate
- retrieval depth: shallow/standard/deep
- tool call plan (schemas)
- caching plan (keys, TTL)
- provenance requirements (citations mandatory or optional)
8.3 Self-knowledge checkpoint (minimum viable)
The local model must answer:- “Do I have grounded evidence?”
- “Do I have enough context from GraphRAG?”
- “Is my uncertainty too high?”
- “Is the user asking for expert-level domain correctness?”
9. Council escalation and knowledge cache
9.1 Escalation triggers
- confidence below threshold
- missing citations for required domains
- conflicts detected in local graph
- user explicitly requests “verify / deep check”
- high-stakes domains (configurable)
9.2 Council workflow
- KCG lookup first (avoid recompute)
- If miss: query Council models with task decomposition
- Curator fuses responses (agreement, evidence coverage, conflict detection)
- Produce:
- final answer
- citations + provenance bundle
- reusable artifacts (snippets, entity/relation updates)
- Write back to:
- local cache (CAG)
- local graph updates
- optional KCG write (tenancy rules)
9.3 Cost controls
- modal-based compute budgets (e.g., Modal mode expensive)
- per-user daily cap + burst allowance
- “verify only if needed” toggles
- caching aggressively on common queries
10. Skill system (LoRA patches)
10.1 What LoRA is in Membria
LoRA patches are small expert adapters that improve performance on:- domain vocab
- task formatting
- reasoning style constraints
- codebase conventions (for coding use cases)
10.2 Critical caution (do not oversell)
LoRA does not magically create intelligence. It can:- reduce hallucinations in narrow domains (by shaping priors),
- improve adherence to conventions,
- improve recall over a domain distribution.
- overfit to low-quality artifacts,
- entrench wrong patterns,
- degrade general performance.
- curated (only from verified fused artifacts),
- scoped (per domain/task),
- reversible (easy disable/rollback),
- measurable (A/B eval on held-out tasks).
10.3 How LoRAs “accumulate”
Correct framing:- Membria accumulates verified training artifacts from the Council + cache.
- Only recurring gaps with stable evaluation are candidates for LoRA updates.
11. Decision Black Box (DBB)
11.1 DBB mission
Across dozens of chats and tools, DBB extracts:- “what changed”
- “what got decided”
- “why it was decided”
- “what to check later” (outcomes)
11.2 DBB detection heuristics (minimal)
- commit-like language: “we decided”, “let’s ship”, “final”, “go with”
- change events: PR merged, config changed, dependency swapped
- repeated unresolved topics
- confidence markers: % confidence, risk statements, tradeoffs
- escalation events and Council outputs
11.3 DBB output object (example)
A Decision object should include:- title (human readable)
- timestamp (first commit and closure)
- status: proposed / pending / decided / reviewed / invalidated
- reasoning summary (structured)
- assumptions list (each with evidence)
- evidence links (thought units, PRs, docs)
- outcome hooks (what to check, when)
12. Decision Surface (DS)
12.1 Why DS is the home screen
Power users drown in context. DS answers:- “What is open?”
- “Where did I drift?”
- “What must be decided?”
- “What did we decide before, and how did it turn out?“
12.2 DS reads DBB, not chat
DS is built from:- decisions table + links
- stability flags
- unresolved loops
- precedent links
- outcome tracking state
12.3 DS cards (safer, less scary semantics)
Avoid jargon as defaults. Prefer “normal people” labels:- Open loops
- Decisions awaiting sign-off
- Assumptions that changed
- Similar past choices (precedents)
- Missing evidence / unclear
13. UI behaviors relevant to runtime
13.1 Modes
- Chat: primary interaction surface
- Decision Surface: daily dashboard derived from DBB
- Search: explainable retrieval with citations and link chains
13.2 Source handling (files, links, snippets)
Files are never “silently dumped into context”. Instead:- user adds sources into a Sources Tray
- each source has scope: include/ignore, time range, permissions
- retrieval uses citations; the model sees only retrieved chunks
13.3 Speed UX (non-negotiable)
- show partial progress (“retrieving”, “verifying”, “caching”)
- allow user to continue interacting while deep retrieval runs
- persist results into DS artifacts when relevant
14. Observability and metrics
14.1 Required telemetry (privacy-safe)
- latency by stage (route, retrieval, rank, generate, verify)
- cache hit ratio (local + KCG)
- escalation rate
- citation coverage rate
- DBB extraction rate + false positive rate
- DS engagement (cards opened, decisions reviewed)
14.2 Debug views for power users
Optional “verbose panel”:- shows plan and stages
- never exposes sensitive tokens/keys
- can be toggled per workspace
15. Security model
- explicit source authorization and revocation
- encryption at rest for local stores (optional but recommended)
- key management: OS keychain where possible
- tool permission prompts: first-use + per-scope
- sandboxed skill execution and export controls
16. Deployment modes
16.1 Cloud first
- client UI + thin local cache
- orchestrator mostly server-side
- GraphRAG and DBB can run as managed services
16.2 Self-hosted later
- local orchestrator + local stores + local SLM
- optional offline mode
- cloud used for KCG sync + Council escalation (optional)
17. Roadmap (runtime-facing)
- Cloud MVP: Chat + Sources + explainable retrieval + caching.
- DBB MVP: decision extraction from chat + minimal DS.
- Council + Curator + KCG caching loop.
- Self-hosted runtime: local SLM + local stores + local GraphRAG.
- SkillForge: curated LoRA updates + evaluation harness.
- GPU acceleration: optional graph ranking and embedding speedups.
18. Open questions
- Best graph store for embedded self-hosted mode?
- KCG tenancy: what is shared vs private by default?
- DBB false positives: best UI pattern for corrections without friction?
- LoRA lifecycle: distribution, rollback, evaluation dataset governance.