Skip to main content

Architecture

LiteJoin packs an entire stream processing engine into a single Go binary. There are no external dependencies — no message broker, no database server, no coordinator. Everything runs in-process, backed by SQLite.

Engine Overview

┌──────────────────────────────────────────────────────────────┐
│                        LiteJoin Engine                        │
│                                                              │
│  ┌─────────┐ ┌──────────┐ ┌──────────┐                      │
│  │   API   │ │   HTTP   │ │  Kafka   │    Sources            │
│  │ Source  │ │  Source   │ │  Source  │                       │
│  └────┬────┘ └────┬─────┘ └────┬─────┘                      │
│       │           │            │                             │
│       └───────────┴──────┬─────┘                             │
│                          ▼                                   │
│                     ┌─────────┐                              │
│                     │  msgCh  │    Message Channel            │
│                     └────┬────┘                              │
│                          ▼                                   │
│                     ┌─────────┐                              │
│                     │ Writer  │    Batched Writes             │
│                     └────┬────┘                              │
│                          ▼                                   │
│              ┌───────────────────────┐                       │
│              │  Sharded SQLite (WAL) │    Storage             │
│              │  shard_0 ... shard_N  │                       │
│              └───────────┬───────────┘                       │
│                          │                                   │
│              ┌───────────┼───────────┐                       │
│              ▼           ▼           ▼                       │
│         ┌─────────┐ ┌─────────┐ ┌──────────┐                │
│         │ Joiner  │ │Windower │ │Retention │                 │
│         └────┬────┘ └────┬────┘ │ Cleaner  │                 │
│              │           │      └──────────┘                 │
│              └─────┬─────┘                                   │
│                    ▼                                         │
│         ┌──────────────────────┐                             │
│         │       Sinks          │                             │
│         │  HTTP │ Kafka │ SSE  │                             │
│         └──────────────────────┘                             │
└──────────────────────────────────────────────────────────────┘

SQLite Storage

LiteJoin uses SQLite in WAL (Write-Ahead Logging) mode as its primary data store. Every topic maps to a SQLite table with key, payload, and timestamp columns.

Sharding

Data is distributed across multiple SQLite databases using consistent hashing (FNV) on the message key:
shard_index = fnv32(message.key) % shard_count
The default shard count is 8. Each shard is a separate .db file:
data/
├── shard_0.db
├── shard_1.db
├── ...
└── shard_7.db

Write Path

A single writer connection per shard ensures zero write contention. Messages are batched by the Writer component and flushed every flush_interval (default: 10ms) or when batch_size (default: 1000) is reached. Writes use SQLite’s INSERT OR REPLACE (upsert), meaning the latest value for a key always wins.

Read Path

A configurable pool of reader connections (default: 4 per shard) handles queries from the joiner, windower, and snapshot API. SQLite’s WAL mode provides snapshot isolation — readers see a consistent state even while writes are in progress.

Retention

A background goroutine periodically deletes data older than the configured retention duration:
retention:
  duration: 24h
  clean_interval: 1m
This keeps SQLite databases small and queries fast. For longer-term data, see Tiered Storage.

Tiered Storage (Optional)

When enabled, LiteJoin compacts expired data into Parquet files before deleting from SQLite. These files are queryable via an embedded DuckDB instance and can optionally be uploaded to cloud object storage (S3, GCS, Azure Blob):
TierStorageLatencyRetention
HotSQLite (sharded)Sub-millisecondConfigured TTL (e.g., 1h–24h)
WarmLocal Parquet filesMillisecondsConfigurable (default: 7 days)
ColdCloud object storageSecondsIndefinite
The hot path has zero overhead from tiered storage — DuckDB is only initialized when the feature is enabled and only used for historical queries.

Replication (Optional)

LiteJoin integrates with Litestream for continuous backup of SQLite shards to cloud object storage. This enables:
  • Disaster recovery — restore shards from S3/GCS on startup
  • Ephemeral deployments — containers can boot, restore state, and resume processing
  • Point-in-time recovery — roll back to any point within the retention window
See Replication for setup details.

At-Least-Once Delivery

When enabled, failed sink deliveries are captured in a dead-letter queue (DLQ) backed by a dedicated SQLite database. A background retry worker redelivers messages with exponential backoff:
Sink fails → Enqueue to DLQ → Retry with backoff → Ack on success
See Delivery Guarantees for configuration.

LiteJoin Studio

LiteJoin Studio is a Wails desktop application (Go backend + React/Vite frontend) that embeds the LiteJoin engine directly. It provides:
  • Visual pipeline builder — add sources, configure joins and windows via UI
  • Live SQL editor — write queries with autocomplete and see results stream in real-time
  • One-click deploy — export a production-ready litejoin.yaml configuration
Studio communicates with the embedded engine via Wails bindings — no network overhead, no separate server process.