Mu TSDB PDD: Gorilla-inspired Features (Design Intake)
Mu TSDB PDD: Gorilla-inspired Features (Design Intake)
Date: 2026-02-25
Context
Mu is a lean TSDB built for three hard goals:
- Blazing writes
- Blazing queries
- Never drop a metric
This document captures implementation ideas worth "ripping" from Meta/Facebook's Gorilla TSDB design and paper, adapted to Mu's goals and constraints.
Primary reference: "Gorilla: A Fast, Scalable, In-Memory Time Series Database" (Teller et al.). https://www.vldb.org/pvldb/vol8/p1816-teller.pdf
Non-goals (for this PDD)
- Prometheus/Influx/OTEL wire compatibility
- Full distributed control-plane design (Paxos shard assignment, etc.) in the near term
- Exact reproduction of Gorilla's architecture (Mu can diverge where it helps correctness or simplicity)
Key decision: "never drop a metric" vs Gorilla availability tradeoffs
Gorilla explicitly tolerates dropping a small amount of recent data under some failure/overload situations to preserve availability. Mu's "never drop a metric" goal implies different defaults:
- The default write contract should be "accepted only after durable enough to survive process crash."
- Under overload, Mu should prefer backpressure (block) over dropping or rejecting with 5xx.
- If we ever add a "best-effort" write mode, it must be explicit and opt-in.
Design ideas to adopt (with Mu adaptation and phase)
1) Delta-of-delta timestamp bitpacking (per-series)
What Gorilla did:
- Encode timestamps as delta-of-delta with variable-length bitpacking inside a compressed block.
Why we want it:
- Huge write amplification reduction.
- Faster query scans due to less memory bandwidth.
Mu adaptation:
- Implement a per-series encoder for
(ts_ms)using Gorilla-style delta-of-delta. - Keep decoder streaming-friendly so aggregates can consume without materializing points.
Phase:
- v1 storage format.
2) XOR float value compression (per-series)
What Gorilla did:
- Encode values as XOR vs previous value, with leading/trailing zero compression and a "reuse previous window" fast path.
Why we want it:
- This is the core "2 bytes/point-ish" win for float64-ish telemetry series.
Mu adaptation:
- Store values as f64 (or f32 if explicitly configured) and apply XOR encoding in block format.
- Add support for integer value types later if Mu adds typed metrics.
Phase:
- v1 storage format.
3) Time-partitioned compressed blocks (2h as a starting point)
What Gorilla did:
- Use fixed-ish time blocks (they mention 2 hours) as a good compression vs decode tradeoff.
Why we want it:
- Bounded memory overhead per series.
- Efficient eviction/retention boundaries.
Mu adaptation:
- Use time-partitioned blocks for each series.
- The block duration should be configurable (start with 2h) and can be shorter for high-cardinality series.
Phase:
- v1 storage format.
4) Open/closed block lifecycle (append-only open, immutable closed)
What Gorilla did:
- Keep one open block per series for appends.
- When full, close it and treat closed blocks as immutable.
Why we want it:
- Writes are fast and localized.
- Closed blocks become cache-friendly and can be shared across threads safely.
Mu adaptation:
- Model
OpenBlockwith an encoder and small mutable buffers. - Model
ClosedBlockas immutable byte slices (or slab-allocated arenas).
Phase:
- v1.
5) Serve queries by copying compressed blocks (avoid server-side decompression)
What Gorilla did:
- Return compressed blocks for the query range; decompression happens in a downstream tier.
Why we want it:
- Dramatically reduces CPU and p99 under large fanout queries.
- Shifts work toward a caller or a dedicated query tier.
Mu adaptation:
- Introduce a query mode that returns compressed blocks or block summaries.
- Keep current JSON results for MVP; add "binary blocks" response behind a new endpoint or content-type.
Phase:
- v1 for block format and transport.
- v2 for a proper "query tier" split if desired.
6) Slab allocation to control fragmentation
What Gorilla did:
- Copy closed blocks into large slab allocations to avoid heap fragmentation.
Why we want it:
- High-cardinality series plus lots of small allocations will crush allocator performance and RSS stability.
Mu adaptation:
- Use arena/slab allocation for
ClosedBlockbytes. - Keep
OpenBlockon the regular heap (or small arenas) and copy-on-close.
Phase:
- v1.
7) Hybrid in-memory index (paged scan + O(1) lookup) with tombstones/free-pool
What Gorilla did:
- Maintain an array/vector for scan-friendly iteration over series pointers.
- Maintain a hash map for name->series pointer lookup.
- Use tombstones and a free-pool to reuse slots on delete.
Why we want it:
- Query planning and "scan lots of series" workloads need linear, cache-friendly iteration.
- Direct ingest lookups need constant time.
Mu adaptation:
- Split "series catalog" from "series data."
- Maintain:
Vec<SeriesRef>for scansHashMap<SeriesKey, SeriesId>for lookup- Free-list for recycling
SeriesId
Phase:
- MVP-next for the catalog split.
- v1 for tombstones/free-pool and tighter memory behavior.
8) Locking strategy optimized for write-heavy workloads
What Gorilla did:
- Use short-lived RW spinlocks for the index structures.
- Use tiny per-series locks so writes don't contend globally.
- Copy pointers / page lists out of critical sections.
Why we want it:
- Mu currently risks query/write contention if a query holds a large read lock while aggregating.
Mu adaptation:
- Introduce sharded locks (N-way) for the series catalog and block stores.
- Store per-series block lists behind a small lock, or use append-only + atomic swap techniques.
- Ensure the query path can snapshot references quickly and release locks before aggregating.
Phase:
- MVP-next (this is the highest leverage for "never drop" under query load).
9) Per-shard persistence format (key dictionary + id-tagged append log + checkpoints)
What Gorilla did:
- Per-shard directory.
- A key list mapping string->int id.
- An append-only log interleaving series updates tagged by 32-bit id.
- Periodic complete-block files and checkpoint files for crash recovery validation.
Why we want it:
- Better replay time than scanning a monolithic WAL of JSON lines.
- Smaller on-disk footprint and faster warm start.
Mu adaptation:
- Evolve from JSONL WAL to a binary format:
keys.binorkeys.jsonmapping series key -> idwal.binid-tagged record streamcheckpoint.binwith last-good offsets and CRCs- Optional
blocks/files for closed blocks and compaction outputs
- Keep "idempotent request_id" semantics independent of the storage encoding.
Phase:
- v1 for binary WAL + checkpointing.
- v2 for compaction and block files.
10) Ops model: dual region redundancy, shard assignment, client buffering, partial results
What Gorilla did:
- Two independent regional instances and write replication to both.
- Paxos-based shard assignment.
- Clients buffer during shard moves.
- Partial results semantics in queries.
Why we want it:
- This is the path to "never drop a metric" at infrastructure scale.
Mu adaptation:
- Define the contracts now even if we do not implement the full system:
- Replication protocol between peers.
- Shard ownership and reassignment model.
- Query response includes completeness metadata.
Phase:
- v2+ (design first, implement later).
Additional Gorilla-derived features to consider
11) Hot-window cache with long-term source of truth
What Gorilla did:
- Keep the most recent ~26 hours in memory.
- Store long-term data elsewhere (HBase in their case).