Commit Graph

15 Commits

Author SHA1 Message Date
66707bbf15 Update metricstore documentation
Entire-Checkpoint: 99f20c1edd90
2026-03-29 21:38:04 +02:00
fc47b12fed fix: Pause WAL writes during binary checkpoint to prevent message drops
WAL writes during checkpoint are redundant since the binary snapshot
captures all in-memory data. Pausing eliminates channel saturation
(1.4M+ dropped messages) caused by disk I/O contention between
checkpoint writes and WAL staging. Also removes direct WAL file
deletion in checkpoint workers that raced with the staging goroutine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 34d698f40bac
2026-03-29 11:13:39 +02:00
937984d11f fix: WAL rotation skipped for all nodes due to non-blocking send on small channel
RotateWALFiles used a non-blocking send (select/default) on rotation
channels buffered at 64. With thousands of nodes and few shards, the
channel fills instantly and nearly all hosts are skipped, leaving WAL
files unrotated indefinitely.

Replace with a blocking send using a shared 2-minute deadline so the
checkpoint goroutine waits for the staging goroutine to drain the
channel instead of immediately giving up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a1ec897216fa
2026-03-28 06:55:45 +01:00
cc3d03bb5b fix: Unbound growth of wal files in case of checkpointing error
Entire-Checkpoint: 95a89a7127c5
2026-03-28 06:26:21 +01:00
ac0a4cc39a Increase shutdown timeouts and WAL flush interval
Entire-Checkpoint: 94ee2fb97830
2026-03-27 09:56:34 +01:00
97d65a9e5c Fix bugs in WAL journal pipeline
Entire-Checkpoint: 8fe0de4e6ac2
2026-03-26 07:25:36 +01:00
e759810051 Add shutdown timings. Do not drain WAL buffers on shutdown
Entire-Checkpoint: d4b497002f54
2026-03-26 07:02:37 +01:00
6f7dda53ee Cleanup
Entire-Checkpoint: ed68d32218ac
2026-03-24 07:03:46 +01:00
0325d9e866 fix: Increase throughput for WAL writers
Entire-Checkpoint: ddd40d290c56
2026-03-24 06:53:12 +01:00
bf1a8a174e fix: Shard WAL consumer for higher throughput
Entire-Checkpoint: e583b7b11439
2026-03-18 06:32:14 +01:00
50aed595cf fix: metricstore NATS contention
Entire-Checkpoint: 7e68050cab59
2026-03-18 06:14:15 +01:00
b214e1755a Add buffered I/O to WAL writes and fix MemoryCap comment
WAL writes now go through bufio.Writer instead of raw syscalls per record,
reducing I/O overhead. Buffers are flushed on rotate, drain, and shutdown.
Fixed misleading MemoryCap comment ("Max bytes" → "Max memory in GB").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: b38dc35e5334
2026-03-13 09:05:24 +01:00
Aditya Ujeniya
a243e17499 Update to shutdown worker for WAL checkpointing mode 2026-03-02 15:27:06 +01:00
a418abc7d5 Run go fix 2026-02-27 14:40:26 +01:00
ca0f9a42c7 Introduce metric store binary checkpoints with write ahead log 2026-02-26 10:08:40 +01:00