A second-pass audit surfaced three severe issues missed by the previous
review, each a sibling code path of a bug class that was only partially
fixed before:
- auth: JWT session login (jwtSession.go) registered its authenticator
even when CROSS_LOGIN_JWT_HS512_KEY was unset, leaving an empty HMAC
key. golang-jwt verifies any HS256/HS512 signature against an empty
key, allowing unauthenticated admin token forgery. Init() now refuses
to register without a key, with a defense-in-depth empty-key guard in
the keyfunc.
- repository: metric names from GraphQL ([String!]) were interpolated
raw into json_extract(footprint, "$.<name>") SQL. SQLite parses
double-quoted strings as literals, enabling SQL injection by any
authenticated user. Validate metric names against ^[a-zA-Z0-9_]+$ in
jobsMetricStatisticsHistogram and buildFloatJSONCondition.
- metricstore: cluster/host line-protocol tags flowed unvalidated into
path.Join(RootDir, cluster, host) for checkpoint/WAL files, allowing
arbitrary file write outside the checkpoint root via NATS
(unauthenticated) or POST /api/write. Reject path-traversal sequences
in DecodeLine before the tags become path components.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b57246993ec1
WAL writes during checkpoint are redundant since the binary snapshot
captures all in-memory data. Pausing eliminates channel saturation
(1.4M+ dropped messages) caused by disk I/O contention between
checkpoint writes and WAL staging. Also removes direct WAL file
deletion in checkpoint workers that raced with the staging goroutine.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 34d698f40bac
RotateWALFiles used a non-blocking send (select/default) on rotation
channels buffered at 64. With thousands of nodes and few shards, the
channel fills instantly and nearly all hosts are skipped, leaving WAL
files unrotated indefinitely.
Replace with a blocking send using a shared 2-minute deadline so the
checkpoint goroutine waits for the staging goroutine to drain the
channel instead of immediately giving up.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a1ec897216fa