fix: Fix metricstore memory explosion from broken emergency free and batch aborts

- Fix MemoryUsageTracker: remove premature bufferPool.Clear() that prevented
  mem.Alloc from decreasing, replace broken ForceFree loop (100 iterations
  with no GC) with progressive time-based Free at 75%/50%/25% retention,
  add bufferPool.Clear()+GC between steps so memory stats update correctly
- Enable debug.FreeOSMemory() after emergency freeing to return memory to OS
- Add adaptive ticker: 30s checks when memory >80% of cap, normal otherwise
- Reduce default memory check interval from 1h to 5min
- Don't abort entire NATS batch on single write error (out-of-order timestamp),
  log warning and continue processing remaining lines
- Prune empty levels from tree after free() to reduce overhead
- Include buffer struct overhead in sizeInBytes() for more accurate reporting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 7ce28627fc1d
This commit is contained in:
2026-03-13 07:57:35 +01:00
parent 126f65879a
commit 8234ad3126
5 changed files with 71 additions and 31 deletions

View File

@@ -359,7 +359,8 @@ func DecodeLine(dec *lineprotocol.Decoder,
}
if err := ms.WriteToLevel(lvl, st.selector, time, []Metric{metric}); err != nil {
return err
cclog.Warnf("write error for host %s metric %s at ts %d: %s", host, string(st.metricBuf), time, err.Error())
continue
}
}
return nil