WIP: Save agent roles integration work before CHORUS rebrand

- Agent roles and coordination features
- Chat API integration testing
- New configuration and workspace management

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
anthonyrawlins
2025-08-01 02:21:11 +10:00
parent 81b473d48f
commit 5978a0b8f5
3713 changed files with 1103925 additions and 59 deletions

1
vendor/github.com/raulk/go-watchdog/.dockerignore generated vendored Normal file
View File

@@ -0,0 +1 @@
Makefile

22
vendor/github.com/raulk/go-watchdog/Dockerfile.dlv generated vendored Normal file
View File

@@ -0,0 +1,22 @@
## This Dockerfile compiles the watchdog with delve support. It enables the tests
## to be debugged inside a container.
##
## Run with:
## docker run --memory=64MiB --memory-swap=64MiB -p 2345:2345 <image> \
## --listen=:2345 --headless=true --log=true \
## --log-output=debugger,debuglineerr,gdbwire,lldbout,rpc \
## --accept-multiclient --api-version=2 exec /root/watchdog.test
##
FROM golang:1.15.5
WORKDIR /watchdog
COPY . .
RUN CGO_ENABLED=0 go get -ldflags "-s -w -extldflags '-static'" github.com/go-delve/delve/cmd/dlv
RUN CGO_ENABLED=0 go test -gcflags "all=-N -l" -c -o ./watchdog.test
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/bin/dlv /dlv
COPY --from=0 /watchdog/watchdog.test .
ENTRYPOINT [ "/dlv" ]
EXPOSE 2345

5
vendor/github.com/raulk/go-watchdog/LICENSE-APACHE generated vendored Normal file
View File

@@ -0,0 +1,5 @@
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

19
vendor/github.com/raulk/go-watchdog/LICENSE-MIT generated vendored Normal file
View File

@@ -0,0 +1,19 @@
The MIT License (MIT)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

36
vendor/github.com/raulk/go-watchdog/Makefile generated vendored Normal file
View File

@@ -0,0 +1,36 @@
SHELL = /bin/bash
.PHONY: test
# these tests run in isolation by calling go test -run=... or the equivalent.
ISOLATED_TESTS +=
ifdef CI
ISOLATED_TESTS = TestControl_Isolated \
TestSystemDriven_Isolated \
TestHeapDriven_Isolated
else
ISOLATED_TESTS = TestControl_Isolated \
TestSystemDriven_Isolated \
TestHeapDriven_Isolated \
TestCgroupsDriven_Create_Isolated \
TestCgroupsDriven_Docker_Isolated
endif
test: test-binary test-docker
test-binary:
go test -v ./... # run all the non-isolated tests.
# foreach does not actually execute each iteration; it expands the text, and it's executed all at once
# that's why we use && true, to shorcircuit if a test fails.
$(foreach name,$(ISOLATED_TESTS),TEST_ISOLATED=1 go test -v -test.run=$(name) ./... && ) true
test-docker: docker
docker run --memory=32MiB --memory-swap=32MiB -e TEST_DOCKER_MEMLIMIT=33554432 raulk/watchdog:latest
$(foreach name,$(ISOLATED_TESTS),docker run \
--memory=32MiB --memory-swap=32MiB \
-e TEST_ISOLATED=1 \
-e TEST_DOCKER_MEMLIMIT=33554432 \
raulk/watchdog:latest /root/watchdog.test -test.v -test.run=$(name) ./... && ) true
docker:
docker build -f ./Dockerfile.test -t raulk/watchdog:latest .

88
vendor/github.com/raulk/go-watchdog/README.md generated vendored Normal file
View File

@@ -0,0 +1,88 @@
# Go memory watchdog
> 🐺 A library to curb OOMs by running Go GC according to a user-defined policy.
[![godocs](https://img.shields.io/badge/godoc-reference-5272B4.svg?style=flat-square)](https://godoc.org/github.com/raulk/go-watchdog)
[![build status](https://circleci.com/gh/raulk/go-watchdog.svg?style=svg)](https://circleci.com/gh/raulk/go-watchdog)
Package watchdog runs a singleton memory watchdog in the process, which
watches memory utilization and forces Go GC in accordance with a
user-defined policy.
There three kinds of watchdogs:
1. heap-driven (`watchdog.HeapDriven()`): applies a heap limit, adjusting GOGC
dynamically in accordance with the policy.
2. system-driven (`watchdog.SystemDriven()`): applies a limit to the total
system memory used, obtaining the current usage through elastic/go-sigar.
3. cgroups-driven (`watchdog.CgroupDriven()`): discovers the memory limit from
the cgroup of the process (derived from /proc/self/cgroup), or from the
root cgroup path if the PID == 1 (which indicates that the process is
running in a container). It uses the cgroup stats to obtain the
current usage.
The watchdog's behaviour is controlled by the policy, a pluggable function
that determines when to trigger GC based on the current utilization. This
library ships with two policies:
1. watermarks policy (`watchdog.NewWatermarkPolicy()`): runs GC at configured
watermarks of memory utilisation.
2. adaptive policy (`watchdog.NewAdaptivePolicy()`): runs GC when the current
usage surpasses a dynamically-set threshold.
You can easily write a custom policy tailored to the allocation patterns of
your program.
## Recommended way to set up the watchdog
The recommended way to set up the watchdog is as follows, in descending order
of precedence. This logic assumes that the library supports setting a heap
limit through an environment variable (e.g. MYAPP_HEAP_MAX) or config key.
1. If heap limit is set and legal, initialize a heap-driven watchdog.
2. Otherwise, try to use the cgroup-driven watchdog. If it succeeds, return.
3. Otherwise, try to initialize a system-driven watchdog. If it succeeds, return.
4. Watchdog initialization failed. Log a warning to inform the user that
they're flying solo.
## Running the tests
Given the low-level nature of this component, some tests need to run in
isolation, so that they don't carry over Go runtime metrics. For completeness,
this module uses a Docker image for testing, so we can simulate cgroup memory
limits.
The test execution and docker builds have been conveniently packaged in a
Makefile. Run with:
```shell
$ make
```
## Why is this even needed?
The garbage collector that ships with the go runtime is pretty good in some
regards (low-latency, negligible no stop-the-world), but it's insatisfactory in
a number of situations that yield ill-fated outcomes:
1. it is incapable of dealing with bursty/spiky allocations efficiently;
depending on the workload, the program may OOM as a consequence of not
scheduling GC in a timely manner.
2. part of the above is due to the fact that go doesn't concern itself with any
limits. To date, it is not possible to set a maximum heap size.
2. its default policy of scheduling GC when the heap doubles, coupled with its
ignorance of system or process limits, can easily cause it to OOM.
For more information, check out these GitHub issues:
* https://github.com/golang/go/issues/42805
* https://github.com/golang/go/issues/42430
* https://github.com/golang/go/issues/14735
* https://github.com/golang/go/issues/16843
* https://github.com/golang/go/issues/10064
* https://github.com/golang/go/issues/9849
## License
Dual-licensed: [MIT](./LICENSE-MIT), [Apache Software License v2](./LICENSE-APACHE), by way of the
[Permissive License Stack](https://protocol.ai/blog/announcing-the-permissive-license-stack/).

31
vendor/github.com/raulk/go-watchdog/adaptive.go generated vendored Normal file
View File

@@ -0,0 +1,31 @@
package watchdog
// NewAdaptivePolicy creates a policy that forces GC when the usage surpasses a
// user-configured percentage (factor) of the available memory.
//
// This policy recalculates the next target as usage+(limit-usage)*factor.
func NewAdaptivePolicy(factor float64) PolicyCtor {
return func(limit uint64) (Policy, error) {
return &adaptivePolicy{
factor: factor,
limit: limit,
}, nil
}
}
type adaptivePolicy struct {
factor float64
limit uint64
}
var _ Policy = (*adaptivePolicy)(nil)
func (p *adaptivePolicy) Evaluate(_ UtilizationType, used uint64) (next uint64) {
if used >= p.limit {
return used
}
available := float64(p.limit) - float64(used)
next = used + uint64(available*p.factor)
return next
}

40
vendor/github.com/raulk/go-watchdog/doc.go generated vendored Normal file
View File

@@ -0,0 +1,40 @@
// Package watchdog runs a singleton memory watchdog in the process, which
// watches memory utilization and forces Go GC in accordance with a
// user-defined policy.
//
// There three kinds of watchdogs:
//
// 1. heap-driven (watchdog.HeapDriven()): applies a heap limit, adjusting GOGC
// dynamically in accordance with the policy.
// 2. system-driven (watchdog.SystemDriven()): applies a limit to the total
// system memory used, obtaining the current usage through elastic/go-sigar.
// 3. cgroups-driven (watchdog.CgroupDriven()): discovers the memory limit from
// the cgroup of the process (derived from /proc/self/cgroup), or from the
// root cgroup path if the PID == 1 (which indicates that the process is
// running in a container). It uses the cgroup stats to obtain the
// current usage.
//
// The watchdog's behaviour is controlled by the policy, a pluggable function
// that determines when to trigger GC based on the current utilization. This
// library ships with two policies:
//
// 1. watermarks policy (watchdog.NewWatermarkPolicy()): runs GC at configured
// watermarks of memory utilisation.
// 2. adaptive policy (watchdog.NewAdaptivePolicy()): runs GC when the current
// usage surpasses a dynamically-set threshold.
//
// You can easily write a custom policy tailored to the allocation patterns of
// your program.
//
// Recommended way to set up the watchdog
//
// The recommended way to set up the watchdog is as follows, in descending order
// of precedence. This logic assumes that the library supports setting a heap
// limit through an environment variable (e.g. MYAPP_HEAP_MAX) or config key.
//
// 1. If heap limit is set and legal, initialize a heap-driven watchdog.
// 2. Otherwise, try to use the cgroup-driven watchdog. If it succeeds, return.
// 3. Otherwise, try to initialize a system-driven watchdog. If it succeeds, return.
// 4. Watchdog initialization failed. Log a warning to inform the user that
// they're flying solo.
package watchdog

38
vendor/github.com/raulk/go-watchdog/log.go generated vendored Normal file
View File

@@ -0,0 +1,38 @@
package watchdog
import "log"
// logger is an interface to be implemented by custom loggers.
type logger interface {
Debugf(template string, args ...interface{})
Infof(template string, args ...interface{})
Warnf(template string, args ...interface{})
Errorf(template string, args ...interface{})
}
var _ logger = (*stdlog)(nil)
// stdlog is a logger that proxies to a standard log.logger.
type stdlog struct {
log *log.Logger
debug bool
}
func (s *stdlog) Debugf(template string, args ...interface{}) {
if !s.debug {
return
}
s.log.Printf(template, args...)
}
func (s *stdlog) Infof(template string, args ...interface{}) {
s.log.Printf(template, args...)
}
func (s *stdlog) Warnf(template string, args ...interface{}) {
s.log.Printf(template, args...)
}
func (s *stdlog) Errorf(template string, args ...interface{}) {
s.log.Printf(template, args...)
}

80
vendor/github.com/raulk/go-watchdog/notification.go generated vendored Normal file
View File

@@ -0,0 +1,80 @@
package watchdog
import "sync"
var (
gcNotifeeMutex sync.Mutex
gcNotifees []notifeeEntry
forcedGCNotifeeMutex sync.Mutex
forcedGCNotifees []notifeeEntry
)
// RegisterPostGCNotifee registers a function that is called every time a GC has happened,
// both GC runs triggered by the Go runtime and by watchdog.
// The unregister function returned can be used to unregister this notifee.
func RegisterPostGCNotifee(f func()) (unregister func()) {
gcNotifeeMutex.Lock()
defer gcNotifeeMutex.Unlock()
var id int
if len(gcNotifees) > 0 {
id = gcNotifees[len(gcNotifees)-1].id + 1
}
gcNotifees = append(gcNotifees, notifeeEntry{id: id, f: f})
return func() {
gcNotifeeMutex.Lock()
defer gcNotifeeMutex.Unlock()
for i, entry := range gcNotifees {
if entry.id == id {
gcNotifees = append(gcNotifees[:i], gcNotifees[i+1:]...)
}
}
}
}
func notifyGC() {
if NotifyGC != nil {
NotifyGC()
}
gcNotifeeMutex.Lock()
defer gcNotifeeMutex.Unlock()
for _, entry := range gcNotifees {
entry.f()
}
}
// RegisterPreGCNotifee registers a function that is called before watchdog triggers a GC run.
// It is ONLY called when watchdog triggers a GC run, not when the Go runtime triggers it.
// The unregister function returned can be used to unregister this notifee.
func RegisterPreGCNotifee(f func()) (unregister func()) {
forcedGCNotifeeMutex.Lock()
defer forcedGCNotifeeMutex.Unlock()
var id int
if len(forcedGCNotifees) > 0 {
id = forcedGCNotifees[len(forcedGCNotifees)-1].id + 1
}
forcedGCNotifees = append(forcedGCNotifees, notifeeEntry{id: id, f: f})
return func() {
forcedGCNotifeeMutex.Lock()
defer forcedGCNotifeeMutex.Unlock()
for i, entry := range forcedGCNotifees {
if entry.id == id {
forcedGCNotifees = append(forcedGCNotifees[:i], forcedGCNotifees[i+1:]...)
}
}
}
}
func notifyForcedGC() {
forcedGCNotifeeMutex.Lock()
defer forcedGCNotifeeMutex.Unlock()
for _, entry := range forcedGCNotifees {
entry.f()
}
}

518
vendor/github.com/raulk/go-watchdog/watchdog.go generated vendored Normal file
View File

@@ -0,0 +1,518 @@
package watchdog
import (
"errors"
"fmt"
"log"
"math"
"os"
"path/filepath"
"runtime"
"runtime/debug"
"runtime/pprof"
"sync"
"time"
"github.com/elastic/gosigar"
"github.com/benbjohnson/clock"
)
// ErrNotSupported is returned when the watchdog does not support the requested
// run mode in the current OS/arch.
var ErrNotSupported = errors.New("watchdog run mode not supported")
// PolicyTempDisabled is a marker value for policies to signal that the policy
// is temporarily disabled. Use it when all hope is lost to turn around from
// significant memory pressure (such as when above an "extreme" watermark).
const PolicyTempDisabled uint64 = math.MaxUint64
// The watchdog is designed to be used as a singleton; global vars are OK for
// that reason.
var (
// Logger is the logger to use. If nil, it will default to a logger that
// proxies to a standard logger using the "[watchdog]" prefix.
Logger logger = &stdlog{log: log.New(log.Writer(), "[watchdog] ", log.LstdFlags|log.Lmsgprefix)}
// Clock can be used to inject a mock clock for testing.
Clock = clock.New()
// ForcedGCFunc specifies the function to call when forced GC is necessary.
// Its default value is runtime.GC, but it can be set to debug.FreeOSMemory
// to force the release of memory to the OS.
ForcedGCFunc = runtime.GC
// NotifyGC, if non-nil, will be called when a GC has happened.
// Deprecated: use RegisterPostGCNotifee instead.
NotifyGC func() = func() {}
// HeapProfileThreshold sets the utilization threshold that will trigger a
// heap profile to be taken automatically. A zero value disables this feature.
// By default, it is disabled.
HeapProfileThreshold float64
// HeapProfileMaxCaptures sets the maximum amount of heap profiles a process will generate.
// This limits the amount of episodes that will be captured, in case the
// utilization climbs repeatedly over the threshold. By default, it is 10.
HeapProfileMaxCaptures = uint(10)
// HeapProfileDir is the directory where the watchdog will write the heap profile.
// It will be created if it doesn't exist upon initialization. An error when
// creating the dir will not prevent heapdog initialization; it will just
// disable the heap profile capture feature. If zero-valued, the feature is
// disabled.
//
// HeapProfiles will be written to path <HeapProfileDir>/<RFC3339Nano formatted timestamp>.heap.
HeapProfileDir string
)
var (
// ReadMemStats stops the world. But as of go1.9, it should only
// take ~25µs to complete.
//
// Before go1.15, calls to ReadMemStats during an ongoing GC would
// block due to the worldsema lock. As of go1.15, this was optimized
// and the runtime holds on to worldsema less during GC (only during
// sweep termination and mark termination).
//
// For users using go1.14 and earlier, if this call happens during
// GC, it will just block for longer until serviced, but it will not
// take longer in itself. No harm done.
//
// Actual benchmarks
// -----------------
//
// In Go 1.15.5, ReadMem with no ongoing GC takes ~27µs in a MBP 16
// i9 busy with another million things. During GC, it takes an
// average of less than 175µs per op.
//
// goos: darwin
// goarch: amd64
// pkg: github.com/filecoin-project/lotus/api
// BenchmarkReadMemStats-16 44530 27523 ns/op
// BenchmarkReadMemStats-16 43743 26879 ns/op
// BenchmarkReadMemStats-16 45627 26791 ns/op
// BenchmarkReadMemStats-16 44538 26219 ns/op
// BenchmarkReadMemStats-16 44958 26757 ns/op
// BenchmarkReadMemStatsWithGCContention-16 10 183733 p50-ns 211859 p90-ns 211859 p99-ns
// BenchmarkReadMemStatsWithGCContention-16 7 198765 p50-ns 314873 p90-ns 314873 p99-ns
// BenchmarkReadMemStatsWithGCContention-16 10 195151 p50-ns 311408 p90-ns 311408 p99-ns
// BenchmarkReadMemStatsWithGCContention-16 10 217279 p50-ns 295308 p90-ns 295308 p99-ns
// BenchmarkReadMemStatsWithGCContention-16 10 167054 p50-ns 327072 p90-ns 327072 p99-ns
// PASS
//
// See: https://github.com/golang/go/issues/19812
// See: https://github.com/prometheus/client_golang/issues/403
memstatsFn = runtime.ReadMemStats
sysmemFn = (*gosigar.Mem).Get
)
type notifeeEntry struct {
id int
f func()
}
var (
// ErrAlreadyStarted is returned when the user tries to start the watchdog more than once.
ErrAlreadyStarted = fmt.Errorf("singleton memory watchdog was already started")
)
const (
// stateUnstarted represents an unstarted state.
stateUnstarted int32 = iota
// stateRunning represents an operational state.
stateRunning
)
// _watchdog is a global singleton watchdog.
var _watchdog struct {
lk sync.Mutex
state int32
scope UtilizationType
hpleft uint // tracks the amount of heap profiles left.
hpcurr bool // tracks whether a heap profile has already been taken for this episode.
closing chan struct{}
wg sync.WaitGroup
}
// UtilizationType is the utilization metric in use.
type UtilizationType int
const (
// UtilizationSystem specifies that the policy compares against actual used
// system memory.
UtilizationSystem UtilizationType = iota
// UtilizationProcess specifies that the watchdog is using process limits.
UtilizationProcess
// UtilizationHeap specifies that the policy compares against heap used.
UtilizationHeap
)
// PolicyCtor is a policy constructor.
type PolicyCtor func(limit uint64) (Policy, error)
// Policy is polled by the watchdog to determine the next utilisation at which
// a GC should be forced.
type Policy interface {
// Evaluate determines when the next GC should take place. It receives the
// current usage, and it returns the next usage at which to trigger GC.
Evaluate(scope UtilizationType, used uint64) (next uint64)
}
// HeapDriven starts a singleton heap-driven watchdog, which adjusts GOGC
// dynamically after every GC, to honour the policy requirements.
//
// Providing a zero-valued limit will error. A minimum GOGC value is required,
// so as to avoid overscheduling GC, and overfitting to a specific target.
func HeapDriven(limit uint64, minGOGC int, policyCtor PolicyCtor) (err error, stopFn func()) {
if limit == 0 {
return fmt.Errorf("cannot use zero limit for heap-driven watchdog"), nil
}
policy, err := policyCtor(limit)
if err != nil {
return fmt.Errorf("failed to construct policy with limit %d: %w", limit, err), nil
}
if err := start(UtilizationHeap); err != nil {
return err, nil
}
gcTriggered := make(chan struct{}, 16)
setupGCSentinel(gcTriggered)
_watchdog.wg.Add(1)
go func() {
defer _watchdog.wg.Done()
defer wdrecover() // recover from panics.
// get the initial effective GOGC; guess it's 100 (default), and restore
// it to whatever it actually was. This works because SetGCPercent
// returns the previous value.
originalGOGC := debug.SetGCPercent(100)
debug.SetGCPercent(originalGOGC)
currGOGC := originalGOGC
var memstats runtime.MemStats
for {
select {
case <-gcTriggered:
notifyGC()
case <-_watchdog.closing:
return
}
// recompute the next trigger.
memstatsFn(&memstats)
maybeCaptureHeapProfile(memstats.HeapAlloc, limit)
// heapMarked is the amount of heap that was marked as live by GC.
// it is inferred from our current GOGC and the new target picked.
//
// this accurately represents
heapMarked := uint64(float64(memstats.NextGC) / (1 + float64(currGOGC)/100))
if heapMarked == 0 {
// this shouldn't happen, but just in case; avoiding a div by 0.
Logger.Warnf("heap-driven watchdog: inferred zero heap marked; skipping evaluation")
continue
}
// evaluate the policy.
next := policy.Evaluate(UtilizationHeap, memstats.HeapAlloc)
// calculate how much to set GOGC to honour the next trigger point.
// next=PolicyTempDisabled value would make currGOGC extremely high,
// greater than originalGOGC, and therefore we'd restore originalGOGC.
currGOGC = int(((float64(next) / float64(heapMarked)) - float64(1)) * 100)
if currGOGC >= originalGOGC {
Logger.Debugf("heap watchdog: requested GOGC percent higher than default; capping at default; requested: %d; default: %d", currGOGC, originalGOGC)
currGOGC = originalGOGC
} else {
if currGOGC < minGOGC {
currGOGC = minGOGC // cap GOGC to avoid overscheduling.
}
Logger.Debugf("heap watchdog: setting GOGC percent: %d", currGOGC)
}
debug.SetGCPercent(currGOGC)
memstatsFn(&memstats)
Logger.Infof("gc finished; heap watchdog stats: heap_alloc: %d, heap_marked: %d, next_gc: %d, policy_next_gc: %d, gogc: %d",
memstats.HeapAlloc, heapMarked, memstats.NextGC, next, currGOGC)
}
}()
return nil, stop
}
// SystemDriven starts a singleton system-driven watchdog.
//
// The system-driven watchdog keeps a threshold, above which GC will be forced.
// The watchdog polls the system utilization at the specified frequency. When
// the actual utilization exceeds the threshold, a GC is forced.
//
// This threshold is calculated by querying the policy every time that GC runs,
// either triggered by the runtime, or forced by us.
func SystemDriven(limit uint64, frequency time.Duration, policyCtor PolicyCtor) (err error, stopFn func()) {
if limit == 0 {
var sysmem gosigar.Mem
if err := sysmemFn(&sysmem); err != nil {
return fmt.Errorf("failed to get system memory stats: %w", err), nil
}
limit = sysmem.Total
}
policy, err := policyCtor(limit)
if err != nil {
return fmt.Errorf("failed to construct policy with limit %d: %w", limit, err), nil
}
if err := start(UtilizationSystem); err != nil {
return err, nil
}
_watchdog.wg.Add(1)
var sysmem gosigar.Mem
go pollingWatchdog(policy, frequency, limit, func() (uint64, error) {
if err := sysmemFn(&sysmem); err != nil {
return 0, err
}
return sysmem.ActualUsed, nil
})
return nil, stop
}
// pollingWatchdog starts a polling watchdog with the provided policy, using
// the supplied polling frequency. On every tick, it calls usageFn and, if the
// usage is greater or equal to the threshold at the time, it forces GC.
// usageFn is guaranteed to be called serially, so no locking should be
// necessary.
func pollingWatchdog(policy Policy, frequency time.Duration, limit uint64, usageFn func() (uint64, error)) {
defer _watchdog.wg.Done()
defer wdrecover() // recover from panics.
gcTriggered := make(chan struct{}, 16)
setupGCSentinel(gcTriggered)
var (
memstats runtime.MemStats
threshold uint64
)
renewThreshold := func() {
// get the current usage.
usage, err := usageFn()
if err != nil {
Logger.Warnf("failed to obtain memory utilization stats; err: %s", err)
return
}
// calculate the threshold.
threshold = policy.Evaluate(_watchdog.scope, usage)
}
// initialize the threshold.
renewThreshold()
// initialize an empty timer.
timer := Clock.Timer(0)
stopTimer := func() {
if !timer.Stop() {
<-timer.C
}
}
for {
timer.Reset(frequency)
select {
case <-timer.C:
// get the current usage.
usage, err := usageFn()
if err != nil {
Logger.Warnf("failed to obtain memory utilizationstats; err: %s", err)
continue
}
// evaluate if a heap profile needs to be captured.
maybeCaptureHeapProfile(usage, limit)
if usage < threshold {
// nothing to do.
continue
}
// trigger GC; this will emit a gcTriggered event which we'll
// consume next to readjust the threshold.
Logger.Warnf("system-driven watchdog triggering GC; %d/%d bytes (used/threshold)", usage, threshold)
forceGC(&memstats)
case <-gcTriggered:
notifyGC()
renewThreshold()
stopTimer()
case <-_watchdog.closing:
stopTimer()
return
}
}
}
// forceGC forces a manual GC.
func forceGC(memstats *runtime.MemStats) {
Logger.Infof("watchdog is forcing GC")
startNotify := time.Now()
notifyForcedGC()
// it's safe to assume that the finalizer will attempt to run before
// runtime.GC() returns because runtime.GC() waits for the sweep phase to
// finish before returning.
// finalizers are run in the sweep phase.
start := time.Now()
notificationsTook := start.Sub(startNotify)
ForcedGCFunc()
took := time.Since(start)
memstatsFn(memstats)
Logger.Infof("watchdog-triggered GC finished; notifications took: %s, took: %s; current heap allocated: %d bytes", notificationsTook, took, memstats.HeapAlloc)
}
func setupGCSentinel(gcTriggered chan struct{}) {
logger := Logger
// this non-zero sized struct is used as a sentinel to detect when a GC
// run has finished, by setting and resetting a finalizer on it.
// it essentially creates a GC notification "flywheel"; every GC will
// trigger this finalizer, which will reset itself so it gets notified
// of the next GC, breaking the cycle when the watchdog is stopped.
type sentinel struct{ a *int }
var finalizer func(o *sentinel)
finalizer = func(o *sentinel) {
_watchdog.lk.Lock()
defer _watchdog.lk.Unlock()
if _watchdog.state != stateRunning {
// this GC triggered after the watchdog was stopped; ignore
// and do not reset the finalizer.
return
}
// reset so it triggers on the next GC.
runtime.SetFinalizer(o, finalizer)
select {
case gcTriggered <- struct{}{}:
default:
logger.Warnf("failed to queue gc trigger; channel backlogged")
}
}
runtime.SetFinalizer(&sentinel{}, finalizer) // start the flywheel.
}
func start(scope UtilizationType) error {
_watchdog.lk.Lock()
defer _watchdog.lk.Unlock()
if _watchdog.state != stateUnstarted {
return ErrAlreadyStarted
}
_watchdog.state = stateRunning
_watchdog.scope = scope
_watchdog.closing = make(chan struct{})
initHeapProfileCapture()
return nil
}
func stop() {
_watchdog.lk.Lock()
defer _watchdog.lk.Unlock()
if _watchdog.state != stateRunning {
return
}
close(_watchdog.closing)
_watchdog.wg.Wait()
_watchdog.state = stateUnstarted
}
func initHeapProfileCapture() {
if HeapProfileDir == "" || HeapProfileThreshold <= 0 {
Logger.Debugf("heap profile capture disabled")
return
}
if HeapProfileThreshold >= 1 {
Logger.Warnf("failed to initialize heap profile capture: threshold must be 0 < t < 1")
return
}
if fi, err := os.Stat(HeapProfileDir); os.IsNotExist(err) {
if err := os.MkdirAll(HeapProfileDir, 0777); err != nil {
Logger.Warnf("failed to initialize heap profile capture: failed to create dir: %s; err: %s", HeapProfileDir, err)
return
}
} else if err != nil {
Logger.Warnf("failed to initialize heap profile capture: failed to stat path: %s; err: %s", HeapProfileDir, err)
return
} else if !fi.IsDir() {
Logger.Warnf("failed to initialize heap profile capture: path exists but is not a directory: %s", HeapProfileDir)
return
}
// all good, set the amount of heap profile captures left.
_watchdog.hpleft = HeapProfileMaxCaptures
Logger.Infof("initialized heap profile capture; threshold: %f; max captures: %d; dir: %s", HeapProfileThreshold, HeapProfileMaxCaptures, HeapProfileDir)
}
func maybeCaptureHeapProfile(usage, limit uint64) {
if _watchdog.hpleft <= 0 {
// nothing to do; no captures remaining (or captures disabled), or
// already captured a heap profile for this episode.
return
}
if float64(usage)/float64(limit) < HeapProfileThreshold {
// we are below the threshold, reset the hpcurr flag.
_watchdog.hpcurr = false
return
}
// we are above the threshold.
if _watchdog.hpcurr {
return // we've already captured this episode, skip.
}
path := filepath.Join(HeapProfileDir, time.Now().Format(time.RFC3339Nano)+".heap")
file, err := os.Create(path)
if err != nil {
Logger.Warnf("failed to create heap profile file; path: %s; err: %s", path, err)
return
}
defer file.Close()
if err = pprof.WriteHeapProfile(file); err != nil {
Logger.Warnf("failed to write heap profile; path: %s; err: %s", path, err)
return
}
Logger.Infof("heap profile captured; path: %s", path)
_watchdog.hpcurr = true
_watchdog.hpleft--
}
func wdrecover() {
if r := recover(); r != nil {
msg := fmt.Sprintf("WATCHDOG PANICKED; recovered but watchdog is disarmed: %s", r)
if Logger != nil {
Logger.Errorf(msg)
} else {
_, _ = fmt.Fprintln(os.Stderr, msg)
}
}
}

73
vendor/github.com/raulk/go-watchdog/watchdog_linux.go generated vendored Normal file
View File

@@ -0,0 +1,73 @@
package watchdog
import (
"fmt"
"os"
"time"
"github.com/containerd/cgroups"
)
var (
pid = os.Getpid()
memSubsystem = cgroups.SingleSubsystem(cgroups.V1, cgroups.Memory)
)
// CgroupDriven initializes a cgroups-driven watchdog. It will try to discover
// the memory limit from the cgroup of the process (derived from /proc/self/cgroup),
// or from the root cgroup path if the PID == 1 (which indicates that the process
// is running in a container).
//
// Memory usage is calculated by querying the cgroup stats.
//
// This function will return an error immediately if the OS does not support cgroups,
// or if another error occurs during initialization. The caller can then safely fall
// back to the system driven watchdog.
func CgroupDriven(frequency time.Duration, policyCtor PolicyCtor) (err error, stopFn func()) {
// use self path unless our PID is 1, in which case we're running inside
// a container and our limits are in the root path.
path := cgroups.NestedPath("")
if pid := os.Getpid(); pid == 1 {
path = cgroups.RootPath
}
cgroup, err := cgroups.Load(memSubsystem, path)
if err != nil {
return fmt.Errorf("failed to load cgroup for process: %w", err), nil
}
var limit uint64
if stat, err := cgroup.Stat(); err != nil {
return fmt.Errorf("failed to load memory cgroup stats: %w", err), nil
} else if stat.Memory == nil || stat.Memory.Usage == nil {
return fmt.Errorf("cgroup memory stats are nil; aborting"), nil
} else {
limit = stat.Memory.Usage.Limit
}
if limit == 0 {
return fmt.Errorf("cgroup limit is 0; refusing to start memory watchdog"), nil
}
policy, err := policyCtor(limit)
if err != nil {
return fmt.Errorf("failed to construct policy with limit %d: %w", limit, err), nil
}
if err := start(UtilizationProcess); err != nil {
return err, nil
}
_watchdog.wg.Add(1)
go pollingWatchdog(policy, frequency, limit, func() (uint64, error) {
stat, err := cgroup.Stat()
if err != nil {
return 0, err
} else if stat.Memory == nil || stat.Memory.Usage == nil {
return 0, fmt.Errorf("cgroup memory stats are nil; aborting")
}
return stat.Memory.Usage.Usage, nil
})
return nil, stop
}

13
vendor/github.com/raulk/go-watchdog/watchdog_other.go generated vendored Normal file
View File

@@ -0,0 +1,13 @@
// +build !linux
package watchdog
import (
"fmt"
"time"
)
// CgroupDriven is only available in Linux. This method will error.
func CgroupDriven(frequency time.Duration, policyCtor PolicyCtor) (err error, stopFn func()) {
return fmt.Errorf("cgroups-driven watchdog: %w", ErrNotSupported), nil
}

42
vendor/github.com/raulk/go-watchdog/watermarks.go generated vendored Normal file
View File

@@ -0,0 +1,42 @@
package watchdog
// NewWatermarkPolicy creates a watchdog policy that schedules GC at concrete
// watermarks. When queried, it will determine the next trigger point based
// on the current utilisation. If the last watermark is surpassed,
// the policy will be disarmed. It is recommended to set an extreme watermark
// as the last element (e.g. 0.99) to prevent the policy from disarming too soon.
func NewWatermarkPolicy(watermarks ...float64) PolicyCtor {
return func(limit uint64) (Policy, error) {
p := new(watermarkPolicy)
p.limit = limit
p.thresholds = make([]uint64, 0, len(watermarks))
for _, m := range watermarks {
p.thresholds = append(p.thresholds, uint64(float64(limit)*m))
}
Logger.Infof("initialized watermark watchdog policy; watermarks: %v; thresholds: %v", p.watermarks, p.thresholds)
return p, nil
}
}
type watermarkPolicy struct {
// watermarks are the percentual amounts of limit.
watermarks []float64
// thresholds are the absolute trigger points of this policy.
thresholds []uint64
limit uint64
}
var _ Policy = (*watermarkPolicy)(nil)
func (w *watermarkPolicy) Evaluate(_ UtilizationType, used uint64) (next uint64) {
Logger.Debugf("watermark policy: evaluating; utilization: %d/%d (used/limit)", used, w.limit)
var i int
for ; i < len(w.thresholds); i++ {
t := w.thresholds[i]
if used < t {
return t
}
}
// we reached the maximum threshold, so we disable this policy.
return PolicyTempDisabled
}