All you need is sync.Pool
I have ignored sync.Pool for a long time but it has come to an end. One of Coraza’s greatest memory consumption feature was Transaction and collection creation, it required tons of IO operations and it wasn’t GC friendly, which means the garbage collector would require a few seconds to remove an old transaction, generating in high traffic systems a long tail of unused transactions.
Sync Pools are a “cache” method used to recycle objects instead of freeing the resources and reallocating them. It is really useful because each web request may easily generate 30 transactions and thousand to millions of allocations. With Sync Pools we poll from the pool instead of creating new resources.
// First we create a global synchronized pool of *Transaction
var transactionPool = sync.Pool{
New: func() interface{} { return new(Transaction) },
}
// Now we update NewTransaction with the pool system
func (w *Waf) NewTransaction() *Transaction {
tx := transactionPool.Get().(*Transaction)
// ...
}
// We add an additional Clean function to clean transactions data
func (tx *Transaction) Clean() error {
defer transactionPool.Put(tx)
if err := tx.RequestBodyBuffer.Close(); err != nil {
return err
}
if err := tx.ResponseBodyBuffer.Close(); err != nil {
return err
}
tx.Waf.Logger.Debug("Transaction finished", zap.String("event", "CLEAN_TRANSACTION"), zap.String("txid", tx.ID), zap.Bool("interrupted", tx.Interrupted()))
return nil
}
Benchmarking code:
func BenchmarkNewTxWithoutPool(b *testing.B) {
var p *Transaction
waf := NewWaf()
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
for j := 0; j < 10000; j++ {
p = new(Transaction)
p.Waf = waf
}
}
}
func BenchmarkNewTxWithPool(b *testing.B) {
var p *Transaction
b.ReportAllocs()
b.ResetTimer()
waf := NewWaf()
for i := 0; i < b.N; i++ {
for j := 0; j < 10000; j++ {
p = transactionPool.Get().(*Transaction)
p.Waf = waf
transactionPool.Put(p)
}
}
}
Using 16 cpu cores and 32gb memory, the results are the following:
# Benchmarks:
BenchmarkWithoutPool-16 426 2878390 ns/op 10240140 B/op 10001 allocs/op
BenchmarkWithPool-16 9152 113863 ns/op 1 B/op 0 allocs/op
# Memory consumption passive:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
167819 65532 20 0 3610724 359808 28336 S 0,0 1,1 19:20.67 caddy
# Memory consumption for 100.000 transactions using ab -c 1000 -n 100000 http://localhost:80/:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
167819 65532 20 0 3678508 1,2g 28336 S 1243 3,9 20:53.46 caddy
More CPU consumption but a huge improvement on IO operations.