How @with_pool Works
This page explains the internal mechanics of the @with_pool macro for advanced users and contributors who want to understand the optimization strategies.
Overview
The @with_pool macro provides automatic lifecycle management with three key optimizations:
- Try-Finally Safety — Guarantees cleanup even on exceptions
- Typed Checkpoint/Rewind — Only saves/restores used types (~77% faster)
- Untracked Acquire Detection — Safely handles
acquire!calls outside macro visibility
Basic Lifecycle Flow
┌─────────────────────────────────────────────────────────────┐
│ @with_pool pool function foo(x) │
│ A = acquire!(pool, Float64, 100) │
│ B = similar!(pool, A) │
│ return sum(A) + sum(B) │
│ end │
└─────────────────────────────────────────────────────────────┘
↓
┌───────────────────────────────────┐
│ Macro Transformation │
└───────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ function foo(x) │
│ pool = get_task_local_pool() │
│ checkpoint!(pool, Float64) # ← Type-specific │
│ try │
│ A = _acquire_impl!(pool, Float64, 100) │
│ B = _similar_impl!(pool, A) │
│ return sum(A) + sum(B) │
│ finally │
│ rewind!(pool, Float64) # ← Type-specific │
│ end │
│ end │
└─────────────────────────────────────────────────────────────┘Key Points
try-finallyensuresrewind!executes even if an exception occursacquire!→_acquire_impl!transformation bypasses untracked marking overhead- Type-specific
checkpoint!(pool, Float64)is ~77% faster than full checkpoint
Type Extraction: Static Analysis at Compile Time
The macro analyzes the AST to extract types used in acquire! calls:
# Macro sees these acquire! calls:
@with_pool pool begin
A = acquire!(pool, Float64, 10, 10) # → extracts Float64
B = zeros!(pool, ComplexF64, 100) # → extracts ComplexF64
C = similar!(pool, A) # → extracts eltype(A) → Float64
end
# Generated code uses typed checkpoint/rewind:
checkpoint!(pool, Float64, ComplexF64)
try
...
finally
rewind!(pool, Float64, ComplexF64)
endType Extraction Rules
| Call Pattern | Extracted Type |
|---|---|
acquire!(pool, Float64, dims...) | Float64 |
acquire!(pool, x) | eltype(x) (if x is external) |
zeros!(pool, dims...) | default_eltype(pool) |
zeros!(pool, Float32, dims...) | Float32 |
similar!(pool, x) | eltype(x) |
similar!(pool, x, Int64, ...) | Int64 |
When Type Extraction Fails → Full Checkpoint
The macro falls back to full checkpoint!(pool) when:
@with_pool pool begin
T = eltype(data) # T defined locally AFTER checkpoint
A = acquire!(pool, T, 100) # Can't use T at checkpoint time!
end
# → Falls back to checkpoint!(pool) / rewind!(pool)
@with_pool pool begin
local_arr = compute() # local_arr defined AFTER checkpoint
B = similar!(pool, local_arr) # eltype(local_arr) unavailable
end
# → Falls back to checkpoint!(pool) / rewind!(pool)Untracked Acquire Detection
The Problem
The macro can only see acquire! calls directly in its AST. Calls inside helper functions are invisible:
function helper!(pool)
return zeros!(pool, Float64, 100) # Macro can't see this!
end
@with_pool pool begin
A = acquire!(pool, Int64, 10) # ← Macro sees this (Int64)
B = helper!(pool) # ← Macro can't see Float64 inside!
end
# If only checkpoint!(pool, Int64), Float64 arrays won't be rewound!The Solution: Bitmask-Based Type Touch Tracking
Every acquire! call (and convenience functions) records the type touch with type-specific bitmask information:
# Public API (called from user code outside macro)
@inline function acquire!(pool, ::Type{T}, n::Int) where {T}
_record_type_touch!(pool, T) # ← Records type-specific bitmask!
_acquire_impl!(pool, T, n)
end
# Macro-transformed calls skip the recording
# (because macro already knows about them)
_acquire_impl!(pool, T, n) # ← No recordingEach fixed-slot type maps to a bit in a UInt16 bitmask via _fixed_slot_bit(T). Non-fixed-slot types set a separate _touched_has_others flag.
Flow Diagram
@with_pool pool begin Bitmask state at depth 2
│ ─────────────────────────────
├─► checkpoint!(pool, Int64) masks[2]=0x0000, others[2]=false
│
│ A = _acquire_impl!(...) (macro-transformed, no mark)
│ B = helper!(pool)
│ └─► zeros!(pool, Float64, N)
│ └─► _record_type_touch!(pool, Float64)
│ masks[2] |= 0x0001 (Float64 bit) ←───┐
│ │
│ ... more code ... │
│ │
└─► rewind! check: │
tracked_mask = _tracked_mask_for_types(Int64) │
if _can_use_typed_path(pool, tracked_mask) ────────┘
rewind!(pool, Int64) # Typed rewind (fast)
else # Float64 not in {Int64} → full
rewind!(pool) # Full rewind (safe)
end
endWhy This Works
- Macro-tracked calls: Transformed to
_acquire_impl!→ no bitmask touch → typed path - External calls: Use public API → records type-specific bitmask → subset check at rewind
- Subset optimization: If touched types are a subset of tracked types, the typed path is still safe
- Result: Always safe, with finer-grained optimization than a single boolean flag
Nested @with_pool Handling
Each @with_pool maintains its own checkpoint depth:
@with_pool p1 begin depth: 1 → 2
v1 = acquire!(p1, Float64, 10)
│
├─► @with_pool p2 begin depth: 2 → 3
│ v2 = acquire!(p2, Int64, 5)
│ helper!(p2) # marks bitmask at depth 3
│ sum(v2)
│ end depth: 3 → 2, bitmask checked
│
│ # v1 still valid here!
sum(v1)
end depth: 2 → 1, bitmask checkedDepth Tracking Data Structures
struct AdaptiveArrayPool
# ... type pools ...
_current_depth::Int # Current scope depth (1 = global)
_touched_type_masks::Vector{UInt16} # Per-depth: which fixed slots were touched
_touched_has_others::Vector{Bool} # Per-depth: any non-fixed-slot type touched
end
# Initialized with sentinel:
_current_depth = 1 # Global scope
_touched_type_masks = [UInt16(0)] # Sentinel for depth=1
_touched_has_others = [false] # Sentinel for depth=1Performance Impact
| Scenario | Checkpoint Method | Relative Speed |
|---|---|---|
| 1 type, no untracked | checkpoint!(pool, T) | ~77% faster |
| Multiple types, no untracked | checkpoint!(pool, T1, T2, ...) | ~50% faster |
| Untracked subset of tracked | checkpoint!(pool, T...) | ~77% faster |
| Unknown untracked types | checkpoint!(pool) | Baseline |
The optimization matters most in tight loops with many iterations. The bitmask subset check allows the typed path even when untracked acquires occur, as long as those types are already covered by the macro's tracked set.
Code Generation Summary
# INPUT
@with_pool pool function compute(data)
A = acquire!(pool, Float64, length(data))
result = helper!(pool, A) # May have untracked acquires
return result
end
# OUTPUT (simplified)
function compute(data)
pool = get_task_local_pool()
# Bitmask subset check: can typed path handle any untracked acquires?
if _can_use_typed_path(pool, _tracked_mask_for_types(Float64))
checkpoint!(pool, Float64) # Typed checkpoint (fast)
else
checkpoint!(pool) # Full checkpoint (safe)
end
try
A = _acquire_impl!(pool, Float64, length(data))
result = helper!(pool, A)
return result
finally
if _can_use_typed_path(pool, _tracked_mask_for_types(Float64))
rewind!(pool, Float64) # Typed rewind (fast)
else
rewind!(pool) # Full rewind (safe)
end
end
endKey Internal Functions
| Function | Purpose |
|---|---|
_extract_acquire_types(expr, pool_name) | AST walk to find types |
_filter_static_types(types, local_vars) | Filter out locally-defined types |
_transform_acquire_calls(expr, pool_name) | Replace acquire! → _acquire_impl! |
_record_type_touch!(pool, T) | Record type touch in bitmask for current depth |
_can_use_typed_path(pool, mask) | Bitmask subset check for typed vs full path |
_tracked_mask_for_types(T...) | Compile-time bitmask for tracked types |
_generate_typed_checkpoint_call(pool, types) | Generate bitmask-aware checkpoint |
See Also
- How It Works — Overview of pool architecture
- Safety Rules — Scope rules and best practices
- Configuration — Performance tuning options