API Reference
API Summary
Macros
| Macro | Description |
|---|---|
@with_pool name expr | Recommended. Injects a global, task-local pool named name. Automatically checkpoints and rewinds. |
@maybe_with_pool name expr | Same as @with_pool, but can be toggled on/off at runtime via MAYBE_POOLING_ENABLED[]. |
Functions
| Function | Description |
|---|---|
acquire!(pool, T, dims...) | Returns a view for most T: SubArray{T,1} for 1D, ReshapedArray{T,N} for N-D. For T === Bit, returns native BitVector/BitArray{N}. Cache hit 0 bytes. |
acquire!(pool, T, dims::Tuple) | Tuple overload for acquire! (e.g., acquire!(pool, T, size(x))). |
acquire!(pool, x::AbstractArray) | Similar-style: acquires array matching eltype(x) and size(x). |
unsafe_acquire!(pool, T, dims...) | Returns native Array/CuArray (CPU: Vector{T} for 1D, Array{T,N} for N-D). For T === Bit, returns native BitVector/BitArray{N} (equivalent to acquire!). Only use for FFI/type constraints. |
unsafe_acquire!(pool, x::AbstractArray) | Similar-style: acquires raw array matching eltype(x) and size(x). |
checkpoint!(pool) | Saves the current pool state (stack pointer). |
rewind!(pool) | Restores the pool to the last checkpoint, freeing all arrays acquired since then. |
pool_stats(pool) | Prints detailed statistics about pool usage. |
get_task_local_pool() | Returns the task-local pool instance. |
empty!(pool) | Clears all internal storage, releasing all memory. |
Convenience Functions
Default element type is Float64 (CPU) or Float32 (CUDA).
| Function | Description |
|---|---|
zeros!(pool, [T,] dims...) | Zero-initialized view. Equivalent to acquire! + fill!(0). |
ones!(pool, [T,] dims...) | One-initialized view. Equivalent to acquire! + fill!(1). |
trues!(pool, dims...) | Bit-packed BitVector / BitArray{N} filled with true. |
falses!(pool, dims...) | Bit-packed BitVector / BitArray{N} filled with false. |
similar!(pool, A) | View matching eltype(A) and size(A). |
Types
| Type | Description |
|---|---|
AdaptiveArrayPool | The main pool type. Create with AdaptiveArrayPool(). |
Bit | Sentinel type to request packed BitVector storage (1 bit/element). |
DisabledPool{Backend} | Sentinel type when pooling is disabled. |
Configuration & Utilities
| Symbol | Description |
|---|---|
USE_POOLING | Compile-time constant to disable all pooling. |
MAYBE_POOLING_ENABLED | Runtime Ref{Bool} for @maybe_with_pool. |
POOL_DEBUG | Runtime Ref{Bool} to enable safety validation. |
set_cache_ways!(n) | Set N-way cache size. |
Detailed Reference
AdaptiveArrayPools.@maybe_with_pool — Macro
@maybe_with_pool pool_name expr
@maybe_with_pool exprConditionally enables pooling based on MAYBE_POOLING_ENABLED[]. If disabled, pool_name becomes nothing, and acquire! falls back to standard allocation.
Useful for libraries that want to let users control pooling behavior at runtime.
Function Definition
Like @with_pool, wrap function definitions:
@maybe_with_pool pool function process_data(data)
tmp = acquire!(pool, Float64, length(data)) # Conditionally pooled
tmp .= data
sum(tmp)
endBlock Usage
MAYBE_POOLING_ENABLED[] = false
@maybe_with_pool pool begin
v = acquire!(pool, Float64, 100) # Falls back to Vector{Float64}(undef, 100)
endAdaptiveArrayPools.@with_pool — Macro
@with_pool pool_name expr
@with_pool expr
@with_pool :backend pool_name expr
@with_pool :backend exprExecutes code within a pooling scope with automatic lifecycle management. Calls checkpoint! on entry and rewind! on exit (even if errors occur).
If pool_name is omitted, a hidden variable is used (useful when you don't need to reference the pool directly).
Backend Selection
Use a symbol to specify the pool backend:
:cpu- CPU pools (default):cuda- GPU pools (requiresusing CUDA)
# CPU (default)
@with_pool pool begin ... end
# GPU via CUDA
@with_pool :cuda pool begin ... endFunction Definition
Wrap function definitions to inject pool lifecycle into the body:
# Long form function
@with_pool pool function compute_stats(data)
tmp = acquire!(pool, Float64, length(data))
tmp .= data
mean(tmp), std(tmp)
end
# Short form function
@with_pool pool fast_sum(data) = begin
tmp = acquire!(pool, eltype(data), length(data))
tmp .= data
sum(tmp)
endBlock Usage
# With explicit pool name
@with_pool pool begin
v = acquire!(pool, Float64, 100)
v .= 1.0
sum(v)
end
# Without pool name (for simple blocks)
@with_pool begin
inner_function() # inner function can use get_task_local_pool()
endNesting
Nested @with_pool blocks work correctly - each maintains its own checkpoint.
@with_pool p1 begin
v1 = acquire!(p1, Float64, 10)
inner = @with_pool p2 begin
v2 = acquire!(p2, Float64, 5)
sum(v2)
end
# v1 is still valid here
sum(v1) + inner
endAdaptiveArrayPools._acquire_impl! — Method
_acquire_impl!(pool, Type{T}, n) -> SubArray{T,1,Vector{T},...}
_acquire_impl!(pool, Type{T}, dims...) -> ReshapedArray{T,N,...}Internal implementation of acquire!. Called directly by macro-transformed code (no type touch recording). User code calls acquire! which adds recording.
AdaptiveArrayPools._can_use_typed_path — Method
_can_use_typed_path(pool::AbstractArrayPool, tracked_mask::UInt16) -> BoolCheck if the typed (fast) checkpoint/rewind path is safe to use.
Returns true when all touched types at the current depth are a subset of the tracked types (bitmask subset check) AND no non-fixed-slot types were touched.
The subset check: (touched_mask & ~tracked_mask) == 0 means every bit set in touched_mask is also set in tracked_mask.
AdaptiveArrayPools._disabled_pool_expr — Method
_disabled_pool_expr(backend::Symbol) -> ExprGenerate expression for DisabledPool singleton based on backend. Used when pooling is disabled to preserve backend context.
AdaptiveArrayPools._ensure_body_has_toplevel_lnn — Method
_ensure_body_has_toplevel_lnn(body, source)Ensure body has a LineNumberNode pointing to user source at the top level.
- Scans first few args to handle Expr(:meta, ...) from @inline etc.
- If first LNN points to user file (same as source.file), preserve it
- If first LNN points elsewhere (e.g., macros.jl), replace with source LNN
- If no LNN exists, prepend source LNN
- If source.file === :none (REPL/eval), don't clobber valid file LNNs
Returns a new Expr to avoid mutating the original AST.
AdaptiveArrayPools._extract_acquire_types — Function
_extract_acquire_types(expr, target_pool) -> Set{Any}Extract type arguments from acquire/convenience function calls in an expression. Only extracts types from calls where the first argument matches target_pool. This prevents AST pollution when multiple pools are used in the same block.
Supported functions:
acquire!and its aliasacquire_view!unsafe_acquire!and its aliasacquire_array!zeros!,ones!,similar!unsafe_zeros!,unsafe_ones!,unsafe_similar!
Handles various forms:
[unsafe_]acquire!(pool, Type, dims...): extracts Type directlyacquire!(pool, x): generateseltype(x)expressionzeros!(pool, dims...)/ones!(pool, dims...): Float64 (default)zeros!(pool, Type, dims...)/ones!(pool, Type, dims...): extracts Typesimilar!(pool, x): generateseltype(x)expressionsimilar!(pool, x, Type, ...): extracts Type
AdaptiveArrayPools._extract_local_assignments — Function
_extract_local_assignments(expr, locals=Set{Symbol}()) -> Set{Symbol}Find all symbols that are assigned locally in the expression body. These cannot be used for typed checkpoint since they're defined after checkpoint!.
Detects patterns like: T = eltype(x), local T = ..., etc.
AdaptiveArrayPools._filter_static_types — Function
_filter_static_types(types, local_vars=Set{Symbol}()) -> (static_types, has_dynamic)Filter types for typed checkpoint/rewind generation.
- Symbols NOT in local_vars are passed through (type parameters, global types)
- Symbols IN local_vars trigger fallback (defined after checkpoint!)
- Parametric types like Vector{T} trigger fallback
eltype(x)expressions: usable ifxdoes NOT reference a local variable
Type parameters (T, S from where clause) resolve to concrete types at runtime. Local variables (T = eltype(x)) are defined after checkpoint! and cannot be used.
AdaptiveArrayPools._find_first_lnn_index — Method
_find_first_lnn_index(args) -> Union{Int, Nothing}Find the index of the first LineNumberNode in the leading prefix of args.
Scans sequentially, skipping Expr(:meta, ...) nodes (inserted by @inline, @inbounds, etc.). Returns nothing as soon as a non-meta, non-LNN expression is encountered—this prevents matching LNNs deeper in the AST.
Example AST prefix patterns
[Expr(:meta,:inline), LNN, ...]→ returns 2[LNN, ...]→ returns 1[Expr(:meta,:inline), Expr(:call,...), LNN, ...]→ returns nothing (stopped at call)
AdaptiveArrayPools._fix_try_body_lnn! — Method
_fix_try_body_lnn!(expr, source)Fix LineNumberNodes inside try blocks to point to user source. Julia's stack trace uses the LAST LNN before error location for line numbers. By replacing the first LNN in try body with source LNN, we ensure correct line numbers in stack traces.
Scans first few args to handle Expr(:meta, ...) from @inline etc. If source.file === :none (REPL/eval), don't clobber valid file LNNs. Modifies expr in-place and returns it.
AdaptiveArrayPools._generate_function_pool_code_with_backend — Method
_generate_function_pool_code_with_backend(backend, pool_name, func_def, disable_pooling)Generate function code for a specific backend (e.g., :cuda). Wraps the function body with pool getter, checkpoint, try-finally, rewind.
AdaptiveArrayPools._generate_lazy_checkpoint_call — Method
_generate_lazy_checkpoint_call(pool_expr)Generate a depth-only checkpoint call for dynamic-selective mode (use_typed=false). Much lighter than full checkpoint!: only increments depth and pushes bitmask sentinels.
AdaptiveArrayPools._generate_lazy_rewind_call — Method
_generate_lazy_rewind_call(pool_expr)Generate selective rewind code for dynamic-selective mode (use_typed=false). Delegates to _lazy_rewind! — a single function call, symmetric with _lazy_checkpoint! for checkpoint. This avoids let-block overhead in finally clauses (which can impair Julia's type inference and cause boxing).
AdaptiveArrayPools._generate_pool_code_with_backend — Method
_generate_pool_code_with_backend(backend, pool_name, expr, force_enable)Generate pool code for a specific backend (e.g., :cuda, :cpu). Uses _get_pool_for_backend(Val{backend}()) for zero-overhead dispatch.
Includes type-specific checkpoint/rewind optimization (same as regular @with_pool).
AdaptiveArrayPools._generate_typed_checkpoint_call — Method
_generate_typed_checkpoint_call(pool_expr, types)Generate bitmask-aware checkpoint call. When types are known at compile time, emits a conditional:
- if touched types ⊆ tracked types → typed checkpoint (fast path)
- otherwise →
_typed_lazy_checkpoint!(typed checkpoint + set bit 14 for lazy first-touch checkpointing of extra types touched by helpers)
AdaptiveArrayPools._generate_typed_rewind_call — Method
_generate_typed_rewind_call(pool_expr, types)Generate bitmask-aware rewind call. When types are known at compile time, emits a conditional:
- if touched types ⊆ tracked types → typed rewind (fast path)
- otherwise →
_typed_lazy_rewind!(rewinds tracked | touched mask; all touched types have Case A checkpoints via bit 14 lazy mode)
AdaptiveArrayPools._get_pool_for_backend — Method
_get_pool_for_backend(::Val{:cpu}) -> AdaptiveArrayPoolGet task-local pool for the specified backend.
Extensions add methods for their backends (e.g., Val{:cuda}). Using Val{Symbol} enables compile-time dispatch and full inlining, achieving zero overhead compared to Dict-based registry.
Example (in CUDA extension)
@inline AdaptiveArrayPools._get_pool_for_backend(::Val{:cuda}) = get_task_local_cuda_pool()AdaptiveArrayPools._lazy_checkpoint! — Method
_lazy_checkpoint!(pool::AdaptiveArrayPool)Lightweight checkpoint for lazy mode (use_typed=false macro path).
Increments _current_depth and pushes bitmask sentinels — but does not save n_active for any fixed-slot typed pool. The _LAZY_MODE_BIT (bit 15) in _touched_type_masks marks this depth as lazy mode so that _record_type_touch! can trigger lazy first-touch checkpoints.
Existing others entries are eagerly checkpointed since there is no per-type tracking for non-fixed-slot pools; Case B in _rewind_typed_pool! handles any new others entries created during the scope (n_active starts at 0 = sentinel).
Performance: ~2ns vs ~540ns for full checkpoint!.
AdaptiveArrayPools._lazy_rewind! — Method
_lazy_rewind!(pool::AdaptiveArrayPool)Complete rewind for lazy mode (use_typed=false macro path).
Reads the combined mask at the current depth, rewinds only the fixed-slot pools whose bits are set, handles any others entries, then pops the depth metadata.
Called directly from the macro-generated finally clause as a single function call (matching the structure of _lazy_checkpoint! for symmetry and performance).
AdaptiveArrayPools._looks_like_type — Method
_looks_like_type(expr) -> BoolHeuristic to check if an expression looks like a type. Returns true for: uppercase Symbols (Float64, Int), curly expressions (Vector{T}), GlobalRef to types.
AdaptiveArrayPools._record_type_touch! — Method
_record_type_touch!(pool::AbstractArrayPool, ::Type{T})Record that type T was touched (acquired) at the current checkpoint depth. Called by acquire! and convenience wrappers; macro-transformed calls use _acquire_impl! directly (bypassing this for zero overhead).
For fixed-slot types, sets the corresponding bit in _touched_type_masks. For non-fixed-slot types, sets _touched_has_others flag.
AdaptiveArrayPools._selective_rewind_fixed_slots! — Method
_selective_rewind_fixed_slots!(pool::AdaptiveArrayPool, mask::UInt16)Rewind only the fixed-slot typed pools whose bits are set in mask.
Each of the 8 fixed-slot pools maps to bits 0–7 (same encoding as _fixed_slot_bit). Bits 8–15 (mode flags) are not checked here — callers must strip them before passing the mask (e.g. mask & _TYPE_BITS_MASK).
Unset bits are skipped entirely: for pools that were acquired without a matching checkpoint, _rewind_typed_pool! Case B safely restores from the parent checkpoint.
AdaptiveArrayPools._tracked_mask_for_types — Method
_tracked_mask_for_types(types::Type...) -> UInt16Compute compile-time bitmask for the types tracked by a typed checkpoint/rewind. Uses @generated for zero-overhead constant folding.
Returns UInt16(0) when called with no arguments. Non-fixed-slot types contribute UInt16(0) (their bit is 0).
AdaptiveArrayPools._typed_lazy_checkpoint! — Method
_typed_lazy_checkpoint!(pool::AdaptiveArrayPool, types::Type...)Typed checkpoint that enables lazy first-touch checkpointing for extra types touched by helpers (use_typed=true, _can_use_typed_path=false path).
Calls checkpoint!(pool, types...) (checkpoints only the statically-known types), then sets _TYPED_LAZY_BIT (bit 14) in _touched_type_masks[depth] to signal typed lazy mode.
_record_type_touch! checks (mask & _MODE_BITS_MASK) != 0 (bit 14 OR bit 15) to trigger a lazy first-touch checkpoint for each extra type on first acquire, ensuring Case A (not Case B) applies at rewind and parent n_active is preserved correctly.
AdaptiveArrayPools._typed_lazy_rewind! — Method
_typed_lazy_rewind!(pool::AdaptiveArrayPool, tracked_mask::UInt16)Selective rewind for typed mode (use_typed=true) fallback path.
Called when _can_use_typed_path returns false (helpers touched types beyond the statically-tracked set). Rewinds only pools whose bits are set in tracked_mask | touched_mask. All touched types have Case A checkpoints, guaranteed by the _TYPED_LAZY_BIT mode set in _typed_lazy_checkpoint!.
AdaptiveArrayPools._unsafe_acquire_impl! — Method
_unsafe_acquire_impl!(pool, Type{T}, dims...) -> Array{T,N}Internal implementation of unsafe_acquire!. Called directly by macro-transformed code.
AdaptiveArrayPools._uses_local_var — Method
_uses_local_var(expr, local_vars) -> BoolCheck if an expression uses any local variable (recursively). Handles field access (x.y.z) and indexing (x[i]) by checking the base variable.
This is used to detect cases like acquire!(pool, cp1d.t_i_average) where cp1d is defined locally - the eltype expression can't be evaluated at checkpoint time since cp1d doesn't exist yet.
AdaptiveArrayPools.acquire! — Method
acquire!(pool, x::AbstractArray) -> SubArrayAcquire an array with the same element type and size as x (similar to similar(x)).
Example
A = rand(10, 10)
@with_pool pool begin
B = acquire!(pool, A) # Same type and size as A
B .= A .* 2
endAdaptiveArrayPools.acquire! — Method
acquire!(pool, Type{T}, n) -> view type
acquire!(pool, Type{T}, dims...) -> view type
acquire!(pool, Type{T}, dims::NTuple{N,Int}) -> view typeAcquire a pooled array of type T with size n or dimensions dims.
Returns a pooled array (backend-dependent type):
- CPU 1D:
SubArray{T,1,Vector{T},...}(parent isVector{T}) - CPU N-D:
ReshapedArray{T,N,...}(zero creation cost) - Bit (
T === Bit):BitVector/BitArray{N}(chunks-sharing, SIMD optimized) - CUDA:
CuArray{T,N}(unified N-way cache)
For CPU numeric arrays, the return types are StridedArray, compatible with BLAS and broadcasting.
For type-unspecified paths (struct fields without concrete type parameters), use unsafe_acquire! instead - cached native array instances can be reused.
Example
@with_pool pool begin
v = acquire!(pool, Float64, 100) # 1D view
m = acquire!(pool, Float64, 10, 10) # 2D view
v .= 1.0
m .= 2.0
sum(v) + sum(m)
endSee also: unsafe_acquire! for native array access.
AdaptiveArrayPools.acquire_array! — Function
acquire_array!(pool, Type{T}, dims...)Alias for unsafe_acquire!.
Explicit name emphasizing the return type is a raw Array. Use when you prefer symmetric naming with acquire_view!.
AdaptiveArrayPools.acquire_view! — Function
acquire_view!(pool, Type{T}, dims...)Alias for acquire!.
Explicit name emphasizing the return type is a view (SubArray/ReshapedArray), not a raw Array. Use when you prefer symmetric naming with acquire_array!.
AdaptiveArrayPools.checkpoint! — Method
checkpoint!(pool::AdaptiveArrayPool, types::Type...)Save state for multiple specific types. Uses @generated for zero-overhead compile-time unrolling. Increments currentdepth once for all types.
AdaptiveArrayPools.checkpoint! — Method
checkpoint!(pool::AdaptiveArrayPool)Save the current pool state (n_active counters) to internal stacks.
This is called automatically by @with_pool and related macros. After warmup, this function has zero allocation.
See also: rewind!, @with_pool
AdaptiveArrayPools.checkpoint! — Method
checkpoint!(pool::AdaptiveArrayPool, ::Type{T})Save state for a specific type only. Used by optimized macros that know which types will be used at compile time.
Also updates currentdepth and bitmask state for type touch tracking.
~77% faster than full checkpoint! when only one type is used.
AdaptiveArrayPools.default_eltype — Method
default_eltype(pool) -> TypeDefault element type for convenience functions when type is not specified. CPU pools default to Float64, CUDA pools to Float32.
Backends can override this to provide appropriate defaults.
AdaptiveArrayPools.default_eltype — Method
default_eltype(::DisabledPool{:cpu}) -> Float64Default element type for disabled CPU pools (matches Julia's zeros() default).
AdaptiveArrayPools.falses! — Method
falses!(pool, dims...) -> BitArray
falses!(pool, dims::Tuple) -> BitArrayAcquire a bit-packed boolean array filled with false from the pool.
Equivalent to Julia's falses(dims...) but using pooled memory. Uses ~8x less memory than zeros!(pool, Bool, dims...).
Example
@with_pool pool begin
bv = falses!(pool, 100) # BitVector, all false
bm = falses!(pool, 10, 10) # BitMatrix, all false
endAdaptiveArrayPools.foreach_fixed_slot — Method
foreach_fixed_slot(f, pool::AdaptiveArrayPool)Apply f to each fixed slot TypedPool. Zero allocation via compile-time unrolling.
AdaptiveArrayPools.get_bitarray! — Method
get_bitarray!(tp::BitTypedPool, dims::NTuple{N,Int}) -> BitArray{N}Get a BitArray{N} that shares chunks with the pooled BitVector.
Uses N-way cache for BitArray reuse. Unlike Array which requires unsafe_wrap for each shape, BitArray can reuse cached entries by modifying dims/len fields when ndims matches (0 bytes allocation).
Cache Strategy
- Exact match: Return cached BitArray directly (0 bytes)
- Same ndims: Modify dims/len/chunks of cached entry (0 bytes)
- Different ndims: Create new BitArray{N} and cache it (~944 bytes)
Implementation Notes
- BitVector (N=1):
size()useslenfield,dimsis ignored - BitArray{N>1}:
size()usesdimsfield - All BitArrays share
chunkswith the pool's backing BitVector
Safety
The returned BitArray is only valid within the @with_pool scope. Do NOT use after the scope ends (use-after-free risk).
AdaptiveArrayPools.get_nd_array! — Method
get_nd_array!(tp::AbstractTypedPool{T}, dims::NTuple{N,Int}) -> Array{T,N}Get an N-dimensional Array from the pool with N-way caching.
AdaptiveArrayPools.get_nd_view! — Method
get_nd_view!(tp::AbstractTypedPool{T}, dims::NTuple{N,Int})Get an N-dimensional view via reshape (zero creation cost).
AdaptiveArrayPools.get_task_local_cuda_pool — Function
get_task_local_cuda_pool() -> CuAdaptiveArrayPoolRetrieves (or creates) the CUDA pool for the current Task and current GPU device.
Requires CUDA.jl to be loaded. Throws an error if CUDA extension is not available.
See also: get_task_local_pool for CPU pools.
AdaptiveArrayPools.get_task_local_cuda_pools — Function
get_task_local_cuda_pools() -> Dict{Int, CuAdaptiveArrayPool}Returns the dictionary of all CUDA pools for the current task (one per device).
Requires CUDA.jl to be loaded. Throws an error if CUDA extension is not available.
AdaptiveArrayPools.get_task_local_pool — Method
get_task_local_pool() -> AdaptiveArrayPoolRetrieves (or creates) the AdaptiveArrayPool for the current Task.
Each Task gets its own pool instance via task_local_storage(), ensuring thread safety without locks.
AdaptiveArrayPools.get_view! — Method
get_view!(tp::AbstractTypedPool{T}, n::Int)Get a 1D vector view of size n from the typed pool. Returns cached view on hit (zero allocation), creates new on miss.
AdaptiveArrayPools.ones! — Method
ones!(pool, dims...) -> view
ones!(pool, T, dims...) -> view
ones!(pool, dims::Tuple) -> view
ones!(pool, T, dims::Tuple) -> viewAcquire a one-initialized array from the pool.
Equivalent to acquire!(pool, T, dims...) followed by fill!(arr, one(T)). Default element type depends on pool backend (CPU: Float64, CUDA: Float32). See default_eltype.
Example
@with_pool pool begin
v = ones!(pool, 100) # Uses default_eltype(pool)
m = ones!(pool, Float32, 10, 10) # Matrix{Float32} view, all ones
endAdaptiveArrayPools.pool_stats — Method
pool_stats(tp::AbstractTypedPool; io::IO=stdout, indent::Int=0, name::String="")Print statistics for a TypedPool or BitTypedPool.
AdaptiveArrayPools.pool_stats — Method
pool_stats(pool::AdaptiveArrayPool; io::IO=stdout)Print detailed statistics about pool usage with colored output.
Example
pool = AdaptiveArrayPool()
@with_pool pool begin
v = acquire!(pool, Float64, 100)
pool_stats(pool)
endAdaptiveArrayPools.pool_stats — Method
pool_stats(:cpu; io::IO=stdout)Print statistics for the CPU task-local pool only.
AdaptiveArrayPools.pool_stats — Method
pool_stats(:cuda; io::IO=stdout)Print statistics for CUDA task-local pools. Requires CUDA.jl to be loaded.
AdaptiveArrayPools.pool_stats — Method
pool_stats(; io::IO=stdout)Print statistics for all task-local pools (CPU and CUDA if loaded).
Example
@with_pool begin
v = acquire!(pool, Float64, 100)
pool_stats() # Shows all pool stats
endAdaptiveArrayPools.pooling_enabled — Method
pooling_enabled(pool) -> BoolReturns true if pool is an active pool, false if pooling is disabled.
Examples
@maybe_with_pool pool begin
if pooling_enabled(pool)
# Using pooled memory
else
# Using standard allocation
end
endSee also: DisabledPool
AdaptiveArrayPools.reset! — Method
reset!(tp::AbstractTypedPool)Reset state without clearing allocated storage. Sets n_active = 0 and restores checkpoint stacks to sentinel state.
AdaptiveArrayPools.reset! — Method
reset!(pool::AdaptiveArrayPool, types::Type...)Reset state for multiple specific types. Uses @generated for zero-overhead compile-time unrolling.
See also: reset!(::AdaptiveArrayPool), rewind!
AdaptiveArrayPools.reset! — Method
reset!(pool::AdaptiveArrayPool)Reset pool state without clearing allocated storage.
This function:
- Resets all
n_activecounters to 0 - Restores all checkpoint stacks to sentinel state
- Resets
_current_depthand type touch tracking state
Unlike empty!, this preserves all allocated vectors, views, and N-D arrays for reuse, avoiding reallocation costs.
Use Case
When functions that acquire from the pool are called without proper checkpoint!/rewind! management, n_active can grow indefinitely. Use reset! to cleanly restore the pool to its initial state while keeping allocated memory available.
Example
pool = AdaptiveArrayPool()
# Some function that acquires without checkpoint management
function compute!(pool)
v = acquire!(pool, Float64, 100)
# ... use v ...
# No rewind! called
end
for _ in 1:1000
compute!(pool) # n_active grows each iteration
end
reset!(pool) # Restore state, keep allocated memory
# Now pool.n_active == 0, but vectors are still available for reuseAdaptiveArrayPools.reset! — Method
reset!(pool::AdaptiveArrayPool, ::Type{T})Reset state for a specific type only. Clears n_active and checkpoint stacks to sentinel state while preserving allocated vectors.
See also: reset!(::AdaptiveArrayPool), rewind!
AdaptiveArrayPools.rewind! — Method
rewind!(pool::AdaptiveArrayPool, types::Type...)Restore state for multiple specific types in reverse order. Decrements currentdepth once after all types are rewound.
AdaptiveArrayPools.rewind! — Method
rewind!(pool::AdaptiveArrayPool)Restore the pool state (nactive counters) from internal stacks. Uses _checkpointdepths to accurately determine which entries to pop vs restore.
Only the counters are restored; allocated memory remains for reuse. Handles touched types by checking checkpointdepths for accurate restoration.
Safety: If called at global scope (depth=1, no pending checkpoints), automatically delegates to reset! to safely clear all n_active counters.
See also: checkpoint!, reset!, @with_pool
AdaptiveArrayPools.rewind! — Method
rewind!(pool::AdaptiveArrayPool, ::Type{T})Restore state for a specific type only. Also updates currentdepth and bitmask state.
AdaptiveArrayPools.safe_prod — Method
safe_prod(dims::NTuple{N, Int}) -> IntCompute the product of dimensions with overflow checking.
Throws OverflowError if the product exceeds typemax(Int), preventing memory corruption from integer overflow in unsafe_wrap operations.
Rationale
Without overflow checking, large dimensions like (10^10, 10^10) would wrap around to a small value, causing unsafe_wrap to create an array view that indexes beyond allocated memory.
Performance
Adds ~0.3-1.2 ns overhead (<1%) compared to unchecked prod(), which is negligible relative to the 100-200 ns cost of the full allocation path.
AdaptiveArrayPools.set_cache_ways! — Method
set_cache_ways!(n::Int)Set the number of cache ways for N-D array caching. Requires Julia restart to take effect.
Higher values reduce cache eviction but increase memory usage per slot.
Arguments
n::Int: Number of cache ways (valid range: 1-16)
Example
using AdaptiveArrayPools
AdaptiveArrayPools.set_cache_ways!(8) # Double the default
# Restart Julia to apply the changeAdaptiveArrayPools.similar! — Method
similar!(pool, array) -> view
similar!(pool, array, T) -> view
similar!(pool, array, dims...) -> view
similar!(pool, array, T, dims...) -> viewAcquire an uninitialized array from the pool, using a template array for defaults.
similar!(pool, A): same element type and size asAsimilar!(pool, A, T): element typeT, same size asAsimilar!(pool, A, dims...): same element type asA, specified dimensionssimilar!(pool, A, T, dims...): element typeT, specified dimensions
Example
A = rand(10, 10)
@with_pool pool begin
B = similar!(pool, A) # Same type and size
C = similar!(pool, A, Float32) # Float32, same size
D = similar!(pool, A, 5, 5) # Same type, different size
E = similar!(pool, A, Int, 20) # Int, 1D
endAdaptiveArrayPools.trues! — Method
trues!(pool, dims...) -> BitArray
trues!(pool, dims::Tuple) -> BitArrayAcquire a bit-packed boolean array filled with true from the pool.
Equivalent to Julia's trues(dims...) but using pooled memory. Uses ~8x less memory than ones!(pool, Bool, dims...).
Example
@with_pool pool begin
bv = trues!(pool, 100) # BitVector, all true
bm = trues!(pool, 10, 10) # BitMatrix, all true
endAdaptiveArrayPools.unsafe_acquire! — Method
unsafe_acquire!(pool, x::AbstractArray) -> ArrayAcquire a raw array with the same element type and size as x (similar to similar(x)).
Example
A = rand(10, 10)
@with_pool pool begin
B = unsafe_acquire!(pool, A) # Matrix{Float64}, same size as A
B .= A .* 2
endAdaptiveArrayPools.unsafe_acquire! — Method
unsafe_acquire!(pool, Type{T}, n) -> backend's native array type
unsafe_acquire!(pool, Type{T}, dims...) -> backend's native array type
unsafe_acquire!(pool, Type{T}, dims::NTuple{N,Int}) -> backend's native array typeAcquire a native array backed by pool memory.
Returns the backend's native array type:
- CPU:
Array{T,N}(viaunsafe_wrap) - Bit (
T === Bit):BitVector/BitArray{N}(chunks-sharing; equivalent toacquire!) - CUDA:
CuArray{T,N}(via unified view cache)
For CPU pools, since Array instances are mutable references, cached instances can be returned directly without creating new wrapper objects—ideal for type-unspecified paths. For CUDA pools, this delegates to the same unified N-way cache as acquire!.
Safety Warning
The returned array is only valid within the @with_pool scope. Using it after the scope ends leads to undefined behavior (use-after-free, data corruption).
Do NOT call resize!, push!, or append! on returned arrays - this causes undefined behavior as the memory is owned by the pool.
When to Use
- Type-unspecified paths: Struct fields without concrete type parameters (e.g.,
_pooled_chain::PooledChaininstead of_pooled_chain::PooledChain{M}) - FFI calls expecting raw pointers
- APIs that strictly require native array types
Allocation Behavior
- CPU: Cache hit 0 bytes, cache miss ~112 bytes (Array header via
unsafe_wrap) - CUDA: Cache hit ~0 bytes, cache miss ~80 bytes (CuArray wrapper creation)
Example
@with_pool pool begin
A = unsafe_acquire!(pool, Float64, 100, 100) # Matrix{Float64} (CPU) or CuMatrix{Float64} (CUDA)
B = unsafe_acquire!(pool, Float64, 100, 100)
C = similar(A) # Regular allocation for result
mul!(C, A, B) # BLAS uses A, B directly
end
# A and B are INVALID after this point!See also: acquire! for view-based access.
AdaptiveArrayPools.unsafe_ones! — Method
unsafe_ones!(pool, dims...) -> Array
unsafe_ones!(pool, T, dims...) -> Array
unsafe_ones!(pool, dims::Tuple) -> Array
unsafe_ones!(pool, T, dims::Tuple) -> ArrayAcquire a one-initialized raw array (not a view) from the pool.
Equivalent to unsafe_acquire!(pool, T, dims...) followed by fill!(arr, one(T)). Default element type depends on pool backend (CPU: Float64, CUDA: Float32). See default_eltype.
Example
@with_pool pool begin
v = unsafe_ones!(pool, 100) # Uses default_eltype(pool)
m = unsafe_ones!(pool, Float32, 10, 10) # Array{Float32}, all ones
endSee also: unsafe_zeros!, ones!, unsafe_acquire!
AdaptiveArrayPools.unsafe_similar! — Method
unsafe_similar!(pool, array) -> Array
unsafe_similar!(pool, array, T) -> Array
unsafe_similar!(pool, array, dims...) -> Array
unsafe_similar!(pool, array, T, dims...) -> ArrayAcquire an uninitialized raw array (not a view) from the pool, using a template array for defaults.
unsafe_similar!(pool, A): same element type and size asAunsafe_similar!(pool, A, T): element typeT, same size asAunsafe_similar!(pool, A, dims...): same element type asA, specified dimensionsunsafe_similar!(pool, A, T, dims...): element typeT, specified dimensions
Example
A = rand(10, 10)
@with_pool pool begin
B = unsafe_similar!(pool, A) # Same type and size, raw array
C = unsafe_similar!(pool, A, Float32) # Float32, same size
D = unsafe_similar!(pool, A, 5, 5) # Same type, different size
endSee also: similar!, unsafe_acquire!
AdaptiveArrayPools.unsafe_zeros! — Method
unsafe_zeros!(pool, dims...) -> Array
unsafe_zeros!(pool, T, dims...) -> Array
unsafe_zeros!(pool, dims::Tuple) -> Array
unsafe_zeros!(pool, T, dims::Tuple) -> ArrayAcquire a zero-initialized raw array (not a view) from the pool.
Equivalent to unsafe_acquire!(pool, T, dims...) followed by fill!(arr, zero(T)). Default element type depends on pool backend (CPU: Float64, CUDA: Float32). See default_eltype.
Example
@with_pool pool begin
v = unsafe_zeros!(pool, 100) # Uses default_eltype(pool)
m = unsafe_zeros!(pool, Float32, 10, 10) # Array{Float32}, all zeros
endSee also: unsafe_ones!, zeros!, unsafe_acquire!
AdaptiveArrayPools.zeros! — Method
zeros!(pool, dims...) -> view
zeros!(pool, T, dims...) -> view
zeros!(pool, dims::Tuple) -> view
zeros!(pool, T, dims::Tuple) -> viewAcquire a zero-initialized array from the pool.
Equivalent to acquire!(pool, T, dims...) followed by fill!(arr, zero(T)). Default element type depends on pool backend (CPU: Float64, CUDA: Float32). See default_eltype.
Example
@with_pool pool begin
v = zeros!(pool, 100) # Uses default_eltype(pool)
m = zeros!(pool, Float32, 10, 10) # Matrix{Float32} view, all zeros
endBase.empty! — Method
empty!(tp::BitTypedPool)Clear all internal storage for BitTypedPool, releasing all memory. Restores sentinel values for 1-based sentinel pattern.
Base.empty! — Method
empty!(tp::TypedPool)Clear all internal storage for TypedPool, releasing all memory. Restores sentinel values for 1-based sentinel pattern.
Base.empty! — Method
empty!(pool::AdaptiveArrayPool)Completely clear the pool, releasing all stored vectors and resetting all state.
This is useful when you want to free memory or start fresh without creating a new pool instance.
Example
pool = AdaptiveArrayPool()
v = acquire!(pool, Float64, 1000)
# ... use v ...
empty!(pool) # Release all memoryWarning
Any SubArrays previously acquired from this pool become invalid after empty!.
AdaptiveArrayPools.AbstractArrayPool — Type
AbstractArrayPoolAbstract base for multi-type array pools.
AdaptiveArrayPools.AbstractTypedPool — Type
AbstractTypedPool{T, V<:AbstractVector{T}}Abstract base for type-specific memory pools.
AdaptiveArrayPools.AdaptiveArrayPool — Type
AdaptiveArrayPoolMulti-type memory pool with fixed slots for common types and IdDict fallback for others. Zero allocation after warmup. NOT thread-safe - use one pool per Task.
AdaptiveArrayPools.BackendNotLoadedError — Type
BackendNotLoadedError <: ExceptionError thrown when a backend-specific operation is attempted but the backend package is not loaded.
Example
@maybe_with_pool :cuda pool begin
zeros!(pool, 10) # Throws if CUDA.jl not loaded
endAdaptiveArrayPools.Bit — Type
BitSentinel type for bit-packed boolean storage via BitVector.
Use Bit instead of Bool in pool operations to get memory-efficient bit-packed arrays (1 bit per element vs 1 byte for Vector{Bool}).
Usage
@with_pool pool begin
# BitVector (1 bit per element, ~8x memory savings)
bv = acquire!(pool, Bit, 1000)
# vs Vector{Bool} (1 byte per element)
vb = acquire!(pool, Bool, 1000)
# Convenience functions work too
mask = falses!(pool, 100) # BitVector filled with false
flags = trues!(pool, 100) # BitVector filled with true
endReturn Types (Unified for Performance)
Unlike other types, Bit always returns native BitVector/BitArray:
- 1D:
BitVector(bothacquire!andunsafe_acquire!) - N-D:
BitArray{N}(reshaped, preserves SIMD optimization)
This design ensures users always get SIMD-optimized performance without needing to remember which API to use.
Performance
BitVector operations like count(), sum(), and bitwise operations are ~(10x ~ 100x) faster than equivalent operations on SubArray{Bool} because they use SIMD-optimized algorithms on packed 64-bit chunks.
@with_pool pool begin
bv = acquire!(pool, Bit, 10000)
fill!(bv, true)
count(bv) # Uses fast SIMD path automatically
endMemory Safety
The returned BitVector shares its internal chunks array with the pool. It is only valid within the @with_pool scope - using it after the scope ends leads to undefined behavior (use-after-free risk).
See also: trues!, falses!, BitTypedPool
AdaptiveArrayPools.BitTypedPool — Type
BitTypedPool <: AbstractTypedPool{Bool, BitVector}Specialized pool for BitVector arrays with memory reuse.
Unlike TypedPool{Bool} which stores Vector{Bool} (1 byte per element), this pool stores BitVector (1 bit per element, ~8x memory efficiency).
Unified API (Always Returns BitVector)
Unlike other types, both acquire! and unsafe_acquire! return BitVector for the Bit type. This design ensures users always get SIMD-optimized performance without needing to choose between APIs.
acquire!(pool, Bit, n)→BitVector(SIMD optimized)unsafe_acquire!(pool, Bit, n)→BitVector(same behavior)trues!(pool, n)→BitVectorfilled withtruefalses!(pool, n)→BitVectorfilled withfalse
Fields
vectors: BackingBitVectorstoragend_arrays: Cached wrapper BitVectors (chunks sharing)nd_dims: Cached lengths for wrapper cache validationnd_ptrs: Cached chunk pointers for invalidation detectionnd_next_way: Round-robin counter for N-way cachen_active: Count of currently active arrays_checkpoint_*: State management stacks (1-based sentinel pattern)
Usage
@with_pool pool begin
# All return BitVector with SIMD performance
bv = acquire!(pool, Bit, 100) # BitVector
count(bv) # Fast SIMD path
# Convenience functions
t = trues!(pool, 50) # BitVector filled with true
f = falses!(pool, 50) # BitVector filled with false
endPerformance
Operations like count(), sum(), and bitwise operations are ~(10x ~ 100x) faster than equivalent operations on SubArray{Bool} because BitVector uses SIMD-optimized algorithms on packed 64-bit chunks.
AdaptiveArrayPools.DisabledPool — Type
DisabledPool{Backend}Sentinel type for disabled pooling that preserves backend context. When USE_POOLING=false (compile-time) or MAYBE_POOLING_ENABLED[]=false (runtime), macros return DisabledPool{backend}() instead of nothing.
Backend symbols:
:cpu- Standard Julia arrays:cuda- CUDA.jl CuArrays (defined in extension)
This enables @with_pool :cuda to return correct array types even when pooling is off.
Example
# When USE_POOLING=false:
@with_pool :cuda pool begin
v = zeros!(pool, 10) # Returns CuArray{Float32}, not Array{Float64}!
endSee also: pooling_enabled, DISABLED_CPU
AdaptiveArrayPools.TypedPool — Type
TypedPool{T} <: AbstractTypedPool{T, Vector{T}}Internal structure managing pooled vectors for a specific element type T.
Fields
Storage
vectors: BackingVector{T}storage (actual memory allocation)
1D Cache (for acquire!(pool, T, n))
views: CachedSubArrayviews for zero-allocation 1D accessview_lengths: Cached lengths for fast Int comparison (SoA pattern)
N-D Array Cache (for unsafe_acquire! only, N-way set associative)
nd_arrays: Cached N-DArrayobjects (length = slots × CACHE_WAYS)nd_dims: Cached dimension tuples for cache hit validationnd_ptrs: Cached pointer values to detect backing vector resizend_next_way: Round-robin counter per slot (length = slots)
State Management (1-based sentinel pattern)
n_active: Count of currently active (checked-out) arrays_checkpoint_n_active: Saved n_active values at each checkpoint (sentinel:[0])_checkpoint_depths: Depth of each checkpoint entry (sentinel:[0])
Note
acquire! for N-D returns ReshapedArray (zero creation cost), so no caching needed. Only unsafe_acquire! benefits from N-D caching since unsafe_wrap allocates 112 bytes.
AdaptiveArrayPools.CACHE_WAYS — Constant
Number of cache ways per slot for N-way set associative cache. Supports up to CACHE_WAYS different dimension patterns per slot without thrashing.
Default: 4 (handles most use cases well)
Configuration
using AdaptiveArrayPools
AdaptiveArrayPools.set_cache_ways!(8) # Restart Julia to take effectOr manually in LocalPreferences.toml:
[AdaptiveArrayPools]
cache_ways = 8Valid range: 1-16 (higher values increase memory but reduce eviction)
AdaptiveArrayPools.DISABLED_CPU — Constant
DISABLED_CPUSingleton instance for disabled CPU pooling. Used by macros when USE_POOLING=false without backend specification.
AdaptiveArrayPools.FIXED_SLOT_FIELDS — Constant
FIXED_SLOT_FIELDSField names for fixed slot TypedPools. Single source of truth for foreach_fixed_slot.
When modifying, also update: struct definition, get_typed_pool! dispatches, constructor. Tests verify synchronization automatically.
AdaptiveArrayPools.MAYBE_POOLING_ENABLED — Constant
MAYBE_POOLING_ENABLEDRuntime flag for @maybe_with_pool macro only. When false, @maybe_with_pool will use nothing as the pool, causing acquire! to allocate normally.
Note: This only affects @maybe_with_pool. @with_pool ignores this flag (always uses pooling).
For complete removal of pooling overhead at compile time, use USE_POOLING instead.
Default: true
AdaptiveArrayPools.POOL_DEBUG — Constant
POOL_DEBUGWhen true, @with_pool macros validate that returned values don't reference pool memory (which would be unsafe).
Default: false
AdaptiveArrayPools.USE_POOLING — Constant
USE_POOLING::BoolCompile-time constant (master switch) to completely disable pooling. When false, all macros (@with_pool, @maybe_with_pool) generate code that uses nothing as the pool, causing acquire! to fall back to normal allocation.
This enables zero-overhead when pooling is disabled, as the compiler can eliminate all pool-related code paths.
Configuration via Preferences.jl
Set in your project's LocalPreferences.toml:
[AdaptiveArrayPools]
use_pooling = falseOr programmatically (requires restart):
using Preferences
Preferences.set_preferences!(AdaptiveArrayPools, "use_pooling" => false)Default: true