Metal Backend

AdaptiveArrayPools provides native Apple Silicon GPU support through a package extension that loads automatically when Metal.jl is available. Requires Julia 1.11+.

Quick Start

using AdaptiveArrayPools, Metal

# Use :metal backend for Apple Silicon GPU arrays
@with_pool :metal pool function gpu_computation(n)
    A = acquire!(pool, Float32, n, n)  # MtlArray
    B = acquire!(pool, Float32, n, n)  # MtlArray

    fill!(A, 1.0f0)
    fill!(B, 2.0f0)

    return sum(A .+ B)
end

# Zero GPU allocation in hot loops
for i in 1:1000
    gpu_computation(100)  # GPU memory reused from pool
end

API

The Metal backend uses the same API as CPU and CUDA, with :metal backend specifier:

Macro/FunctionDescription
@with_pool :metal pool exprGPU pool with automatic checkpoint/rewind
acquire!(pool, T, dims...)Returns MtlArray (always 0 bytes GPU alloc)
acquire_view!(pool, T, dims...)Returns MtlArray (same as acquire! on Metal)
get_task_local_metal_pool()Returns the task-local Metal pool
pool_stats(:metal)Print Metal pool statistics

Return Types

Function1D ReturnN-D Return
acquire!MtlArray{T,1}MtlArray{T,N}
acquire_view!MtlArray{T,1}MtlArray{T,N}

Allocation Behavior

GPU Memory: Always 0 bytes allocation after warmup. The underlying MtlVector is resized as needed and reused.

CPU-side Wrapper Memory (for acquire! N-D on Metal):

  • The Metal backend uses arr_wrappers-based direct-index caching for MtlArray wrapper reuse
  • Each dimensionality N has one cached wrapper per slot, reused via setfield!(:dims)
  • After warmup: zero CPU-side allocation for any number of dimension patterns (same N)
  • Different N values each get their own cached wrapper (also zero-alloc after first use)

Fixed Slot Types

Metal hardware does not support Float64 or ComplexF64. The following types have optimized pre-allocated slots:

TypeField
Float32.float32
Float16.float16
Int64.int64
Int32.int32
ComplexF32.complexf32
Bool.bool

Other types use the fallback dictionary (.others).

No Float64/ComplexF64

Apple Silicon GPUs do not natively support 64-bit floating point. Use Float32 or Float16 instead.

Limitations

  • No Float64/ComplexF64: Apple Silicon GPUs do not natively support 64-bit floating point
  • No @maybe_with_pool :metal: Runtime toggle not supported for Metal backend
  • Single-device only: Tested on single Apple GPU (multi-device untested)
  • Julia 1.11+: Required for setfield!-based Array internals used by GPU extensions
  • Task-local only: Each Task gets its own Metal pool, same as CPU

Example: Matrix Computation

using AdaptiveArrayPools, Metal

@with_pool :metal pool function gpu_compute(n)
    A = acquire!(pool, Float32, n, n)
    B = acquire!(pool, Float32, n, n)
    C = acquire!(pool, Float32, n, n)

    fill!(A, 1.0f0); fill!(B, 2.0f0)
    C .= A .+ B

    return sum(C)
end

# Warmup
gpu_compute(100)

# Benchmark - zero GPU allocation
using BenchmarkTools
@benchmark gpu_compute(1000)

Debugging

# Check pool state
pool_stats(:metal)

# Output:
# MetalAdaptiveArrayPool
#   Float32 (fixed) [Metal]
#     slots: 3 (active: 0)
#     elements: 30000 (117.188 KiB)

CUDA vs Metal

FeatureCUDAMetal
Backend symbol:cuda:metal
Array typeCuArrayMtlArray
Float64 supportYesNo
ComplexF64 supportYesNo
Julia requirement1.11+1.11+
Safety featuresFullFull
Lazy modeYesYes