Cubic Adjoint (ND)

The CubicAdjointND operator computes $\bar{\mathbf{f}} = W^\top \bar{\mathbf{y}}$ for N-dimensional cubic spline interpolation — mapping query-space sensitivities back to grid-space.

It is query-baked and data-free: constructed from grids and query points only. The same operator can be applied to any $\bar{\mathbf{y}}$ vector.

Quick Start

using FastInterpolations
using LinearAlgebra: dot

x = range(0.0, 1.0, 20)
y = range(0.0, 2.0, 15)
xq = range(0.1, 0.9, 10)
yq = range(0.2, 1.8, 10)
data = [sin(xi) * cos(yj) for xi in x, yj in y]

# Build the adjoint operator (grids + queries only, no data)
adj = cubic_adjoint((x, y), (xq, yq))

CubicAdjointND{Float64, 2}
├─ Grids: 2D, 20×15 points
│  ├─ x₁: Range ∈ [0, 1]
│  └─ x₂: Range ∈ [0, 2]
├─ Query:  10 points
└─ BC: CubicFit (all axes)

# Apply to a sensitivity vector
ȳ = randn(10)
f̄ = adj(ȳ)
size(f̄)    # (20, 15) — same shape as data grid

(20, 15)

Verify correctness via the dot-product identity $\langle W f, \bar{y} \rangle = \langle f, W^\top \bar{y} \rangle$:

itp = cubic_interp((x, y), data)
Wf = itp((xq, yq))               # forward: W·f
WTy = adj(ȳ)                      # adjoint: Wᵀ·ȳ

@assert dot(Wf, ȳ) ≈ dot(vec(data), vec(WTy))
println("⟨Wf, ȳ⟩ = ", dot(Wf, ȳ))
println("⟨f, Wᵀȳ⟩ = ", dot(vec(data), vec(WTy)))

⟨Wf, ȳ⟩ = 0.040093889887293586
⟨f, Wᵀȳ⟩ = 0.04009388988729363

Constructor

cubic_adjoint(grids, queries; bc=CubicFit(), extrap=NoExtrap()) -> CubicAdjointND

The constructor accepts the same kwargs as cubic_interp. bc and extrap accept a single value (broadcast to all axes) or a per-axis N-tuple.

See Adjoint API Reference for full docstrings.

There are two ways to compute $\partial L / \partial \mathbf{f}$ through an ND cubic interpolation. The native API is faster because it avoids AD overhead; AD is convenient when the interpolation is embedded in a larger differentiable pipeline.

Native Adjoint (Recommended for Performance)

Construct once, apply many times with different $\bar{\mathbf{y}}$:

adj = cubic_adjoint((x, y), (xq, yq))

# Gradient of L2 loss: ∂/∂f ||W·f - y_obs||²  =  2·Wᵀ·(W·f - y_obs)
y_obs = randn(10)
residual = itp((xq, yq)) .- y_obs
∇f = adj(2.0 .* residual)
size(∇f)

(20, 15)

In-place version for zero allocation in hot loops:

∇f_buf = zeros(20, 15)
adj(∇f_buf, 2.0 .* residual)    # zeros + accumulates into ∇f_buf
∇f_buf[1:3, 1:3]

3×3 Matrix{Float64}:
 -0.00165808   0.0197517   0.0189784
  0.0129691   -0.154469   -0.148172
  0.178113    -2.09182    -1.67839

Derivative adjoint — adjoint of $\partial^k / \partial x_i^k$ instead of value interpolation:

f̄ = adj(ȳ; deriv=DerivOp(1))                      # ∂/∂x broadcast to all axes
f̄ = adj(ȳ; deriv=(DerivOp(1), EvalValue()))        # mixed partial ∂/∂x₁ only

AD Integration (Zygote / Enzyme)

Zygote and Enzyme use CubicAdjointND internally via registered rrules. You get native adjoint performance through the standard AD interface:

using Zygote

# ∂loss/∂data — uses CubicAdjointND internally
∇data = Zygote.gradient(
    d -> sum(abs2, cubic_interp((x, y), d, (xq, yq)) .- y_obs), data
)[1]

# Derivative adjoint also works through AD
∇data_d1 = Zygote.gradient(
    d -> sum(cubic_interp((x, y), d, (xq, yq); deriv=DerivOp(1))), data
)[1]

using Enzyme

loss(d, g, q, y_obs) = sum(abs2, cubic_interp(g, d, q) .- y_obs)
df = zeros(20, 15)
Enzyme.autodiff(Enzyme.Reverse, loss, Enzyme.Active,
    Enzyme.Duplicated(copy(data), df),
    Enzyme.Const((x, y)), Enzyme.Const((xq, yq)), Enzyme.Const(y_obs))
# df now contains ∂loss/∂data

For full backend compatibility details, see Adjoint via AD.

Matrix Materialization

For verification and debugging, you can materialize the full weight matrix. This is $O(n \times m)$ and not intended for production use.

adj_small = cubic_adjoint(
    (range(0.0, 1.0, 5), range(0.0, 1.0, 4)),
    (rand(3), rand(3))
)
Wᵀ = Matrix(adj_small)
size(Wᵀ)          # (20, 3) — 5×4 grid flattened

(20, 3)

Performance

Construction: $O(\sum_d n_d + m \log n_d)$ — per-axis Thomas factorization + anchor search
Apply (per call): $O(\sum_d n_d + m \cdot 4^N)$ — per-axis transpose solve + tensor-product scatter
In-place: Zero allocation after construction
Cache sharing: Reuses the same CubicSplineCache per axis as forward interpolation

Dual Cache Architecture

CubicAdjointND maintains two sets of caches: user-BC caches (for pure derivatives) and mixed-partial caches (CubicFit, for cross-axis derivatives). When the user's BC is already CubicFit, both cache sets are identical and Julia optimizes the branch away.