Numerical Methods

You are an expert in numerical stability and computational aspects of statistical methods.

Floating-Point Fundamentals

IEEE 754 Double Precision

Precision: ~15-17 significant decimal digits
Range: ~10⁻³⁰⁸ to 10³⁰⁸
Machine epsilon: ε ≈ 2.2 × 10⁻¹⁶
Special values: Inf, -Inf, NaN

Key Constants in R

.Machine$double.eps # ~2.22e-16 (machine epsilon) .Machine$double.xmax # ~1.80e+308 (max finite) .Machine$double.xmin # ~2.23e-308 (min positive normalized) .Machine$double.neg.eps # ~1.11e-16 (negative epsilon)

Common Numerical Issues

Catastrophic Cancellation

When subtracting nearly equal numbers:

BAD: loses precision

x <- 1e10 + 1 y <- 1e10 result <- x - y # Should be 1, may have errors

BETTER: reformulate to avoid subtraction

Example: Computing variance

var_bad <- mean(x^2) - mean(x)^2 # Can be negative! var_good <- sum((x - mean(x))^2) / (n-1) # Always non-negative

Overflow/Underflow

BAD: overflow

prod(1:200) # Inf

GOOD: work on log scale

sum(log(1:200)) # Then exp() if needed

BAD: underflow in probabilities

prod(dnorm(x)) # 0 for large x

GOOD: sum log probabilities

sum(dnorm(x, log = TRUE))

Log-Sum-Exp Trick

Essential for working with log probabilities:

log_sum_exp <- function(log_x) { max_log <- max(log_x) if (is.infinite(max_log)) return(max_log) max_log + log(sum(exp(log_x - max_log))) }

Example: log(exp(-1000) + exp(-1001))

log_sum_exp(c(-1000, -1001)) # Correct: ~-999.69 log(exp(-1000) + exp(-1001)) # Wrong: -Inf

Softmax Stability

BAD

softmax_bad <- function(x) exp(x) / sum(exp(x))

GOOD

softmax <- function(x) { x_max <- max(x) exp_x <- exp(x - x_max) exp_x / sum(exp_x) }

Matrix Computations

Conditioning

The condition number κ(A) measures sensitivity to perturbation:

κ(A) = ‖A‖ · ‖A⁻¹‖
Rule: Expect to lose log₁₀(κ) digits of accuracy
κ > 10¹⁵ means matrix is numerically singular

Check condition number

kappa(X, exact = TRUE)

For regression: check X'X conditioning

kappa(crossprod(X))

Solving Linear Systems

Prefer: Decomposition methods over explicit inversion

BAD: explicit inverse

beta <- solve(t(X) %% X) %% t(X) %*% y

GOOD: QR decomposition

beta <- qr.coef(qr(X), y)

BETTER for positive definite: Cholesky

R <- chol(crossprod(X)) beta <- backsolve(R, forwardsolve(t(R), crossprod(X, y)))

For ill-conditioned: SVD/pseudoinverse

beta <- MASS::ginv(X) %*% y

Symmetric Positive Definite Matrices

Always use specialized methods:

Cholesky for SPD

L <- chol(Sigma)

Eigendecomposition

eig <- eigen(Sigma, symmetric = TRUE)

Check positive definiteness

all(eigen(Sigma, symmetric = TRUE, only.values = TRUE)$values > 0)

Optimization Stability

Gradient Computation

Numerical gradient (for verification)

numerical_grad <- function(f, x, h = sqrt(.Machine$double.eps)) { sapply(seq_along(x), function(i) { x_plus <- x_minus <- x x_plus[i] <- x[i] + h x_minus[i] <- x[i] - h (f(x_plus) - f(x_minus)) / (2 * h) }) }

Central difference is O(h²) accurate

Forward difference is O(h) accurate

Hessian Stability

Check Hessian is positive definite at optimum

check_hessian <- function(H, tol = 1e-8) { eigs <- eigen(H, symmetric = TRUE, only.values = TRUE)$values min_eig <- min(eigs)

list( positive_definite = min_eig > tol, min_eigenvalue = min_eig, condition_number = max(eigs) / min_eig ) }

Line Search

For gradient descent stability:

backtracking_line_search <- function(f, x, d, grad, alpha = 1, rho = 0.5, c = 1e-4) {

Armijo condition

while (f(x + alpha * d) > f(x) + c * alpha * sum(grad * d)) { alpha <- rho * alpha if (alpha < 1e-10) break } alpha }

Integration and Quadrature

Numerical Integration Guidelines

Adaptive quadrature (default choice)

integrate(f, lower, upper)

For infinite limits

integrate(f, -Inf, Inf)

For highly oscillatory or peaked functions

Increase subdivisions

integrate(f, lower, upper, subdivisions = 1000)

For known singularities, split the domain

Monte Carlo Integration

mc_integrate <- function(f, n, lower, upper) { x <- runif(n, lower, upper) fx <- sapply(x, f)

estimate <- (upper - lower) * mean(fx) se <- (upper - lower) * sd(fx) / sqrt(n)

list(value = estimate, se = se) }

Root Finding

Newton-Raphson Stability

newton_raphson <- function(f, df, x0, tol = 1e-8, max_iter = 100) { x <- x0 for (i in 1:max_iter) { fx <- f(x) dfx <- df(x)

# Check for near-zero derivative
if (abs(dfx) &#x3C; .Machine$double.eps * 100) {
  warning("Near-zero derivative")
  break
}

x_new &#x3C;- x - fx / dfx

if (abs(x_new - x) &#x3C; tol) break
x &#x3C;- x_new

} x }

Brent's Method

For robust root finding without derivatives:

uniroot(f, interval = c(lower, upper), tol = .Machine$double.eps^0.5)

Statistical Computing Patterns

Safe Likelihood Computation

Always work with log-likelihood

log_lik <- function(theta, data) {

Compute log-likelihood, not likelihood

sum(dnorm(data, mean = theta[1], sd = theta[2], log = TRUE)) }

Robust Standard Errors

Sandwich estimator with numerical stability

sandwich_se <- function(score, hessian) {

Check Hessian conditioning

H_inv <- tryCatch( solve(hessian), error = function(e) MASS::ginv(hessian) )

meat <- crossprod(score) V <- H_inv %% meat %% H_inv

sqrt(diag(V)) }

Bootstrap with Error Handling

safe_bootstrap <- function(data, statistic, R = 1000) { results <- numeric(R) failures <- 0

for (i in 1:R) { boot_data <- data[sample(nrow(data), replace = TRUE), ] result <- tryCatch( statistic(boot_data), error = function(e) NA ) results[i] <- result if (is.na(result)) failures <- failures + 1 }

if (failures > 0.1 * R) { warning(sprintf("%.1f%% bootstrap failures", 100 * failures / R)) }

list( estimate = mean(results, na.rm = TRUE), se = sd(results, na.rm = TRUE), failures = failures ) }

Debugging Numerical Issues

Diagnostic Checklist

Check for NaN/Inf: any(is.nan(x)) , any(is.infinite(x))
Check conditioning: kappa(matrix)
Check eigenvalues: For PD matrices
Check gradients: Numerically vs analytically
Check scale: Variables on similar scales?

Debugging Functions

Trace NaN/Inf sources

debug_numeric <- function(x, name = "x") { cat(sprintf("%s: range [%.3g, %.3g], ", name, min(x), max(x))) cat(sprintf("NaN: %d, Inf: %d, -Inf: %d\n", sum(is.nan(x)), sum(x == Inf), sum(x == -Inf))) }

Check relative error

rel_error <- function(computed, true) { abs(computed - true) / max(abs(true), 1) }

Best Practices Summary

Always work on log scale for products of probabilities
Use QR or Cholesky instead of matrix inversion
Check conditioning before solving linear systems
Center and scale predictors in regression
Handle edge cases (empty data, singular matrices)
Use existing implementations (LAPACK, BLAS) when possible
Test with extreme values (very small, very large, near-zero)
Compare analytical and numerical gradients
Monitor convergence in iterative algorithms
Document numerical assumptions and limitations

Key References

Higham
Golub & Van Loan