Numerical Methods
You are an expert in numerical stability and computational aspects of statistical methods.
Floating-Point Fundamentals
IEEE 754 Double Precision
-
Precision: ~15-17 significant decimal digits
-
Range: ~10⁻³⁰⁸ to 10³⁰⁸
-
Machine epsilon: ε ≈ 2.2 × 10⁻¹⁶
-
Special values: Inf, -Inf, NaN
Key Constants in R
.Machine$double.eps # ~2.22e-16 (machine epsilon) .Machine$double.xmax # ~1.80e+308 (max finite) .Machine$double.xmin # ~2.23e-308 (min positive normalized) .Machine$double.neg.eps # ~1.11e-16 (negative epsilon)
Common Numerical Issues
- Catastrophic Cancellation
When subtracting nearly equal numbers:
BAD: loses precision
x <- 1e10 + 1 y <- 1e10 result <- x - y # Should be 1, may have errors
BETTER: reformulate to avoid subtraction
Example: Computing variance
var_bad <- mean(x^2) - mean(x)^2 # Can be negative! var_good <- sum((x - mean(x))^2) / (n-1) # Always non-negative
- Overflow/Underflow
BAD: overflow
prod(1:200) # Inf
GOOD: work on log scale
sum(log(1:200)) # Then exp() if needed
BAD: underflow in probabilities
prod(dnorm(x)) # 0 for large x
GOOD: sum log probabilities
sum(dnorm(x, log = TRUE))
- Log-Sum-Exp Trick
Essential for working with log probabilities:
log_sum_exp <- function(log_x) { max_log <- max(log_x) if (is.infinite(max_log)) return(max_log) max_log + log(sum(exp(log_x - max_log))) }
Example: log(exp(-1000) + exp(-1001))
log_sum_exp(c(-1000, -1001)) # Correct: ~-999.69 log(exp(-1000) + exp(-1001)) # Wrong: -Inf
- Softmax Stability
BAD
softmax_bad <- function(x) exp(x) / sum(exp(x))
GOOD
softmax <- function(x) { x_max <- max(x) exp_x <- exp(x - x_max) exp_x / sum(exp_x) }
Matrix Computations
Conditioning
The condition number κ(A) measures sensitivity to perturbation:
-
κ(A) = ‖A‖ · ‖A⁻¹‖
-
Rule: Expect to lose log₁₀(κ) digits of accuracy
-
κ > 10¹⁵ means matrix is numerically singular
Check condition number
kappa(X, exact = TRUE)
For regression: check X'X conditioning
kappa(crossprod(X))
Solving Linear Systems
Prefer: Decomposition methods over explicit inversion
BAD: explicit inverse
beta <- solve(t(X) %% X) %% t(X) %*% y
GOOD: QR decomposition
beta <- qr.coef(qr(X), y)
BETTER for positive definite: Cholesky
R <- chol(crossprod(X)) beta <- backsolve(R, forwardsolve(t(R), crossprod(X, y)))
For ill-conditioned: SVD/pseudoinverse
beta <- MASS::ginv(X) %*% y
Symmetric Positive Definite Matrices
Always use specialized methods:
Cholesky for SPD
L <- chol(Sigma)
Eigendecomposition
eig <- eigen(Sigma, symmetric = TRUE)
Check positive definiteness
all(eigen(Sigma, symmetric = TRUE, only.values = TRUE)$values > 0)
Optimization Stability
Gradient Computation
Numerical gradient (for verification)
numerical_grad <- function(f, x, h = sqrt(.Machine$double.eps)) { sapply(seq_along(x), function(i) { x_plus <- x_minus <- x x_plus[i] <- x[i] + h x_minus[i] <- x[i] - h (f(x_plus) - f(x_minus)) / (2 * h) }) }
Central difference is O(h²) accurate
Forward difference is O(h) accurate
Hessian Stability
Check Hessian is positive definite at optimum
check_hessian <- function(H, tol = 1e-8) { eigs <- eigen(H, symmetric = TRUE, only.values = TRUE)$values min_eig <- min(eigs)
list( positive_definite = min_eig > tol, min_eigenvalue = min_eig, condition_number = max(eigs) / min_eig ) }
Line Search
For gradient descent stability:
backtracking_line_search <- function(f, x, d, grad, alpha = 1, rho = 0.5, c = 1e-4) {
Armijo condition
while (f(x + alpha * d) > f(x) + c * alpha * sum(grad * d)) { alpha <- rho * alpha if (alpha < 1e-10) break } alpha }
Integration and Quadrature
Numerical Integration Guidelines
Adaptive quadrature (default choice)
integrate(f, lower, upper)
For infinite limits
integrate(f, -Inf, Inf)
For highly oscillatory or peaked functions
Increase subdivisions
integrate(f, lower, upper, subdivisions = 1000)
For known singularities, split the domain
Monte Carlo Integration
mc_integrate <- function(f, n, lower, upper) { x <- runif(n, lower, upper) fx <- sapply(x, f)
estimate <- (upper - lower) * mean(fx) se <- (upper - lower) * sd(fx) / sqrt(n)
list(value = estimate, se = se) }
Root Finding
Newton-Raphson Stability
newton_raphson <- function(f, df, x0, tol = 1e-8, max_iter = 100) { x <- x0 for (i in 1:max_iter) { fx <- f(x) dfx <- df(x)
# Check for near-zero derivative
if (abs(dfx) < .Machine$double.eps * 100) {
warning("Near-zero derivative")
break
}
x_new <- x - fx / dfx
if (abs(x_new - x) < tol) break
x <- x_new
} x }
Brent's Method
For robust root finding without derivatives:
uniroot(f, interval = c(lower, upper), tol = .Machine$double.eps^0.5)
Statistical Computing Patterns
Safe Likelihood Computation
Always work with log-likelihood
log_lik <- function(theta, data) {
Compute log-likelihood, not likelihood
sum(dnorm(data, mean = theta[1], sd = theta[2], log = TRUE)) }
Robust Standard Errors
Sandwich estimator with numerical stability
sandwich_se <- function(score, hessian) {
Check Hessian conditioning
H_inv <- tryCatch( solve(hessian), error = function(e) MASS::ginv(hessian) )
meat <- crossprod(score) V <- H_inv %% meat %% H_inv
sqrt(diag(V)) }
Bootstrap with Error Handling
safe_bootstrap <- function(data, statistic, R = 1000) { results <- numeric(R) failures <- 0
for (i in 1:R) { boot_data <- data[sample(nrow(data), replace = TRUE), ] result <- tryCatch( statistic(boot_data), error = function(e) NA ) results[i] <- result if (is.na(result)) failures <- failures + 1 }
if (failures > 0.1 * R) { warning(sprintf("%.1f%% bootstrap failures", 100 * failures / R)) }
list( estimate = mean(results, na.rm = TRUE), se = sd(results, na.rm = TRUE), failures = failures ) }
Debugging Numerical Issues
Diagnostic Checklist
-
Check for NaN/Inf: any(is.nan(x)) , any(is.infinite(x))
-
Check conditioning: kappa(matrix)
-
Check eigenvalues: For PD matrices
-
Check gradients: Numerically vs analytically
-
Check scale: Variables on similar scales?
Debugging Functions
Trace NaN/Inf sources
debug_numeric <- function(x, name = "x") { cat(sprintf("%s: range [%.3g, %.3g], ", name, min(x), max(x))) cat(sprintf("NaN: %d, Inf: %d, -Inf: %d\n", sum(is.nan(x)), sum(x == Inf), sum(x == -Inf))) }
Check relative error
rel_error <- function(computed, true) { abs(computed - true) / max(abs(true), 1) }
Best Practices Summary
-
Always work on log scale for products of probabilities
-
Use QR or Cholesky instead of matrix inversion
-
Check conditioning before solving linear systems
-
Center and scale predictors in regression
-
Handle edge cases (empty data, singular matrices)
-
Use existing implementations (LAPACK, BLAS) when possible
-
Test with extreme values (very small, very large, near-zero)
-
Compare analytical and numerical gradients
-
Monitor convergence in iterative algorithms
-
Document numerical assumptions and limitations
Key References
-
Higham
-
Golub & Van Loan