NumPy - Numerical Python

The fundamental package for numerical computing in Python, providing multi-dimensional arrays and fast operations.

When to Use

Working with multi-dimensional arrays and matrices
Performing element-wise operations on arrays
Linear algebra computations (matrix multiplication, eigenvalues, SVD)
Random number generation and statistical distributions
Fourier transforms and signal processing basics
Mathematical operations (trigonometric, exponential, logarithmic)
Broadcasting operations across different array shapes
Vectorizing Python loops for performance
Reading and writing numerical data to files
Building numerical algorithms and simulations
Serving as foundation for pandas, scikit-learn, SciPy

Reference Documentation

Official docs: https://numpy.org/doc/
Search patterns: np.array, np.zeros, np.dot, np.linalg, np.random, np.broadcast

Core Principles

Use NumPy For

Task	Function	Example
Create arrays	`array`, `zeros`, `ones`	`np.array([1, 2, 3])`
Mathematical ops	`+`, `*`, `sin`, `exp`	`np.sin(arr)`
Linear algebra	`dot`, `linalg.inv`	`np.dot(A, B)`
Statistics	`mean`, `std`, `percentile`	`np.mean(arr)`
Random numbers	`random.rand`, `random.normal`	`np.random.rand(10)`
Indexing	`[]`, boolean, fancy	`arr[arr > 0]`
Broadcasting	Automatic	`arr + scalar`
Reshaping	`reshape`, `flatten`	`arr.reshape(2, 3)`

Do NOT Use For

String manipulation (use built-in str or pandas)
Complex data structures (use pandas DataFrame)
Symbolic mathematics (use SymPy)
Deep learning (use PyTorch, TensorFlow)
Sparse matrices (use scipy.sparse)

Quick Reference

Installation

# pip
pip install numpy

# conda
conda install numpy

# Specific version
pip install numpy==1.26.0

Standard Imports

import numpy as np

# Common submodules
from numpy import linalg as la
from numpy import random as rand
from numpy import fft

# Never import *
# from numpy import *  # DON'T DO THIS!

Basic Pattern - Array Creation

import numpy as np

# From list
arr = np.array([1, 2, 3, 4, 5])

# Zeros and ones
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))

# Range
range_arr = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]

# Linspace
linspace_arr = np.linspace(0, 1, 5)  # [0, 0.25, 0.5, 0.75, 1]

print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Dtype: {arr.dtype}")

Basic Pattern - Array Operations

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise operations
c = a + b      # [5, 7, 9]
d = a * b      # [4, 10, 18]
e = a ** 2     # [1, 4, 9]

# Mathematical functions
f = np.sin(a)
g = np.exp(a)

print(f"Sum: {c}")
print(f"Product: {d}")

Basic Pattern - Linear Algebra

import numpy as np

# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Dot product
C = np.dot(A, B)  # or A @ B

# Matrix inverse
A_inv = np.linalg.inv(A)

# Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(A)

print(f"Matrix product:\n{C}")
print(f"Eigenvalues: {eigenvalues}")

Critical Rules

✅ DO

Use vectorization - Avoid Python loops, use array operations
Specify dtype explicitly - For memory efficiency and precision control
Use views when possible - Avoid unnecessary copies
Broadcast properly - Understand broadcasting rules
Check array shapes - Use .shape frequently
Use axis parameter - For operations along specific dimensions
Pre-allocate arrays - Don't grow arrays in loops
Use appropriate dtypes - int32, float64, complex128, etc.
Copy when needed - Use .copy() for independent arrays
Use built-in functions - They're optimized in C

❌ DON'T

Loop over arrays - Use vectorization instead
Grow arrays dynamically - Pre-allocate instead
Use Python lists for math - Convert to arrays first
Ignore memory layout - C-contiguous vs Fortran-contiguous matters
Mix dtypes carelessly - Know implicit type promotion rules
Modify arrays during iteration - Can cause undefined behavior
Use == for array comparison - Use np.array_equal() or np.allclose()
Assume views vs copies - Check with .base attribute
Ignore NaN handling - Use np.nanmean(), np.nanstd(), etc.
Use outdated APIs - Check for deprecated functions

Anti-Patterns (NEVER)

import numpy as np

# ❌ BAD: Python loops
result = []
for i in range(len(arr)):
    result.append(arr[i] * 2)
result = np.array(result)

# ✅ GOOD: Vectorization
result = arr * 2

# ❌ BAD: Growing arrays
result = np.array([])
for i in range(1000):
    result = np.append(result, i)  # Very slow!

# ✅ GOOD: Pre-allocate
result = np.zeros(1000)
for i in range(1000):
    result[i] = i

# Even better: Use arange
result = np.arange(1000)

# ❌ BAD: Comparing arrays with ==
if arr1 == arr2:  # This is ambiguous!
    print("Equal")

# ✅ GOOD: Use appropriate comparison
if np.array_equal(arr1, arr2):
    print("Equal")

# Or for floating point
if np.allclose(arr1, arr2, rtol=1e-5):
    print("Close enough")

# ❌ BAD: Ignoring dtypes
arr = np.array([1, 2, 3])
arr[0] = 1.5  # Silently truncates to 1!

# ✅ GOOD: Explicit dtype
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5  # Now works correctly

# ❌ BAD: Unintentional modification
a = np.array([1, 2, 3])
b = a  # b is just a reference!
b[0] = 999  # Also modifies a!

# ✅ GOOD: Explicit copy
a = np.array([1, 2, 3])
b = a.copy()  # b is independent
b[0] = 999  # a is unchanged

Array Creation

Basic Array Creation

import numpy as np

# From Python list
arr1 = np.array([1, 2, 3, 4, 5])

# From nested list (2D)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# Specify dtype
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)

# From tuple
arr5 = np.array((1, 2, 3))

# Complex numbers
arr6 = np.array([1+2j, 3+4j])

print(f"1D array: {arr1}")
print(f"2D array:\n{arr2}")
print(f"Float array: {arr3}")

Special Array Creation

import numpy as np

# Zeros
zeros = np.zeros((3, 4))  # 3x4 array of zeros

# Ones
ones = np.ones((2, 3, 4))  # 2x3x4 array of ones

# Empty (uninitialized)
empty = np.empty((2, 2))  # Faster but values are garbage

# Full (constant value)
full = np.full((3, 3), 7)  # 3x3 array filled with 7

# Identity matrix
identity = np.eye(4)  # 4x4 identity matrix

# Diagonal matrix
diag = np.diag([1, 2, 3, 4])

print(f"Zeros shape: {zeros.shape}")
print(f"Identity:\n{identity}")

Range-Based Creation

import numpy as np

# Arange (like Python range)
a = np.arange(10)           # [0, 1, 2, ..., 9]
b = np.arange(2, 10)        # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1)    # [0, 0.1, 0.2, ..., 0.9]

# Linspace (linearly spaced)
e = np.linspace(0, 1, 5)    # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 100 points from 0 to 10

# Logspace (logarithmically spaced)
g = np.logspace(0, 2, 5)    # [1, 10^0.5, 10, 10^1.5, 100]

# Geomspace (geometrically spaced)
h = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]

print(f"Arange: {a}")
print(f"Linspace: {e}")

Array Copies and Views

import numpy as np

original = np.array([1, 2, 3, 4, 5])

# View (shares memory)
view = original[:]
view[0] = 999  # Modifies original!

# Copy (independent)
copy = original.copy()
copy[0] = 777  # Doesn't affect original

# Check if array is a view
print(f"Is view? {view.base is original}")
print(f"Is copy? {copy.base is None}")

# Some operations create views, some create copies
slice_view = original[1:3]  # View
boolean_copy = original[original > 2]  # Copy!

Array Indexing and Slicing

Basic Indexing

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Single element
print(arr[0])   # 10
print(arr[-1])  # 50 (last element)

# Slicing
print(arr[1:4])    # [20, 30, 40]
print(arr[:3])     # [10, 20, 30]
print(arr[2:])     # [30, 40, 50]
print(arr[::2])    # [10, 30, 50] (every 2nd element)

# Negative indices
print(arr[-3:-1])  # [30, 40]

# Reverse
print(arr[::-1])   # [50, 40, 30, 20, 10]

Multi-Dimensional Indexing

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Single element
print(arr[0, 0])   # 1
print(arr[1, 2])   # 6
print(arr[-1, -1]) # 9

# Row slicing
print(arr[0])      # [1, 2, 3] (first row)
print(arr[1, :])   # [4, 5, 6] (second row)

# Column slicing
print(arr[:, 0])   # [1, 4, 7] (first column)
print(arr[:, 1])   # [2, 5, 8] (second column)

# Sub-array
print(arr[0:2, 1:3])  # [[2, 3], [5, 6]]

# Every other element
print(arr[::2, ::2])  # [[1, 3], [7, 9]]

Boolean Indexing

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Boolean condition
mask = arr > 5
print(mask)  # [False, False, False, False, False, True, True, True, True, True]

# Boolean indexing
filtered = arr[arr > 5]
print(filtered)  # [6, 7, 8, 9, 10]

# Multiple conditions (use & and |, not 'and' and 'or')
result = arr[(arr > 3) & (arr < 8)]
print(result)  # [4, 5, 6, 7]

# Or condition
result = arr[(arr < 3) | (arr > 8)]
print(result)  # [1, 2, 9, 10]

# Negation
result = arr[~(arr > 5)]
print(result)  # [1, 2, 3, 4, 5]

Fancy Indexing

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Index with array of integers
indices = np.array([0, 2, 4])
result = arr[indices]
print(result)  # [10, 30, 50]

# 2D fancy indexing
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols]  # Elements at (0,1) and (2,2)
print(result)  # [2, 9]

# Combining boolean and fancy indexing
mask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large)  # [2, 3, 4]

Array Operations

Element-wise Operations

import numpy as np

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Arithmetic operations
print(a + b)    # [6, 8, 10, 12]
print(a - b)    # [-4, -4, -4, -4]
print(a * b)    # [5, 12, 21, 32]
print(a / b)    # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2)   # [1, 4, 9, 16]
print(a // b)   # [0, 0, 0, 0] (floor division)
print(a % b)    # [1, 2, 3, 4] (modulo)

# With scalars
print(a + 10)   # [11, 12, 13, 14]
print(a * 2)    # [2, 4, 6, 8]

Mathematical Functions

import numpy as np

x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])

# Trigonometric
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)

# Inverse trig
arcsin_x = np.arcsin([0, 0.5, 1])

# Exponential and logarithm
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)

# Rounding
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats))   # [1, 3, 4, 5]
print(np.floor(floats))   # [1, 2, 3, 4]
print(np.ceil(floats))    # [2, 3, 4, 5]

# Absolute value
print(np.abs([-1, -2, 3, -4]))  # [1, 2, 3, 4]

Aggregation Functions

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Sum
print(np.sum(arr))         # 45 (all elements)
print(np.sum(arr, axis=0)) # [12, 15, 18] (column sums)
print(np.sum(arr, axis=1)) # [6, 15, 24] (row sums)

# Mean
print(np.mean(arr))        # 5.0

# Standard deviation
print(np.std(arr))         # ~2.58

# Min and max
print(np.min(arr))         # 1
print(np.max(arr))         # 9
print(np.argmin(arr))      # 0 (index of min)
print(np.argmax(arr))      # 8 (index of max)

# Median and percentiles
print(np.median(arr))      # 5.0
print(np.percentile(arr, 25))  # 3.0 (25th percentile)

Broadcasting

Broadcasting Rules

import numpy as np

# Scalar and array
arr = np.array([1, 2, 3, 4])
result = arr + 10  # Broadcast scalar to array shape
print(result)  # [11, 12, 13, 14]

# 1D and 2D
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])

result = arr1d + arr2d
print(result)
# [[11, 12, 13],
#  [21, 22, 23],
#  [31, 32, 33]]

# Broadcasting example: standardization
data = np.random.randn(100, 3)  # 100 samples, 3 features
mean = np.mean(data, axis=0)    # Mean of each column
std = np.std(data, axis=0)      # Std of each column
standardized = (data - mean) / std  # Broadcasting!

Explicit Broadcasting

import numpy as np

# Using broadcast_to
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
# [[1, 2, 3],
#  [1, 2, 3],
#  [1, 2, 3],
#  [1, 2, 3]]

# Using newaxis
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis]  # Shape (3, 1)
row_vector = arr1d[np.newaxis, :]  # Shape (1, 3)

# Outer product using broadcasting
outer = col_vector * row_vector
print(outer)
# [[1, 2, 3],
#  [2, 4, 6],
#  [3, 6, 9]]

Linear Algebra

Matrix Operations

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
C = np.dot(A, B)  # Traditional
C = A @ B         # Modern syntax (Python 3.5+)

# Element-wise multiplication
D = A * B  # Not matrix multiplication!

# Matrix transpose
A_T = A.T

# Trace (sum of diagonal)
trace = np.trace(A)

# Matrix power
A_squared = np.linalg.matrix_power(A, 2)

print(f"Matrix product:\n{C}")
print(f"Transpose:\n{A_T}")
print(f"Trace: {trace}")

Solving Linear Systems

import numpy as np

# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

# Solve for x
x = np.linalg.solve(A, b)
print(f"Solution: {x}")  # [2, 3]

# Verify solution
print(f"Verification: {np.allclose(A @ x, b)}")  # True

# Matrix inverse
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")

# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")

Eigenvalues and Eigenvectors

import numpy as np

# Square matrix
A = np.array([[1, 2], [2, 1]])

# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)

print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Verify: A * v = λ * v
for i in range(len(eigenvalues)):
    lam = eigenvalues[i]
    v = eigenvectors[:, i]
    
    left = A @ v
    right = lam * v
    
    print(f"Eigenvalue {i}: {np.allclose(left, right)}")

Singular Value Decomposition (SVD)

import numpy as np

# Any matrix
A = np.array([[1, 2, 3],
              [4, 5, 6]])

# SVD: A = U @ S @ Vt
U, s, Vt = np.linalg.svd(A)

# Reconstruct original matrix
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt

print(f"Original:\n{A}")
print(f"Reconstructed:\n{A_reconstructed}")
print(f"Close? {np.allclose(A, A_reconstructed)}")

# Singular values
print(f"Singular values: {s}")

Matrix Norms

import numpy as np

A = np.array([[1, 2], [3, 4]])

# Frobenius norm (default)
norm_fro = np.linalg.norm(A)

# 1-norm (max column sum)
norm_1 = np.linalg.norm(A, ord=1)

# Infinity norm (max row sum)
norm_inf = np.linalg.norm(A, ord=np.inf)

# 2-norm (spectral norm)
norm_2 = np.linalg.norm(A, ord=2)

print(f"Frobenius: {norm_fro:.4f}")
print(f"1-norm: {norm_1:.4f}")
print(f"2-norm: {norm_2:.4f}")
print(f"inf-norm: {norm_inf:.4f}")

Random Number Generation

Basic Random Generation

import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Random floats [0, 1)
rand_uniform = np.random.rand(5)  # 1D array of 5 elements
rand_2d = np.random.rand(3, 4)    # 3x4 array

# Random integers
rand_int = np.random.randint(0, 10, size=5)  # [0, 10)
rand_int_2d = np.random.randint(0, 100, size=(3, 3))

# Random normal distribution
rand_normal = np.random.randn(1000)  # Mean=0, std=1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)

# Random choice
choices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])

Statistical Distributions

import numpy as np

# Uniform distribution [low, high)
uniform = np.random.uniform(low=0, high=10, size=1000)

# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000)

# Exponential distribution
exponential = np.random.exponential(scale=2, size=1000)

# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000)

# Poisson distribution
poisson = np.random.poisson(lam=3, size=1000)

# Beta distribution
beta = np.random.beta(a=2, b=5, size=1000)

# Chi-squared distribution
chisq = np.random.chisquare(df=2, size=1000)

Modern Random Generator (numpy.random.Generator)

import numpy as np

# Create generator
rng = np.random.default_rng(seed=42)

# Generate random numbers
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)

# Shuffle array in-place
arr = np.arange(10)
rng.shuffle(arr)

# Sample without replacement
sample = rng.choice(100, size=10, replace=False)

print(f"Random: {rand}")
print(f"Shuffled: {arr}")

Reshaping and Manipulation

Reshaping Arrays

import numpy as np

# Original array
arr = np.arange(12)  # [0, 1, 2, ..., 11]

# Reshape
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)

# Automatic dimension calculation with -1
arr_auto = arr.reshape(3, -1)  # Automatically calculates 4

# Flatten to 1D
flat = arr_2d.flatten()  # Returns copy
flat = arr_2d.ravel()    # Returns view if possible

# Transpose
arr_t = arr_2d.T

print(f"Original shape: {arr.shape}")
print(f"2D shape: {arr_2d.shape}")
print(f"3D shape: {arr_3d.shape}")

Stacking and Splitting

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])

# Vertical stacking (vstack)
vstacked = np.vstack([a, b, c])
print(vstacked)
# [[1, 2, 3],
#  [4, 5, 6],
#  [7, 8, 9]]

# Horizontal stacking (hstack)
hstacked = np.hstack([a, b, c])
print(hstacked)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Column stack
col_stacked = np.column_stack([a, b, c])

# Concatenate (more general)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)

# Splitting
arr = np.arange(12)
split = np.split(arr, 3)  # Split into 3 equal parts
print(split)  # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]

File I/O

Text Files

import numpy as np

# Save to text file
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')

# Load from text file
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')

# Skip header rows
loaded_skip = np.loadtxt('data.txt', skiprows=1)

# Load specific columns
loaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))

Binary Files (.npy, .npz)

import numpy as np

# Save single array
arr = np.random.rand(100, 100)
np.save('array.npy', arr)

# Load single array
loaded = np.load('array.npy')

# Save multiple arrays (compressed)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)

# Load multiple arrays
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']

# Compressed save
np.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)

Advanced Techniques

Universal Functions (ufuncs)

import numpy as np

# Ufuncs operate element-wise
arr = np.array([1, 2, 3, 4, 5])

# Built-in ufuncs
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)

# Custom ufunc
def my_func(x):
    return x**2 + 2*x + 1

vectorized = np.vectorize(my_func)
result = vectorized(arr)

# More efficient: define true ufunc
@np.vectorize
def better_func(x):
    return x**2 + 2*x + 1

Structured Arrays

import numpy as np

# Define dtype
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])

# Create structured array
data = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 70.2),
    ('Charlie', 35, 82.1)
], dtype=dt)

# Access by field name
names = data['name']
ages = data['age']

# Sort by field
sorted_data = np.sort(data, order='age')

print(f"Names: {names}")
print(f"Sorted by age:\n{sorted_data}")

Memory Layout and Performance

import numpy as np

# C-contiguous (row-major, default)
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')

# Fortran-contiguous (column-major)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')

# Check memory layout
print(f"C-contiguous? {arr_c.flags['C_CONTIGUOUS']}")
print(f"F-contiguous? {arr_c.flags['F_CONTIGUOUS']}")

# Make contiguous
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)

# Memory usage
print(f"Memory (bytes): {arr_c.nbytes}")
print(f"Item size: {arr_c.itemsize}")

Advanced Indexing with ix_

import numpy as np

arr = np.arange(20).reshape(4, 5)

# Select specific rows and columns
rows = np.array([0, 2])
cols = np.array([1, 3, 4])

# ix_ creates open mesh
result = arr[np.ix_(rows, cols)]
print(result)
# [[1, 3, 4],
#  [11, 13, 14]]

# Equivalent to
# result = arr[[0, 2]][:, [1, 3, 4]]

Practical Workflows

Statistical Analysis

import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)

# Descriptive statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)

# Percentiles
q25, q50, q75 = np.percentile(data, [25, 50, 75])

# Histogram
counts, bins = np.histogram(data, bins=20)

# Correlation coefficient
data2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std: {std:.2f}")
print(f"IQR: [{q25:.2f}, {q75:.2f}]")
print(f"Correlation: {corr:.3f}")

Monte Carlo Simulation

import numpy as np

def estimate_pi(n_samples=1000000):
    """Estimate π using Monte Carlo method."""
    # Random points in [0, 1] x [0, 1]
    x = np.random.rand(n_samples)
    y = np.random.rand(n_samples)
    
    # Check if inside quarter circle
    inside = (x**2 + y**2) <= 1
    
    # Estimate π
    pi_estimate = 4 * np.sum(inside) / n_samples
    
    return pi_estimate

# Estimate π
pi_est = estimate_pi(10000000)
print(f"π estimate: {pi_est:.6f}")
print(f"Error: {abs(pi_est - np.pi):.6f}")

Polynomial Fitting

import numpy as np

# Generate noisy data
x = np.linspace(0, 10, 50)
y_true = 2*x**2 + 3*x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)

# Fit polynomial (degree 2)
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"Coefficients: {coeffs}")  # Should be close to [2, 3, 1]

# Predict
y_pred = np.polyval(coeffs, x)

# Evaluate fit quality
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"RMSE: {rmse:.2f}")

# Create polynomial object
poly = np.poly1d(coeffs)
print(f"Polynomial: {poly}")

Image Processing Basics

import numpy as np

# Create synthetic image (grayscale)
image = np.random.rand(100, 100)

# Apply transformations
# Rotate 90 degrees
rotated = np.rot90(image)

# Flip vertically
flipped_v = np.flipud(image)

# Flip horizontally
flipped_h = np.fliplr(image)

# Transpose
transposed = image.T

# Normalize to [0, 255]
normalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)

print(f"Original shape: {image.shape}")
print(f"Value range: [{image.min():.2f}, {image.max():.2f}]")

Distance Matrices

import numpy as np

# Points in 2D
points = np.random.rand(100, 2)

# Pairwise distances (broadcasting)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))

print(f"Distance matrix shape: {distances.shape}")
print(f"Max distance: {distances.max():.4f}")

# Find nearest neighbors
for i in range(5):
    # Exclude self (distance = 0)
    dists = distances[i].copy()
    dists[i] = np.inf
    nearest = np.argmin(dists)
    print(f"Point {i} nearest to point {nearest}, distance: {distances[i, nearest]:.4f}")

Sliding Window Operations

import numpy as np

def sliding_window_view(arr, window_size):
    """Create sliding window views of array."""
    shape = (arr.shape[0] - window_size + 1, window_size)
    strides = (arr.strides[0], arr.strides[0])
    return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)

# Time series data
data = np.random.rand(100)

# Create sliding windows
windows = sliding_window_view(data, window_size=10)

# Compute statistics for each window
window_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)

print(f"Number of windows: {len(windows)}")
print(f"First window mean: {window_means[0]:.4f}")

Performance Optimization

Vectorization Examples

import numpy as np
import time

# Bad: Python loop
def sum_python_loop(arr):
    total = 0
    for x in arr:
        total += x**2
    return total

# Good: Vectorized
def sum_vectorized(arr):
    return np.sum(arr**2)

# Benchmark
arr = np.random.rand(1000000)

start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start

start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start

print(f"Loop time: {time_loop:.4f}s")
print(f"Vectorized time: {time_vec:.4f}s")
print(f"Speedup: {time_loop/time_vec:.1f}x")

Memory-Efficient Operations

import numpy as np

# Bad: Creates intermediate arrays
def inefficient(arr):
    temp1 = arr * 2
    temp2 = temp1 + 5
    temp3 = temp2 ** 2
    return temp3

# Good: In-place operations
def efficient(arr):
    result = arr.copy()
    result *= 2
    result += 5
    result **= 2
    return result

# Even better: Single expression (optimized by NumPy)
def most_efficient(arr):
    return (arr * 2 + 5) ** 2

Using numexpr for Complex Expressions

import numpy as np

# For very large arrays and complex expressions,
# numexpr can be faster (requires installation)

# Without numexpr
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2*a + 3*b**2 - np.sqrt(a)

# With numexpr (if installed)
# import numexpr as ne
# result = ne.evaluate('2*a + 3*b**2 - sqrt(a)')

Common Pitfalls and Solutions

NaN Handling

import numpy as np

arr = np.array([1, 2, np.nan, 4, 5, np.nan])

# Problem: Regular functions return NaN
mean = np.mean(arr)  # Returns nan

# Solution: Use nan-safe functions
mean = np.nanmean(arr)  # Returns 3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)

# Check for NaN
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]

# Remove NaN
arr_clean = arr[~np.isnan(arr)]

print(f"Mean (nan-safe): {mean}")
print(f"NaN positions: {where_nan}")

Integer Division Pitfall

import numpy as np

# Problem: Integer division with integers
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b  # With Python 3, this is fine

# But be careful with older code or explicit int types
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)

# In NumPy, / always gives float result
result_float = a_int / b_int  # [0.5, 1, 1.5]

# Use // for integer division
result_int = a_int // b_int  # [0, 1, 1]

print(f"Float division: {result_float}")
print(f"Integer division: {result_int}")

Array Equality

import numpy as np

a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])

# Problem: Can't use == directly for array comparison
# if a == b:  # ValueError!

# Solution 1: Element-wise comparison
equal_elements = a == b  # Boolean array

# Solution 2: Check if all elements equal
all_equal = np.all(a == b)

# Solution 3: array_equal
array_equal = np.array_equal(a, b)

# Solution 4: For floating point, use allclose
c = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)

print(f"All equal: {all_equal}")
print(f"Arrays equal: {array_equal}")
print(f"Close enough: {close_enough}")

Memory Leaks with Views

import numpy as np

# Problem: Large array kept in memory
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10]  # Just 10 rows

# large_array is kept in memory because small_view references it!
del large_array  # Doesn't free memory!

# Solution: Make a copy
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array  # Now memory is freed

# Check if it's a view
print(f"Is view? {small_view.base is not None}")
print(f"Is copy? {small_copy.base is None}")

This comprehensive NumPy guide covers 50+ examples across all major array operations and numerical computing workflows!