Seaborn Statistical Visualization
Overview
Seaborn is a Python visualization library for creating publication-quality statistical graphics. Use this skill for dataset-oriented plotting, multivariate analysis, automatic statistical estimation, and complex multi-panel figures with minimal code.
Design Philosophy
Seaborn follows these core principles:
-
Dataset-oriented: Work directly with DataFrames and named variables rather than abstract coordinates
-
Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles)
-
Statistical awareness: Built-in aggregation, error estimation, and confidence intervals
-
Aesthetic defaults: Publication-ready themes and color palettes out of the box
-
Matplotlib integration: Full compatibility with matplotlib customization when needed
Quick Start
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd
Load example dataset
df = sns.load_dataset('tips')
Create a simple visualization
sns.scatterplot(data=df, x='total_bill', y='tip', hue='day') plt.show()
Core Plotting Interfaces
Function Interface (Traditional)
The function interface provides specialized plotting functions organized by visualization type. Each category has axes-level functions (plot to single axes) and figure-level functions (manage entire figure with faceting).
When to use:
-
Quick exploratory analysis
-
Single-purpose visualizations
-
When you need a specific plot type
Objects Interface (Modern)
The seaborn.objects interface provides a declarative, composable API similar to ggplot2. Build visualizations by chaining methods to specify data mappings, marks, transformations, and scales.
When to use:
-
Complex layered visualizations
-
When you need fine-grained control over transformations
-
Building custom plot types
-
Programmatic plot generation
from seaborn import objects as so
Declarative syntax
( so.Plot(data=df, x='total_bill', y='tip') .add(so.Dot(), color='day') .add(so.Line(), so.PolyFit()) )
Plotting Functions by Category
Relational Plots (Relationships Between Variables)
Use for: Exploring how two or more variables relate to each other
-
scatterplot()
-
Display individual observations as points
-
lineplot()
-
Show trends and changes (automatically aggregates and computes CI)
-
relplot()
-
Figure-level interface with automatic faceting
Key parameters:
-
x , y
-
Primary variables
-
hue
-
Color encoding for additional categorical/continuous variable
-
size
-
Point/line size encoding
-
style
-
Marker/line style encoding
-
col , row
-
Facet into multiple subplots (figure-level only)
Scatter with multiple semantic mappings
sns.scatterplot(data=df, x='total_bill', y='tip', hue='time', size='size', style='sex')
Line plot with confidence intervals
sns.lineplot(data=timeseries, x='date', y='value', hue='category')
Faceted relational plot
sns.relplot(data=df, x='total_bill', y='tip', col='time', row='sex', hue='smoker', kind='scatter')
Distribution Plots (Single and Bivariate Distributions)
Use for: Understanding data spread, shape, and probability density
-
histplot()
-
Bar-based frequency distributions with flexible binning
-
kdeplot()
-
Smooth density estimates using Gaussian kernels
-
ecdfplot()
-
Empirical cumulative distribution (no parameters to tune)
-
rugplot()
-
Individual observation tick marks
-
displot()
-
Figure-level interface for univariate and bivariate distributions
-
jointplot()
-
Bivariate plot with marginal distributions
-
pairplot()
-
Matrix of pairwise relationships across dataset
Key parameters:
-
x , y
-
Variables (y optional for univariate)
-
hue
-
Separate distributions by category
-
stat
-
Normalization: "count", "frequency", "probability", "density"
-
bins / binwidth
-
Histogram binning control
-
bw_adjust
-
KDE bandwidth multiplier (higher = smoother)
-
fill
-
Fill area under curve
-
multiple
-
How to handle hue: "layer", "stack", "dodge", "fill"
Histogram with density normalization
sns.histplot(data=df, x='total_bill', hue='time', stat='density', multiple='stack')
Bivariate KDE with contours
sns.kdeplot(data=df, x='total_bill', y='tip', fill=True, levels=5, thresh=0.1)
Joint plot with marginals
sns.jointplot(data=df, x='total_bill', y='tip', kind='scatter', hue='time')
Pairwise relationships
sns.pairplot(data=df, hue='species', corner=True)
Categorical Plots (Comparisons Across Categories)
Use for: Comparing distributions or statistics across discrete categories
Categorical scatterplots:
-
stripplot()
-
Points with jitter to show all observations
-
swarmplot()
-
Non-overlapping points (beeswarm algorithm)
Distribution comparisons:
-
boxplot()
-
Quartiles and outliers
-
violinplot()
-
KDE + quartile information
-
boxenplot()
-
Enhanced boxplot for larger datasets
Statistical estimates:
-
barplot()
-
Mean/aggregate with confidence intervals
-
pointplot()
-
Point estimates with connecting lines
-
countplot()
-
Count of observations per category
Figure-level:
- catplot()
- Faceted categorical plots (set kind parameter)
Key parameters:
-
x , y
-
Variables (one typically categorical)
-
hue
-
Additional categorical grouping
-
order , hue_order
-
Control category ordering
-
dodge
-
Separate hue levels side-by-side
-
orient
-
"v" (vertical) or "h" (horizontal)
-
kind
-
Plot type for catplot: "strip", "swarm", "box", "violin", "bar", "point"
Swarm plot showing all points
sns.swarmplot(data=df, x='day', y='total_bill', hue='sex')
Violin plot with split for comparison
sns.violinplot(data=df, x='day', y='total_bill', hue='sex', split=True)
Bar plot with error bars
sns.barplot(data=df, x='day', y='total_bill', hue='sex', estimator='mean', errorbar='ci')
Faceted categorical plot
sns.catplot(data=df, x='day', y='total_bill', col='time', kind='box')
Regression Plots (Linear Relationships)
Use for: Visualizing linear regressions and residuals
-
regplot()
-
Axes-level regression plot with scatter + fit line
-
lmplot()
-
Figure-level with faceting support
-
residplot()
-
Residual plot for assessing model fit
Key parameters:
-
x , y
-
Variables to regress
-
order
-
Polynomial regression order
-
logistic
-
Fit logistic regression
-
robust
-
Use robust regression (less sensitive to outliers)
-
ci
-
Confidence interval width (default 95)
-
scatter_kws , line_kws
-
Customize scatter and line properties
Simple linear regression
sns.regplot(data=df, x='total_bill', y='tip')
Polynomial regression with faceting
sns.lmplot(data=df, x='total_bill', y='tip', col='time', order=2, ci=95)
Check residuals
sns.residplot(data=df, x='total_bill', y='tip')
Matrix Plots (Rectangular Data)
Use for: Visualizing matrices, correlations, and grid-structured data
-
heatmap()
-
Color-encoded matrix with annotations
-
clustermap()
-
Hierarchically-clustered heatmap
Key parameters:
-
data
-
2D rectangular dataset (DataFrame or array)
-
annot
-
Display values in cells
-
fmt
-
Format string for annotations (e.g., ".2f")
-
cmap
-
Colormap name
-
center
-
Value at colormap center (for diverging colormaps)
-
vmin , vmax
-
Color scale limits
-
square
-
Force square cells
-
linewidths
-
Gap between cells
Correlation heatmap
corr = df.corr() sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', center=0, square=True)
Clustered heatmap
sns.clustermap(data, cmap='viridis', standard_scale=1, figsize=(10, 10))
Multi-Plot Grids
Seaborn provides grid objects for creating complex multi-panel figures:
FacetGrid
Create subplots based on categorical variables. Most useful when called through figure-level functions (relplot , displot , catplot ), but can be used directly for custom plots.
g = sns.FacetGrid(df, col='time', row='sex', hue='smoker') g.map(sns.scatterplot, 'total_bill', 'tip') g.add_legend()
PairGrid
Show pairwise relationships between all variables in a dataset.
g = sns.PairGrid(df, hue='species') g.map_upper(sns.scatterplot) g.map_lower(sns.kdeplot) g.map_diag(sns.histplot) g.add_legend()
JointGrid
Combine bivariate plot with marginal distributions.
g = sns.JointGrid(data=df, x='total_bill', y='tip') g.plot_joint(sns.scatterplot) g.plot_marginals(sns.histplot)
Figure-Level vs Axes-Level Functions
Understanding this distinction is crucial for effective seaborn usage:
Axes-Level Functions
-
Plot to a single matplotlib Axes object
-
Integrate easily into complex matplotlib figures
-
Accept ax= parameter for precise placement
-
Return Axes object
-
Examples: scatterplot , histplot , boxplot , regplot , heatmap
When to use:
-
Building custom multi-plot layouts
-
Combining different plot types
-
Need matplotlib-level control
-
Integrating with existing matplotlib code
fig, axes = plt.subplots(2, 2, figsize=(10, 10)) sns.scatterplot(data=df, x='x', y='y', ax=axes[0, 0]) sns.histplot(data=df, x='x', ax=axes[0, 1]) sns.boxplot(data=df, x='cat', y='y', ax=axes[1, 0]) sns.kdeplot(data=df, x='x', y='y', ax=axes[1, 1])
Figure-Level Functions
-
Manage entire figure including all subplots
-
Built-in faceting via col and row parameters
-
Return FacetGrid , JointGrid , or PairGrid objects
-
Use height and aspect for sizing (per subplot)
-
Cannot be placed in existing figure
-
Examples: relplot , displot , catplot , lmplot , jointplot , pairplot
When to use:
-
Faceted visualizations (small multiples)
-
Quick exploratory analysis
-
Consistent multi-panel layouts
-
Don't need to combine with other plot types
Automatic faceting
sns.relplot(data=df, x='x', y='y', col='category', row='group', hue='type', height=3, aspect=1.2)
Data Structure Requirements
Long-Form Data (Preferred)
Each variable is a column, each observation is a row. This "tidy" format provides maximum flexibility:
Long-form structure
subject condition measurement 0 1 control 10.5 1 1 treatment 12.3 2 2 control 9.8 3 2 treatment 13.1
Advantages:
-
Works with all seaborn functions
-
Easy to remap variables to visual properties
-
Supports arbitrary complexity
-
Natural for DataFrame operations
Wide-Form Data
Variables are spread across columns. Useful for simple rectangular data:
Wide-form structure
control treatment 0 10.5 12.3 1 9.8 13.1
Use cases:
-
Simple time series
-
Correlation matrices
-
Heatmaps
-
Quick plots of array data
Converting wide to long:
df_long = df.melt(var_name='condition', value_name='measurement')
Color Palettes
Seaborn provides carefully designed color palettes for different data types:
Qualitative Palettes (Categorical Data)
Distinguish categories through hue variation:
-
"deep"
-
Default, vivid colors
-
"muted"
-
Softer, less saturated
-
"pastel"
-
Light, desaturated
-
"bright"
-
Highly saturated
-
"dark"
-
Dark values
-
"colorblind"
-
Safe for color vision deficiency
sns.set_palette("colorblind") sns.color_palette("Set2")
Sequential Palettes (Ordered Data)
Show progression from low to high values:
-
"rocket" , "mako"
-
Wide luminance range (good for heatmaps)
-
"flare" , "crest"
-
Restricted luminance (good for points/lines)
-
"viridis" , "magma" , "plasma"
-
Matplotlib perceptually uniform
sns.heatmap(data, cmap='rocket') sns.kdeplot(data=df, x='x', y='y', cmap='mako', fill=True)
Diverging Palettes (Centered Data)
Emphasize deviations from a midpoint:
-
"vlag"
-
Blue to red
-
"icefire"
-
Blue to orange
-
"coolwarm"
-
Cool to warm
-
"Spectral"
-
Rainbow diverging
sns.heatmap(correlation_matrix, cmap='vlag', center=0)
Custom Palettes
Create custom palette
custom = sns.color_palette("husl", 8)
Light to dark gradient
palette = sns.light_palette("seagreen", as_cmap=True)
Diverging palette from hues
palette = sns.diverging_palette(250, 10, as_cmap=True)
Theming and Aesthetics
Set Theme
set_theme() controls overall appearance:
Set complete theme
sns.set_theme(style='whitegrid', palette='pastel', font='sans-serif')
Reset to defaults
sns.set_theme()
Styles
Control background and grid appearance:
-
"darkgrid"
-
Gray background with white grid (default)
-
"whitegrid"
-
White background with gray grid
-
"dark"
-
Gray background, no grid
-
"white"
-
White background, no grid
-
"ticks"
-
White background with axis ticks
sns.set_style("whitegrid")
Remove spines
sns.despine(left=False, bottom=False, offset=10, trim=True)
Temporary style
with sns.axes_style("white"): sns.scatterplot(data=df, x='x', y='y')
Contexts
Scale elements for different use cases:
-
"paper"
-
Smallest (default)
-
"notebook"
-
Slightly larger
-
"talk"
-
Presentation slides
-
"poster"
-
Large format
sns.set_context("talk", font_scale=1.2)
Temporary context
with sns.plotting_context("poster"): sns.barplot(data=df, x='category', y='value')
Best Practices
- Data Preparation
Always use well-structured DataFrames with meaningful column names:
Good: Named columns in DataFrame
df = pd.DataFrame({'bill': bills, 'tip': tips, 'day': days}) sns.scatterplot(data=df, x='bill', y='tip', hue='day')
Avoid: Unnamed arrays
sns.scatterplot(x=x_array, y=y_array) # Loses axis labels
- Choose the Right Plot Type
Continuous x, continuous y: scatterplot , lineplot , kdeplot , regplot
Continuous x, categorical y: violinplot , boxplot , stripplot , swarmplot
One continuous variable: histplot , kdeplot , ecdfplot
Correlations/matrices: heatmap , clustermap
Pairwise relationships: pairplot , jointplot
- Use Figure-Level Functions for Faceting
Instead of manual subplot creation
sns.relplot(data=df, x='x', y='y', col='category', col_wrap=3)
Not: Creating subplots manually for simple faceting
- Leverage Semantic Mappings
Use hue , size , and style to encode additional dimensions:
sns.scatterplot(data=df, x='x', y='y', hue='category', # Color by category size='importance', # Size by continuous variable style='type') # Marker style by type
- Control Statistical Estimation
Many functions compute statistics automatically. Understand and customize:
Lineplot computes mean and 95% CI by default
sns.lineplot(data=df, x='time', y='value', errorbar='sd') # Use standard deviation instead
Barplot computes mean by default
sns.barplot(data=df, x='category', y='value', estimator='median', # Use median instead errorbar=('ci', 95)) # Bootstrapped CI
- Combine with Matplotlib
Seaborn integrates seamlessly with matplotlib for fine-tuning:
ax = sns.scatterplot(data=df, x='x', y='y') ax.set(xlabel='Custom X Label', ylabel='Custom Y Label', title='Custom Title') ax.axhline(y=0, color='r', linestyle='--') plt.tight_layout()
- Save High-Quality Figures
fig = sns.relplot(data=df, x='x', y='y', col='group') fig.savefig('figure.png', dpi=300, bbox_inches='tight') fig.savefig('figure.pdf') # Vector format for publications
Common Patterns
Exploratory Data Analysis
Quick overview of all relationships
sns.pairplot(data=df, hue='target', corner=True)
Distribution exploration
sns.displot(data=df, x='variable', hue='group', kind='kde', fill=True, col='category')
Correlation analysis
corr = df.corr() sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
Publication-Quality Figures
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
g = sns.catplot(data=df, x='treatment', y='response', col='cell_line', kind='box', height=3, aspect=1.2) g.set_axis_labels('Treatment Condition', 'Response (μM)') g.set_titles('{col_name}') sns.despine(trim=True)
g.savefig('figure.pdf', dpi=300, bbox_inches='tight')
Complex Multi-Panel Figures
Using matplotlib subplots with seaborn
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
sns.scatterplot(data=df, x='x1', y='y', hue='group', ax=axes[0, 0]) sns.histplot(data=df, x='x1', hue='group', ax=axes[0, 1]) sns.violinplot(data=df, x='group', y='y', ax=axes[1, 0]) sns.heatmap(df.pivot_table(values='y', index='x1', columns='x2'), ax=axes[1, 1], cmap='viridis')
plt.tight_layout()
Time Series with Confidence Bands
Lineplot automatically aggregates and shows CI
sns.lineplot(data=timeseries, x='date', y='measurement', hue='sensor', style='location', errorbar='sd')
For more control
g = sns.relplot(data=timeseries, x='date', y='measurement', col='location', hue='sensor', kind='line', height=4, aspect=1.5, errorbar=('ci', 95)) g.set_axis_labels('Date', 'Measurement (units)')
Troubleshooting
Issue: Legend Outside Plot Area
Figure-level functions place legends outside by default. To move inside:
g = sns.relplot(data=df, x='x', y='y', hue='category') g._legend.set_bbox_to_anchor((0.9, 0.5)) # Adjust position
Issue: Overlapping Labels
plt.xticks(rotation=45, ha='right') plt.tight_layout()
Issue: Figure Too Small
For figure-level functions:
sns.relplot(data=df, x='x', y='y', height=6, aspect=1.5)
For axes-level functions:
fig, ax = plt.subplots(figsize=(10, 6)) sns.scatterplot(data=df, x='x', y='y', ax=ax)
Issue: Colors Not Distinct Enough
Use a different palette
sns.set_palette("bright")
Or specify number of colors
palette = sns.color_palette("husl", n_colors=len(df['category'].unique())) sns.scatterplot(data=df, x='x', y='y', hue='category', palette=palette)
Issue: KDE Too Smooth or Jagged
Adjust bandwidth
sns.kdeplot(data=df, x='x', bw_adjust=0.5) # Less smooth sns.kdeplot(data=df, x='x', bw_adjust=2) # More smooth
Resources
This skill includes reference materials for deeper exploration:
references/
-
function_reference.md
-
Comprehensive listing of all seaborn functions with parameters and examples
-
objects_interface.md
-
Detailed guide to the modern seaborn.objects API
-
examples.md
-
Common use cases and code patterns for different analysis scenarios
Load reference files as needed for detailed function signatures, advanced parameters, or specific examples.