Stock Correlation Analysis Skill
Finds and analyzes correlated stocks using historical price data from Yahoo Finance via yfinance. Routes to specialized sub-skills based on user intent.
Important: This is for research and educational purposes only. Not financial advice. yfinance is not affiliated with Yahoo, Inc.
Step 1: Ensure Dependencies Are Available
Before running any code, install required packages if needed:
import subprocess, sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])
Always include this at the top of your script.
Step 2: Route to the Correct Sub-Skill
Classify the user's request and jump to the matching sub-skill section below.
| User Request | Route To | Examples |
|---|---|---|
| Single ticker, wants to find related stocks | Sub-Skill A: Co-movement Discovery | "what correlates with NVDA", "find stocks related to AMD", "sympathy plays for TSLA" |
| Two or more specific tickers, wants relationship details | Sub-Skill B: Return Correlation | "correlation between AMD and NVDA", "how do LITE and COHR move together", "compare AAPL vs MSFT" |
| Group of tickers, wants structure/grouping | Sub-Skill C: Sector Clustering | "correlation matrix for FAANG", "cluster these semiconductor stocks", "sector peers for AMD" |
| Wants time-varying or conditional correlation | Sub-Skill D: Realized Correlation | "rolling correlation AMD NVDA", "when NVDA drops what else drops", "how has correlation changed" |
If ambiguous, default to Sub-Skill A (Co-movement Discovery) for single tickers, or Sub-Skill B (Return Correlation) for two tickers.
Defaults for all sub-skills
| Parameter | Default |
|---|---|
| Lookback period | 1y (1 year) |
| Data interval | 1d (daily) |
| Correlation method | Pearson |
| Minimum correlation threshold | 0.60 |
| Number of results | Top 10 |
| Return type | Daily log returns |
| Rolling window | 60 trading days |
Sub-Skill A: Co-movement Discovery
Goal: Given a single ticker, find stocks that move with it.
A1: Build the peer universe
You need 15-30 candidates. Do not use hardcoded ticker lists — build the universe dynamically at runtime. See references/sector_universes.md for the full implementation. The approach:
- Screen same-industry stocks using
yf.screen()+yf.EquityQueryto find stocks in the same industry as the target - Broaden to sector if the industry screen returns fewer than 10 peers
- Add thematic/adjacent industries — read the target's
longBusinessSummaryand screen 1-2 related industries (e.g., a semiconductor company → also screen semiconductor equipment) - Combine, deduplicate, remove target ticker
A2: Compute correlations
import yfinance as yf
import pandas as pd
import numpy as np
def discover_comovement(target_ticker, peer_tickers, period="1y"):
all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker]
data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)
# Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
# Log returns
returns = np.log(closes / closes.shift(1)).dropna()
corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")
# Rank by absolute correlation
ranked = corr_series.abs().sort_values(ascending=False)
result = pd.DataFrame({
"Ticker": ranked.index,
"Correlation": [round(corr_series[t], 4) for t in ranked.index],
})
return result, returns
A3: Present results
Show a ranked table with company names and sectors (fetch via yf.Ticker(t).info.get("shortName")):
| Rank | Ticker | Company | Correlation | Why linked |
|---|---|---|---|---|
| 1 | AMD | Advanced Micro Devices | 0.82 | Same industry — GPU/CPU |
| 2 | AVGO | Broadcom | 0.78 | AI infrastructure peer |
Include:
- Top 10 positively correlated stocks
- Any notable negatively correlated stocks (potential hedges)
- Brief explanation of why each might be linked (sector, supply chain, customer overlap)
Sub-Skill B: Return Correlation
Goal: Deep-dive into the relationship between two (or a few) specific tickers.
B1: Download and compute
import yfinance as yf
import pandas as pd
import numpy as np
def return_correlation(ticker_a, ticker_b, period="1y"):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
corr = returns[ticker_a].corr(returns[ticker_b])
# Beta: how much does B move per unit move of A
cov_matrix = returns.cov()
beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]
# R-squared
r_squared = corr ** 2
# Rolling 60-day correlation for stability
rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])
# Spread (log price ratio) for mean-reversion
spread = np.log(closes[ticker_a] / closes[ticker_b])
spread_z = (spread - spread.mean()) / spread.std()
return {
"correlation": round(corr, 4),
"beta": round(beta, 4),
"r_squared": round(r_squared, 4),
"rolling_corr_mean": round(rolling_corr.mean(), 4),
"rolling_corr_std": round(rolling_corr.std(), 4),
"rolling_corr_min": round(rolling_corr.min(), 4),
"rolling_corr_max": round(rolling_corr.max(), 4),
"spread_z_current": round(spread_z.iloc[-1], 4),
"observations": len(returns),
}
B2: Present results
Show a summary card:
| Metric | Value |
|---|---|
| Pearson Correlation | 0.82 |
| Beta (B vs A) | 1.15 |
| R-squared | 0.67 |
| Rolling Corr (60d avg) | 0.80 |
| Rolling Corr Range | [0.55, 0.94] |
| Rolling Corr Std Dev | 0.08 |
| Spread Z-Score (current) | +1.2 |
| Observations | 250 |
Interpretation guide:
- Correlation > 0.80: Strong co-movement — these stocks are tightly linked
- Correlation 0.50–0.80: Moderate — shared sector drivers but independent factors too
- Correlation < 0.50: Weak — limited co-movement despite possible sector overlap
- High rolling std: Unstable relationship — correlation varies significantly over time
- Spread Z > |2|: Unusual divergence from historical relationship
Sub-Skill C: Sector Clustering
Goal: Given a group of tickers, show the full correlation structure and identify clusters.
C1: Build the correlation matrix
import yfinance as yf
import pandas as pd
import numpy as np
def sector_clustering(tickers, period="1y"):
data = yf.download(tickers, period=period, auto_adjust=True, progress=False)
# yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
returns = np.log(closes / closes.shift(1)).dropna()
corr_matrix = returns.corr()
# Hierarchical clustering order
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import squareform
dist_matrix = 1 - corr_matrix.abs()
np.fill_diagonal(dist_matrix.values, 0)
condensed = squareform(dist_matrix)
linkage_matrix = linkage(condensed, method="ward")
order = leaves_list(linkage_matrix)
ordered_tickers = [corr_matrix.columns[i] for i in order]
# Reorder matrix
clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]
return clustered, returns
Note: if scipy is not available, fall back to sorting by average correlation instead of hierarchical clustering.
C2: Present results
-
Full correlation matrix — formatted as a table. For more than 8 tickers, show as a heatmap description or highlight only the strongest/weakest pairs.
-
Identified clusters — group tickers that have high intra-group correlation:
- Cluster 1: [NVDA, AMD, AVGO] — avg intra-correlation 0.82
- Cluster 2: [AAPL, MSFT] — avg intra-correlation 0.75
-
Outliers — tickers with low average correlation to the group (potential diversifiers).
-
Strongest pairs — top 5 highest-correlation pairs in the matrix.
-
Weakest pairs — top 5 lowest/negative-correlation pairs (hedging candidates).
Sub-Skill D: Realized Correlation
Goal: Show how correlation changes over time and under different market conditions.
D1: Rolling correlation
import yfinance as yf
import pandas as pd
import numpy as np
def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]):
data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False)
closes = data["Close"][[ticker_a, ticker_b]].dropna()
returns = np.log(closes / closes.shift(1)).dropna()
rolling = {}
for w in windows:
rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])
return rolling, returns
D2: Regime-conditional correlation
def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None):
"""Compare correlation across up/down/volatile regimes."""
if condition_ticker is None:
condition_ticker = ticker_a
ret = returns[condition_ticker]
regimes = {
"All Days": pd.Series(True, index=returns.index),
"Up Days (target > 0)": ret > 0,
"Down Days (target < 0)": ret < 0,
"High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
"Low Vol (bottom 25%)": ret.abs() < ret.abs().quantile(0.25),
"Large Drawdown (< -2%)": ret < -0.02,
}
results = {}
for name, mask in regimes.items():
subset = returns[mask]
if len(subset) >= 20:
results[name] = {
"correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
"days": int(mask.sum()),
}
return results
D3: Present results
- Rolling correlation summary table:
| Window | Current | Mean | Min | Max | Std |
|---|---|---|---|---|---|
| 20-day | 0.88 | 0.76 | 0.32 | 0.95 | 0.12 |
| 60-day | 0.82 | 0.78 | 0.55 | 0.92 | 0.08 |
| 120-day | 0.80 | 0.79 | 0.68 | 0.88 | 0.05 |
- Regime correlation table:
| Regime | Correlation | Days |
|---|---|---|
| All Days | 0.82 | 250 |
| Up Days | 0.75 | 132 |
| Down Days | 0.87 | 118 |
| High Vol (top 25%) | 0.90 | 63 |
| Large Drawdown (< -2%) | 0.93 | 28 |
-
Key insight: Highlight whether correlation increases during sell-offs (very common — "correlations go to 1 in a crisis"). This is critical for risk management.
-
Trend: Is correlation trending higher or lower recently vs. its historical average?
Step 3: Respond to the User
After running the appropriate sub-skill, present results clearly:
Always include
- The lookback period and data interval used
- The number of observations (trading days)
- Any tickers dropped due to insufficient data
Always caveat
- Correlation is not causation — co-movement does not imply a causal link
- Past correlation does not guarantee future correlation — regimes shift
- Short lookback windows produce noisy estimates; longer windows smooth but may miss regime changes
Practical applications (mention when relevant)
- Sympathy plays: Stocks likely to follow a peer's earnings/news move
- Pair trading: High-correlation pairs where the spread has diverged from its mean
- Portfolio diversification: Finding low-correlation assets to reduce risk
- Hedging: Identifying inversely correlated instruments
- Sector rotation: Understanding which sectors move together
- Risk management: Correlation spikes during stress — diversification may fail when needed most
Important: Never recommend specific trades. Present data and let the user draw conclusions.
Reference Files
references/sector_universes.md— Dynamic peer universe construction using yfinance Screener API
Read the reference file when you need to build a peer universe for a given ticker.