stock-correlation

Stock Correlation Analysis Skill

Finds and analyzes correlated stocks using historical price data from Yahoo Finance via yfinance. Routes to specialized sub-skills based on user intent.

Important: This is for research and educational purposes only. Not financial advice. yfinance is not affiliated with Yahoo, Inc.

Step 1: Ensure Dependencies Are Available

Current environment status:

!python3 -c "import yfinance, pandas, numpy; print(f'yfinance={yfinance.__version__} pandas={pandas.__version__} numpy={numpy.__version__}')" 2>/dev/null || echo "DEPS_MISSING"

If DEPS_MISSING , install required packages before running any code:

import subprocess, sys subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "yfinance", "pandas", "numpy"])

If all dependencies are already installed, skip the install step and proceed directly.

Step 2: Route to the Correct Sub-Skill

Classify the user's request and jump to the matching sub-skill section below.

User Request Route To Examples

Single ticker, wants to find related stocks Sub-Skill A: Co-movement Discovery "what correlates with NVDA", "find stocks related to AMD", "sympathy plays for TSLA"

Two or more specific tickers, wants relationship details Sub-Skill B: Return Correlation "correlation between AMD and NVDA", "how do LITE and COHR move together", "compare AAPL vs MSFT"

Group of tickers, wants structure/grouping Sub-Skill C: Sector Clustering "correlation matrix for FAANG", "cluster these semiconductor stocks", "sector peers for AMD"

Wants time-varying or conditional correlation Sub-Skill D: Realized Correlation "rolling correlation AMD NVDA", "when NVDA drops what else drops", "how has correlation changed"

If ambiguous, default to Sub-Skill A (Co-movement Discovery) for single tickers, or Sub-Skill B (Return Correlation) for two tickers.

Defaults for all sub-skills

Parameter Default

Lookback period 1y (1 year)

Data interval 1d (daily)

Correlation method Pearson

Minimum correlation threshold 0.60

Number of results Top 10

Return type Daily log returns

Rolling window 60 trading days

Sub-Skill A: Co-movement Discovery

Goal: Given a single ticker, find stocks that move with it.

A1: Build the peer universe

You need 15-30 candidates. Do not use hardcoded ticker lists — build the universe dynamically at runtime. See references/sector_universes.md for the full implementation. The approach:

Screen same-industry stocks using yf.screen()

yf.EquityQuery to find stocks in the same industry as the target

Broaden to sector if the industry screen returns fewer than 10 peers
Add thematic/adjacent industries — read the target's longBusinessSummary and screen 1-2 related industries (e.g., a semiconductor company → also screen semiconductor equipment)
Combine, deduplicate, remove target ticker

A2: Compute correlations

import yfinance as yf import pandas as pd import numpy as np

def discover_comovement(target_ticker, peer_tickers, period="1y"): all_tickers = [target_ticker] + [t for t in peer_tickers if t != target_ticker] data = yf.download(all_tickers, period=period, auto_adjust=True, progress=False)

# Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))

# Log returns
returns = np.log(closes / closes.shift(1)).dropna()
corr_series = returns.corr()[target_ticker].drop(target_ticker, errors="ignore")

# Rank by absolute correlation
ranked = corr_series.abs().sort_values(ascending=False)

result = pd.DataFrame({
    "Ticker": ranked.index,
    "Correlation": [round(corr_series[t], 4) for t in ranked.index],
})
return result, returns

A3: Present results

Show a ranked table with company names and sectors (fetch via yf.Ticker(t).info.get("shortName") ):

Rank Ticker Company Correlation Why linked

1 AMD Advanced Micro Devices 0.82 Same industry — GPU/CPU

2 AVGO Broadcom 0.78 AI infrastructure peer

Include:

Top 10 positively correlated stocks
Any notable negatively correlated stocks (potential hedges)
Brief explanation of why each might be linked (sector, supply chain, customer overlap)

Sub-Skill B: Return Correlation

Goal: Deep-dive into the relationship between two (or a few) specific tickers.

B1: Download and compute

import yfinance as yf import pandas as pd import numpy as np

def return_correlation(ticker_a, ticker_b, period="1y"): data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False) closes = data["Close"][[ticker_a, ticker_b]].dropna()

returns = np.log(closes / closes.shift(1)).dropna()
corr = returns[ticker_a].corr(returns[ticker_b])

# Beta: how much does B move per unit move of A
cov_matrix = returns.cov()
beta = cov_matrix.loc[ticker_b, ticker_a] / cov_matrix.loc[ticker_a, ticker_a]

# R-squared
r_squared = corr ** 2

# Rolling 60-day correlation for stability
rolling_corr = returns[ticker_a].rolling(60).corr(returns[ticker_b])

# Spread (log price ratio) for mean-reversion
spread = np.log(closes[ticker_a] / closes[ticker_b])
spread_z = (spread - spread.mean()) / spread.std()

return {
    "correlation": round(corr, 4),
    "beta": round(beta, 4),
    "r_squared": round(r_squared, 4),
    "rolling_corr_mean": round(rolling_corr.mean(), 4),
    "rolling_corr_std": round(rolling_corr.std(), 4),
    "rolling_corr_min": round(rolling_corr.min(), 4),
    "rolling_corr_max": round(rolling_corr.max(), 4),
    "spread_z_current": round(spread_z.iloc[-1], 4),
    "observations": len(returns),
}

B2: Present results

Show a summary card:

Metric Value

Pearson Correlation 0.82

Beta (B vs A) 1.15

R-squared 0.67

Rolling Corr (60d avg) 0.80

Rolling Corr Range [0.55, 0.94]

Rolling Corr Std Dev 0.08

Spread Z-Score (current) +1.2

Observations 250

Interpretation guide:

Correlation > 0.80: Strong co-movement — these stocks are tightly linked
Correlation 0.50–0.80: Moderate — shared sector drivers but independent factors too
Correlation < 0.50: Weak — limited co-movement despite possible sector overlap
High rolling std: Unstable relationship — correlation varies significantly over time
Spread Z > |2|: Unusual divergence from historical relationship

Sub-Skill C: Sector Clustering

Goal: Given a group of tickers, show the full correlation structure and identify clusters.

C1: Build the correlation matrix

import yfinance as yf import pandas as pd import numpy as np

def sector_clustering(tickers, period="1y"): data = yf.download(tickers, period=period, auto_adjust=True, progress=False)

# yf.download returns MultiIndex (Price, Ticker) columns
closes = data["Close"].dropna(axis=1, thresh=max(60, len(data) // 2))
returns = np.log(closes / closes.shift(1)).dropna()
corr_matrix = returns.corr()

# Hierarchical clustering order
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import squareform

dist_matrix = 1 - corr_matrix.abs()
np.fill_diagonal(dist_matrix.values, 0)
condensed = squareform(dist_matrix)
linkage_matrix = linkage(condensed, method="ward")
order = leaves_list(linkage_matrix)
ordered_tickers = [corr_matrix.columns[i] for i in order]

# Reorder matrix
clustered = corr_matrix.loc[ordered_tickers, ordered_tickers]

return clustered, returns

Note: if scipy is not available, fall back to sorting by average correlation instead of hierarchical clustering.

C2: Present results

Full correlation matrix — formatted as a table. For more than 8 tickers, show as a heatmap description or highlight only the strongest/weakest pairs.

Identified clusters — group tickers that have high intra-group correlation:

Cluster 1: [NVDA, AMD, AVGO] — avg intra-correlation 0.82
Cluster 2: [AAPL, MSFT] — avg intra-correlation 0.75

Outliers — tickers with low average correlation to the group (potential diversifiers).

Strongest pairs — top 5 highest-correlation pairs in the matrix.

Weakest pairs — top 5 lowest/negative-correlation pairs (hedging candidates).

Sub-Skill D: Realized Correlation

Goal: Show how correlation changes over time and under different market conditions.

D1: Rolling correlation

import yfinance as yf import pandas as pd import numpy as np

def realized_correlation(ticker_a, ticker_b, period="2y", windows=[20, 60, 120]): data = yf.download([ticker_a, ticker_b], period=period, auto_adjust=True, progress=False) closes = data["Close"][[ticker_a, ticker_b]].dropna()

returns = np.log(closes / closes.shift(1)).dropna()

rolling = {}
for w in windows:
    rolling[f"{w}d"] = returns[ticker_a].rolling(w).corr(returns[ticker_b])

return rolling, returns

D2: Regime-conditional correlation

def regime_correlation(returns, ticker_a, ticker_b, condition_ticker=None): """Compare correlation across up/down/volatile regimes.""" if condition_ticker is None: condition_ticker = ticker_a

ret = returns[condition_ticker]

regimes = {
    "All Days": pd.Series(True, index=returns.index),
    "Up Days (target > 0)": ret > 0,
    "Down Days (target &#x3C; 0)": ret &#x3C; 0,
    "High Vol (top 25%)": ret.abs() > ret.abs().quantile(0.75),
    "Low Vol (bottom 25%)": ret.abs() &#x3C; ret.abs().quantile(0.25),
    "Large Drawdown (&#x3C; -2%)": ret &#x3C; -0.02,
}

results = {}
for name, mask in regimes.items():
    subset = returns[mask]
    if len(subset) >= 20:
        results[name] = {
            "correlation": round(subset[ticker_a].corr(subset[ticker_b]), 4),
            "days": int(mask.sum()),
        }

return results

D3: Present results

Rolling correlation summary table:

Window Current Mean Min Max Std

20-day 0.88 0.76 0.32 0.95 0.12

60-day 0.82 0.78 0.55 0.92 0.08

120-day 0.80 0.79 0.68 0.88 0.05

Regime correlation table:

Regime Correlation Days

All Days 0.82 250

Up Days 0.75 132

Down Days 0.87 118

High Vol (top 25%) 0.90 63

Large Drawdown (< -2%) 0.93 28

Key insight: Highlight whether correlation increases during sell-offs (very common — "correlations go to 1 in a crisis"). This is critical for risk management.

Trend: Is correlation trending higher or lower recently vs. its historical average?

Step 3: Respond to the User

After running the appropriate sub-skill, present results clearly:

Always include

The lookback period and data interval used
The number of observations (trading days)
Any tickers dropped due to insufficient data

Always caveat

Correlation is not causation — co-movement does not imply a causal link
Past correlation does not guarantee future correlation — regimes shift
Short lookback windows produce noisy estimates; longer windows smooth but may miss regime changes

Practical applications (mention when relevant)

Sympathy plays: Stocks likely to follow a peer's earnings/news move
Pair trading: High-correlation pairs where the spread has diverged from its mean
Portfolio diversification: Finding low-correlation assets to reduce risk
Hedging: Identifying inversely correlated instruments
Sector rotation: Understanding which sectors move together
Risk management: Correlation spikes during stress — diversification may fail when needed most

Important: Never recommend specific trades. Present data and let the user draw conclusions.

Reference Files

references/sector_universes.md — Dynamic peer universe construction using yfinance Screener API

Read the reference file when you need to build a peer universe for a given ticker.

stock-correlation

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

yfinance-data

options-payoff

generative-ui

hormuz-strait