PoliBERT Sentiment Analysis
Political sentiment analysis skill powered by PoliBERTweet - a transformer model trained on 83 million political tweets (Georgetown University, LREC 2022).
Overview
This skill provides political sentiment analysis capabilities using a specialized NLP model trained on political content. It can analyze sentiment toward political candidates, issues, and events from various data sources including Reddit, local files, or direct text input.
Features
- Sentiment Classification: Support / Oppose / Neutral toward political targets
- Stance Detection: Issue-specific stance analysis (e.g., pro/anti immigration)
- Entity Targeting: Analyze sentiment toward specific politicians
- Confidence Scoring: Probability scores for each classification
- Reddit Data Integration: Auto-fetch political discussions from Reddit (free, read-only)
- Batch Processing: Analyze multiple texts from files or stdin
- JSON Output: Machine-readable results for integration with other tools
When to Use
Use this skill when you need to:
- Analyze public sentiment toward political candidates or figures
- Track political opinion trends on social media
- Complement prediction market data with social sentiment
- Monitor political discourse around specific issues
- Aggregate opinions from Reddit political communities
Model Information
- Model: PoliBERTweet
- Architecture: RoBERTa (Robustly Optimized BERT)
- Training Data: 83 million political tweets (2016-2020 US elections)
- HuggingFace Hub:
kornosk/polibertweet-political-twitter-roberta-mlm - Model Size: ~500MB
- Academic Paper: LREC 2022
- Institution: Georgetown University DataLab
Installation
Prerequisites
# Python 3.9 or higher
python --version
# Install core dependencies
pip install transformers>=4.18.0 torch>=1.10.2
# Optional: Reddit data fetching
pip install praw>=7.8.1
First Run
On first execution, the model will be automatically downloaded from HuggingFace Hub (~500MB):
python polibert_sentiment.py --text "Test"
Data Sources
| Source | Method | Cost | Data Quality | Use Case |
|---|---|---|---|---|
--reddit | Free | High | Real-time political discussions | |
| Local File | --file | - | User-dependent | Batch analysis of collected data |
| Stdin | --stdin | - | User-dependent | Pipeline integration |
| Direct Text | --text | - | User-dependent | Quick testing and single analysis |
Reddit Data
Default Subreddits: r/politics, r/Conservative, r/democrats, r/Republican, r/PoliticalDiscussion
Note: Reddit data fetching uses read-only mode (no API credentials required). Rate limits apply.
Usage Examples
1. Single Text Analysis
python polibert_sentiment.py --text "J.D. Vance is the future of the Republican party"
Output:
Text: J.D. Vance is the future of the Republican party
Sentiment: SUPPORT (78.3% confidence)
2. Reddit Sentiment Analysis
# Analyze J.D. Vance sentiment from Reddit
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 50
# Analyze specific query
python polibert_sentiment.py --query "2028 election" --reddit --limit 100
# Custom subreddits
python polibert_sentiment.py --query "climate policy" --reddit --subreddits politics,environment
3. Batch File Analysis
# File with one text per line
python polibert_sentiment.py --candidate "Trump" --file tweets.txt
4. JSON Output (for integration)
python polibert_sentiment.py --candidate "Biden" --reddit --json
Output:
{
"candidate": "Biden",
"total_analyzed": 47,
"sentiment_breakdown": {
"support": {"count": 15, "percentage": 31.9},
"oppose": {"count": 22, "percentage": 46.8},
"neutral": {"count": 10, "percentage": 21.3}
},
"net_sentiment": -14.9,
"average_confidence": 72.4
}
Integration with Other Skills
With Polymarket
Polymarket (market odds) → PoliBERT (social sentiment) → Prediction synthesis
18.6% (Vance) 35% Support Combined signal
With Prediction Skill
Use PoliBERT sentiment as an input factor in the BRACE forecasting framework:
- Base rate: Historical election patterns
- Sentiment: Social media trends (via PoliBERT)
- Market: Prediction market odds (via Polymarket)
Example Workflow
# 1. Get market data
python polymarket.py search "presidential election winner 2028" --json
# 2. Get social sentiment
python polibert_sentiment.py --candidate "J.D. Vance" --reddit --limit 100 --json
# 3. Synthesize in prediction framework
# (Use prediction skill to combine signals)
Output Format
Human-Readable Output
📊 Sentiment Analysis: J.D. Vance
Source: Reddit | Total analyzed: 47
Support: 31.9% (15)
Oppose: 46.8% (22)
Neutral: 21.3% (10)
Net Sentiment: -14.9%
Avg Confidence: 72.4%
JSON Output Structure
{
"candidate": "string",
"total_analyzed": "integer",
"sentiment_breakdown": {
"support": {"count": "integer", "percentage": "float"},
"oppose": {"count": "integer", "percentage": "float"},
"neutral": {"count": "integer", "percentage": "float"}
},
"average_confidence": "float",
"net_sentiment": "float",
"sample_results": [
{"text": "string", "sentiment": "string", "confidence": "float"}
]
}
Limitations and Considerations
Model Limitations
- Training Data: Model trained on 2016-2020 tweets, may not capture 2024-2028 linguistic patterns
- Context Sensitivity: May miss sarcasm, irony, or cultural references
- Temporal Drift: Political language evolves; model accuracy may degrade over time
- Confidence Calibration: Confidence scores are model outputs, not calibrated probabilities
Data Limitations
- Reddit Sample Bias: Reddit users skew younger, more educated, more liberal than general population
- Selection Bias: Active Reddit users are not representative voters
- Timing: Social sentiment can shift rapidly; snapshot may not represent election day mood
- Volume: Low-liquidity markets may have few social media discussions
Best Practices
- Use as one input among many, not sole prediction basis
- Combine with prediction markets, polling data, economic indicators
- Track sentiment trends over time, not single snapshots
- Adjust for platform demographics (Reddit ≠ Twitter ≠ general population)
Citation
If you use this skill or PoliBERTweet model in research, please cite:
@inproceedings{kawintiranon2022polibertweet,
title={{P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter},
author={Kawintiranon, Kornraphop and Singh, Lisa},
booktitle={Proceedings of the Language Resources and Evaluation Conference (LREC)},
year={2022},
pages={7360--7367},
publisher={European Language Resources Association}
}
License
- Skill Code: MIT License
- PoliBERTweet Model: Subject to HuggingFace Hub and original paper terms
Feedback and Contributions
- Report issues: Create GitHub issue
- Model questions: See PoliBERTweet repository
Related Skills
polymarket-unified- Prediction market data for political forecastingprediction- BRACE framework for calibrated forecastingai-model-team- Multi-model prediction system for financial markets
Version History
- v1.0.0 (2026-04-17): Initial release
- PoliBERTweet model integration
- Reddit data source support
- Sentiment analysis pipeline
- JSON and human-readable output formats
- Batch processing capabilities