Bilibili Video Summary Tool
Extract full content from a Bilibili video — transcript/subtitles, danmaku, comments, and description — then use your own LLM capabilities to produce a deep summary. No external AI API required (no OpenAI / Gemini key needed).
Capabilities
| Data Source | Method | Priority |
|---|---|---|
| CC Subtitles | Bilibili API | Fastest, used if available |
| Audio Transcription | whisper.cpp + Vulkan GPU | Automatic fallback when no subtitles |
| Video Description | yt-dlp | Always captured |
| Danmaku (scrolling comments) | yt-dlp | Parsed, analyzed for frequent content |
| Comments | Bilibili Comment API | Hot-sorted, deduplicated, top liked extracted |
Workflow
When you receive a Bilibili video link and are asked to summarize it, follow these steps:
Step 1: Extract all data
python bili-transcript.py "<video_url>"
The script automatically:
- Gets video title, uploader, duration, description
- Attempts Bilibili CC subtitles (fastest, used if available)
- Falls back to GPU transcription: download audio → convert to wav → whisper.cpp with Vulkan
- Downloads and analyzes danmaku (scrolling comments)
- Fetches video comments, sorted by likes
Output files are saved to ./bili-output/:
transcript.txt— full transcript/subtitle textdanmaku.json— danmaku data with statisticscomments.json— comment data with top-liked
The JSON output includes preview text, danmaku summary, and top comments.
Step 2: Read full transcript
The JSON preview truncates at 2000 characters. Read the full file:
cat ./bili-output/transcript.txt
Step 3: Read danmaku and comments
Review community response data:
cat ./bili-output/danmaku.json
cat ./bili-output/comments.json
Step 4: Compose your summary
Use your own LLM capabilities to produce a comprehensive summary. Suggested structure:
Video Overview — Title, uploader, duration, transcription source (subtitle / GPU). Key info from the description (project links, update notes, etc.).
Core Content — What the video is about. Fluent paragraph summary of the main narrative.
Key Points — Notable arguments, data points, or information worth highlighting.
Community Response (optional) — Reactions from danmaku and comments. Skip if content is insubstantial (spam, trolling, no valuable discussion).
- Danmaku analysis: look for frequently repeated phrases (community memes/reactions), informative questions, technical discussions, controversy points
- Comment analysis: look for top-liked opinions, creator interactions, user-reported issues, technical insights
Assessment (optional) — Content quality, information density, notable strengths or weaknesses.
Available Actions
# Video metadata only
python bili-transcript.py "<URL>" --action info
# CC subtitles only (if available)
python bili-transcript.py "<URL>" --action subtitle
# Force GPU transcription (skip subtitle check)
python bili-transcript.py "<URL>" --action transcribe
# Danmaku only
python bili-transcript.py "<URL>" --action danmaku
# Comments only
python bili-transcript.py "<URL>" --action comments
# Custom output directory
python bili-transcript.py "<URL>" --output ./my-output
Environment Variables
| Variable | Purpose |
|---|---|
WHISPER_CPP_DIR | Path to whisper.cpp directory (containing whisper-cli) |
WHISPER_MODEL | Path to whisper model file (e.g., ggml-large-v3-turbo.bin) |
BILI_OUTPUT_DIR | Default output directory (default: ./bili-output) |
Performance Reference
| Video Length | Total Time | Notes |
|---|---|---|
| 5 minutes | ~15s | GPU transcription is fast |
| 12 minutes | ~22s | Download + convert + transcribe |
| 1 hour | ~2-3 min | Depends on audio density |
| Danmaku/Comments | ~5-10s | Depends on comment volume |
Dependencies
- Python packages: yt-dlp, av (PyAV)
- Transcription engine: whisper.cpp with Vulkan support (optional, only needed if no CC subtitles)
- Model: ggml-large-v3-turbo.bin (~1.6GB, download separately)
- GPU: Any Vulkan-compatible GPU (NVIDIA, AMD, Intel) — auto-detected
- No external AI API keys required
Limitations
- Requires internet access to Bilibili
- Some content requires login (paid courses, restricted videos) — may fail
- Danmaku and comment APIs may be rate-limited
- whisper.cpp does not support m4a; script auto-converts via PyAV
- Very long videos (>2 hours) take significant transcription time; try
--action subtitlefirst - Comments are fetched from the first 3 pages (~60 comments); may not cover very hot videos fully