File to Markdown Converter

# File to Markdown — Skill ## Overview Convert files into **clean, structured, AI-ready Markdown** using the `markdown.new` API powered by **Cloudflare Workers AI toMarkdown()**. Supports 20+ formats including documents, spreadsheets, images, and structured data. No authentication required (500 requests/day per IP). --- ## When to Use This Skill Use this skill whenever you need to: * Extract text from files for LLM processing * Convert PDFs or Office files into Markdown * Normalize data into structured text * Process uploaded user files * Scrape webpage content into Markdown * Convert images into AI-generated descriptions + content Common AI workflows: * RAG ingestion pipelines * Knowledge base creation * Document summarization * Dataset extraction * Spreadsheet analysis * OCR-like extraction from images --- ## Supported Formats ### Documents * `.pdf` * `.docx` * `.odt` ### Spreadsheets * `.xlsx` * `.xls` * `.xlsm` * `.xlsb` * `.et` * `.ods` * `.numbers` ### Images * `.jpg` * `.jpeg` * `.png` * `.webp` * `.svg` ### Text & Structured Data * `.txt` * `.md` * `.csv` * `.json` * `.xml` * `.html` * `.htm` Notes: * Image conversion uses AI object detection + summarization. * HTML URL conversion uses a web page pipeline. * Uploaded HTML uses Workers AI conversion. --- ## API Base URL ``` https://markdown.new ``` --- ## Endpoints ### 1️⃣ Convert Remote File (Simple GET) Returns plain Markdown text. ``` GET /:file-url ``` Example: ```bash curl -s "https://markdown.new/https://example.com/report.pdf" ``` --- ### 2️⃣ Convert Remote File (JSON Response) Returns metadata + Markdown. ``` GET /:file-url?format=json ``` Example: ```bash curl -s "https://markdown.new/https://example.com/report.pdf?format=json" ``` --- ### 3️⃣ Convert Remote File via POST Use when you want structured JSON response. ``` POST / Content-Type: application/json ``` Body: ```json { "url": "https://example.com/report.pdf" } ``` Example: ```bash curl -s https://markdown.new/ \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com/report.pdf"}' ``` --- ### 4️⃣ Upload Local File Use when file is not publicly accessible. ``` POST /convert multipart/form-data ``` Example: ```bash curl -s https://markdown.new/convert \ -F "file=@document.pdf" ``` --- ## Response Formats ### URL Conversion Response ```json { "success": true, "url": "https://example.com/report.pdf", "title": "Quarterly Report", "content": "# Quarterly Report\n\n...", "method": "Workers AI (file)", "duration_ms": 1200, "tokens": 850 } ``` --- ### Upload Conversion Response ```json { "success": true, "data": { "title": "Q4 Report", "content": "# Q4 Report\n\n...", "filename": "report.xlsx", "file_type": ".xlsx", "tokens": 1250, "processing_time_ms": 320 } } ``` --- ## Best Practices for AI Agents ### Prefer GET for Simple Workflows Use: ``` GET /:url ``` When: * You only need Markdown text * Speed is important * No metadata required --- ### Prefer POST for Structured Pipelines Use POST when: * Metadata is needed * Token counts are required * Monitoring or logging is implemented * Building automation workflows --- ### File Upload Strategy Use `/convert` only if: * File is local * File is private * File requires authentication to access Otherwise always prefer URL conversion. --- ## Error Handling Strategy Agents should: 1. Check `"success": true` 2. Retry once if network failure 3. Validate content length > 0 4. Fallback to alternate extraction if needed --- ## Rate Limits * 500 requests/day per IP without API key * No signup required Agents should: * Cache results when possible * Avoid duplicate conversions --- ## Integration Examples ### JavaScript (Node.js) ```js const res = await fetch("https://markdown.new/", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ url: "https://example.com/file.pdf" }) }); const data = await res.json(); console.log(data.content); ``` --- ### Python ```python import requests res = requests.post( "https://markdown.new/", json={"url": "https://example.com/file.pdf"} ) data = res.json() print(data["content"]) ``` --- ## Agent Decision Tree If user provides: | Input Type | Action | | --------------- | ---------------------- | | Public file URL | Use GET or POST | | Local file | Use POST /convert | | Image | Convert then summarize | | Spreadsheet | Convert then analyze | | Webpage | Convert URL HTML | --- ## Output Expectations The Markdown should be: * Clean * Structured * AI-friendly * Minimal noise * Ready for LLM ingestion --- ## Limitations * Complex PDF layouts may lose formatting * Large spreadsheets may be truncated * Images rely on AI interpretation accuracy * Token limits may apply --- ## Summary This skill provides a **universal file-to-Markdown conversion layer** for AI systems with: * No authentication * Simple HTTP interface * Multi-format support * Structured output * Fast processing Ideal for document ingestion, RAG pipelines, and automation agents. ---

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "File to Markdown Converter" with this command: npx skills add alaminrifat/file-to-markdown

File to Markdown — Skill

Overview

Convert files into clean, structured, AI-ready Markdown using the markdown.new API powered by Cloudflare Workers AI toMarkdown().

Supports 20+ formats including documents, spreadsheets, images, and structured data.

No authentication required (500 requests/day per IP).

When to Use This Skill

Use this skill whenever you need to:

Extract text from files for LLM processing
Convert PDFs or Office files into Markdown
Normalize data into structured text
Process uploaded user files
Scrape webpage content into Markdown
Convert images into AI-generated descriptions + content

Common AI workflows:

RAG ingestion pipelines
Knowledge base creation
Document summarization
Dataset extraction
Spreadsheet analysis
OCR-like extraction from images

Supported Formats

Documents

.pdf
.docx
.odt

Spreadsheets

.xlsx
.xls
.xlsm
.xlsb
.et
.ods
.numbers

Images

.jpg
.jpeg
.png
.webp
.svg

Text & Structured Data

.txt
.md
.csv
.json
.xml
.html
.htm

Notes:

Image conversion uses AI object detection + summarization.
HTML URL conversion uses a web page pipeline.
Uploaded HTML uses Workers AI conversion.

API Base URL

https://markdown.new

Endpoints

1️⃣ Convert Remote File (Simple GET)

Returns plain Markdown text.

GET /:file-url

Example:

curl -s "https://markdown.new/https://example.com/report.pdf"

2️⃣ Convert Remote File (JSON Response)

Returns metadata + Markdown.

GET /:file-url?format=json

Example:

curl -s "https://markdown.new/https://example.com/report.pdf?format=json"

3️⃣ Convert Remote File via POST

Use when you want structured JSON response.

POST /
Content-Type: application/json

Body:

{
  "url": "https://example.com/report.pdf"
}

Example:

curl -s https://markdown.new/ \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/report.pdf"}'

4️⃣ Upload Local File

Use when file is not publicly accessible.

POST /convert
multipart/form-data

Example:

curl -s https://markdown.new/convert \
  -F "file=@document.pdf"

Response Formats

URL Conversion Response

{
  "success": true,
  "url": "https://example.com/report.pdf",
  "title": "Quarterly Report",
  "content": "# Quarterly Report\n\n...",
  "method": "Workers AI (file)",
  "duration_ms": 1200,
  "tokens": 850
}

Upload Conversion Response

{
  "success": true,
  "data": {
    "title": "Q4 Report",
    "content": "# Q4 Report\n\n...",
    "filename": "report.xlsx",
    "file_type": ".xlsx",
    "tokens": 1250,
    "processing_time_ms": 320
  }
}

Best Practices for AI Agents

Prefer GET for Simple Workflows

Use:

GET /:url

When:

You only need Markdown text
Speed is important
No metadata required

Prefer POST for Structured Pipelines

Use POST when:

Metadata is needed
Token counts are required
Monitoring or logging is implemented
Building automation workflows

File Upload Strategy

Use /convert only if:

File is local
File is private
File requires authentication to access

Otherwise always prefer URL conversion.

Error Handling Strategy

Agents should:

Check "success": true
Retry once if network failure
Validate content length > 0
Fallback to alternate extraction if needed

Rate Limits

500 requests/day per IP without API key
No signup required

Agents should:

Cache results when possible
Avoid duplicate conversions

Integration Examples

JavaScript (Node.js)

const res = await fetch("https://markdown.new/", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    url: "https://example.com/file.pdf"
  })
});

const data = await res.json();
console.log(data.content);

Python

import requests

res = requests.post(
    "https://markdown.new/",
    json={"url": "https://example.com/file.pdf"}
)

data = res.json()
print(data["content"])

Agent Decision Tree

If user provides:

Input Type	Action
Public file URL	Use GET or POST
Local file	Use POST /convert
Image	Convert then summarize
Spreadsheet	Convert then analyze
Webpage	Convert URL HTML

Output Expectations

The Markdown should be:

Clean
Structured
AI-friendly
Minimal noise
Ready for LLM ingestion

Limitations

Complex PDF layouts may lose formatting
Large spreadsheets may be truncated
Images rely on AI interpretation accuracy
Token limits may apply

Summary

This skill provides a universal file-to-Markdown conversion layer for AI systems with:

No authentication
Simple HTTP interface
Multi-format support
Structured output
Fast processing

Ideal for document ingestion, RAG pipelines, and automation agents.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Open Registry Record Open in ClawHub

Related Skills

Related by shared tags or category signals.

General

Markdown to PDF (Styled)

Convert Markdown files to styled PDFs using pandoc and wkhtmltopdf with built-in or custom CSS style options.

Registry SourceRecently Updated

7140Profile unavailable

General

Doc Genius

支持PDF、Word、Markdown智能摘要和格式转换，提供批量处理与进度报告，提升文档处理效率。

Registry SourceRecently Updated

3090Profile unavailable

General

Mxe

Convert Markdown files to PDF, DOCX, or HTML with advanced formatting, Mermaid diagrams, custom fonts, and table of contents support.

Registry SourceRecently Updated

1.7K0Profile unavailable

General

claw-text-and-pics

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

Registry SourceRecently Updated

1070Profile unavailable