mcp-server-evaluations

Test MCP servers for quality and reliability. Verify tool functionality, test error handling, generate tests, and assess response quality with no dependencies other than curl. Use this when validating MCP server implementations, testing OpenAPI-to-MCP conversions, or assessing API tool quality.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mcp-server-evaluations" with this command: npx skills add mcp-com-ai/mcp-server-evaluations-skills/mcp-com-ai-mcp-server-evaluations-skills-mcp-server-evaluations

MCP Server Evaluations Skill

Systematically evaluate MCP servers to ensure they function correctly, handle errors gracefully, and meet quality standards.

Workflow

Phase 1: Environment Verification

  1. Verify MCP server is running
    curl -s http://localhost:3030/health
    # Expected: 200 OK
    
    curl -s -X POST http://localhost:3030/mcp \
      -H "Content-Type: application/json" \
      -d '{"jsonrpc":"2.0","id":1,"method":"ping"}'
    # Expected: {"jsonrpc":"2.0","id":1,"result":{}}
    

Phase 2: Tool Discovery

  1. List all available tools

    curl -X POST http://localhost:3030/mcp \
      -H "Content-Type: application/json" \
      -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
    
  2. Verify tool completeness

    • All OpenAPI operations exposed as tools
    • Tool names follow consistent convention (e.g., getUsers, createOrder)
    • Descriptions are clear and actionable
    • Required vs optional parameters clearly marked
    • Parameter types match OpenAPI schema
  3. Document discovered tools — Create inventory of tools for systematic testing.

Phase 3: Functional Testing

For each discovered tool:

  1. Basic functionality test

    curl -X POST http://localhost:3030/mcp \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
          "name": "<tool_name>",
          "arguments": { <valid_arguments> }
        },
        "id": 2
      }'
    
  2. Verify response structure

    • Response contains expected data
    • Data types match schema
    • No unexpected null values
    • Pagination works (if applicable)
  3. Error handling test — Call with invalid/missing arguments:

    curl -X POST http://localhost:3030/mcp \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
          "name": "<tool_name>",
          "arguments": {}
        },
        "id": 3
      }'
    
  4. Verify error response quality

    • Error message is actionable
    • Missing required parameters identified
    • HTTP status codes propagated correctly

Phase 4: Question-Based Evaluation

Generate and test with realistic user questions:

  1. Generate 10+ test questions covering:

    • Simple single-tool queries
    • Multi-step workflows requiring multiple tools
    • Edge cases (empty results, large datasets)
    • Error scenarios (invalid IDs, unauthorized access)
  2. Execute each question through MCP client or Inspector

  3. Score responses using evaluation criteria:

    • Correctness: Does the answer match expected result?
    • Completeness: Is all relevant information included?
    • Clarity: Is the response well-structured?
    • Performance: Response time within acceptable limits?

Phase 5: Quality Scoring

Calculate overall quality score:

CategoryWeightCriteria
Tool Discovery20%All operations exposed, proper naming
Basic Functionality30%Valid inputs return correct responses
Error Handling20%Graceful errors with actionable messages
Question Accuracy20%Test questions answered correctly
Performance10%Response times < 5s for standard ops

Pass threshold: 80% overall score

Quick Evaluation Checklist

Run this minimal check for fast validation:

# 1. Health check
curl -s http://localhost:3030/health | grep -q "" && echo "✓ Health OK" || echo "✗ Health FAILED"

# 2. MCP ping
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"

# 3. Tools list
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools | length' | xargs -I {} echo "✓ {} tools discovered"

# 4. Sample tool call (adjust tool name and args)
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"listPets","arguments":{}},"id":2}' | jq '.result' > /dev/null && echo "✓ Tool call OK" || echo "✗ Tool call FAILED"

Test Question Templates

Use these patterns to generate effective test questions:

  1. List/Query: "Show me all [resources] that match [criteria]"
  2. Get Details: "What are the details of [resource] with ID [id]?"
  3. Create: "Create a new [resource] with [properties]"
  4. Update: "Update [resource] [id] to change [field] to [value]"
  5. Delete: "Remove [resource] with ID [id]"
  6. Aggregate: "How many [resources] exist with [status]?"
  7. Search: "Find [resources] where [field] contains [term]"
  8. Workflow: "Create a [resource], then update it, then list all"

References

For detailed documentation:

Example: Petstore API Evaluation

# 1. Run health checks
curl -s http://localhost:3030/health
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"

# 2. Tool discovery
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools'

# 3. Test questions:
# - "List all available pets"
# - "Show details of pet with ID 1"
# - "Find pets with status 'available'"
# - "Create a new pet named 'Fluffy'"

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

swagger-to-mcp

No summary provided by upstream source.

Repository SourceNeeds Review
General

Jimeng AI Image Generation

即梦 AI 图片生成技能(火山引擎图片生成 4.0)。当用户想要 AI 生成图片、文生图、图生图、 字体设计、海报制作时使用。支持场景: - "帮我生成一张图片:..." - "用即梦画一张 16:9 的科技感壁纸" - "字体设计:新年快乐,红色背景" - "把这张图的背景换成星空" - "生成一组表情包" -...

Registry SourceRecently Updated
General

Oven

Lightweight Oven tracker. Add entries, view stats, search history, and export in multiple formats.

Registry SourceRecently Updated
General

FW Trading

Fosun Wealth OpenAPI 技能集合,包含 SDK 环境初始化与证券交易两大模块。涵盖 SDK 安装配置、凭证管理、行情查询、资金/持仓查询、资金流水查询、下单/撤单及订单管理,支持港股(L2)、美股(L1)、A股港股通(L1)市场。

Registry SourceRecently Updated