spring-ai

Spring AI - Quick Reference

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "spring-ai" with this command: npx skills add claude-dev-suite/claude-dev-suite/claude-dev-suite-claude-dev-suite-spring-ai

Spring AI - Quick Reference

Full Reference: See advanced.md for image generation, multi-modal/vision, advisors/middleware, testing patterns, and prompt templates.

Deep Knowledge: Use mcp__documentation__fetch_docs with technology: spring-ai for comprehensive documentation.

Dependencies

<!-- OpenAI --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency>

<!-- Azure OpenAI --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId> </dependency>

<!-- Ollama (local) --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency>

<!-- Vector Store - PGVector --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency>

Configuration

OpenAI

spring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o temperature: 0.7 max-tokens: 1000 embedding: options: model: text-embedding-3-small

Azure OpenAI

spring: ai: azure: openai: api-key: ${AZURE_OPENAI_KEY} endpoint: ${AZURE_OPENAI_ENDPOINT} chat: options: deployment-name: gpt-4o temperature: 0.7

Ollama (Local)

spring: ai: ollama: base-url: http://localhost:11434 chat: options: model: llama3 temperature: 0.7

Basic Chat

@Service @RequiredArgsConstructor public class ChatService {

private final ChatClient chatClient;

public String chat(String message) {
    return chatClient.prompt()
        .user(message)
        .call()
        .content();
}

// With system prompt
public String chatWithContext(String message) {
    return chatClient.prompt()
        .system("You are a helpful assistant specialized in Spring Boot.")
        .user(message)
        .call()
        .content();
}

// With parameters
public String chatWithParams(String message, String topic) {
    return chatClient.prompt()
        .system(s -> s.text("You are an expert in {topic}.")
            .param("topic", topic))
        .user(message)
        .call()
        .content();
}

}

ChatClient Builder

@Configuration public class ChatClientConfig {

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
    return builder
        .defaultSystem("You are a helpful AI assistant.")
        .defaultOptions(ChatOptionsBuilder.builder()
            .withTemperature(0.7)
            .withMaxTokens(1000)
            .build())
        .build();
}

}

Structured Output

public record BookRecommendation( String title, String author, String genre, String summary, int rating ) {}

@Service public class BookService {

private final ChatClient chatClient;

public BookRecommendation getRecommendation(String preferences) {
    return chatClient.prompt()
        .user("Recommend a book based on: " + preferences)
        .call()
        .entity(BookRecommendation.class);
}

public List&#x3C;BookRecommendation> getRecommendations(String preferences, int count) {
    return chatClient.prompt()
        .user("Recommend " + count + " books based on: " + preferences)
        .call()
        .entity(new ParameterizedTypeReference&#x3C;List&#x3C;BookRecommendation>>() {});
}

}

Streaming

@Service public class StreamingChatService {

private final ChatClient chatClient;

public Flux&#x3C;String> streamChat(String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

// WebFlux controller
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux&#x3C;String> streamResponse(@RequestParam String message) {
    return streamChat(message);
}

}

Function Calling

@Configuration public class FunctionConfig {

@Bean
@Description("Get current weather for a location")
public Function&#x3C;WeatherRequest, WeatherResponse> currentWeather() {
    return request -> weatherService.getWeather(request.location());
}

@Bean
@Description("Search for products by name")
public Function&#x3C;ProductSearchRequest, List&#x3C;Product>> searchProducts() {
    return request -> productService.search(request.query(), request.maxResults());
}

}

public record WeatherRequest(String location) {} public record WeatherResponse(String location, double temperature, String conditions) {}

@Service public class AssistantService {

private final ChatClient chatClient;

public String assistWithFunctions(String message) {
    return chatClient.prompt()
        .user(message)
        .functions("currentWeather", "searchProducts")
        .call()
        .content();
}

}

Embeddings

@Service @RequiredArgsConstructor public class EmbeddingService {

private final EmbeddingModel embeddingModel;

public float[] getEmbedding(String text) {
    EmbeddingResponse response = embeddingModel.embedForResponse(List.of(text));
    return response.getResult().getOutput();
}

public List&#x3C;float[]> getEmbeddings(List&#x3C;String> texts) {
    EmbeddingResponse response = embeddingModel.embedForResponse(texts);
    return response.getResults().stream()
        .map(e -> e.getOutput())
        .toList();
}

}

Vector Store (RAG)

Configuration

spring: ai: vectorstore: pgvector: dimensions: 1536 index-type: HNSW distance-type: COSINE_DISTANCE

RAG Query

@Service @RequiredArgsConstructor public class RagService {

private final VectorStore vectorStore;
private final ChatClient chatClient;

public String queryWithContext(String question) {
    // Retrieve relevant documents
    List&#x3C;Document> relevantDocs = vectorStore.similaritySearch(
        SearchRequest.query(question)
            .withTopK(5)
            .withSimilarityThreshold(0.7)
    );

    // Build context
    String context = relevantDocs.stream()
        .map(Document::getContent)
        .collect(Collectors.joining("\n\n"));

    // Generate response with context
    return chatClient.prompt()
        .system("""
            You are a helpful assistant. Answer questions based on the provided context.
            If the answer is not in the context, say "I don't have information about that."

            Context:
            {context}
            """)
        .user(question)
        .call()
        .content();
}

}

QuestionAnswerAdvisor

@Configuration public class RagConfig {

@Bean
public ChatClient ragChatClient(ChatClient.Builder builder, VectorStore vectorStore) {
    return builder
        .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
        .build();
}

}

// Usage is simple - advisor handles RAG automatically @Service public class SimpleRagService {

private final ChatClient ragChatClient;

public String answer(String question) {
    return ragChatClient.prompt()
        .user(question)
        .call()
        .content();
}

}

Best Practices

Do Don't

Use structured output for predictable results Parse free-form text manually

Implement proper error handling Ignore API failures

Use streaming for long responses Block on large generations

Cache embeddings when possible Regenerate embeddings repeatedly

Set appropriate token limits Use unlimited tokens

Production Checklist

  • API keys secured (environment variables)

  • Rate limiting implemented

  • Error handling and retries

  • Token usage monitoring

  • Response caching where appropriate

  • Vector store properly indexed

  • Embedding dimension consistency

  • Prompt injection protection

  • Cost monitoring and alerts

  • Fallback models configured

When NOT to Use This Skill

  • Raw OpenAI/Anthropic API - Use respective SDKs directly

  • ML model training - Use Python frameworks (PyTorch, TensorFlow)

  • Non-Spring applications - Use LangChain or native SDKs

  • Simple text generation - May be overkill for trivial use cases

Anti-Patterns

Anti-Pattern Problem Solution

Hardcoded API keys Security risk Use environment variables

No token limit Cost explosion Set max-tokens appropriately

Synchronous for long requests Thread blocking Use streaming

Ignoring rate limits API errors, bans Implement retry with backoff

No caching for embeddings High costs Cache embeddings locally

Prompt injection vulnerability Security risk Sanitize user input

Quick Troubleshooting

Problem Diagnostic Fix

API key invalid Check error message Verify OPENAI_API_KEY env var

Rate limit exceeded 429 error Add retry logic, reduce requests

Timeout on large prompts Connection timeout Use streaming, increase timeout

Embeddings dimension mismatch Vector store error Match embedding model dimensions

Structured output fails JSON parse error Simplify schema, add examples

Reference Documentation

  • Spring AI Reference

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cron-scheduling

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

token-optimization

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

webrtc

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

react-19

No summary provided by upstream source.

Repository SourceNeeds Review