spring-ai

Spring AI - Quick Reference

Full Reference: See advanced.md for image generation, multi-modal/vision, advisors/middleware, testing patterns, and prompt templates.

Deep Knowledge: Use mcp__documentation__fetch_docs with technology: spring-ai for comprehensive documentation.

Dependencies

<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency>

<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId> </dependency>

<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency>

<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId> </dependency>

Configuration

OpenAI

spring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o temperature: 0.7 max-tokens: 1000 embedding: options: model: text-embedding-3-small

Azure OpenAI

spring: ai: azure: openai: api-key: ${AZURE_OPENAI_KEY} endpoint: ${AZURE_OPENAI_ENDPOINT} chat: options: deployment-name: gpt-4o temperature: 0.7

Ollama (Local)

spring: ai: ollama: base-url: http://localhost:11434 chat: options: model: llama3 temperature: 0.7

Basic Chat

@Service @RequiredArgsConstructor public class ChatService {

private final ChatClient chatClient;

public String chat(String message) {
    return chatClient.prompt()
        .user(message)
        .call()
        .content();
}

// With system prompt
public String chatWithContext(String message) {
    return chatClient.prompt()
        .system("You are a helpful assistant specialized in Spring Boot.")
        .user(message)
        .call()
        .content();
}

// With parameters
public String chatWithParams(String message, String topic) {
    return chatClient.prompt()
        .system(s -> s.text("You are an expert in {topic}.")
            .param("topic", topic))
        .user(message)
        .call()
        .content();
}

}

ChatClient Builder

@Configuration public class ChatClientConfig {

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
    return builder
        .defaultSystem("You are a helpful AI assistant.")
        .defaultOptions(ChatOptionsBuilder.builder()
            .withTemperature(0.7)
            .withMaxTokens(1000)
            .build())
        .build();
}

}

Structured Output

public record BookRecommendation( String title, String author, String genre, String summary, int rating ) {}

@Service public class BookService {

private final ChatClient chatClient;

public BookRecommendation getRecommendation(String preferences) {
    return chatClient.prompt()
        .user("Recommend a book based on: " + preferences)
        .call()
        .entity(BookRecommendation.class);
}

public List&#x3C;BookRecommendation> getRecommendations(String preferences, int count) {
    return chatClient.prompt()
        .user("Recommend " + count + " books based on: " + preferences)
        .call()
        .entity(new ParameterizedTypeReference&#x3C;List&#x3C;BookRecommendation>>() {});
}

}

Streaming

@Service public class StreamingChatService {

private final ChatClient chatClient;

public Flux&#x3C;String> streamChat(String message) {
    return chatClient.prompt()
        .user(message)
        .stream()
        .content();
}

// WebFlux controller
@GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux&#x3C;String> streamResponse(@RequestParam String message) {
    return streamChat(message);
}

}

Function Calling

@Configuration public class FunctionConfig {

@Bean
@Description("Get current weather for a location")
public Function&#x3C;WeatherRequest, WeatherResponse> currentWeather() {
    return request -> weatherService.getWeather(request.location());
}

@Bean
@Description("Search for products by name")
public Function&#x3C;ProductSearchRequest, List&#x3C;Product>> searchProducts() {
    return request -> productService.search(request.query(), request.maxResults());
}

}

public record WeatherRequest(String location) {} public record WeatherResponse(String location, double temperature, String conditions) {}

@Service public class AssistantService {

private final ChatClient chatClient;

public String assistWithFunctions(String message) {
    return chatClient.prompt()
        .user(message)
        .functions("currentWeather", "searchProducts")
        .call()
        .content();
}

}

Embeddings

@Service @RequiredArgsConstructor public class EmbeddingService {

private final EmbeddingModel embeddingModel;

public float[] getEmbedding(String text) {
    EmbeddingResponse response = embeddingModel.embedForResponse(List.of(text));
    return response.getResult().getOutput();
}

public List&#x3C;float[]> getEmbeddings(List&#x3C;String> texts) {
    EmbeddingResponse response = embeddingModel.embedForResponse(texts);
    return response.getResults().stream()
        .map(e -> e.getOutput())
        .toList();
}

}

Vector Store (RAG)

Configuration

spring: ai: vectorstore: pgvector: dimensions: 1536 index-type: HNSW distance-type: COSINE_DISTANCE

RAG Query

@Service @RequiredArgsConstructor public class RagService {

private final VectorStore vectorStore;
private final ChatClient chatClient;

public String queryWithContext(String question) {
    // Retrieve relevant documents
    List&#x3C;Document> relevantDocs = vectorStore.similaritySearch(
        SearchRequest.query(question)
            .withTopK(5)
            .withSimilarityThreshold(0.7)
    );

    // Build context
    String context = relevantDocs.stream()
        .map(Document::getContent)
        .collect(Collectors.joining("\n\n"));

    // Generate response with context
    return chatClient.prompt()
        .system("""
            You are a helpful assistant. Answer questions based on the provided context.
            If the answer is not in the context, say "I don't have information about that."

            Context:
            {context}
            """)
        .user(question)
        .call()
        .content();
}

}

QuestionAnswerAdvisor

@Configuration public class RagConfig {

@Bean
public ChatClient ragChatClient(ChatClient.Builder builder, VectorStore vectorStore) {
    return builder
        .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
        .build();
}

}

// Usage is simple - advisor handles RAG automatically @Service public class SimpleRagService {

private final ChatClient ragChatClient;

public String answer(String question) {
    return ragChatClient.prompt()
        .user(question)
        .call()
        .content();
}

}

Best Practices

Do Don't

Use structured output for predictable results Parse free-form text manually

Implement proper error handling Ignore API failures

Use streaming for long responses Block on large generations

Cache embeddings when possible Regenerate embeddings repeatedly

Set appropriate token limits Use unlimited tokens

Production Checklist

API keys secured (environment variables)
Rate limiting implemented
Error handling and retries
Token usage monitoring
Response caching where appropriate
Vector store properly indexed
Embedding dimension consistency
Prompt injection protection
Cost monitoring and alerts
Fallback models configured

When NOT to Use This Skill

Raw OpenAI/Anthropic API - Use respective SDKs directly
ML model training - Use Python frameworks (PyTorch, TensorFlow)
Non-Spring applications - Use LangChain or native SDKs
Simple text generation - May be overkill for trivial use cases

Anti-Patterns

Anti-Pattern Problem Solution

Hardcoded API keys Security risk Use environment variables

No token limit Cost explosion Set max-tokens appropriately

Synchronous for long requests Thread blocking Use streaming

Ignoring rate limits API errors, bans Implement retry with backoff

No caching for embeddings High costs Cache embeddings locally

Prompt injection vulnerability Security risk Sanitize user input

Quick Troubleshooting

Problem Diagnostic Fix

API key invalid Check error message Verify OPENAI_API_KEY env var

Rate limit exceeded 429 error Add retry logic, reduce requests

Timeout on large prompts Connection timeout Use streaming, increase timeout

Embeddings dimension mismatch Vector store error Match embedding model dimensions

Structured output fails JSON parse error Simplify schema, add examples

Reference Documentation

Spring AI Reference

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

cron-scheduling

token-optimization

webrtc

react-19