Contents

Overview

This module adds support for selected Ollama models.

Maven Coordinates

In addition to the Helidon integration with LangChain4j core dependencies, you must add the following:

<dependency>
    <groupId>io.helidon.integrations.langchain4j.providers</groupId>
    <artifactId>helidon-integrations-langchain4j-providers-ollama</artifactId>
</dependency>
Copied

Components

OllamaChatModel

To automatically create and add OllamaChatModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-chat-model:
      provider: ollama
      model-name: "llama3.1"
Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

KeyTypeDescription
base-url

string

The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.

enabled

boolean

If set to false, the component will not be available even if configured.

format

string

Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.

log-requests

boolean

Whether to log API requests.

log-responses

boolean

Whether to log API responses.

max-retries

integer

The maximum number of retries for failed API requests.

model-name

string

The model name to use.

num-predict

int

Length of the output generated by the model.

repeat-penalty

double

The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of 1.0 applies no penalty (default behavior), while values greater than 1.0 reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.

seed

int

The seed for the random number generator used by the model.

stop

string[]

List of sequences where the API will stop generating further tokens.

temperature

double

Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.

timeout

duration

The timeout setting for API requests. See here for the format.

top-k

int

Limits the token pool to the topK highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller topK (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.

top-p

double

Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

OllamaEmbeddingModel

To automatically create and add OllamaEmbeddingModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-embedding-model:
      provider: ollama
      model-name: "nomic-embed-text"
Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

KeyTypeDescription
base-url

string

The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.

enabled

boolean

If set to false, the component will not be available even if configured.

log-requests

boolean

Whether to log API requests.

log-responses

boolean

Whether to log API responses.

max-retries

integer

The maximum number of retries for failed API requests.

model-name

string

The model name to use.

timeout

duration

The timeout setting for API requests. See here for the format.

OllamaLanguageModel

To automatically create and add OllamaLanguageModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-language-model:
      provider: ollama
      model-name: "llama3.1"
Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

KeyTypeDescription
base-url

string

The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.

enabled

boolean

If set to false, the component will not be available even if configured.

format

string

Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.

log-requests

boolean

Whether to log API requests.

log-responses

boolean

Whether to log API responses.

max-retries

integer

The maximum number of retries for failed API requests.

model-name

string

The model name to use.

num-predict

int

Length of the output generated by the model.

repeat-penalty

double

The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of 1.0 applies no penalty (default behavior), while values greater than 1.0 reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.

seed

int

The seed for the random number generator used by the model.

stop

string[]

List of sequences where the API will stop generating further tokens.

temperature

double

Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.

timeout

duration

The timeout setting for API requests. See here for the format.

top-k

int

Limits the token pool to the topK highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller topK (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.

top-p

double

Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

OllamaStreamingChatModel

To automatically create and add OllamaStreamingChatModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-streaming-chat-model:
      provider: ollama
      model-name: "llama3.1"
Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

KeyTypeDescription
base-url

string

The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.

enabled

boolean

If set to false, the component will not be available even if configured.

format

string

Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.

log-requests

boolean

Whether to log API requests.

log-responses

boolean

Whether to log API responses.

max-retries

integer

The maximum number of retries for failed API requests.

model-name

string

The model name to use.

num-predict

int

Length of the output generated by the model.

repeat-penalty

double

The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of 1.0 applies no penalty (default behavior), while values greater than 1.0 reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.

seed

int

The seed for the random number generator used by the model.

stop

string[]

List of sequences where the API will stop generating further tokens.

temperature

double

Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.

timeout

duration

The timeout setting for API requests. See here for the format.

top-k

int

Limits the token pool to the topK highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller topK (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.

top-p

double

Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

Additional Information