Ollama

Overview

This module adds support for selected Ollama models.

Maven Coordinates

In addition to the Helidon integration with LangChain4j core dependencies, you must add the following:

<dependency>
    <groupId>io.helidon.integrations.langchain4j.providers</groupId>
    <artifactId>helidon-integrations-langchain4j-providers-ollama</artifactId>
</dependency>

Copied

Components

OllamaChatModel

To automatically create and add OllamaChatModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-chat-model:
      provider: ollama
      model-name: "llama3.1"

Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

Key	Type	Description
`base-url`	string	The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.
`enabled`	boolean	If set to false, the component will not be available even if configured.
`format`	string	Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.
`log-requests`	boolean	Whether to log API requests.
`log-responses`	boolean	Whether to log API responses.
`max-retries`	integer	The maximum number of retries for failed API requests.
`model-name`	string	The model name to use.
`num-predict`	int	Length of the output generated by the model.
`repeat-penalty`	double	The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of `1.0` applies no penalty (default behavior), while values greater than `1.0` reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.
`seed`	int	The seed for the random number generator used by the model.
`stop`	string[]	List of sequences where the API will stop generating further tokens.
`temperature`	double	Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.
`timeout`	duration	The timeout setting for API requests. See here for the format.
`top-k`	int	Limits the token pool to the `topK` highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller `topK` (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.
`top-p`	double	Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

OllamaEmbeddingModel

To automatically create and add OllamaEmbeddingModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-embedding-model:
      provider: ollama
      model-name: "nomic-embed-text"

Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

Key	Type	Description
`base-url`	string	The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.
`enabled`	boolean	If set to false, the component will not be available even if configured.
`log-requests`	boolean	Whether to log API requests.
`log-responses`	boolean	Whether to log API responses.
`max-retries`	integer	The maximum number of retries for failed API requests.
`model-name`	string	The model name to use.
`timeout`	duration	The timeout setting for API requests. See here for the format.

OllamaLanguageModel

To automatically create and add OllamaLanguageModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-language-model:
      provider: ollama
      model-name: "llama3.1"

Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

Key	Type	Description
`base-url`	string	The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.
`enabled`	boolean	If set to false, the component will not be available even if configured.
`format`	string	Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.
`log-requests`	boolean	Whether to log API requests.
`log-responses`	boolean	Whether to log API responses.
`max-retries`	integer	The maximum number of retries for failed API requests.
`model-name`	string	The model name to use.
`num-predict`	int	Length of the output generated by the model.
`repeat-penalty`	double	The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of `1.0` applies no penalty (default behavior), while values greater than `1.0` reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.
`seed`	int	The seed for the random number generator used by the model.
`stop`	string[]	List of sequences where the API will stop generating further tokens.
`temperature`	double	Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.
`timeout`	duration	The timeout setting for API requests. See here for the format.
`top-k`	int	Limits the token pool to the `topK` highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller `topK` (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.
`top-p`	double	Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

OllamaStreamingChatModel

To automatically create and add OllamaStreamingChatModel to the service registry add the following lines to application.yaml:

langchain4j:
  providers:
    ollama:
      base-url: "http://localhost:11434"

  models:
    ollama-streaming-chat-model:
      provider: ollama
      model-name: "llama3.1"

Copied

If enabled is set to false, the configuration is ignored, and the component is not created.

Full list of configuration properties:

Key	Type	Description
`base-url`	string	The base URL for the Ollama API. If not present, the default value supplied from LangChain4j is used.
`enabled`	boolean	If set to false, the component will not be available even if configured.
`format`	string	Specifies the structure or style of the text produced by the model, such as plain text, JSON, or a custom format.
`log-requests`	boolean	Whether to log API requests.
`log-responses`	boolean	Whether to log API responses.
`max-retries`	integer	The maximum number of retries for failed API requests.
`model-name`	string	The model name to use.
`num-predict`	int	Length of the output generated by the model.
`repeat-penalty`	double	The penalty applied to repeated tokens during text generation. Higher values discourage the model from generating the same token multiple times, promoting more varied and natural output. A value of `1.0` applies no penalty (default behavior), while values greater than `1.0` reduce the likelihood of repetition. Excessively high values may overly penalize common phrases, leading to unnatural results.
`seed`	int	The seed for the random number generator used by the model.
`stop`	string[]	List of sequences where the API will stop generating further tokens.
`temperature`	double	Sampling temperature to use, between 0 and 2. Higher values make the output more random, while lower values make it more focused and deterministic.
`timeout`	duration	The timeout setting for API requests. See here for the format.
`top-k`	int	Limits the token pool to the `topK` highest-probability tokens, controlling the balance between deterministic and diverse outputs. A smaller `topK` (e.g., 1) results in deterministic output, while a larger value (e.g., 50) allows for more variability and creativity.
`top-p`	double	Nucleus sampling value, where the model considers the results of the tokens with top_p probability mass.

Overview

Maven Coordinates

Components

OllamaChatModel

OllamaEmbeddingModel

OllamaLanguageModel

OllamaStreamingChatModel

Additional Information