Contents
Overview
This module adds support for selected Oracle Cloud Infrastructure GenAI models.
Maven Coordinates
In addition to the Helidon integration with LangChain4j core dependencies, you must add the following:
<dependency>
<groupId>io.helidon.integrations.langchain4j.providers</groupId>
<artifactId>helidon-integrations-langchain4j-providers-oci-genai</artifactId>
</dependency>Authentication
Integration uses OCI SDK authentication provider bean from the service registry. The simplest way to configure it is by adding Helidon OCI integration:
<dependency>
<groupId>io.helidon.integrations.oci</groupId>
<artifactId>helidon-integrations-oci</artifactId>
</dependency>
<!-- Jakartified OCI SDK HTTP client -->
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-common-httpclient-jersey3</artifactId>
<scope>runtime</scope>
</dependency>Helidon OCI integration makes OCI Authentication provider available as a Helidon service registry bean so LangChain4j OCI GenAI can automatically discover it.
Example of OCI specific configuration file oci-config.yaml:
helidon.oci:
# "config-file" value can instruct integration to
# load values from `~/.oci/config` file
authentication-method: "config"
authentication:
config:
region: eu-frankfurt-1
fingerprint: "b7:a7:9c:7f:57:a7:74:ad:c2:fa:d4:31:06:b5:02:f5"
tenant-id: "ocid1.tenancy.oc1...."
user-id: "ocid1.user.oc1....."
private-key:
path: "/secrets/oci_ai_api_key.pem"More authentication methods are available like oke-workload-identity or resource-principal, for example, authentication method config-file can instruct integration to use ~/.oci/config file:
helidon.oci:
authentication-method: "config-file"All possible OCI configuration properties are documented at OCI Configuration.
More general information about Helidon OCI authentication integration can be found in Helidon OCI integration
Components
OciGenAiChatModel
To automatically create and add OciGenAiChatModel to the service registry add the following lines to application.yaml:
langchain4j:
providers:
oci-gen-ai:
compartment-id: "ocid1.tenancy.oc1...."
region: EU_FRANKFURT_1
models:
oci-genai-chat-model:
provider: oci-gen-ai
model-name: meta.llama-3.3-70b-instructIf enabled is set to false, the configuration is ignored, and the component is not created.
Full list of configuration properties:
| Key | Type | Description |
|---|---|---|
enabled | boolean | If set to false, the component will not be available even if configured. |
model-name | string | The model name or model’s OCID to use. |
compartment-id | OCID | OCI Compartment OCID |
region | enum | Explicit region. If not configured, the current one is resolved. |
auth-provider | injected bean | Injected is default bean if exists, named bean can be configured with auth-provider.service-registry.named: beanName |
gen-ai-client | injected bean | Manually configured OCI SDK GenAi client. When set, values provided with region and auth-provider are ignored. Injected is default bean if exists, named bean can be configured with gen-ai-client.service-registry.named: beanName |
serving-type | enum | The model’s serving mode, which is either on-demand serving or dedicated serving. |
top-k | int | The maximum number of top-probability tokens to consider when generating text. |
top-p | double between 0 and 1 | If set to a probability 0.0 < p < 1.0, it ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. |
seed | int | The seed for the random number generator used by the model. |
temperature | double > 0 | A number that sets the randomness of the generated output. A lower temperature means a less random generations. Use lower numbers for tasks with a correct answer such as question answering or summarizing. High temperatures can generate hallucinations or factually incorrect information. Start with temperatures lower than 1.0 and increase the temperature for more creative outputs, as you regenerate the prompts to refine the outputs. Default is 1. |
presence-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on whether they’ve appeared in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Similar to frequency penalty, a penalty is applied to previously present tokens, except that this penalty is applied equally to all tokens that have already appeared, regardless of how many times they’ve appeared. Set to 0 to disable. Default is 0. |
stop | list of strings | List of strings that stop the generation if they are generated for the response text. The returned output will not contain the stop strings. |
max-tokens | integer > 1 | The maximum number of tokens that can be generated per output sequence. The token count of your prompt plus maxTokens must not exceed the model’s context length. Not setting a value for maxTokens results in the possible use of model’s full context length. |
frequency-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Set to 0 to disable. Default is 0. |
num-generations | int between 1 and 5 | The number of generated texts that will be returned. To eliminate tokens with low likelihood, assign p a minimum percentage for the next token’s likelihood. For example, when p is set to 0.75, the model eliminates the bottom 25 percent for the next token. Set to 1 to consider all tokens and set to 0 to disable. If both k and p are enabled, p acts after k. |
log-probs | int > 0 | Includes the logarithmic probabilities for the most likely output tokens and the chosen tokens. For example, if the log probability is 5, the API returns a list of the 5 most likely tokens. The API returns the log probability of the sampled token, so there might be up to logprobs+1 elements in the response. |
logit-bias | json | Modifies the likelihood of specified tokens that appear in the completion. Example: {"6395": 2, "8134": 1, "21943": 0.5, "5923": -100} |
OciGenAiStreamingChatModel
To automatically create and add OciGenAiStreamingChatModel to the service registry add the following lines to application.yaml:
langchain4j:
providers:
oci-gen-ai:
compartment-id: "ocid1.tenancy.oc1...."
region: EU_FRANKFURT_1
models:
oci-genai-streaming-chat-model:
provider: oci-gen-ai
model-name: meta.llama-3.3-70b-instructIf enabled is set to false, the configuration is ignored, and the component is not created.
Full list of configuration properties:
| Key | Type | Description |
|---|---|---|
enabled | boolean | If set to false, the component will not be available even if configured. |
model-name | string | The model name or model’s OCID to use. |
compartment-id | OCID | OCI Compartment OCID |
region | enum | Explicit region. If not configured, the current one is resolved. |
auth-provider | injected bean | Injected is default bean if exists, named bean can be configured with |
gen-ai-client | injected bean | Manually configured OCI SDK GenAi client. When set, values provided with |
serving-type | enum | The model’s serving mode, which is either on-demand serving or dedicated serving. |
top-k | int | The maximum number of top-probability tokens to consider when generating text. |
top-p | double between 0 and 1 | If set to a probability 0.0 < p < 1.0, it ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. |
seed | int | The seed for the random number generator used by the model. |
temperature | double > 0 | A number that sets the randomness of the generated output. A lower temperature means a less random generations. Use lower numbers for tasks with a correct answer such as question answering or summarizing. High temperatures can generate hallucinations or factually incorrect information. Start with temperatures lower than 1.0 and increase the temperature for more creative outputs, as you regenerate the prompts to refine the outputs. Default is 1. |
presence-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on whether they’ve appeared in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Similar to frequency penalty, a penalty is applied to previously present tokens, except that this penalty is applied equally to all tokens that have already appeared, regardless of how many times they’ve appeared. Set to 0 to disable. Default is 0. |
stop | list of strings | List of strings that stop the generation if they are generated for the response text. The returned output will not contain the stop strings. |
max-tokens | integer > 1 | The maximum number of tokens that can be generated per output sequence. The token count of your prompt plus maxTokens must not exceed the model’s context length. Not setting a value for maxTokens results in the possible use of model’s full context length. |
frequency-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Set to 0 to disable. Default is 0. |
num-generations | int between 1 and 5 | The number of generated texts that will be returned. To eliminate tokens with low likelihood, assign p a minimum percentage for the next token’s likelihood. For example, when p is set to 0.75, the model eliminates the bottom 25 percent for the next token. Set to 1 to consider all tokens and set to 0 to disable. If both k and p are enabled, p acts after k. |
log-probs | int > 0 | Includes the logarithmic probabilities for the most likely output tokens and the chosen tokens. For example, if the log probability is 5, the API returns a list of the 5 most likely tokens. The API returns the log probability of the sampled token, so there might be up to logprobs+1 elements in the response. |
logit-bias | Modifies the likelihood of specified tokens that appear in the completion. Example: |
OciGenAiCohereChatModel
To automatically create and add OciGenAiChatModel to the service registry add the following lines to application.yaml:
langchain4j:
providers:
oci-gen-ai-cohere:
compartment-id: "ocid1.tenancy.oc1...."
region: EU_FRANKFURT_1
models:
oci-genai-cohere-chat-model:
provider: oci-gen-ai-cohere
model-name: meta.llama-3.3-70b-instructIf enabled is set to false, the configuration is ignored, and the component is not created.
Full list of configuration properties:
| Key | Type | Description |
|---|---|---|
enabled | boolean | If set to false, the component will not be available even if configured. |
model-name | string | The model name or model’s OCID to use. |
compartment-id | OCID | OCI Compartment OCID |
region | enum | Explicit region. If not configured, the current one is resolved. |
auth-provider | injected bean | Injected is default bean if exists, named bean can be configured with |
gen-ai-client | injected bean | Manually configured OCI SDK GenAi client. When set, values provided with |
serving-type | enum | The model’s serving mode, which is either on-demand serving or dedicated serving. |
top-k | int | The maximum number of top-probability tokens to consider when generating text. |
top-p | double between 0 and 1 | If set to a probability 0.0 < p < 1.0, it ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. |
seed | int | The seed for the random number generator used by the model. |
temperature | double > 0 | A number that sets the randomness of the generated output. A lower temperature means a less random generations. Use lower numbers for tasks with a correct answer such as question answering or summarizing. High temperatures can generate hallucinations or factually incorrect information. Start with temperatures lower than 1.0 and increase the temperature for more creative outputs, as you regenerate the prompts to refine the outputs. Default is 1. |
presence-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on whether they’ve appeared in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Similar to frequency penalty, a penalty is applied to previously present tokens, except that this penalty is applied equally to all tokens that have already appeared, regardless of how many times they’ve appeared. Set to 0 to disable. Default is 0. |
stop | list of strings | List of strings that stop the generation if they are generated for the response text. The returned output will not contain the stop strings. |
max-tokens | integer > 1 | The maximum number of tokens that can be generated per output sequence. The token count of your prompt plus maxTokens must not exceed the model’s context length. Not setting a value for maxTokens results in the possible use of model’s full context length. |
frequency-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Set to 0 to disable. Default is 0. |
is-raw-prompting | boolean | When enabled, the user’s message will be sent to the model without any preprocessing. Default is false. |
citation-quality | enum | When FAST is selected, citations are generated at the same time as the text output and the request will be completed sooner. May result in less accurate citations. Default is ACCURATE. |
preamble-override | string | If specified, the default Cohere preamble is replaced with the provided preamble. A preamble is an initial guideline message that can change the model’s overall chat behavior and conversation style. Default preambles vary for different models. Example: You are a travel advisor. Answer with a pirate tone. |
max-input-tokens | int | The maximum number of input tokens to send to the model. If not specified, max_input_tokens is the model’s context length limit minus a small buffer. |
prompt-truncation | enum | Defaults to OFF. Dictates how the prompt will be constructed. With promptTruncation set to AUTO_PRESERVE_ORDER, some elements from chatHistory and documents will be dropped to construct a prompt that fits within the model’s context length limit. During this process the order of the documents and chat history will be preserved. With prompt_truncation set to OFF, no elements will be dropped. |
is-search-queries-only | boolean | When set to true, the response contains only a list of generated search queries without the search results and the model will not respond to the user’s message. |
OciGenAiCohereStreamingChatModel
To automatically create and add OciGenAiStreamingChatModel to the service registry add the following lines to application.yaml:
langchain4j:
providers:
oci-gen-ai-cohere:
compartment-id: "ocid1.tenancy.oc1...."
region: EU_FRANKFURT_1
models:
oci-genai-cohere-streaming-chat-model:
provider: oci-gen-ai-cohere
model-name: meta.llama-3.3-70b-instructIf enabled is set to false, the configuration is ignored, and the component is not created.
Full list of configuration properties:
| Key | Type | Description |
|---|---|---|
enabled | boolean | If set to false, the component will not be available even if configured. |
model-name | string | The model name or model’s OCID to use. |
compartment-id | OCID | OCI Compartment OCID |
region | enum | Explicit region. If not configured, the current one is resolved. |
auth-provider | injected bean | Injected is default bean if exist, named bean can be configured with |
gen-ai-client | injected bean | Manually configured OCI SDK GenAi client. When set, values provided with |
serving-type | enum | The model’s serving mode, which is either on-demand serving or dedicated serving. |
top-k | int | The maximum number of top-probability tokens to consider when generating text. |
top-p | double between 0 and 1 | If set to a probability 0.0 < p < 1.0, it ensures that only the most likely tokens, with total probability mass of p, are considered for generation at each step. |
seed | int | The seed for the random number generator used by the model. |
temperature | double > 0 | A number that sets the randomness of the generated output. A lower temperature means a less random generations. Use lower numbers for tasks with a correct answer such as question answering or summarizing. High temperatures can generate hallucinations or factually incorrect information. Start with temperatures lower than 1.0 and increase the temperature for more creative outputs, as you regenerate the prompts to refine the outputs. Default is 1. |
presence-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on whether they’ve appeared in the generated text so far. Values > 0 encourage the model to use new tokens and values < 0 encourage the model to repeat tokens. Similar to frequency penalty, a penalty is applied to previously present tokens, except that this penalty is applied equally to all tokens that have already appeared, regardless of how many times they’ve appeared. Set to 0 to disable. Default is 0. |
stop | list of strings | List of strings that stop the generation if they are generated for the response text. The returned output will not contain the stop strings. |
max-tokens | integer > 1 | The maximum number of tokens that can be generated per output sequence. The token count of your prompt plus maxTokens must not exceed the model’s context length. Not setting a value for maxTokens results in the possible use of model’s full context length. |
frequency-penalty | double between -2 and 2 | To reduce repetitiveness of generated tokens, this number penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, and values < 0 encourage the model to repeat tokens. Set to 0 to disable. Default is 0. |
is-raw-prompting | boolean | When enabled, the user’s message will be sent to the model without any preprocessing. Default is false. |
citation-quality | enum | When FAST is selected, citations are generated at the same time as the text output and the request will be completed sooner. May result in less accurate citations. Default is ACCURATE. |
preamble-override | string | If specified, the default Cohere preamble is replaced with the provided preamble. A preamble is an initial guideline message that can change the model’s overall chat behavior and conversation style. Default preambles vary for different models. Example: You are a travel advisor. Answer with a pirate tone. |
max-input-tokens | int | The maximum number of input tokens to send to the model. If not specified, max_input_tokens is the model’s context length limit minus a small buffer. |
prompt-truncation | enum | Defaults to OFF. Dictates how the prompt will be constructed. With promptTruncation set to AUTO_PRESERVE_ORDER, some elements from chatHistory and documents will be dropped to construct a prompt that fits within the model’s context length limit. During this process the order of the documents and chat history will be preserved. With prompt_truncation set to OFF, no elements will be dropped. |
is-search-queries-only | boolean | When set to true, the response contains only a list of generated search queries without the search results and the model will not respond to the user’s message. |