AI setup

GoRules AI requires an LLM provider to be configured on your BRMS instance. Your administrator sets this up via environment variables on the server.

Supported LLM providers

Provider	`LLM_PROVIDER` value	Supported models
OpenAI	`openai`	OpenAI models
Anthropic (Claude)	`anthropic`	Anthropic models
Google (Gemini)	`google`	Gemini models
Amazon Bedrock	`amazon-bedrock`	Anthropic models
Google Vertex AI	`google-vertex`	Gemini models
Azure OpenAI	`azure-openai`	OpenAI models

Vertex AI and Azure currently support only their native model families. If you need cross-provider model support (e.g., Anthropic models on Vertex AI or Azure), please contact us — we are happy to add it based on customer demand.

Environment variables

Variable	Description	Default
`LLM_PROVIDER`	LLM provider to use	Required
`LLM_MODEL`	Model name (e.g., `gpt-5.4`, `claude-sonnet-4-6`, `claude-opus-4-6`, `gemini-3.1-pro-preview`, `eu.anthropic.claude-opus-4-6-v1`)	Required
`LLM_API_KEY`	API key for the provider (not required for Amazon Bedrock and Vertex providers)	Required
`LLM_BASE_URL`	Custom base URL for OpenAI-compatible endpoints	—
`LLM_TEMPERATURE`	Sampling temperature (applies to Gemini/Google providers only)	`0.4`
`LLM_CONTEXT_WINDOW`	Context window size in tokens	Provider default
`LLM_MAX_OUTPUT_TOKENS`	Maximum tokens per response	`32000`
`LLM_THINKING_LEVEL`	Extended thinking level: `high` or `medium`	`medium`
`LLM_AZURE_RESOURCE_NAME`	Azure OpenAI resource name (required for `azure-openai`)	—
`LLM_GCP_PROJECT`	GCP project ID (required for `google-vertex`)	—
`LLM_GCP_LOCATION`	GCP region for Vertex AI (required for `google-vertex`, e.g. `global`)	—

Prompt caching

GoRules AI uses prompt caching to reduce token usage and improve response times. Caching behavior depends on the provider:

Provider	Caching
Anthropic (direct)	`cacheControl: ephemeral`
Amazon Bedrock (Anthropic models)	`cachePoint` on messages
OpenAI	Automatic (prefix caching)
Azure OpenAI	Automatic (prefix caching)
Gemini/Google (direct & Vertex AI)	Automatic (implicit caching)

No additional configuration is required — caching is handled automatically for all supported providers. Prompt caching can reduce token costs by up to 90% in some cases, though actual savings depend on the provider, model, and usage patterns.

For self-hosted deployments, ensure your load balancer has response buffering disabled or streaming enabled for optimal AI assistant experience.

Next steps

Once configured, the AI assistant is available to all users on a plan with AI enabled. See AI assistant for usage details.

Overview

AI & Tools

Deployment

Platform Guides

SDKs

Big Data

JDM

Supported LLM providers

Environment variables

Prompt caching

Next steps

Overview

AI & Tools

Deployment

Platform Guides

SDKs

Big Data

JDM

Documentation Index

​Supported LLM providers

​Environment variables

​Prompt caching

​Next steps

Supported LLM providers

Environment variables

Prompt caching

Next steps