Skip to main content
GoRules AI requires an LLM provider to be configured on your BRMS instance. Your administrator sets this up via environment variables on the server.

Supported LLM providers

ProviderLLM_PROVIDER valueSupported models
OpenAIopenaiOpenAI models
Anthropic (Claude)anthropicAnthropic models
Google (Gemini)googleGemini models
Amazon Bedrockamazon-bedrockAnthropic models
Google Vertex AIgoogle-vertexGemini models
Azure OpenAIazure-openaiOpenAI models
Vertex AI and Azure currently support only their native model families. If you need cross-provider model support (e.g., Anthropic models on Vertex AI or Azure), please contact us — we are happy to add it based on customer demand.

Environment variables

VariableDescriptionDefault
LLM_PROVIDERLLM provider to useRequired
LLM_MODELModel name (e.g., gpt-5.4, claude-sonnet-4-6, claude-opus-4-6, gemini-3.1-pro-preview, eu.anthropic.claude-opus-4-6-v1)Required
LLM_API_KEYAPI key for the provider (not required for Amazon Bedrock and Vertex providers)Required
LLM_TEMPERATURESampling temperature (applies to Gemini/Google providers only)0.4
LLM_CONTEXT_WINDOWContext window size in tokensProvider default
LLM_MAX_OUTPUT_TOKENSMaximum tokens per response32000
LLM_THINKING_LEVELExtended thinking level: high or mediummedium
LLM_AZURE_RESOURCE_NAMEAzure OpenAI resource name (required for azure-openai)
LLM_GCP_PROJECTGCP project ID (required for google-vertex)
LLM_GCP_LOCATIONGCP region for Vertex AI (required for google-vertex, e.g. global)

Prompt caching

GoRules AI uses prompt caching to reduce token usage and improve response times. Caching behavior depends on the provider:
ProviderCaching
Anthropic (direct)cacheControl: ephemeral
Amazon Bedrock (Anthropic models)cachePoint on messages
OpenAIAutomatic (prefix caching)
Azure OpenAIAutomatic (prefix caching)
Gemini/Google (direct & Vertex AI)Automatic (implicit caching)
No additional configuration is required — caching is handled automatically for all supported providers. Prompt caching can reduce token costs by up to 90% in some cases, though actual savings depend on the provider, model, and usage patterns.
For self-hosted deployments, ensure your load balancer has response buffering disabled or streaming enabled for optimal AI assistant experience.

Next steps

Once configured, the AI assistant is available to all users on a plan with AI enabled. See AI assistant for usage details.