GoRules AI requires an LLM provider to be configured on your BRMS instance. Your administrator sets this up via environment variables on the server.
Supported LLM providers
| Provider | LLM_PROVIDER value |
|---|
| OpenAI | openai |
| Anthropic (Claude) | anthropic |
| Google (Gemini) | google |
| Amazon Bedrock | amazon-bedrock |
| Azure OpenAI | azure-openai |
Environment variables
| Variable | Description | Default |
|---|
LLM_PROVIDER | LLM provider to use | Required |
LLM_MODEL | Model name (e.g., gpt-5.4, claude-sonnet-4-6, claude-opus-4-6, gemini-3.1-pro-preview, eu.anthropic.claude-opus-4-6-v1) | Required |
LLM_API_KEY | API key for the provider (not required for amazon-bedrock) | Required |
LLM_TEMPERATURE | Sampling temperature | 0.4 |
LLM_CONTEXT_WINDOW | Context window size in tokens | Provider default |
LLM_MAX_OUTPUT_TOKENS | Maximum tokens per response | 32000 |
LLM_THINKING_LEVEL | Extended thinking level: high, medium, low | medium |
LLM_AZURE_RESOURCE_NAME | Azure OpenAI resource name (required for azure-openai) | — |
Prompt caching
GoRules AI uses prompt caching to reduce token usage and improve response times. Caching behavior depends on the provider:
| Provider | Caching |
|---|
| Anthropic (direct) | cacheControl: ephemeral |
| Amazon Bedrock (Anthropic models) | cachePoint on messages |
| OpenAI | Automatic (prefix caching) |
| Azure OpenAI | Automatic (prefix caching) |
| Gemini/Google | Automatic (implicit caching) |
No additional configuration is required — caching is handled automatically for all supported providers. Prompt caching can reduce token costs by up to 90% in some cases, though actual savings depend on the provider, model, and usage patterns.
For self-hosted deployments, ensure your load balancer has response buffering disabled or streaming enabled for optimal AI assistant experience.
Next steps
Once configured, the AI assistant is available to all users on a plan with AI enabled. See AI assistant for usage details.