Supported LLM providers
| Provider | LLM_PROVIDER value | Supported models |
|---|---|---|
| OpenAI | openai | OpenAI models |
| Anthropic (Claude) | anthropic | Anthropic models |
| Google (Gemini) | google | Gemini models |
| Amazon Bedrock | amazon-bedrock | Anthropic models |
| Google Vertex AI | google-vertex | Gemini models |
| Azure OpenAI | azure-openai | OpenAI models |
Vertex AI and Azure currently support only their native model families. If you need cross-provider model support (e.g., Anthropic models on Vertex AI or Azure), please contact us — we are happy to add it based on customer demand.
Environment variables
| Variable | Description | Default |
|---|---|---|
LLM_PROVIDER | LLM provider to use | Required |
LLM_MODEL | Model name (e.g., gpt-5.4, claude-sonnet-4-6, claude-opus-4-6, gemini-3.1-pro-preview, eu.anthropic.claude-opus-4-6-v1) | Required |
LLM_API_KEY | API key for the provider (not required for Amazon Bedrock and Vertex providers) | Required |
LLM_BASE_URL | Custom base URL for OpenAI-compatible endpoints | — |
LLM_TEMPERATURE | Sampling temperature (applies to Gemini/Google providers only) | 0.4 |
LLM_CONTEXT_WINDOW | Context window size in tokens | Provider default |
LLM_MAX_OUTPUT_TOKENS | Maximum tokens per response | 32000 |
LLM_THINKING_LEVEL | Extended thinking level: high or medium | medium |
LLM_AZURE_RESOURCE_NAME | Azure OpenAI resource name (required for azure-openai) | — |
LLM_GCP_PROJECT | GCP project ID (required for google-vertex) | — |
LLM_GCP_LOCATION | GCP region for Vertex AI (required for google-vertex, e.g. global) | — |
Prompt caching
GoRules AI uses prompt caching to reduce token usage and improve response times. Caching behavior depends on the provider:| Provider | Caching |
|---|---|
| Anthropic (direct) | cacheControl: ephemeral |
| Amazon Bedrock (Anthropic models) | cachePoint on messages |
| OpenAI | Automatic (prefix caching) |
| Azure OpenAI | Automatic (prefix caching) |
| Gemini/Google (direct & Vertex AI) | Automatic (implicit caching) |
For self-hosted deployments, ensure your load balancer has response buffering disabled or streaming enabled for optimal AI assistant experience.