Deploy the operator
Prerequisites
- A Kubernetes cluster (current and two previous minor versions are supported)
- Permissions to create resources in the cluster
kubectlconfigured to communicate with your cluster- Helm (v3.10 minimum, v3.14+ recommended)
Install the CRDs
The ToolHive operator requires Custom Resource Definitions (CRDs) to manage MCPServer resources. The CRDs define the structure and behavior of MCPServers in your cluster.
Choose an installation method based on your needs:
- Helm (recommended): Provides customization options and manages the full lifecycle of the operator. CRDs are installed and upgraded automatically as part of the Helm chart.
- kubectl: Uses static manifests for a simple installation. Useful for environments where Helm isn't available or for GitOps workflows.
- Helm
- kubectl
This command installs the latest version of the ToolHive operator CRDs Helm chart:
helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds \
-n toolhive-system --create-namespace
When you install this chart, Helm stamps all CRDs with a
meta.helm.sh/release-namespace annotation set to the namespace used at install
time and is fixed for that release. You must continue to use the same namespace
on all future helm upgrade commands for the CRDs. If you decide to specify a
different namespace, an error will occur due to ownership issues.
If you need to migrate to a different namespace, see the CRD namespace mismatch troubleshooting section.
To install a specific version, append --version <VERSION> to the command, for
example:
helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds \
-n toolhive-system --version 0.12.1
CRD configuration options
The Helm chart installs all CRDs by default. You can control CRD installation and uninstall behavior using these values:
| Value | Description | Default |
|---|---|---|
crds.install | Install the ToolHive CRDs | true |
crds.keep | Preserve CRDs when uninstalling the chart | true |
The crds.keep option adds the helm.sh/resource-policy: keep annotation to
CRDs, which prevents Helm from deleting them during helm uninstall. This
protects your custom resources from accidental deletion. If you want to remove
CRDs during uninstall, set crds.keep=false.
To install the CRDs using kubectl, run the following. The operator registers
all controllers, so apply the complete set of CRDs:
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_embeddingservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpgroups.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpoidcconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpregistries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpremoteproxies.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpserverentries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptoolconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpcompositetooldefinitions.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml
Replace v0.21.0 in the commands above with your target CRD version.
Install the operator
To install the ToolHive operator using default settings, run the following command:
helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace
This command installs the latest version of the ToolHive operator CRDs Helm
chart. To install a specific version, append --version <VERSION> to the
command, for example:
helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace --version 0.12.1
Verify the installation:
kubectl get pods -n toolhive-system
After about 30 seconds, you should see the toolhive-operator pod running.
Check the logs of the operator pod:
kubectl logs -f -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>
This shows you the logs of the operator pod, which can help you debug any issues. For comprehensive logging and audit capabilities, see the Logging infrastructure guide.
Customize the operator
You can customize the operator installation by providing a values.yaml file
with your configuration settings. For example, to change the number of replicas
and set a specific ToolHive version, create a values.yaml file:
operator:
replicaCount: 2
toolhiveRunnerImage: ghcr.io/stacklok/toolhive:v0.2.17 # or `latest`
Install the operator with your custom values:
helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator\
-n toolhive-system --create-namespace\
-f values.yaml
To see all available configuration options, run:
helm show values oci://ghcr.io/stacklok/toolhive/toolhive-operator
Pull workload images from a private registry
If your MCP server, proxy runner, vMCP, registry API, or embedding server images
live in a private container registry, use operator.defaultImagePullSecrets to
apply pull credentials to every workload the operator spawns:
operator:
defaultImagePullSecrets:
- regcred
- name: backup-regcred
Each entry is the name of a Kubernetes Secret of type
kubernetes.io/dockerconfigjson (or another image-pull-secret type) that must
exist in the namespace where each workload is created. Plain string and object
forms are equivalent.
The secrets propagate to the pod spec and the operator-managed ServiceAccount of
every workload-spawning controller: MCPServer, MCPRemoteProxy,
MCPRegistry, VirtualMCPServer, and EmbeddingServer. Chart-level entries
are appended to any per-CR imagePullSecrets already configured on the
resource. Common per-CR fields are:
spec.imagePullSecretsonMCPRegistryandVirtualMCPServerspec.resourceOverrides.proxyDeployment.imagePullSecretsonMCPServerandMCPRemoteProxy
Per-CR entries take precedence when a secret name appears in both places.
The chart also exposes operator.imagePullSecrets, which controls only the
operator's own pod. Use it when the operator image itself is in a private
registry; use defaultImagePullSecrets for the workloads the operator manages.
Scale the operator with autoscaling
The operator runs a single replica by default. For high availability, run more
than one replica: set a fixed count with operator.replicaCount, or enable a
HorizontalPodAutoscaler (HPA) to adjust the count automatically based on
resource utilization.
Autoscaling is disabled by default. To enable it, configure the
operator.autoscaling values:
operator:
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
When autoscaling.enabled is true, the chart creates a
HorizontalPodAutoscaler and stops setting a static replica count, so the HPA
takes full control of the replica range. Memory-based scaling is off unless you
set targetMemoryUtilizationPercentage.
The operator uses leader election, so only one replica is active at a time; the
others run as warm standbys that take over if the leader fails. Extra replicas
improve failover, not reconciliation throughput. For high availability, keep
minReplicas (or operator.replicaCount) at 2 or more so a standby is always
ready.
The HPA requires the Kubernetes metrics server to be installed in your cluster so it can read CPU and memory usage. Many managed Kubernetes distributions include it by default.
Tune operator resources
The operator ships with conservative resource requests and limits suitable for most clusters. The defaults are:
operator:
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 10m
memory: 64Mi
If the operator manages a large number of resources, you can raise these values.
The chart also exposes Go runtime tuning through the operator.gc values, which
set the GOMEMLIMIT and GOGC environment variables on the operator container:
operator:
gc:
gomemlimit: 110MiB
gogc: 75
gomemlimit is a soft memory ceiling that helps the Go runtime avoid hitting
the container memory limit, and gogc controls how aggressively garbage
collection runs (a lower value collects more often and uses less memory). Keep
gomemlimit below the container memory limit, and raise it if you raise the
memory limit. The defaults work well for most deployments.
Run on OpenShift
The operator runs on OpenShift with no special configuration. At startup, it
detects whether it's running on OpenShift (by looking for the
route.openshift.io API), and the workloads it creates use security contexts
that satisfy the default restricted Security Context Constraints (SCCs).
On OpenShift, the operator omits hardcoded user and group IDs from the workloads it creates so the platform can assign them dynamically from the namespace's allowed range, and it applies the required seccomp profile and drops all Linux capabilities. Install the operator using the standard commands above; no OpenShift-specific values are required.
Operator deployment modes
The ToolHive operator supports two distinct deployment modes to accommodate different security requirements and organizational structures.
Cluster mode (default)
Cluster mode provides the operator with cluster-wide access to manage MCPServer resources in any namespace. This is the default mode and is suitable for platform teams managing MCPServers across the entire cluster.
Characteristics:
- Full cluster-wide access to manage MCPServers in any namespace
- Uses
ClusterRoleandClusterRoleBindingfor broad permissions - Simplest configuration and management
- Best for single-tenant clusters or trusted environments
To explicitly configure cluster mode, include the following property in your
Helm values.yaml file:
operator:
rbac:
scope: 'cluster'
Reference the values.yaml file when you install the operator using Helm:
helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
-n toolhive-system --create-namespace
-f values.yaml
This is the default configuration used in the standard installation commands.
Namespace mode
Namespace mode restricts the operator's access to only specified namespaces. This mode is perfect for multi-tenant environments and organizations following the principle of least privilege.
Characteristics:
- Restricted access to only specified namespaces
- Uses
ClusterRolewith namespace-specificRoleBindingsfor precise access control - Enhanced security through reduced blast radius
- Ideal for multi-tenant environments and compliance requirements
To configure namespace mode, include the following in your Helm values.yaml:
operator:
rbac:
scope: 'namespace'
allowedNamespaces:
- 'team-frontend'
- 'team-backend'
- 'staging'
- 'production'
This example lets the operator manage MCPServer resources in the four namespaces
listed in the allowedNamespaces property. Adjust the list to match your
environment.
Reference the values.yaml file when you install the operator using Helm:
helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
-n toolhive-system --create-namespace
-f values.yaml
Verify the RoleBindings are created:
kubectl get rolebinding --all-namespaces | grep toolhive
You should see RoleBindings in the specified namespaces, granting the operator access to manage MCPServers. Example output:
NAMESPACE NAME ROLE
team-frontend toolhive-operator-manager-rolebinding ClusterRole/toolhive-operator-manager-role
team-backend toolhive-operator-manager-rolebinding ClusterRole/toolhive-operator-manager-role
staging toolhive-operator-manager-rolebinding ClusterRole/toolhive-operator-manager-role
production toolhive-operator-manager-rolebinding ClusterRole/toolhive-operator-manager-role
toolhive-system toolhive-operator-leader-election-rolebinding Role/toolhive-operator-leader-election-role
Migrate between modes
You can switch between cluster mode and namespace mode by updating the
values.yaml file and reapplying the Helm chart as shown above. Migration in
both directions is supported.
Check operator status
To verify the operator is working correctly:
# Verify CRDs are installed
kubectl get crd | grep toolhive
# Check operator deployment status
kubectl get deployment -n toolhive-system toolhive-operator
# Check operator service account and RBAC
kubectl get serviceaccount -n toolhive-system
kubectl get clusterrole | grep toolhive
kubectl get clusterrolebinding | grep toolhive
# Check operator pod status
kubectl get pods -n toolhive-system
# Check operator pod logs
kubectl logs -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>
Upgrade the operator
To upgrade the ToolHive operator to a new version, you need to upgrade both the CRDs and the operator installation.
Upgrade the CRDs
Choose an upgrade method based on your needs:
- Helm (recommended): Provides customization options and manages the full lifecycle of the operator. CRDs are installed and upgraded automatically as part of the Helm chart.
- kubectl: Uses static manifests for a simple installation. Useful for environments where Helm isn't available or for GitOps workflows.
- Helm
- kubectl
To upgrade the ToolHive operator to a new version, upgrade the CRDs first by upgrading with the desired CRDs chart:
helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.12.1
To upgrade the CRDs using kubectl, run the following. The operator registers
all controllers, so apply the complete set of CRDs:
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_embeddingservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpgroups.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpoidcconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpregistries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpremoteproxies.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpserverentries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptoolconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpcompositetooldefinitions.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml
Replace v0.21.0 in the commands above with your target CRD version.
Upgrade the operator Helm release
Then, upgrade the operator installation using Helm.
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system
This upgrades the operator to the latest version available in the OCI registry.
To upgrade to a specific version, add the --version flag:
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --version 0.12.1
If you have a custom values.yaml file, include it with the -f flag:
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system -f values.yaml
Uninstall the operator
To uninstall the operator and CRDs:
First, uninstall the operator:
helm uninstall toolhive-operator -n toolhive-system
Then, if you want to completely remove ToolHive including all CRDs and related resources, delete the CRDs.
This will delete all MCPServer and related resources in your cluster!
- Helm
- kubectl
helm uninstall toolhive-operator-crds
If you installed the CRDs with Helm and have crds.keep still set to true,
first upgrade the chart with --set crds.keep=false so that when you uninstall
the CRDs chart, it completely removes all CRDs too:
helm upgrade toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --set crds.keep=false
To remove the CRDs using kubectl, run the following:
kubectl delete crd embeddingservers.toolhive.stacklok.dev
kubectl delete crd mcpexternalauthconfigs.toolhive.stacklok.dev
kubectl delete crd mcpgroups.toolhive.stacklok.dev
kubectl delete crd mcpoidcconfigs.toolhive.stacklok.dev
kubectl delete crd mcpregistries.toolhive.stacklok.dev
kubectl delete crd mcpremoteproxies.toolhive.stacklok.dev
kubectl delete crd mcpserverentries.toolhive.stacklok.dev
kubectl delete crd mcpservers.toolhive.stacklok.dev
kubectl delete crd mcptelemetryconfigs.toolhive.stacklok.dev
kubectl delete crd mcptoolconfigs.toolhive.stacklok.dev
kubectl delete crd virtualmcpcompositetooldefinitions.toolhive.stacklok.dev
kubectl delete crd virtualmcpservers.toolhive.stacklok.dev
If you created the toolhive-system namespace with Helm's --create-namespace
flag, delete it manually:
kubectl delete namespace toolhive-system
Next steps
- Run MCP servers in Kubernetes to create and manage MCP servers using the ToolHive operator
- Configure authentication before exposing servers externally
Related information
- Kubernetes introduction - Overview of ToolHive's Kubernetes integration
- ToolHive operator tutorial - Step-by-step tutorial for getting started using a local kind cluster
Troubleshooting
Authentication error with ghcr.io
If you encounter an authentication error when pulling the Helm chart, it might
indicate a problem with your access to the GitHub Container Registry
(ghcr.io).
ToolHive's charts and images are public, but if you've previously logged into
ghcr.io using a personal access token, you might need to re-authenticate if
your token has expired or been revoked.
See the GitHub documentation to re-authenticate to the registry.
Operator pod fails to start
If the operator pod is not starting or is in a CrashLoopBackOff state, check
the pod logs for error messages:
kubectl get pods -n toolhive-system
# Note the name of the toolhive-operator pod
kubectl describe pod -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>
kubectl logs -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>
Common causes:
-
Missing CRDs: The operator fails to start if the CRDs aren't installed. Confirm they're present:
kubectl api-resources --api-group=toolhive.stacklok.devIf the list is empty, install the CRDs as described in Install the CRDs.
-
Image pull failure:
kubectl describe podshowsImagePullBackOfforErrImagePull. Verify the cluster has egress toghcr.ioand that any customoperator.imagevalue in yourvalues.yamlis correct. -
Invalid
values.yaml: The pod logs show a startup error referencing a specific field. Compare your file againsthelm show values oci://ghcr.io/stacklok/toolhive/toolhive-operator.
CRD upgrade fails with namespace mismatch
If you see an error like the following when upgrading the CRD chart:
Error: invalid ownership metadata; annotation validation error:
key "meta.helm.sh/release-namespace" must equal "toolhive-system":
current value is "default"
This means the CRD chart was originally installed in a different namespace than
the one you're now targeting. To fix this, patch the
meta.helm.sh/release-namespace annotation on all CRDs to match your desired
namespace:
for crd in $(kubectl get crd -o name | grep toolhive.stacklok.dev); do
kubectl annotate "$crd" \
meta.helm.sh/release-namespace=<TARGET_NAMESPACE> --overwrite
done
Replace <TARGET_NAMESPACE> with the namespace you want to use going forward
(for example, toolhive-system). This is a one-time operation. After patching,
future upgrades work as long as you use the same namespace consistently.
CRDs installation fails
If the CRDs installation fails, you might see errors about existing resources or permission issues:
# Check if CRDs already exist
kubectl get crd | grep toolhive
# Remove existing CRDs if needed (this will delete all related resources)
kubectl delete crd <CRD_NAME>
To reinstall the CRDs:
helm uninstall toolhive-operator-crds
helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds
Namespace creation issues
If you encounter permission errors when creating the toolhive-system
namespace, create it manually first:
kubectl create namespace toolhive-system
Then install the operator without the --create-namespace flag:
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system
Helm chart not found
If Helm cannot find the chart, ensure you're using the correct OCI registry URL and that your Helm version supports OCI registries (v3.8.0+):
# Check Helm version
helm version
# Try pulling the chart explicitly
helm pull oci://ghcr.io/stacklok/toolhive/toolhive-operator
Network connectivity issues
If you're experiencing network timeouts or connection issues:
- Verify your cluster has internet access to reach
ghcr.io - Check if your organization uses a proxy or firewall that might block access
- Consider using a private registry mirror if direct access is restricted