Deploy the operator

Prerequisites

A Kubernetes cluster (current and two previous minor versions are supported)
Permissions to create resources in the cluster
kubectl configured to communicate with your cluster
Helm (v3.10 minimum, v3.14+ recommended)

Install the CRDs

The ToolHive operator requires Custom Resource Definitions (CRDs) to manage MCPServer resources. The CRDs define the structure and behavior of MCPServers in your cluster.

Choose an installation method based on your needs:

Helm (recommended): Provides customization options and manages the full lifecycle of the operator. CRDs are installed and upgraded automatically as part of the Helm chart.
kubectl: Uses static manifests for a simple installation. Useful for environments where Helm isn't available or for GitOps workflows.

Helm
kubectl

This command installs the latest version of the ToolHive operator CRDs Helm chart:

helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds \
  -n toolhive-system --create-namespace

Namespace consistency

When you install this chart, Helm stamps all CRDs with a meta.helm.sh/release-namespace annotation set to the namespace used at install time and is fixed for that release. You must continue to use the same namespace on all future helm upgrade commands for the CRDs. If you decide to specify a different namespace, an error will occur due to ownership issues.

If you need to migrate to a different namespace, see the CRD namespace mismatch troubleshooting section.

To install a specific version, append --version <VERSION> to the command, for example:

helm upgrade --install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds \
  -n toolhive-system --version 0.12.1

CRD configuration options

The Helm chart installs all CRDs by default. You can control CRD installation and uninstall behavior using these values:

Value	Description	Default
`crds.install`	Install the ToolHive CRDs	`true`
`crds.keep`	Preserve CRDs when uninstalling the chart	`true`

note

The crds.keep option adds the helm.sh/resource-policy: keep annotation to CRDs, which prevents Helm from deleting them during helm uninstall. This protects your custom resources from accidental deletion. If you want to remove CRDs during uninstall, set crds.keep=false.

To install the CRDs using kubectl, run the following. The operator registers all controllers, so apply the complete set of CRDs:

kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_embeddingservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpgroups.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpoidcconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpregistries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpremoteproxies.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpserverentries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptoolconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpcompositetooldefinitions.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml

Replace v0.21.0 in the commands above with your target CRD version.

Install the operator

To install the ToolHive operator using default settings, run the following command:

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace

This command installs the latest version of the ToolHive operator CRDs Helm chart. To install a specific version, append --version <VERSION> to the command, for example:

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --create-namespace --version 0.12.1

Verify the installation:

kubectl get pods -n toolhive-system

After about 30 seconds, you should see the toolhive-operator pod running.

Check the logs of the operator pod:

kubectl logs -f -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>

This shows you the logs of the operator pod, which can help you debug any issues. For comprehensive logging and audit capabilities, see the Logging infrastructure guide.

Customize the operator

You can customize the operator installation by providing a values.yaml file with your configuration settings. For example, to change the number of replicas and set a specific ToolHive version, create a values.yaml file:

values.yaml
operator:
  replicaCount: 2
  toolhiveRunnerImage: ghcr.io/stacklok/toolhive:v0.2.17 # or `latest`

Install the operator with your custom values:

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator\
  -n toolhive-system --create-namespace\
  -f values.yaml

To see all available configuration options, run:

helm show values oci://ghcr.io/stacklok/toolhive/toolhive-operator

Pull workload images from a private registry

If your MCP server, proxy runner, vMCP, registry API, or embedding server images live in a private container registry, use operator.defaultImagePullSecrets to apply pull credentials to every workload the operator spawns:

values.yaml
operator:
  defaultImagePullSecrets:
    - regcred
    - name: backup-regcred

Each entry is the name of a Kubernetes Secret of type kubernetes.io/dockerconfigjson (or another image-pull-secret type) that must exist in the namespace where each workload is created. Plain string and object forms are equivalent.

The secrets propagate to the pod spec and the operator-managed ServiceAccount of every workload-spawning controller: MCPServer, MCPRemoteProxy, MCPRegistry, VirtualMCPServer, and EmbeddingServer. Chart-level entries are appended to any per-CR imagePullSecrets already configured on the resource. Common per-CR fields are:

spec.imagePullSecrets on MCPRegistry and VirtualMCPServer
spec.resourceOverrides.proxyDeployment.imagePullSecrets on MCPServer and MCPRemoteProxy

Per-CR entries take precedence when a secret name appears in both places.

The chart also exposes operator.imagePullSecrets, which controls only the operator's own pod. Use it when the operator image itself is in a private registry; use defaultImagePullSecrets for the workloads the operator manages.

Scale the operator with autoscaling

The operator runs a single replica by default. For high availability, run more than one replica: set a fixed count with operator.replicaCount, or enable a HorizontalPodAutoscaler (HPA) to adjust the count automatically based on resource utilization.

Autoscaling is disabled by default. To enable it, configure the operator.autoscaling values:

values.yaml
operator:
  autoscaling:
    enabled: true
    minReplicas: 1
    maxReplicas: 100
    targetCPUUtilizationPercentage: 80
    # targetMemoryUtilizationPercentage: 80

When autoscaling.enabled is true, the chart creates a HorizontalPodAutoscaler and stops setting a static replica count, so the HPA takes full control of the replica range. Memory-based scaling is off unless you set targetMemoryUtilizationPercentage.

note

The operator uses leader election, so only one replica is active at a time; the others run as warm standbys that take over if the leader fails. Extra replicas improve failover, not reconciliation throughput. For high availability, keep minReplicas (or operator.replicaCount) at 2 or more so a standby is always ready.

note

The HPA requires the Kubernetes metrics server to be installed in your cluster so it can read CPU and memory usage. Many managed Kubernetes distributions include it by default.

Tune operator resources

The operator ships with conservative resource requests and limits suitable for most clusters. The defaults are:

values.yaml
operator:
  resources:
    limits:
      cpu: 500m
      memory: 128Mi
    requests:
      cpu: 10m
      memory: 64Mi

If the operator manages a large number of resources, you can raise these values. The chart also exposes Go runtime tuning through the operator.gc values, which set the GOMEMLIMIT and GOGC environment variables on the operator container:

values.yaml
operator:
  gc:
    gomemlimit: 110MiB
    gogc: 75

gomemlimit is a soft memory ceiling that helps the Go runtime avoid hitting the container memory limit, and gogc controls how aggressively garbage collection runs (a lower value collects more often and uses less memory). Keep gomemlimit below the container memory limit, and raise it if you raise the memory limit. The defaults work well for most deployments.

Run on OpenShift

The operator runs on OpenShift with no special configuration. At startup, it detects whether it's running on OpenShift (by looking for the route.openshift.io API), and the workloads it creates use security contexts that satisfy the default restricted Security Context Constraints (SCCs).

On OpenShift, the operator omits hardcoded user and group IDs from the workloads it creates so the platform can assign them dynamically from the namespace's allowed range, and it applies the required seccomp profile and drops all Linux capabilities. Install the operator using the standard commands above; no OpenShift-specific values are required.

Operator deployment modes

The ToolHive operator supports two distinct deployment modes to accommodate different security requirements and organizational structures.

Cluster mode (default)

Cluster mode provides the operator with cluster-wide access to manage MCPServer resources in any namespace. This is the default mode and is suitable for platform teams managing MCPServers across the entire cluster.

Characteristics:

Full cluster-wide access to manage MCPServers in any namespace
Uses ClusterRole and ClusterRoleBinding for broad permissions
Simplest configuration and management
Best for single-tenant clusters or trusted environments

To explicitly configure cluster mode, include the following property in your Helm values.yaml file:

values.yaml
operator:
  rbac:
    scope: 'cluster'

Reference the values.yaml file when you install the operator using Helm:

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  -n toolhive-system --create-namespace
  -f values.yaml

This is the default configuration used in the standard installation commands.

Namespace mode

Namespace mode restricts the operator's access to only specified namespaces. This mode is perfect for multi-tenant environments and organizations following the principle of least privilege.

Characteristics:

Restricted access to only specified namespaces
Uses ClusterRole with namespace-specific RoleBindings for precise access control
Enhanced security through reduced blast radius
Ideal for multi-tenant environments and compliance requirements

To configure namespace mode, include the following in your Helm values.yaml:

values.yaml
operator:
  rbac:
    scope: 'namespace'
    allowedNamespaces:
      - 'team-frontend'
      - 'team-backend'
      - 'staging'
      - 'production'

This example lets the operator manage MCPServer resources in the four namespaces listed in the allowedNamespaces property. Adjust the list to match your environment.

Reference the values.yaml file when you install the operator using Helm:

helm upgrade --install toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator \
  -n toolhive-system --create-namespace
  -f values.yaml

Verify the RoleBindings are created:

kubectl get rolebinding --all-namespaces | grep toolhive

You should see RoleBindings in the specified namespaces, granting the operator access to manage MCPServers. Example output:

NAMESPACE        NAME                                           ROLE
team-frontend    toolhive-operator-manager-rolebinding          ClusterRole/toolhive-operator-manager-role
team-backend     toolhive-operator-manager-rolebinding          ClusterRole/toolhive-operator-manager-role
staging          toolhive-operator-manager-rolebinding          ClusterRole/toolhive-operator-manager-role
production       toolhive-operator-manager-rolebinding          ClusterRole/toolhive-operator-manager-role
toolhive-system  toolhive-operator-leader-election-rolebinding  Role/toolhive-operator-leader-election-role

Migrate between modes

You can switch between cluster mode and namespace mode by updating the values.yaml file and reapplying the Helm chart as shown above. Migration in both directions is supported.

Check operator status

To verify the operator is working correctly:

# Verify CRDs are installed
kubectl get crd | grep toolhive

# Check operator deployment status
kubectl get deployment -n toolhive-system toolhive-operator

# Check operator service account and RBAC
kubectl get serviceaccount -n toolhive-system
kubectl get clusterrole | grep toolhive
kubectl get clusterrolebinding | grep toolhive

# Check operator pod status
kubectl get pods -n toolhive-system
# Check operator pod logs
kubectl logs -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>

Upgrade the operator

To upgrade the ToolHive operator to a new version, you need to upgrade both the CRDs and the operator installation.

Upgrade the CRDs

Choose an upgrade method based on your needs:

Helm (recommended): Provides customization options and manages the full lifecycle of the operator. CRDs are installed and upgraded automatically as part of the Helm chart.
kubectl: Uses static manifests for a simple installation. Useful for environments where Helm isn't available or for GitOps workflows.

Helm
kubectl

To upgrade the ToolHive operator to a new version, upgrade the CRDs first by upgrading with the desired CRDs chart:

helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.12.1

To upgrade the CRDs using kubectl, run the following. The operator registers all controllers, so apply the complete set of CRDs:

kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_embeddingservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpgroups.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpoidcconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpregistries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpremoteproxies.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpserverentries.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptoolconfigs.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpcompositetooldefinitions.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/v0.21.0/deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml

Replace v0.21.0 in the commands above with your target CRD version.

Upgrade the operator Helm release

Then, upgrade the operator installation using Helm.

helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system

This upgrades the operator to the latest version available in the OCI registry. To upgrade to a specific version, add the --version flag:

helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system --version 0.12.1

If you have a custom values.yaml file, include it with the -f flag:

helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system -f values.yaml

Uninstall the operator

To uninstall the operator and CRDs:

First, uninstall the operator:

helm uninstall toolhive-operator -n toolhive-system

Then, if you want to completely remove ToolHive including all CRDs and related resources, delete the CRDs.

warning

This will delete all MCPServer and related resources in your cluster!

Helm
kubectl

helm uninstall toolhive-operator-crds

note

If you installed the CRDs with Helm and have crds.keep still set to true, first upgrade the chart with --set crds.keep=false so that when you uninstall the CRDs chart, it completely removes all CRDs too:

helm upgrade toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --set crds.keep=false

To remove the CRDs using kubectl, run the following:

kubectl delete crd embeddingservers.toolhive.stacklok.dev
kubectl delete crd mcpexternalauthconfigs.toolhive.stacklok.dev
kubectl delete crd mcpgroups.toolhive.stacklok.dev
kubectl delete crd mcpoidcconfigs.toolhive.stacklok.dev
kubectl delete crd mcpregistries.toolhive.stacklok.dev
kubectl delete crd mcpremoteproxies.toolhive.stacklok.dev
kubectl delete crd mcpserverentries.toolhive.stacklok.dev
kubectl delete crd mcpservers.toolhive.stacklok.dev
kubectl delete crd mcptelemetryconfigs.toolhive.stacklok.dev
kubectl delete crd mcptoolconfigs.toolhive.stacklok.dev
kubectl delete crd virtualmcpcompositetooldefinitions.toolhive.stacklok.dev
kubectl delete crd virtualmcpservers.toolhive.stacklok.dev

If you created the toolhive-system namespace with Helm's --create-namespace flag, delete it manually:

kubectl delete namespace toolhive-system

Next steps

Run MCP servers in Kubernetes to create and manage MCP servers using the ToolHive operator
Configure authentication before exposing servers externally

Kubernetes introduction - Overview of ToolHive's Kubernetes integration
ToolHive operator tutorial - Step-by-step tutorial for getting started using a local kind cluster

Troubleshooting

Authentication error with ghcr.io

If you encounter an authentication error when pulling the Helm chart, it might indicate a problem with your access to the GitHub Container Registry (ghcr.io).

ToolHive's charts and images are public, but if you've previously logged into ghcr.io using a personal access token, you might need to re-authenticate if your token has expired or been revoked.

See the GitHub documentation to re-authenticate to the registry.

Operator pod fails to start

If the operator pod is not starting or is in a CrashLoopBackOff state, check the pod logs for error messages:

kubectl get pods -n toolhive-system
# Note the name of the toolhive-operator pod

kubectl describe pod -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>
kubectl logs -n toolhive-system <TOOLHIVE_OPERATOR_POD_NAME>

Common causes:

Missing CRDs: The operator fails to start if the CRDs aren't installed. Confirm they're present:
```
kubectl api-resources --api-group=toolhive.stacklok.dev
```
If the list is empty, install the CRDs as described in Install the CRDs.
Image pull failure: kubectl describe pod shows ImagePullBackOff or ErrImagePull. Verify the cluster has egress to ghcr.io and that any custom operator.image value in your values.yaml is correct.
Invalid values.yaml: The pod logs show a startup error referencing a specific field. Compare your file against helm show values oci://ghcr.io/stacklok/toolhive/toolhive-operator.

CRD upgrade fails with namespace mismatch

If you see an error like the following when upgrading the CRD chart:

Error: invalid ownership metadata; annotation validation error:
key "meta.helm.sh/release-namespace" must equal "toolhive-system":
current value is "default"

This means the CRD chart was originally installed in a different namespace than the one you're now targeting. To fix this, patch the meta.helm.sh/release-namespace annotation on all CRDs to match your desired namespace:

for crd in $(kubectl get crd -o name | grep toolhive.stacklok.dev); do
  kubectl annotate "$crd" \
    meta.helm.sh/release-namespace=<TARGET_NAMESPACE> --overwrite
done

Replace <TARGET_NAMESPACE> with the namespace you want to use going forward (for example, toolhive-system). This is a one-time operation. After patching, future upgrades work as long as you use the same namespace consistently.

CRDs installation fails

If the CRDs installation fails, you might see errors about existing resources or permission issues:

# Check if CRDs already exist
kubectl get crd | grep toolhive

# Remove existing CRDs if needed (this will delete all related resources)
kubectl delete crd <CRD_NAME>

To reinstall the CRDs:

helm uninstall toolhive-operator-crds
helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds

Namespace creation issues

If you encounter permission errors when creating the toolhive-system namespace, create it manually first:

kubectl create namespace toolhive-system

Then install the operator without the --create-namespace flag:

helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator -n toolhive-system

Helm chart not found

If Helm cannot find the chart, ensure you're using the correct OCI registry URL and that your Helm version supports OCI registries (v3.8.0+):

# Check Helm version
helm version

# Try pulling the chart explicitly
helm pull oci://ghcr.io/stacklok/toolhive/toolhive-operator

Network connectivity issues

If you're experiencing network timeouts or connection issues:

Verify your cluster has internet access to reach ghcr.io
Check if your organization uses a proxy or firewall that might block access
Consider using a private registry mirror if direct access is restricted

Prerequisites​

Install the CRDs​

CRD configuration options​

Install the operator​

Customize the operator​

Pull workload images from a private registry​

Scale the operator with autoscaling​

Tune operator resources​

Run on OpenShift​

Operator deployment modes​

Cluster mode (default)​

Namespace mode​

Migrate between modes​

Check operator status​

Upgrade the operator​

Upgrade the CRDs​

Upgrade the operator Helm release​

Uninstall the operator​

Next steps​

Related information​

Troubleshooting​

Prerequisites

Install the CRDs

CRD configuration options

Install the operator

Customize the operator

Pull workload images from a private registry

Scale the operator with autoscaling

Tune operator resources

Run on OpenShift

Operator deployment modes

Cluster mode (default)

Namespace mode

Migrate between modes

Check operator status

Upgrade the operator

Upgrade the CRDs

Upgrade the operator Helm release

Uninstall the operator

Next steps

Related information

Troubleshooting