Some best practices for running Istio in production

Gokhan Karadas
5 min readJul 21, 2021

--

Istio isn’t easy. There are a lot of options and best practices to manage Istio in the production environment. This article contains some learnings and useful information for system and developer maintainers. Let’s start with data plane and control plane configurations.

1- Control configuration sharing across namespaces

You can define multiple virtual services and destination rules in one namespace and then reuse them in other namespaces. Istio exports all traffic management rules to all namespaces by default, but you can control and tune with the exportTo field.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: transaction-log
namespace: seller
spec:
gateways:
- istio-system/secure-gateway
hosts:
- selleradstransactionlogapi.trendyol.com
http:
- match:
- uri:
prefix: /
route:
- destination:
host: transaction-log
timeout: 1.000s
exportTo:
- "."

2- Manage Envoyfilter workloadSelector scope

EnvoyFilter CRD provides a patching mechanism for customizing Envoy configuration. EnvoyFilter has a workloadSelector field that is used to select the specific set of pods/VMs on which this patch configuration should be applied.

If omitted, the set of patches in this configuration will be applied to all workload instances in the same namespace. If omitted, the EnvoyFilter patches will be applied to all workloads in the same namespace. I suggest you select specific pods/VMs to reduce overhead. Envoy is really sensitive cache configuration dump in memory.

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: cluster-idletimeout
namespace: istio-system
spec:
configPatches:
- applyTo: NETWORK_FILTER
match:
context: SIDECAR_OUTBOUND
listener:
filterChain:
filter:
name: envoy.filters.network.http_connection_manager
patch:
operation: MERGE
value:
typed_config:
'@type': >-
type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
common_http_protocol_options:
idle_timeout: 10s

The above configuration is applied to all sidecar outbound in a sidecar.

3- Always use Sidecar Resource

I told you Istio isn’t easy. Without any tuning Istio data plane uses nearly 200MB of course it depends on your mesh size and configuration. It can be tedious to fight this kind of problem. Most people can say we can change our mesh tool to save our budget quota. This solution isn’t an easy task why I am telling this because we have a lot of custom mesh configurations like ratelimit, service to service auth bla bla. At this point Istio sidecar resource rescue some memory and startup time problems.

The Istio documentation says the following about it:

Sidecar describes the configuration of the sidecar proxy that mediates inbound and outbound communication to the workload instance it is attached to. By default, Istio will program all sidecar proxies in the mesh with the necessary configuration required to reach every workload instance in the mesh, as well as accept traffic on all the ports associated with the workload. The Sidecar configuration provides a way to fine tune the set of ports, protocols that the proxy will accept when forwarding traffic to and from the workload. In addition, it is possible to restrict the set of services that the proxy can reach when forwarding outbound traffic from workload instances.

The Sidecar resource can be used to fine-tune the Envoy config for a set of workloads.

With this configuration, you can save about 40 or 50% of the sidecar proxy memory consumption. Your istio-proxy startup time reduced %30 or %50

Default namespace sidecar configuration:

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
name: default
namespace: ratelimit
spec:
egress:
- hosts:
- ratelimit/*

Specific workload sidecar configuration:

apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
name: api
namespace: browsing
spec:
egress:
- hosts:
- '*/servicename.namespace.svc.cluster.local'
- ./service.browsing.svc.cluster.local
workloadSelector:
labels:
app: api

4- Activate compression on Istio instead of an application container

Compression is a reduction in the number of bits needed to represent data. Before starting work with mesh or API gateway we enable compression on application level code. At this point, we use a lot of different programming languages to develop and maintain our business code. By default, some application framework provides us compression on network level some are not. It’s hard to control compression settings application level. We can move compression responsibility to our mesh with easy configuration. Please keep in mind when disabling Istio for any workload you will lose compression at the network level. You can define alert rules according to Istio metrics. (istio_response_bytes_bucket)

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: application-gzip
namespace: istio-system
spec:
workloadSelector:
labels:
app: workload-app
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: envoy.http_connection_manager
subFilter:
name: envoy.router
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.compressor
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.compressor.v3.Compressor
compressor_library:
name: text_optimized
typed_config:
'@type': type.googleapis.com/envoy.extensions.compression.gzip.compressor.v3.Gzip
remove_accept_encoding_header: true

5- Hide some configuration from the application developer

Istio has a lot of configurations to configure data plane. It’s good practice to hide some features which are traffic management, security, and logging policies. You can give a declarative user interface and application developers can customize workload without touching any configuration rules. We started to work building all in one portal. Application developers can configure mutual-tls just an enable disable button or they can change application timeout, retry-policy without any side effect. If you don’t have resources to build up a custom portal you can use an awesome Kiali dashboard that provides high-level custom configuration dashboard besides Kiali only works with Istio’s replicated control planes scenario. https://kiali.io/documentation/v1.18/features/#_multi_cluster_support

You can also glance at https://github.com/XiaoMi/naftis custom dashboard.

6- Remove some envoy response header

By default, envoy adds some tracing response header. For security reasons, you might want to remove those headers on your ingress configuration.

  • x-envoy-peer-metadata
  • x-envoy-peer-metadata-id
  • x-envoy-decorator-operation

x-envoy-peer-metadata is some base64 encoded metadata about envoy instance. For example, go to https://www.wantedly.com/ you can see those response headers. You can see clear text data in below


INSTANCE_IPS
100.96.50.107

LABELS*

app
wantedly

pod-template-hash  dfc5f9699
roleweb
$
security.istio.io/tlsModeistio
-
service.istio.io/canonical-name
wantedly
/
#service.istio.io/canonical-revisionlatest

MESH_ID
cluster.local
"
NAMEwantedly-dfc5f9699-fmbjj

NAMESPACE
wantedly
M
OWNERDBkubernetes://apis/apps/v1/namespaces/wantedly/deployments/wantedly

SERVICE_ACCOUNT default

WORKLOAD_NAME
wantedly

Some sensitive information about your workload and infrastructure.

Conclusion

Be careful about your resource consumption in production. Move some network-level process application layer to Istio. Use Istio metrics to define alerts rule. We will continue to share production experience with any mesh tool.

References:

--

--