There are several things to look at when running coreDNS in your kubernetes cluster
- Memory
- AutoPath
- Number of Replicas
- Autoscaler
- Other Plugins
- Prometheus metrics
- Separate Server blocks
Memory
CoreDNS recommended amount of memory for replicas is
MB required (default settings) = (Pods + Services) / 1000 + 54

Autopath
Autopath is a feature in Coredns that helps increase the response time for external queries
Normally a DNS query goes through
- ..svc.cluster.local
- .svc.cluster.local
- cluster.local
- Then the configured forward, usually host search path (/etc/resolv.conf
Trying "example.com.default.svc.cluster.local"
Trying "example.com.svc.cluster.local"
Trying "example.com.cluster.local"
Trying "example.com"
Trying "example.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55265
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 30 IN A 93.184.216.34
This requires more memory so the calculation now becomes
MB required (w/ autopath) = (Number of Pods + Services) / 250 + 56
Number of replicas
Defaults to 2 but enabling the Autoscaler should help with load issues.
Autoscaler
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: default
spec:
maxReplicas: 20
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
targetCPUUtilizationPercentage: 50
Node local cache
Beta in Kubernetes 1.15
NodeLocal DNSCache improves Cluster DNS performance by running a dns caching agent on cluster nodes as a DaemonSet. In today’s architecture, Pods in ClusterFirst DNS mode reach out to a kube-dns serviceIP for DNS queries. This is translated to a kube-dns/CoreDNS endpoint via iptables rules added by kube-proxy. With this new architecture, Pods will reach out to the dns caching agent running on the same node, thereby avoiding iptables DNAT rules and connection tracking. The local caching agent will query kube-dns service for cache misses of cluster hostnames(cluster.local suffix by default).
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
Other Plugins
These will also help see what is going on inside CoreDNS
Error - Any errors encountered during the query processing will be printed to standard output.
Trace - enable OpenTracing of how a request flows through CoreDNS
Log - query logging
health - CoreDNS is up and running this returns a 200 OK HTTP status code
ready - By enabling ready an HTTP endpoint on port 8181 will return 200 OK when all plugins that are able to signal readiness have done so.
Ready and Health should be used in the deployment
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
Prometheus Metrics
Prometheus Plugin
coredns_health_request_duration_seconds{} - duration to process a HTTP query to the local /health endpoint. As this a local operation, it should be fast. A (large) increase in this duration indicates the CoreDNS process is having trouble keeping up with its query load.
https://github.com/coredns/deployment/blob/master/kubernetes/Scaling_CoreDNS.md
Separate Server blocks
One last bit of advice is to separate the Cluster DNS server block to external block
CLUSTER_DOMAIN REVERSE_CIDRS {
errors
health
kubernetes
ready
prometheus :9153
loop
reload
loadbalance
}
. {
errors
autopath @kubernetes
forward . UPSTREAMNAMESERVER
cache
loop
}
More information about the k8 plugin and other options here
https://github.com/coredns/coredns/blob/master/plugin/kubernetes/README.md