Kubernetes CustomResourceDefinition upgrade failed - "flinkdeployments.flink.apache.org"

Question

I am trying to do an upgrade to a CRD in Kubernetes for Apache Flink.

Below are the Events in the HelmRelease. I am trying to upgrade the flink-operator from v0.1.0 to v1.0.0. I am also using FluxCD which has been configured to create and/or replace CRDs. Flux is attempting to do this correctly but the Kubernetes API seems to be rejecting the new CRD from the flink operator if I am understanding this correctly.

Any further guidance would be appreciated. Thank you

mbalassi · Accepted Answer · 2022-06-10 14:04:38Z

3

You are correct, unfortunately we simply do not support this graceful upgrade path between 0.1.0 and 1.0.0.

The process is documented here: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/#upgrading-from-v1alpha1---v1beta1

We explicitly marked the 0.1.0 version beta, sorry for any inconvenience that this causes. We aim to support the upgrade path you are suggesting going forward, and happy to hear your feedback.

answered Jun 10, 2022 at 14:04

mbalassi

1915 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Koman Over a year ago

Thank you mbalassi. Would you happen to know why the FlinkDeployments were not recreated after the Flink Operator was upgraded? I used FluxCD to bump the operator version and it was deployed. Now my previosuly running jobs have been stopped (which does make sense) but now I cannot get anything to run? The cluster is not "reconciling". I know Flux is not necessarily something you can advise on but is there anything I can do to check why the FlinkDeployments are no longer listed? Why is the Flink Operator not resubmitting jobs? Thank you again

mbalassi Over a year ago

Sure, this sounds strange - certainly not expected. :-) What do you see in the operator logs? Could you try submitting: kubectl create -f https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.0/examples/basic.yaml Worst case scenario given that your jobs are now suspended (hopefully with a savepoint) you can helm delete flink-kubernetes-operator and then reinstall the new version. Your jobs will not be affected, the k8s resources they need will be kept.

Koman Over a year ago

So I just tried your basic-example and these are the logs I am seeing in the operator

o.a.f.k.o.c.FlinkDeploymentController [ERROR][default/basic-example] Flink Deployment failed o.a.f.k.o.e.DeploymentFailedException: pods "basic-example-5bfd55dc79-" is forbidden: error looking up service account default/flink: serviceaccount "flink" not found

. Before I submitted the example all I saw was this in the operator: WatchConnectionManager [WARN ] Exec Failure: HTTP 404, Status: 404 - Not Found

mbalassi Over a year ago

Hi @Koman, This means that the flink role that we add for the job: github.com/apache/flink-kubernetes-operator/blob/main/helm/… Is not available in the default namespace. Did you set watchNamespaces for the helm chart when installing by any chance?

Koman Over a year ago

Uh interesting, I'll check that. Thank you. I managed to get my main Flink jobs working now. I had to update the FlinkDeployment apiVersion in my Helm chart to v1beta1 from v1alpha1. After Flux reconciled the FlinkDeployments came to life :-)Thank you very much.

Collectives™ on Stack Overflow

Kubernetes CustomResourceDefinition upgrade failed - "flinkdeployments.flink.apache.org"

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related