2

I am trying to do an upgrade to a CRD in Kubernetes for Apache Flink.

Below are the Events in the HelmRelease. I am trying to upgrade the flink-operator from v0.1.0 to v1.0.0. I am also using FluxCD which has been configured to create and/or replace CRDs. Flux is attempting to do this correctly but the Kubernetes API seems to be rejecting the new CRD from the flink operator if I am understanding this correctly.

enter image description here

Any further guidance would be appreciated. Thank you

1 Answer 1

3

You are correct, unfortunately we simply do not support this graceful upgrade path between 0.1.0 and 1.0.0.

The process is documented here: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/#upgrading-from-v1alpha1---v1beta1

We explicitly marked the 0.1.0 version beta, sorry for any inconvenience that this causes. We aim to support the upgrade path you are suggesting going forward, and happy to hear your feedback.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you mbalassi. Would you happen to know why the FlinkDeployments were not recreated after the Flink Operator was upgraded? I used FluxCD to bump the operator version and it was deployed. Now my previosuly running jobs have been stopped (which does make sense) but now I cannot get anything to run? The cluster is not "reconciling". I know Flux is not necessarily something you can advise on but is there anything I can do to check why the FlinkDeployments are no longer listed? Why is the Flink Operator not resubmitting jobs? Thank you again
Sure, this sounds strange - certainly not expected. :-) What do you see in the operator logs? Could you try submitting: kubectl create -f https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.0/examples/basic.yaml Worst case scenario given that your jobs are now suspended (hopefully with a savepoint) you can helm delete flink-kubernetes-operator and then reinstall the new version. Your jobs will not be affected, the k8s resources they need will be kept.
So I just tried your basic-example and these are the logs I am seeing in the operator o.a.f.k.o.c.FlinkDeploymentController [ERROR][default/basic-example] Flink Deployment failed o.a.f.k.o.e.DeploymentFailedException: pods "basic-example-5bfd55dc79-" is forbidden: error looking up service account default/flink: serviceaccount "flink" not found. Before I submitted the example all I saw was this in the operator: WatchConnectionManager [WARN ] Exec Failure: HTTP 404, Status: 404 - Not Found
Hi @Koman, This means that the flink role that we add for the job: github.com/apache/flink-kubernetes-operator/blob/main/helm/… Is not available in the default namespace. Did you set watchNamespaces for the helm chart when installing by any chance?
Uh interesting, I'll check that. Thank you. I managed to get my main Flink jobs working now. I had to update the FlinkDeployment apiVersion in my Helm chart to v1beta1 from v1alpha1. After Flux reconciled the FlinkDeployments came to life :-)Thank you very much.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.