Error Syncing Pod Python Dataflow Workers

Ask Question

Asked 7 months ago

Modified 7 months ago

Viewed 164 times

Part of Google Cloud Collective

I have built my custom Python image for streaming Dataflow job and tested it locally that it works. When deployed to GCP it pulls the images, JOB logs initialize the minimum 1 worker, job is in state RUNNING. I see logs that I printed when pipeline is initiated. However, when I publish pubsub messages to my stream nothing happens, just data freshness metric instreases, workers have strange errors (that are markes info severity) about flags that I do not control:

flag provided but not defined: -logging_endpoint
Usage of /opt/google/dataflow/python_template_launcher:
...

And when looking into dataflow_step logs (not shown in Dataflow Worker console) in GCP I see constant (not sure if those two are related):

"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"sdk-0-0\" with CrashLoopBackOff: \"back-off 10s restarting failed container=sdk-0-0 pod=df-myjob-she-09170618-0m8p-harness-vmzx_default(ce159539391472095ac26571c8a6ceb2)\"" pod="default/df-myjob-she-09170618-0m8p-harness-vmzx" podUID=ce159539391472095ac26571c8a6ceb2

Here is the dockerfile (anything commented was there before from google-docs and tried it):

FROM gcr.io/dataflow-templates-base/python311-template-launcher-base:latest

WORKDIR /template

COPY . /template/

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="src/dataflow.py"


RUN apt-get update \
    && apt-get install -y libffi-dev git \
    && rm -rf /var/lib/apt/lists/* \
    # Upgrade pip and install the requirements.
    # && pip install --upgrade pip \
    # && pip install -e .
    # && pip install --no-cache-dir -r "/template/requirements-pypi.txt" \
    # Download the requirements to speed up launching the Dataflow job.
    # && pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r "/template/requirements-pypi.txt"


ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]

Here is how I launch my job :

gcloud dataflow flex-template run "myjob" \
  --project="xxx" \
  --region="europe-north1" \
  --template-file-gcs-location="gs://xxx/templates/myjob-template.json" \
  --parameters=^::^pubsub_topic=projects/xxx/topics/senior-cash-transactions::staging_location=gs://xxx/temp::sdk_container_image=europe-north1-docker.pkg.dev/xxx/xxx/xxx:latest \
  --temp-location="gs://xxx/temp" \
  --additional-experiments="enable_secure_boot" \
  --disable-public-ips \
  --network="xxx" \
  --subnetwork="https://www.googleapis.com/compute/v1/projects/xxx/regions/europe-north1/subnetworks/xxx-europe-north1" \
  --worker-machine-type="e2-standard-8" \
  --service-account-email="[email protected]"

Service account has pubsub admin, dataflow worker, dataflow admin, gcs admin, network user, log writer, anything I could think of while debugging.

Also, here are some logs I see (filter is dataflow_step and job_id):

I have reviewed several related topics, such as Why did I encounter an "Error syncing pod" with Dataflow pipeline?, also the one in Google Docs per related error message (https://cloud.google.com/dataflow/docs/guides/common-errors#error-syncing-pod), so far I tried:

Changing apache beam versions
As per other thread removing requirements.txt and setting up dependencies with setup.py file and rebuilding the image
Removing requirements and setup at all from Dockerfile as looks like those dependencies already in the image to avoid conflicts and rebuilding the image
Checked firewall settings in my VPC and Firewall GCP, nothing there.
Checked all IAM permissions for service account

asked Sep 17, 2024 at 14:09

bozhyte

372 bronze badges

Hi @bozhyte, It appears that this issue has to be investigated further, so if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue.
– Dhiraj Singh
Commented Sep 17, 2024 at 15:40
Please check github.com/GoogleCloudPlatform/python-docs-samples/tree/main/…. In particular, check what SDK_CONTAINER_IMAGE is used. The syncing pod errors are usually fine as long as it eventually downloads your SDK container successfully.
– XQ Hu
Commented Sep 21, 2024 at 19:54

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Error Syncing Pod Python Dataflow Workers

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked