1

I am running Python Datalfow jobs and I deploy the dataflow template to gcs from Gitlab. I am using --requirements_file=requirement.txt when I deploy my python template to gcs. Cloud NAT is diabled in my project and it will restrict the workers to download the packages from PyPi.

Initial requirement.txt that was used:

  • gcloud
  • google-cloud-logging==1.15.0
  • google-cloud-core==1.4.1
  • google-cloud-datastore==1.8.0
  • httplib2
  • google-resumable-media==2.1.0
  • google-cloud-storage
  • google-cloud-bigquery
  • google-cloud
  • apache-beam[gcp]==2.39.0
  • google-api-python-client

My dataflow job got failed because it was trying to download some packages from Internet.

Then the requirement.txt was modified like this.

  • gcloud
  • google-cloud-logging==3.1.2
  • google-cloud-core==1.7.2
  • google-cloud-datastore==1.8.0
  • httplib2==0.19.1
  • google-resumable-media==2.3.3
  • google-cloud-storage==1.44.0
  • google-cloud-bigquery==2.34.4
  • google-cloud==0.34.0
  • apache-beam[gcp]==2.39.0
  • google-api-python-client==2.51.0
  • google-cloud-appengine-logging==0.1.0
  • google-cloud-audit-log==0.1.0
  • pyyaml

Then there was no more download during dataflow runtime. What was the reason for my initial error? How I can ensure that the correct dependency versions to be provided so that it will not download any packages during runtime? I will not be able to use custom container options due to some restrictions.

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.