0

I have a Dataflow job that writes to a BigQuery table. Every Dataflow job will create a new table.

I realize the write operation to the BigQuery table is asynchronous, i.e. the write operation to the BigQuery table may last for several more minutes after the dataflow job finishes.

Now I want to be able to query the status of the newly created table and get its status like: "Being written to" and "Write completed". Is this something we can do through the GCP Java client library?

If not, I have an alternative idea. I know the number of rows that will be write to the table, during the dataflow execution. I can compare the number of rows in the table against this number and see if they match. If yes, the write operation is completed. However, I don't know where I can put the dynamically generated row count which is generated during dataflow execution.

Please help.

1 Answer 1

1

Maybe you could use the BigQuery JOBS view? You would have to store the job_id of your query, then search for it inside the JOBS view with something like:

SELECT
*
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS
  WHERE job_id = "JOB_ID"

After that, this will let you have information on the status of your query by using the job_stages.status STRUCT. When the query is done, it will be marked as COMPLETED.

You can also access this type of information inside Cloud Logging

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.