1

I get this error only when reading, then writing (to a different table). If I only read from the table, no error occurs. For example, the code below produces no error.

   Pipeline p = Pipeline.create(
    PipelineOptionsFactory.fromArgs(args).withValidation().create());

   PCollection<TableRow> BigQueryTableRow = p
    	.apply(BigQueryIO.Read.named("ReadTable")
        .from("project:dataset.data_table"));

   p.run();

But if I do the following, I get a 'BigQuery job Backend error'.

Pipeline p = Pipeline.create(
    PipelineOptionsFactory.fromArgs(args).withValidation().create());
   PCollection<TableRow> BigQueryTableRow = p
    	.apply(BigQueryIO.Read.named("ReadTable")
        .from("project:dataset.data_table"));


    TableSchema tableSchema = new TableSchema().setFields(fields);
    
    BigQueryTableRow.apply(BigQueryIO.Write
      .named("Write Members to BigQuery")
      .to("project:dataset.data_table_two")
      .withSchema(tableSchema)
      .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
      .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
        
       p.run();

Some more details on the error

BigQuery job "dataflow_job" in project "project-name" 
finished with error(s): errorResult: Backend error. 
Job aborted.

2
  • Please show us where you are setting the PipelineOptions Commented Jul 20, 2016 at 13:06
  • I am setting the pipeline options from arguments, as its done by the 'StarterPipeline' class that is created by default on eclipse. I will add this to my post.
    – user5564900
    Commented Jul 21, 2016 at 6:43

1 Answer 1

1

I managed to figure out the problem on my own. The backend error message is produced because I have two repeated fields in my table.

If I try to output the entire table using BigQuery's web service it displays more helpful error message.

Error: Cannot output multiple independently repeated fields
at the same time. Found memberships_is_coach and actions_type

It is unfortunate that the 'Backend error' message provides no real insight into the problem. Also, when reading only reading the data and not performing any operations, no error is given which further exacerbates the problem.