Difference between tf.data.Dataset.repeat() vs iterator.initializer

Question

Tensorflow has tf.data.Dataset.repeat(x) that iterates through the data x number of times. It also has iterator.initializer which when iterator.get_next() is exhausted, iterator.initializer can be used to restart the iteration. My question is is there difference when using tf.data.Dataset.repeat(x) technique vs iterator.initializer?

Shubham Panchal · Accepted Answer · 2019-08-29 14:22:40Z

As we know, each epoch in the training process of a model takes in the whole dataset and breaks it into batches. This happens on every epoch. Suppose, we have a dataset with 100 samples. On every epoch, the 100 samples are broken into 5 batches ( of 20 each ) for feeding them to the model. But, if I have to train the model for say 5 epochs then, I need to repeat the dataset 5 times. Meaning, the total elements in the repeated dataset will have 500 samples ( 100 samples multipled 5 times ).

Now, this job is done by the tf.data.Dataset.repeat() method. Usually we pass the num_epochs argument to the method.

The iterator.get_next() is just a way of getting the next batch of data from the tf.data.Dataset. You are iterating the dataset batch by batch.

That's the difference. The tf.data.Dataset.repeat() repeats the samples in the dataset whereas iterator.get_next() one-by-one fetches the data in the form of batches.

@ Shubham Panchal , so is there need to initialize the iterator once you gone through the whole data if you are using repeat ?
Intializing the iterator once could help you iterate through the whole dataset ( repeated ).

Collectives™ on Stack Overflow

Difference between tf.data.Dataset.repeat() vs iterator.initializer

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related