3

Tensorflow has tf.data.Dataset.repeat(x) that iterates through the data x number of times. It also has iterator.initializer which when iterator.get_next() is exhausted, iterator.initializer can be used to restart the iteration. My question is is there difference when using tf.data.Dataset.repeat(x) technique vs iterator.initializer?

1 Answer 1

9

As we know, each epoch in the training process of a model takes in the whole dataset and breaks it into batches. This happens on every epoch. Suppose, we have a dataset with 100 samples. On every epoch, the 100 samples are broken into 5 batches ( of 20 each ) for feeding them to the model. But, if I have to train the model for say 5 epochs then, I need to repeat the dataset 5 times. Meaning, the total elements in the repeated dataset will have 500 samples ( 100 samples multipled 5 times ).

Now, this job is done by the tf.data.Dataset.repeat() method. Usually we pass the num_epochs argument to the method.

The iterator.get_next() is just a way of getting the next batch of data from the tf.data.Dataset. You are iterating the dataset batch by batch.

That's the difference. The tf.data.Dataset.repeat() repeats the samples in the dataset whereas iterator.get_next() one-by-one fetches the data in the form of batches.

Sign up to request clarification or add additional context in comments.

3 Comments

@ Shubham Panchal , so is there need to initialize the iterator once you gone through the whole data if you are using repeat ?
Intializing the iterator once could help you iterate through the whole dataset ( repeated ).
Is repeat() just for performance optimization then?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.