I have processed and saved a large dataset of video and audio file (about 8 to 9 GB of data) The data is saved as 2 numpy arrays, one for each modality Shapes of the files are (number_of_examples, maximum_time_length, feature_length)
I want to use this data for training my Neural Network for a classification task I am using the TensorFlow 2.0 Beta version I am running all the codes on Google Colab (after installing tf-2.0 beta) Each time I loading the data in tf.data the entire RAM of the Virtual Machine is used and the session is forced to restart.
Previous Approaches:
I tried 2 approaches
1) Loading both the variables entirely into the RAM and converting it to tensors
2) Loading the data as memory mapped array(from disk) and load that to tf.data
However both approaches loaded the RAM and forced the VM to restart
Code:
# Access the Audio memory from disk without loading
X_audio = np.memmap('gdrive/My Drive/Codes/audio_data.npy', dtype='float32', mode='r').reshape(2198,3860,74)
# Access the Video memory from disk without loading
X_video = np.memmap('gdrive/My Drive/Codes/video_data.npy', dtype='float32', mode='r').reshape(2198,1158,711)
# Load labels
with open('gdrive/My Drive/Codes/label_data_3','rb') as f:
Y = pkl.load(f)
dataset = tf.data.Dataset.from_tensor_slices((X_audio, X_video, Y)).shuffle(2198).batch(32)
Error : Your session crashed after using all available RAM
dask