I noticed that the output from TensorFlow's image_dataset_from_directory
is different than directly loading images (either by PIL, Keras' load_img
, etc.). I set up an experiment: I have a single RGB image with dimensions 2400x1800x3, and tried comparing the resulting numpy arrays from the different methods:
from PIL import Image
from tensorflow.keras.utils import image_dataset_from_directory, load_img, img_to_array
train_set = image_dataset_from_directory(
'../data/',
image_size=(2400, 1800), # I'm using original image size
label_mode=None,
batch_size=1
)
for batch in train_set:
img_from_dataset = np.squeeze(batch.numpy()) # remove batch dimension
img_from_keras = img_to_array(load_img(img_path))
img_from_pil = img_to_array(Image.open(img_path))
print(np.all(img_from_dataset == img_from_keras)) # False
print(np.all(img_from_dataset == img_from_pil)) # False
print(np.all(img_from_keras == img_from_pil)) # True
So, even though all methods return the same shape numpy array, the values from image_dataset_from_directory
are different. Why is this? And what can/should I do about it?
This is a particular problem during prediction time where I'm taking a single image (i.e. not using image_dataset_from_directory
to load the image).