2

I am using a dataset of 11 classes of audio files and by using Convolutional Neural Network I tried to classify those audio files.

My model:

train_data = np.array(X)
train_labels = np.array(y)
model = Sequential()
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=train_data.shape))
model.add(layers.MaxPool2D(2,2))
model.add(layers.Conv2D(32, (3,3), activation='relu'))
model.add(layers.MaxPool2D(2,2))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dense(34, activation="relu"))
model.add(layers.Dense(NUM_LABELS))
model.summary()

train_data is audio loaded using librosa with the shape of (6705, 20, 130)

train_label is an array of one-hot vectors with the shape of (6705, 11)

Whether I expand dimensions: reshaped_train_data = np.expand_dims(train_data, axis=3) or reshape it: reshaped_train_data = train_data.reshape(-1, train_data.shape[1], train_data.shape[2], 1)

and tried to train it: history = model.fit(reshaped_train_data , train_labels, epochs=50, validation_split=0.1)

It gives me the following error: ValueError: Error when checking input: expected conv2d_5_input to have a shape (6705, 20, 130) but got an array with shape (20, 130, 1)

How to reshape it or expand it in a way so that I could train my model?

1 Answer 1

1

There are 2 mistakes:

  1. training data shape
  2. conv2d input_shape parameter

training data should be 4dimensional(batch, rows, cols, channels) so use train_data = np.expand_dims(train_data, axis=3)

input_shape is a tuple of integers that does not include the sample axis so use model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=train_data.shape[1:]))

Here's a sample code using random input:

import numpy as np
import tensorflow.keras.layers as layers
from tensorflow import keras

NUM_LABELS = 11
train_data = np.random.random(size=(6705, 20, 130))

###############expand shape################
train_data = np.expand_dims(train_data, axis=3)

# generate one-hot random vector
train_labels =  np.eye(11)[np.random.choice(1, 6705)]

model = keras.Sequential()

###############input_shape################
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=train_data.shape[1:]))

model.add(layers.MaxPool2D(2,2))
model.add(layers.Conv2D(32, (3,3), activation='relu'))
model.add(layers.MaxPool2D(2,2))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation="relu"))
model.add(layers.Dense(34, activation="relu"))
model.add(layers.Dense(NUM_LABELS))
model.summary()

model.compile(
   loss = 'categorical_crossentropy', optimizer = 'sgd', metrics = ['accuracy']
)

history = model.fit(train_data , train_labels, epochs=1, validation_split=0.1)

Results:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 18, 128, 32)       320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 9, 64, 32)         0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 62, 32)         9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 31, 32)         0
_________________________________________________________________
flatten (Flatten)            (None, 2976)              0
_________________________________________________________________
dense (Dense)                (None, 128)               381056
_________________________________________________________________
dense_1 (Dense)              (None, 34)                4386
_________________________________________________________________
dense_2 (Dense)              (None, 11)                385
=================================================================
Total params: 395,395
Trainable params: 395,395
Non-trainable params: 0
_________________________________________________________________
189/189 [==============================] - 8s 42ms/step - loss: 16.0358 - accuracy: 0.0000e+00 - val_loss: 16.1181 - val_accuracy: 0.0000e+00
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.