How can I convert a video to a multiple of numpy arrays or a single one to use it for machine learning . I only found ways to do it on images.
1 Answer
A regular image is represented as a 3D Tensor with the following shape: (height, width, channels)
. The channels value 3 if the image is RGB and 1 if it is grayscale.
A video is a collection of N frames, where each frame is an image. You'd want to represent this data as a 4D tensor: (frames, height, width, channels)
.
So for example if you have 1 minute of video with 30 fps, where each frame is RGB and has a resolution of 256x256, then your tensor would look like this: (1800, 256, 256, 3)
, where 1800 is the number of frames in the video: 30 (fps) * 60 (seconds).
To achieve this you can basically open each individual frame of the video, store them all in a list and concatenate them together along a new axis (i.e. the "frames" dimension).
You can do this through OpenCV:
# Import the video and cut it into frames.
vid = cv2.VideoCapture('path/to/video/file')
frames = []
check = True
i = 0
while check:
check, arr = vid.read()
if not i % 20: # This line is if you want to subsample your video
# (i.e. keep one frame every 20)
frames.append(arr)
i += 1
frames = np.array(frames) # convert list of frames to numpy array
-
Thank you for your answer ! How can I write that in code though sir ?– SasoCommented May 22, 2021 at 0:09
-