3

Problem

I have the following problem:

I want to use pytorchs DataLoader (in a similar way like here) but my setup varies a bit:

In my datafolder I have images (lets call them image_total of different street situations and I want to use cropped images (called image_crop_[idx] around persons that are close enough to the camera. So it can happen that some images give me one or more cropped images while others give me zero images as they do not show any person or they are to far away.

As I have a lot of images I want to make the implementation as efficient as possible.

My hope is that it is possible to use something like this:

I want to load the image_total and check if useful crops are in it. If so I extract the cropped images and get a list like [image_crop_0, image_crop_1, image_crop_2,...]

Now my question: Is this possible to be compatible with pytorchs DataLoader? The problem I see is that ````getitem```-method of my class would return zero to arbitrary instances. I want to use a constant batch-size for training.

Considerations

  • maybe DataLoader supports this (and I did not find it)
  • I have to work with a buffer or something similar
  • the fallback would be to pre process the data, but this would not be the most efficient solution
0

1 Answer 1

1

the fallback would be to pre process the data, but this would not be the most efficient solution

Indeed, this could be the most simple and efficient solution. Your dataset currently has a dynamic size, which is incompatible with DataLoader which should output something of fixed size for training.

An alternative solution may be to pre-process the data in your pytorch Dataset __init__ to create a list of all persons as well as their corresponding image:

[("img1", p1), ("img1", p2), ..., ("imgn", pk)]

Where pi is the person bounding box in the image. Then, in your __getitem__ method you can read the image and crop the corresponding person:

class PersonDataset(Dataset):
  
  def __init__(self):
    self.images = ["img1", "img2", ..., "image"]
    self.persons = [("img1", p1), ("img1", p2), ..., ("imgn", pk)]

  def __getitem__(self, index):
    img, box = self.persons[index]
    img = rad_image(img)
    return crop(img, box)

  def __len__(self):
    return len(self.persons)

This is not the most efficient method as it may lead to an image being read multiple times, but this should not be a bottleneck if using a DataLoaderusing multiple workers.

You must implement how to create self.persons. Basically you have to read all your annotation files and extract the list of people bounding box of the image.

2
  • Thank you for the answer. This is what I would implement if there is no other solution. I think I try it with this approach. The only thing that is disturbing is that I have to store each of the bounding boxes the whole time (but I think this is not a problem at the end)
    – nckstr15
    Commented Apr 22, 2021 at 9:28
  • It is possible not to store the bounding boxes in memory, though this it is probably not a problem if your dataset is not gigantic.
    – Louis Lac
    Commented Apr 22, 2021 at 9:31

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.