Encoder - Decoder neural network architecture with different input and output size

Question

I am trying to figure out what would be a good architecture for neural network that takes projections (2D images) from different angles and creates volume consisting of 2D slices (CT-like).

So for example:

Input [180,100,100] -> 180 projections of image 100x100 pixels.
Output [100,100,100] -> Volume of size 100x100x100 (100 slices of 2D images)

I have ground truth volumes.

I came up with the idea of using ResNet as Encoder. But I'm not really sure how to implement Decoder and what model would be a good choice for this kind of problem. I did think of U-net architecture, but output dimension is different, so I've abandoned this idea.

I am using PyTorch.

Have you read about Structure from Motion (SfM)? If not, I recommend you do so before delving into a deep-learning-based approach. As this is not a straightforward problem. — Ivan
– Ivan, Commented Mar 2, 2024 at 9:28

Karl · Accepted Answer · 2024-03-04 08:02:04Z

1

Specifying the whole network is out of scope of a single answer, but generally you want something like this:

Use a Resnet or vision transformer as the encoder
Use the encoder to map the input down to a latent tensor
Reshape latent tensor as needed
Use ConvTranspose3d layers to upsample latent tensor to desired output size

You can do a UNet-like setup where you have skip connections between encoder layers and decoder layers, you would just need a projection layer to map the encoder activations into a shape compatible with the decoder activations.

answered Mar 4, 2024 at 8:02

Karl

5,9661 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Encoder - Decoder neural network architecture with different input and output size

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related