0

I am trying to figure out what would be a good architecture for neural network that takes projections (2D images) from different angles and creates volume consisting of 2D slices (CT-like).

So for example:

  • Input [180,100,100] -> 180 projections of image 100x100 pixels.
  • Output [100,100,100] -> Volume of size 100x100x100 (100 slices of 2D images)

I have ground truth volumes.

I came up with the idea of using ResNet as Encoder. But I'm not really sure how to implement Decoder and what model would be a good choice for this kind of problem. I did think of U-net architecture, but output dimension is different, so I've abandoned this idea.

I am using PyTorch.

1
  • Have you read about Structure from Motion (SfM)? If not, I recommend you do so before delving into a deep-learning-based approach. As this is not a straightforward problem. Commented Mar 2, 2024 at 9:28

1 Answer 1

1

Specifying the whole network is out of scope of a single answer, but generally you want something like this:

  1. Use a Resnet or vision transformer as the encoder
  2. Use the encoder to map the input down to a latent tensor
  3. Reshape latent tensor as needed
  4. Use ConvTranspose3d layers to upsample latent tensor to desired output size

You can do a UNet-like setup where you have skip connections between encoder layers and decoder layers, you would just need a projection layer to map the encoder activations into a shape compatible with the decoder activations.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.