I am comparing many deep learning models to each other, including UNETR, on the BTCV dataset and noticed a discrepancy in the reported number of parameters.
In their paper titled "UNETR: Transformers for 3D Medical Image Segmentation" in table 5:
In addition, the input size usded was as follows:
For multi-organ and spleen segmentation tasks, we randomly sample the input images with volume sizes of [96;96;96].
The input channels is 1:
The multi-organ segmentation problem is formulated as a 13 class segmentation task with 1-channel input.
Their model's code is provided by MONAI:
Code: https://monai.io/research/unetr
Now if I used:
from monai.networks.nets import UNETR
model = UNETR(
in_channels=1,
out_channels=13,
img_size=(96, 96, 96),
feature_size=16,
hidden_size=768,
mlp_dim=3072,
num_heads=12,
proj_type="perceptron",
norm_name="instance",
res_block=True,
dropout_rate=0.0,
)
params = sum(p.numel() for p in model.parameters())
print(params / 1e6)
I get:
121.079693
This reports approximately 121.1M parameters, which is significantly higher than the value reported in the paper (92.58M).
My questions are:
What causes the difference between the parameter count reported in the UNETR paper and the MONAI implementation?
When writing my own paper, is it better to report the parameter count from the actual implementation used, even if it differs from the original paper?
