Let's suppose I have a generic model:
Variable A | Variable B | Variable C | Variable D
Variable Dis a categorical variable. ( for example models of cars - and the dataset on which I trained my model only has models up to year 2020 )
I know for sure that Variable A | Variable B | Variable C are always present, however Variable D can be missing (if for example I am using models of cars from 2021).
My questions are:
If I cannot use data from 2021, how safe is it to use
Variable Din my predictions?Could I just randomly assign a value to
Variable Dwhen it is missing?Is it possible that the model may become too reliant on
Variable Dand by randomly assigning values I might introduce bias?Should I just drop
Variable D, or just the rows without an associated category in the data on which my model has been trained?
Thank you for your time.

