Is clothing-invariant person recognition possible using still images only?

Ask Question

I am working on a person recognition system for learning purposes.

My goal is:

Maintain a small gallery of known people (multiple images per person)
Given a new query image, return the most similar person with a confidence score
The system should work even if clothing or accessories change
The query image may show front, back, or partial body views

I am currently experimenting with person re-identification models, but I am observing that matching accuracy drops significantly when clothing changes.

This makes me question the feasibility of my objective.

From a technical perspective, I would like to understand:

Are most image-based person re-identification models inherently appearance-driven (i.e., heavily dependent on clothing)?
Without video (no gait information), is clothing-invariant person recognition realistically achievable?
Is combining multiple modalities (e.g., face + body embeddings) the correct direction?
Or is this problem fundamentally limited when relying only on still RGB images?

I am looking for library recommendations and guidance on whether this objective is technically realistic and what general approach is appropriate.

Any architectural insights would be appreciated.

Collectives™ on Stack Overflow

Is clothing-invariant person recognition possible using still images only?