I am working with the ODIR-5K (Ocular Disease Intelligent Recognition) dataset. The goal is multi-label classification of 8 ocular diseases (Normal, Diabetes, Glaucoma, Cataract, etc.).
The Data Structure Problem The dataset provides two images per patient (Left Eye and Right Eye), but only one set of labels for the patient for each disease. though it has Two columns for each of eye containing its diagnostic keyword.
Approach 1: Single-Input (Splitting the Data)
I restructure the dataframe to treat every image as an independent sample.
Concern: This introduces "Label Noise." If a patient has a Cataract only in the Left eye, splitting the data forces the model to treat the healthy Right eye as "Cataract Positive."
I did train the cnnn based on this but it gave horrible results.
Approach 2: Dual-Input
I keep the patient grouped and feed both eyes simultaneously.
Inputs:
[Left_Input, Right_Input]Architecture: Two CNN branches (sharing weights) $\to$ Concatenate $\to$ Dense Layers $\to$ Output.
Concern: Higher computational cost and complex data generation
Approach 3: Using the diagnostic keyword to alter the disease data for each image.
- Concern: not every disease is there & the diagnostic keywords are too complex to analyze for each image due to subtle differences in keywords
My Questions:
What is the standard practice for handling "Patient-Level Labels" with "Organ-Level Images"?
Which approach should i use also please tell if i have some other options or Tricks to use.
Also if i train based on approach 2 how do i augment the data
