I'm trying to fine-tune a LLaMA 3.1 Instruct model to adapt it to a specific industrial domain. The goal is to have the model extract direct dependencies between tasks from a list of operational steps for a given equipment type.
Given a list of tasks, the model should output strict JSON with pairs of dependencies, like this:
{"dependencies": [["T1", "T2"], ["T3", "T4"], ...]}
I created a custom dataset with 1,000 examples. Each example includes:
An instruction (always the same for now)
A prompt
A list of tasks
The correct dependency output
Metadata: order, equipment type
Here’s an example of the input-output pair seen during fine-tuning:
<|start_header_id|>system<|end_header_id|>
Analyze the following tasks and return the direct dependencies in strict JSON format.
Additional rules:
Generate only valid JSON, without comments.
DO NOT repeat task names.
Expected format: {"dependencies": [["T1", "T2"], ["T3", "T4"], ...]}
The provided order is: ordered.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Equipment type: heat exchanger
Task list:
T1: valve work
T2: removal of the safety valve
T3: cleaning in the wash area
T4: cleaning inspection
T5: workshop safety valve calibration
T6: delivery of calibration report
T7: reinstallation of the safety valve
T8: sealing inspection
T9: project completion milestone
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
{"dependencies": [["T1", "T2"], ["T2", "T3"], ["T3", "T4"], ["T4", "T5"], ["T5", "T6"], ["T6", "T7"], ["T7", "T8"], ["T8", "T9"]]}
Fine-Tuning Setup
LoRA with target_modules=["q_proj", "v_proj", "o_proj", "down_proj"]
Alpha: 16
r: 8
LR: 2e-4
2 epochs
85/15 train/val split
Data augmentation: shuffling task lists to reduce overfitting to linear order
Training and validation loss decreased smoothly, which seemed promising.
After fine-tuning, I evaluated the model:
It performs well on training/validation data.
On unseen examples, it performs badly, especially with parallel dependencies (multiple tasks depending on the same one or running concurrently).
It seems to overfit to sequential dependencies, even though parallel ones are present throughout the dataset.
How can I get the model to better generalize to unseen examples?
How can I encourage learning of parallel dependencies during fine-tuning?