In a pilot study, you get data to inform the design of a later, more definitive study. With a small study your estimates will necessarily be somewhat imprecise, but you want to do what you can with what you have.
With that in mind, the answer to one of your questions
... would collapsing the three tasks into a composite measure be more defensible for a pilot study?
would be no. You presumably want to get as much information as possible about each task/arousal combination. Collapsing would lose much of that information.
With respect to your other concerns:
Each participant contributes only one observation per task (i.e., 3 repeated measures total).
Quoting from this answer (which includes literature references): "The minimum sample size per cluster in a mixed-effecs model is 1, provided that the number of clusters is adequate, and the proportion of singleton cluster is not 'too high.'"
You do need to have enough clusters (subjects, in your case); you have 39, each with 3 observations. That should be OK. See this page.
There are no trial-level repetitions within each task, so within-person variance cannot be reliably estimated.
In your context, you use a mixed model to take within-person correlations among responses into account. Such a mixed model wouldn't estimate within-person variance. A simple mixed model with a random intercept would evaluate the among-person variance in estimated intercept values.
The sample size per arousal group is very small (n = 13), which may lead to unstable fixed-effects estimates.
The design seems extremely underpowered for detecting an Arousal × Task interaction.
A model without that interaction, under treatment coding of factor variables, would already estimate 5 parameter values:
an intercept (estimated execution time at reference levels of both factors)
two coefficients for Task (differences from execution time at the reference level for the other 2 levels of Task)
two "main effects" of Arousal (differences from execution time at the reference level for the other 2 levels of Arousal)
The Arousal × Task interaction would seem to be of interest, unless you already know that the association between Arousal and outcome doesn't differ among the Tasks. That would require only estimating 4 more fixed-effect coefficients, for 9 total.
You have 39 subjects with 3 observations each, for a total of 117 observations. Even with the interaction term, that gives you 13 observations per fixed-effect coefficient. If the observations were independent, that would be a bit less than ideal, but not too far from recommendations for 15 observations per parameter estimate for a continuous outcome.
For a pilot study it would seem to be important to include the interaction, even given the limited sample size and the repeated measures. You might have large standard errors for the coefficient estimates, but those can be incorporated into later simulations to establish sample sizes adequate for a later, more definitive study.