0
$\begingroup$

I would like to compare several groups to a reference group, with the main idea being to show that the other groups are not inferior to the reference. Ideally, I would also like to test for superiority if not inferior.

The sample size is very small: around 20 participants per group. The study was designed without any sample size calculation, and no non-inferiority margin was pre-specified. The investigators, who had no prior experience, concluded non-inferiority simply because the superiority test p-value was >0.05. I should remake the design as not publishable

The context is a study in medically assisted procreation (MAP), comparing the number of oocytes retrieved across 5 groups (corresponding to 5 phases of the menstrual cycle).

I have several questions:

  1. Could such a paper be publishable, even though non-inferiority margins were only defined a posteriori?

  2. The working hypothesis is that treatment could begin at any menstrual phase (not necessarily phase 1) without losing efficacy. Therefore, comparisons are only needed versus the first group. Should I run four separate tests? A global test?

  3. Should I correct p-values for multiple comparisons? (These should technically be independent tests, right? So no correction required?)

  4. For curiosity: how should I calculate the sample size needed for such a hypothesis? Should I compute the required N for each comparison independently and then retain the largest?

    From the observed confidence intervals of the difference between group (too wide), it is already clear that the current sample size is insufficient. I am considering a Bayesian analysis, but I only have theoretical knowledge and have never elicited priors in a real study. What would you recommend regarding priors? Should I set 4 priors for the mean differences in the 4 comparisons? Would that also require 4 priors for the standard deviations? Would non-informative priors be more appropriate? What is the best way to elicit priors without being biased by the current data?

  5. Could such an analysis be implemented in R with JAGS?

The overall idea is to show that the other phases (2, 3, 4, and 5) are not inferior to phase 1, with a non-inferiority margin of 2 oocytes. I also want to test a potential superiority if not inferior (with a delta of 2 oocytes too)

$\endgroup$
3
  • 1
    $\begingroup$ I fixed the formatting as the way you had it, the questions looked cut off. I think we need more information about the design. How did it work? Were different women tracked for different amounts of time (menstrual cycles)? Were the same women assessed multiple times? $\endgroup$ Commented Oct 3 at 11:33
  • $\begingroup$ The investigators did not have a proper study design; they only had hypotheses. which initially was: Could we start treatment at any phase of the menstrual cycle while maintaining the same efficacy?In that case, an equivalence study without a reference group might have been appropriate. However, looking more closely at their actual research question, it is more about: Could we start treatment at any phase of the cycle rather than only at the beginning? That led me to think we could frame this as an equivalence/non-inferiority study, comparing each phase to phase 1. The groups are independant $\endgroup$ Commented Oct 3 at 12:06
  • $\begingroup$ But we rather want to go for a non inferiority study $\endgroup$ Commented Oct 3 at 12:07

1 Answer 1

1
$\begingroup$

To answer your questions in order

  1. Yes, this could be a publishable paper. The fact that the non-inferiority margins were defined post-hoc (or not) is not really relevant. What is relevant is that these margins are defensible. Usually, they come from domain expert consensus. So, can you find papers which used/defined a similar non-inferiority criterion? Or can you convene a panel of domain experts, and get them to agree on your criterion? Or can you at least provide a reasoning based on sound medical judgment? If the non-inferiority margin was pulled out of a hat (or an even darker place), then it does not matter if that was done pre, or post-hoc. It will be challenged, and it may not fly.
  2. I do not know of an omnibus non-inferiority test (and I can not even conceive how it could work). Say, you ran an ANOVA; the best you could achieve is to fail to reject the null hypothesis, which proves nothing (just that your test was underpowered); it does not "prove" yo0ur research hypothesis. You will need to run 4 tests, and for each, you will need to reject the null (which will be $\bar {x_0} - \bar {x_i} \le -margin$) (where $x_0$ is the first group, and $x_i$ are the other 4 groups). If you fail to reject for even 1, then you can not accept your study hypothesis.
  3. Yes, you should correct for multiple comparisons, since you want to reject 4 times. False positives will help you to accept the study hypothesis, so you need to correct for this, if you want to achieve the stated confidence level.
  4. Yes, you should pick the largest N required (N can depend on both the difference in means with the first group, and the variance of the other groups).

You then state that the observed CI’s are wider than they should be (i.e. they extend beyond $-margin$). Did you use single sided intervals? For non-inferiority, you can, and you should; this will increase your power (same thing btw for superiority testing). Now, if even with single sided tests, the CI’s are still extending below $-margin$, then yes, you will need a larger sample size.

  1. I am not a Bayesian statistician, so cannot comment properly on this. But I would be quite surprised if credible intervals were substantially narrower (they tend to give similar answers). And if they were, you would have to defend your subjective prior (in addition to the non-inferiority margin). But I will let “true” Bayesians comment.
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.