TL;DR: The factor of 2 comes from the fact that there are 2 Fabry-Pérot cavities; where you used $K_-^2 P_{\mathrm{arm}}/h\nu$, you should have used $G_- P_{\mathrm{arm}}/h\nu$ because that's the number of photons hitting the mirrors; and the factor of $K_-$ is a transfer function between coupled oscillators (the fluctuating pressure noise and the fluctuating antisymmetric output).
You're right that this was not very well explained at all. In fairness to the Martynov et al. paper, its purpose was just a status report. But more broadly, it is very difficult to track down any good explanation of how LIGO works. The full calculation that's usually cited was done by Buonanno and Chen in a full quantum framework. It's pretty complicated, with a view towards modifications that are only now being incorporated into LIGO (e.g., squeezing), so it's not very intuitive for the simpler case that's relevant to the particular equation you cite. The closest thing I've found to an explanation of how Advanced LIGO works is in this paper, which has lots of details that might be enlightening. The following expresses the idea, but is a bit hand-wavy. (And I'm not at all a detector person; I just happened to be surrounded by postdocs including Buonanno and Chen during grad school, so I got a lot of exposure.)
Remember that we only directly care about the differential channel — the difference in lengths of the two Fabry-Pérot cavities. But the signal-recycling mirror couples them together in an antisymmetric way, so that the differential signal bounces around a few times before leaking out. Let's say we look at the behavior over some period of time $\Delta t \sim 1/f$. And suppose we have $N$ photons circulating in the combined signal-recycling and arm-cavity system, with shot noise giving us uncertainty in that number by about $\sqrt{N}$. But those same photons will typically bounce around quite a few times inside the system before leaving to hit the photodetector. So to get the number of times the mirrors are hit by photons in a beam of power $P_{\mathrm{arm}}$, you have to multiply by the number of times those photons bounce around — which is precisely what this $G_-$ factor is meant to represent. ($G$ is meant to evoke "gain".) So if you're looking for the shot noise in the number of photons hitting these mirrors, you need the number of photons times the number of times each photon hits: $G_- P_{\mathrm{arm}} / h\nu$. Since the shot noise in the number of hits is the square root of the number of hits, we get the factor $(G_- P_{\mathrm{arm}} / h\nu)^{1/2}$
The $K_-$ factor is something different. Remember that we're decomposing the signal into frequencies, and just looking at a particular $f$. It might be easier to imagine there's no such thing as shot noise, but you're pumping in fluctuations of the field at that frequency. Because the field is bouncing around so much while gradually leaking out to the photodetector, the number of photons leaking out doesn't instantly respond to your input fluctuations; that number wants to change at its own pace. So what you have is a set of coupled oscillators, driving one while measuring the response of another. You get a transfer function with poles near the natural period of the cavities. If you were to drive the fluctuations at a frequency that the cavities "liked" responding to, you would get large output at the same frequency. So if you want to know how the output field responds to the input — which is actually just natural fluctuations due to shot noise decomposed into frequencies — then you need to multiply by $K_-$.
Updating your derivation with these corrections gives the same result as Martynov et al.'s equation 9.