My aim is to simulate the following model by means of a Monte Carlo simulation. I wonder if my R code is correct for generating the data.
Could somebody check?
The model:
$$Y = \sum_{j=1}^{100} (1+(-1)^{j}A_j X_j + B_j \sin(6X_j)) \sum_{j=1}^{50} (1+X_j/50) + \epsilon$$
where
- \$A_1, \dots, A_{100}\$ are i.i.d. \$∼ \text{Unif}([0.6,1])\$
- \$B_1, \dots, B_{100}\$ are i.i.d. \$∼ \text{Unif}([0.8,1.2])\$ and independent of \$A_j\$
- \$X \sim \text{Unif}([0,1])\$ where all components are i.i.d. \$∼ \text{Unif}([0, 1])\$
- \$\epsilon \sim N(0,2)\$ and \$X_j\$ represents the \$j\$th column of the design matrix
You can find the model here, p. 14
This is my code attempt
n_sim <- 10
n_sample <- 200
n_reg <- 100
sd_eps <- sqrt(2)
X <- replicate(n_reg, runif(n_sample, 0,1))
A <- replicate(n_reg, runif(1, 0.6,1))
B <- replicate(n_reg, runif(1, 0.8,1.2))
f_1 <- vector(mode = 'integer', length = n_sample)
f_2 <- vector(mode = 'integer', length = n_sample)
for (d in seq(100)){
part1 <- 1 + (-1)^d*A[d]*X[,d]+B[d]*sin(6*X[,d])
f_1 <- f_1 + part1
}
for (d in seq(50)){
part2 <- 1 + X[,d]/50
f_2 <- f_2 + part2
}
# True DGP Train ----
f_true <- f_1*f_2
y <- replicate(n_sim, f_true) + replicate(n_sim, rnorm(n_sample, 0,sd_eps))
X ~ Unif([0,1])has an exponent of 100 in the PDF version of the model:X ~ Unif([0,1]^100). I'm not familiar with that notation. \$\endgroup\$[0,1]^100notation inX ~ Unif([0, 1]^100)is just shorthand for a setwise product. You'll probably have seenR^3as shorthand for the set of 3-dimensional real numbers. It means thatXis a 100-dimensional vector where each component is uniformly distributed on the set[0, 1]\$\endgroup\$