generate.data {abess}  R Documentation 
Generate simulated data under the generalized linear model and Cox proportional hazard model.
generate.data( n, p, support.size = NULL, rho = 0, family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial"), beta = NULL, cortype = 1, snr = 10, sigma = NULL, weibull.shape = 1, uniform.max = 1, y.dim = 3, class.num = 3, seed = 1 )
n 
The number of observations. 
p 
The number of predictors of interest. 
support.size 
The number of nonzero coefficients in the underlying regression
model. Can be omitted if 
rho 
A parameter used to characterize the pairwise correlation in
predictors. Default is 
family 
The distribution of the simulated response. 
beta 
The coefficient values in the underlying regression model.
If it is supplied, 
cortype 
The correlation structure.

snr 
A numerical value controlling the signaltonoise ratio (SNR). The SNR is defined as
as the variance of xβ divided
by the variance of a gaussian noise: \frac{Var(xβ)}{σ^2}.
The gaussian noise ε is set with mean 0 and variance.
The noise is added to the linear predictor η = xβ. Default is 
sigma 
The variance of the gaussian noise. Default 
weibull.shape 
The shape parameter of the Weibull distribution.
It works only when 
uniform.max 
A parameter controlling censored rate.
A large value implies a small censored rate;
otherwise, a large censored rate.
It works only when 
y.dim 
Response's Dimension. It works only when 
class.num 
The number of class. It works only when 
seed 
random seed. Default: 
For family = "gaussian"
, the data model is
Y = X β + ε.
The underlying regression coefficient β has uniform distribution [m, 100m] and m=5 √{2log(p)/n}.
For family= "binomial"
, the data model is
Prob(Y = 1) = \exp(X β + ε)/(1 + \exp(X β + ε)).
The underlying regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.
For family = "poisson"
, the data is modeled to have
an exponential distribution:
Y = Exp(\exp(X β + ε)).
The underlying regression coefficient β has uniform distribution [2m, 10m] and m = √{2log(p)/n}/3.
For family = "cox"
, the model for failure time T is
T = (\log(U / \exp(X β)))^{1/weibull.shape},
where U is a uniform random variable with range [0, 1]. The centering time C is generated from uniform distribution [0, uniform.max], then we define the censor status as δ = I(T ≤ C) and observed time as R = \min\{T, C\}. The underlying regression coefficient β has uniform distribution [2m, 10m], where m = 5 √{2log(p)/n}.
For family = "mgaussian"
, the data model is
Y = X β + E.
The nonzero values of regression matrix β are sampled from uniform distribution [m, 100m] and m=5 √{2log(p)/n}.
For family= "multinomial"
, the data model is
Prob(Y = 1) = \exp(X β + E)/(1 + \exp(X β + E)).
The nonzero values of regression coefficient β has uniform distribution [2m, 10m] and m = 5 √{2log(p)/n}.
In the above models, ε \sim N(0, σ^2 ) and E \sim MVN(0, σ^2 \times I_{q \times q}),
where σ^2 is determined by the snr
and q is y.dim
.
A list
object comprising:
x 
Design matrix of predictors. 
y 
Response variable. 
beta 
The coefficients used in the underlying regression model. 
Jin Zhu
# Generate simulated data n < 200 p < 20 support.size < 5 dataset < generate.data(n, p, support.size) str(dataset)