R语言 mgcv包 gam()函数中文帮助文档(中英文对照).doc-道客多多

资源描述

1、Generalized additive models with integrated smoothness estimation广义加性模型与集成的平滑估计描述-Description-Fits a generalized additive model (GAM) to data, the term “GAM“ being taken to include any quadratically penalized GLM. The degree of smoothness of model terms is estimated as part of fitting. gam can also

2、fit any GLM subject to multiple quadratic penalties (including estimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily available for any q

3、uantity predicted using a fitted model; gam is extendable: users can add smooths. 适合一个广义相加模型（GAM）的数据， “GAM”被视为包括任何二次处罚 GLM。模型计算的平滑度估计作为拟合的一部分。 gam 也可以适用于任何 GLM 多个二次处罚（包括估计程度的处罚）。各向同性或规模不变平滑的任意数量的变量的模型计算，这样的线性泛函平滑的信心/可信区间都是现成的使用拟合模型预测任何数量， “gam 是可扩展的：用户可以添加平滑。Smooth terms are represented using penal

4、ized regression splines (or similar smoothers) with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are available using penalized thin plate regression splines (isotropic) or tens

5、or product splines (when an isotropic smooth is inappropriate). For an overview of the smooths available see smooth.terms. For more on specifying models see gam.models, random.effects and linear.functional.terms. For more on model selection see gam.selection. Do read gam.check and choose.k.平滑术语表示使用惩

6、罚回归花键（或类似的平滑）与由 GCV / UBRE 的/ AIC / REML 或由固定的自由度（两个的混合物被允许）的的回归花键与选择的平滑化参数。多维平滑可使用惩罚薄板回归样条曲线（各向同性）或张量积样条线（各向同性的光滑是不恰当的）。的平滑的概述，请参阅 smooth.terms。欲了解更多有关指定模型gam.models，random.effects 和 linear.functional.terms。模型选择的更多信息，请参阅gam.selection。不要读为 gam.check 和 choose.k。See gam from package gam, for GAMs via

7、 the original Hastie and Tibshirani approach (see details for differences to this implementation).见 GAM 包 gam，GAMS 通过原来的 Hastie 和 Tibshirani 方法（详情请参阅本实施方案的差异）。For very large datasets see bam, for mixed GAM see gamm and random.effects.对于非常大的数据集，请参阅 bam，混合 GAM 看到 gamm 和 random.effects。用法-Usage-gam(fo

8、rmula,family=gaussian(),data=list(),weights=NULL,subset=NULL,na.action,offset=NULL,method=“GCV.Cp“,optimizer=c(“outer“,“newton“),control=list(),scale=0,select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,fit=TRUE,paraPen=NULL,G=NULL,in.out,.)参数-Arguments-参数：formulaA GAM formula (see formula.g

9、am and also gam.models). This is exactly like the formula for a GLM except that smooth terms, s and te can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). 一个 GAM 的公式（见 formula.gam 和 gam.models）。这是完全一样的公

10、式，除非 GLM 那光滑的条款，s 和 te 可以被添加到指定的线性预测依赖于光滑函数的预测（或线性泛函的右手边这些）。参数：familyThis is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. A negative binomial family is provided: see negbin. quasi families actually result in the use of extended qua

11、si-likelihood if method is set to a RE/ML method (McCullagh and Nelder, 1989, 9.6). 这是一个家庭对象指定的分配和使用链接配件等 glm 和 family 更多的细节。负二项分布家庭提供：看到 negbin。 quasi 家庭实际上导致在使用扩展的拟似然 method 设置为一个RE / ML 方法（ McCullagh 和 Nelder，1989 年，9.6）。参数：dataA data frame or list containing the model response variable and cova

12、riates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called. 式所需的一个数据框或列表包含模型响应变量，协变量。默认情况下，变量从environment(formula)：gam 被称为典型的环境。参数：weightsprior weights on the data. 现有的数据上的权重。参数：subsetan optional vector specifying

13、a subset of observations to be used in the fitting process. 一个可选的矢量指定的装配过程中可以使用的观测值的一个子集。参数：na.actiona function which indicates what should happen when the data contain “NA“s. The default is set by the “na.action“ setting of “options“, and is “na.fail“ if that is unset. The factory-fresh default is

14、“na.omit“. 一个函数，它表示时会发生什么数据包含“NA”。默认设置是“na.action 设置选项，na.fail”如果是没有设置的。 “工厂新鲜的 ”默认“na.omit。参数：offsetCan be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset included in formula: this conforms to the behaviour

15、of lm and glm. 可以用来提供一个模型偏移量用于接头。请注意，此偏移量总是被完全忽略当预测，不像一个偏移量包含在 formula：这符合的 lm 和 glm 的行为。参数：controlA list of fit control parameters to replace defaults returned by gam.control. Values not set assume default values. 一个合适的控制参数，以取代默认值返回 gam.control。未设置假设值默认值。参数：methodThe smoothing parameter estimation

16、method. “GCV.Cp“ to use GCV for unknown scale parameter and Mallows Cp/UBRE/AIC for known scale. “GACV.Cp“ is equivalent, but using GACV in place of GCV. “REML“ for REML estimation, including of unknown scale, “P-REML“ for REML estimation, but using a Pearson estimate of the scale. “ML“ and “P-ML“ a

17、re similar, but using maximum likelihood in place of REML. 平滑参数估计方法。 “GCV.Cp“使用 GCV 对未知的尺度参数和锦葵“的 CP / UBRE / AIC 已知的规模。 “GACV.Cp“是等价的，但使用的 GCV GACV 的地方。 “REML“REML 估计，包括不明刻度，“P-REML“REML 估计，但使用的 Pearson 估计规模。 “ML“和“P-ML“ 是相似的，但用最大似然的地方 REML。参数：optimizerAn array specifying the numerical optimization

18、 method to use to optimize the smoothing parameter estimation criterion (given by method). “perf“ for performance iteration. “outer“ for the more stable direct approach. “outer“ can use several alternative optimizers, specified in the second element of optimizer: “newton“ (default), “bfgs“, “optim“,

19、 “nlm“ and “nlm.fd“ (the latter is based entirely on finite differenced derivatives and is very slow). 一个数组，指定的数值优化方法，使用优化的平滑参数估计准则（method）。 “perf“性能迭代。 “outer“更稳定的直接方法。 “outer“可以使用 optimizer：“newton“（默认），“bfgs“， “optim“，“nlm“和第二个元素中指定的几种可供选择的优化， “nlm.fd“（后者则是完全基于上有限差分衍生工具，很慢）。参数：scaleIf this is

20、positive then it is taken as the known scale parameter. Negative signals that the scale parameter is unknown. 0 signals that the scale parameter is 1 for Poisson and binomial and unknown otherwise. Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases. 如果这是

21、正的，那么它被当作已知尺度参数。负信号，规模参数是未知的。 0 信号泊松分布和二项分布和未知的，否则，尺度参数为 1。需要注意的是（RE）的 ML 方法只能工作与尺度参数的泊松分布和二项式情况下。参数：selectIf this is TRUE then gam can add an extra penalty to each term so that it can be penalized to zero. This means that the smoothing parameter estimation that is part of fitting can completely rem

22、ove terms from the model. If the corresponding smoothing parameter is estimated as zero then the extra penalty has no effect. 如果这是 TRUE 然后 gam 可以添加一个额外的处罚，以每学期，以便它可以被扣分零。这意味着平滑参数估计是拟合的一部分的，可以完全除去从模型中的条款。如果相应的平滑参数估计值为零，那么额外的罚款没有任何效果。参数：knotsthis is an optional list containing user specified knot valu

23、es to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the k value supplied (note that the number of knots is not always just k). See tprs for what happens in the “tp“/“ts“ case. Different terms can use different numbers of knots,

24、 unless they share a covariate. 这是一个可选的列表，其中包含用户指定的节点值用于基础建设。对于最基础的用户只需提供要使用的节，它必须匹配的 k 值（附注的节点数不是永远只是 k）。见 tprs“tp“/“ts“情况下会发生什么。不同的术语可以使用不同的节数，除非他们共享一个协。参数：spA vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in t

25、he model formula. Negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible. If smooths share smoothing parameters then length(sp) must correspond to the number of underlying smoothing parameters. 平滑化参数的一种向量，可以提供在这里。必须提供平滑参数的

26、顺序，顺利的词出现在模型公式。负性元件表明应当估计的参数，因此，固定和估计参数的混合物是可能的。如果平滑份额平滑参数，那么 length(sp)必须符合相关的平滑参数的数量。参数：min.spLower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters full.sp, in the returned object, will need to be added to what is supplied here to

27、 get the smoothing parameters actually multiplying the penalties. length(min.sp) should always be the same as the total number of penalties (so it may be longer than sp, if smooths share smoothing parameters). 下界能够供给的平滑化参数。请注意，如果使用此选项，然后平滑参数 full.sp，返回的对象中，将需要添加什么是这里提供的平滑参数乘以处罚。 length(min.sp)应始终是相同

28、的刑罚（所以它可能是长于 sp，如果平滑份额平滑参数）的总人数。参数：HA user supplied fixed quadratic penalty on the parameters of the GAM can be supplied, with this as its coefficient matrix. A common use of this term is to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifia

29、ble on the scale of the linear predictor, but perfectly well defined on the response scale. 用户提供的固定二次罚的 GAM 的参数可以提供，这是系数矩阵。使用这一术语是一个常见的添加脊处罚，GAM 的情况下，该模型是未识别的线性预测的规模，但完全定义的响应规模的参数。参数：gammaIt is sometimes useful to inflate the model degrees of freedom in the GCV or UBRE/AIC score by a constant multip

30、lier. This allows such a multiplier to be supplied. 有时它是有用的 GCV 或 UBRE 的/ AIC 得分由一个常乘数充气模型的自由度。这允许将要提供这样一个乘法器。参数：fitIf this argument is TRUE then gam sets up the model and fits it, but if it is FALSE then the model is set up and an object G containing what would be required to fit is returned is ret

31、urned. See argument G. 如果这种说法是 TRUE 然后 gam 设置模式和适合它，但如果它是 FALSE 然后对模型进行设置和对象 G 包含将需要，以适应返回返回。请参阅参数 G。参数：paraPenoptional list specifying any penalties to be applied to parametric model terms. gam.models explains more. 可选的列表，指定参数模型计算被应用到任何处罚。 gam.models 解释更多。参数：GUsually NULL, but may contain the objec

32、t returned by a previous call to gam with fit=FALSE, in which case all other arguments are ignored except for gamma, in.out, scale, control, method optimizer and fit. 通常是 NULL，但可能包含对象返回以前调用 gam 的 fit=FALSE，在这种情况下，所有其它参数将被忽略，除了 gamma，in.out ，scale，control，methodoptimizer 和 fit。参数：in.outoptional list

33、for initializing outer iteration. If supplied then this must contain two elements: sp should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s are not used); scale is the

34、typical scale of the GCV/UBRE function, for passing to the outer optimizer, or the the initial value of the scale parameter, if this is to be estimated by RE/ML. 初始化外部循环的可选列表。如果提供，则必须包含两个要素：sp 应该是一个数组初始化所有的平滑参数值（是固定的还是要估计，必须有所有的平滑参数的值，而固定 SPS 不使用的话）;scale 是 GCV / UBRE 功能的的典型尺度，用于传递到外的优化器，或尺度参数的初始值，如

35、果这是要估计的 RE / ML。参数：.further arguments for passing on e.g. to gam.fit (such as mustart). 在例如通过进一步的论据 gam.fit（如 mustart）。Details-Details-A generalized additive model (GAM) is a generalized linear model (GLM) in which the linear predictor is given by a user specified sum of smooth functions of the cov

36、ariates plus a conventional parametric component of the linear predictor. A simple example is:一个广义相加模型（GAM ）是一个广义线性模型（ GLM）的线性预测是由用户指定的协变量的函数平滑，再加上传统的参数化组件的线性预测的总和。一个简单的例子是：where the (independent) response variables y_iPoi, and f_1 and f_2 are smooth functions of covariates x_1 and x_2. The log is a

37、n example of a link function. （独立的）响应变量 y_iPoi 和 f_1 和 f_2 是光滑函数的协变量 x_1 和 x_2。的 log 的一个例子是一个链接函数。If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of f_1 and f_2. For this reaso

38、n the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its “wiggliness“. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each pen

39、alty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs. 如果确实被允许在任何光滑的函数模型拟合，最大似然估计这些模型往往会导致复杂的过拟合估计 f_1 和 f_2。出于这个原因的模型通常是适合由惩罚的可能性最大化，其

40、中模型（负对数）的可能性被修改通过加入每个平滑函数罚款，惩罚“wiggliness 。要控制，之间的的惩罚 wiggliness 和惩罚不良适合每个罚球乘以相关的平滑参数：如何估计这些参数的权衡，以及如何在实践中代表顺利的功能是主要的统计问题，介绍了从 GLMS GAMS。The mgcv implementation of gam represents the smooth functions using penalized regression splines, and by default uses basis functions for these splines that are d

41、esigned to be optimal, given the number basis functions used. The smooth terms can be functions of any number of covariates and the user has some control over how smoothness of the functions is measured. mgcvgam 实施顺利使用惩罚的回归样条曲线的功能，在默认情况下使用这些曲线的设计是最佳的，因为数基函数的基础功能。光滑的术语可以是任意数量的协变量的函数，并且用户具有一定的控制的函数的平滑

42、度如何测量。gam in mgcv solves the smoothing parameter estimation problem by using the Generalized Cross Validation (GCV) criteriongam 在 mgcv 解决了平滑参数估计问题通过使用广义交叉验证（GCV ）标准，or an Un-Biased Risk Estimator (UBRE )criterion或无偏风险估计（UBRE）标准where D is the deviance, n the number of data, s the scale parameter and

43、 DoF the effective degrees of freedom of the model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known. 其中 D 是越轨行为，n 数据的数量，s 的尺度参数和 DoF 有效度模型的自由。请注意，UBRE 实际上只是 AIC 重新调整，但只用在 s 被称为。Alternatives are GACV, or a Laplace approximation to REML. There is some evidence that

44、the latter may actually be the most effective choice. 替代品 GACV，或 Laplace 逼近 REML。有一些证据表明，后者实际上可能是最有效的选择。Smoothing parameters are chosen to minimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main computational challenge solved by the mgcv package is to do this efficiently and reli

45、ably. Various alternative numerical methods are provided which can be set by argument optimizer.平滑化参数的选择，以尽量减少 GCV，UBRE / AIC，GACV 或模型 REML 分数，和求解的主要计算挑战 mgcv 包是有效和可靠地做到这一点。各种替代数值方法提供了可以设置的参数 optimizer。Broadly gam works by first constructing basis functions and one or more quadratic penalty coeffici

46、ent matrices for each smooth term in the model formula, obtaining a model matrix for the strictly parametric part of the model formula, and combining these to obtain a complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms. Some linear identifiability constraints a

47、re also obtained at this point. The model is fit using gam.fit, a modification of glm.fit. The GAM penalized likelihood maximization problem is solved by Penalized Iteratively Reweighted Least Squares (P-IRLS) (see e.g. Wood 2000). Smoothing parameter selection is integrated in one of two ways. (i)

48、“Performance iteration“ uses the fact that at each P-IRLS iteration a penalized weighted least squares problem is solved, and the smoothing parameters of that problem can estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing parameter estimates converge. (

49、ii) Alternatively the P-IRLS scheme is iterated to convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores are only evaluated on convergence - optimization is then “outer“ to the P-IRLS loop: in this case the P-IRLS iteration has to be differentiated, to facilitate optimization, and gam.fit3 is used in place of gam.fit. The default is the second method, outer iteration.广义 gam 的工作原理是第一构造的基础功能和一个或多个二次罚系数矩阵中的模型公式为每个平滑内，获得模型矩阵模型公式为严格的参数的一部分，并结合这些以获得一个完整的模型/设计矩阵（矩阵）和刑罚矩阵顺利条款的一组。一些线性辨识性约束在这

展开阅读全文