欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2009, Vol. 26 ›› Issue (2): 173-184.DOI: 10.7523/j.issn.2095-6134.2009.2.005

• 论文 • 上一篇    下一篇

分组零膨胀泊松模型的半参数统计推断

钟雨珂1, 薛宏旗2, 张三国1   

  1. 1. 中国科学院研究生院数学科学学院, 北京 100049;
    2. 美国罗切斯特大学生物统计系, 美国
  • 收稿日期:2008-05-15 修回日期:2008-09-03 发布日期:2009-03-15
  • 通讯作者: 张三国

Semiparametric inference of grouped zero-inflated poisson models

ZHONG Yu-Ke1, XUE Hong-Qi2, ZHANG San-Guo1   

  1. 1. School of Mathematical Sciences, Graduate University of the Chinese Academy of Sciences, Beijing 100049, China;
    2. Department of Biostatistics and Computational Biology, University of Rochester, U.S.A.
  • Received:2008-05-15 Revised:2008-09-03 Published:2009-03-15

摘要:

泊松回归模型常常用于计数数据的研究中,然而在实际数据中零值的比例可能远远大于泊松分布中取零值的概率,而且这些零值通常都有其特殊含义.此外计数数据可能是分组数据,即观测到的数据不是确切值而只是已知其落在某一个区间范围之内;或者某些特定的数据,例如工资,要先对它进行人为的分组然后再进行分析.考虑一种零膨胀泊松半参数回归模型来处理上述分组计数数据.该模型中泊松分布的期望与协变量之间采用部分线性连接函数,而零值的概率与协变量之间采用线性连接函数.利用Sieve极大似然估计方法来估计该回归模型中参数和非参数函数,并提出了一种得分检验方法来检验是否存在零膨胀.在一定正则条件下,获得了Sieve极大似然估计的渐近性质,证明了参数部分的估计是强相合,渐近正态及渐近有效的;同时非参数函数的估计达到了最优收敛速度.模拟研究表明,估计和检验方法效果都比较好,最后将此模型和推断方法应用于一组公共卫生领域实际数据研究.

关键词: 零膨胀泊松回归模型, 部分线性模型, Sieve极大似然估计, 强相合, 渐近有效

Abstract:

The incidence of zero counts is often greater than expected for the Poisson distribution and zero counts frequently have special status. And sometimes the count data may be grouped, which means that for some observation the count is not known exactly but is known to fall in a particular range. This paper considers a semiparametric zero-inflated Poisson (ZIP) model to fit such grouped data with excess zeros, where the partial linear link function is used in the mean of the Poisson distribution and the linear link function is used in modeling the probability of zero. A Sieve maximum likelihood estimator(MLE) is proposed to estimate both the regression parameters and the nonparametric function, and a score test is provided for the presence of excess zeros. Asymptotic properties of the proposed Sieve MLEs are discussed. Under some mild conditions, the estimators are shown to be strong consistent. Moreover, the estimators of the unknown parameters are asymptotic efficient and normally distributed. The estimator of the nonparametric function has optimal convergence rate. Simulation studies are carried out to investigate the performance of the proposed method. For illustration purpose, the method is applied to a data set from a public health survey.

Key words: zero-inflated Poisson model, partial linear models, Sieve maximum likelihood estimator, strongly consistent, asymptotically efficient

中图分类号: