Songmin Xie

Focus on Bioinformatics and Informatics

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::
Continued...] Spatial binning methods simply partition the space into regular blocks and count the number of samples in each block. The population density estimate within each block is given by the number of samples per unit volume for the block.
Properties: Only discrete density estimates are obtained, but they are normalised. There is an unpleasant trade off between bin size and quality of density estimate, leaving the problem of how to find the optimal bin size. Very poor quality is obtained with small samples or when the number of variables is large. It is only really suitable for purely categorical data which is already naturally discrete.

In nearest-neighbour methods, the population density estimate for a test point is obtained by measuring the volume V of the ball containing its k nearest points. The associated density estimate is given by the ratio k/V.

Properties: The overall shapes of population distributions are generally modeled well. The density estimate tends to be good inside the clusters where sample points are plentiful, but overestimates in the tails as a result of which the overall distribution is non-integrable. Density estimates reveal sudden sharp fluctuations. Like kernel methods, small values of k tend to overfit the sample, while large values oversmooth, leaving the problem of selecting the optimal value for the parameter.

posted on 2004-11-18 21:10  Songmin Xie  阅读(366)  评论(0编辑  收藏  举报