Properties: Only discrete density estimates are obtained, but they are normalised. There is an unpleasant trade off between bin size and quality of density estimate, leaving the problem of how to find the optimal bin size. Very poor quality is obtained with small samples or when the number of variables is large. It is only really suitable for purely categorical data which is already naturally discrete.
In nearest-neighbour methods, the population density estimate for a test point is obtained by measuring the volume V of the ball containing its k nearest points. The associated density estimate is given by the ratio k/V.
Properties: The overall shapes of population distributions are generally modeled well. The density estimate tends to be good inside the clusters where sample points are plentiful, but overestimates in the tails as a result of which the overall distribution is non-integrable. Density estimates reveal sudden sharp fluctuations. Like kernel methods, small values of k tend to overfit the sample, while large values oversmooth, leaving the problem of selecting the optimal value for the parameter.