随笔-108  评论-41  文章-12  trackbacks-0
  2010年2月8日
Classification
==============

 #1. C4.5

Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.
Morgan Kaufmann Publishers Inc.
	
Google Scholar Count in October 2006: 6907

 #2. CART

L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and
Regression Trees. Wadsworth, Belmont, CA, 1984.

Google Scholar Count in October 2006: 6078

 #3. K Nearest Neighbours (kNN)

Hastie, T. and Tibshirani, R. 1996. Discriminant Adaptive Nearest
Neighbor Classification. IEEE Trans. Pattern
Anal. Mach. Intell. (TPAMI). 18, 6 (Jun. 1996), 607-616. 
DOI= http://dx.doi.org/10.1109/34.506411

Google SCholar Count: 183

 #4. Naive Bayes

Hand, D.J., Yu, K., 2001. Idiot's Bayes: Not So Stupid After All?
Internat. Statist. Rev. 69, 385-398.

Google Scholar Count in October 2006: 51


Statistical Learning
====================

 #5. SVM

Vapnik, V. N. 1995. The Nature of Statistical Learning
Theory. Springer-Verlag New York, Inc.
		
Google Scholar Count in October 2006: 6441

 #6. EM

McLachlan, G. and Peel, D. (2000). Finite Mixture Models. 
J. Wiley, New York.

Google Scholar Count in October 2006: 848


Association Analysis
====================

 #7. Apriori

Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining
Association Rules. In Proc. of the 20th Int'l Conference on Very Large
Databases (VLDB '94), Santiago, Chile, September 1994. 
http://citeseer.comp.nus.edu.sg/agrawal94fast.html

Google Scholar Count in October 2006: 3639

 #8. FP-Tree

Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without
candidate generation. In Proceedings of the 2000 ACM SIGMOD
international Conference on Management of Data (Dallas, Texas, United
States, May 15 - 18, 2000). SIGMOD '00. ACM Press, New York, NY, 1-12.
DOI= http://doi.acm.org/10.1145/342009.335372

Google Scholar Count in October 2006: 1258


Link Mining
===========

 #9. PageRank

Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual
Web search engine. In Proceedings of the Seventh international
Conference on World Wide Web (WWW-7) (Brisbane,
Australia). P. H. Enslow and A. Ellis, Eds. Elsevier Science
Publishers B. V., Amsterdam, The Netherlands, 107-117. 
DOI= http://dx.doi.org/10.1016/S0169-7552(98)00110-X

Google Shcolar Count: 2558

 #10. HITS

Kleinberg, J. M. 1998. Authoritative sources in a hyperlinked
environment. In Proceedings of the Ninth Annual ACM-SIAM Symposium on
Discrete Algorithms (San Francisco, California, United States, January
25 - 27, 1998). Symposium on Discrete Algorithms. Society for
Industrial and Applied Mathematics, Philadelphia, PA, 668-677.

Google Shcolar Count: 2240


Clustering
==========

 #11. K-Means

MacQueen, J. B., Some methods for classification and analysis of
multivariate observations, in Proc. 5th Berkeley Symp. Mathematical
Statistics and Probability, 1967, pp. 281-297.

Google Scholar Count in October 2006: 1579

 #12. BIRCH

Zhang, T., Ramakrishnan, R., and Livny, M. 1996. BIRCH: an efficient
data clustering method for very large databases. In Proceedings of the
1996 ACM SIGMOD international Conference on Management of Data
(Montreal, Quebec, Canada, June 04 - 06, 1996). J. Widom, Ed. 
SIGMOD '96. ACM Press, New York, NY, 103-114. 
DOI= http://doi.acm.org/10.1145/233269.233324

Google Scholar Count in October 2006: 853


Bagging and Boosting
====================

 #13. AdaBoost

Freund, Y. and Schapire, R. E. 1997. A decision-theoretic
generalization of on-line learning and an application to
boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119-139. 
DOI= http://dx.doi.org/10.1006/jcss.1997.1504

Google Scholar Count in October 2006: 1576


Sequential Patterns
===================

 #14. GSP

Srikant, R. and Agrawal, R. 1996. Mining Sequential Patterns:
Generalizations and Performance Improvements. In Proceedings of the
5th international Conference on Extending Database Technology:
Advances in Database Technology (March 25 - 29, 1996). P. M. Apers,
M. Bouzeghoub, and G. Gardarin, Eds. Lecture Notes In Computer
Science, vol. 1057. Springer-Verlag, London, 3-17.

Google Scholar Count in October 2006: 596

 #15. PrefixSpan

J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and
M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by
Prefix-Projected Pattern Growth. In Proceedings of the 17th
international Conference on Data Engineering (April 02 - 06,
2001). ICDE '01. IEEE Computer Society, Washington, DC.
		 
Google Scholar Count in October 2006: 248


Integrated Mining
=================

 #16. CBA

Liu, B., Hsu, W. and Ma, Y. M. Integrating classification and
association rule mining. KDD-98, 1998, pp. 80-86. 
http://citeseer.comp.nus.edu.sg/liu98integrating.html

Google Scholar Count in October 2006: 436
		 

Rough Sets
==========

 #17. Finding reduct

Zdzislaw Pawlak, Rough Sets: Theoretical Aspects of Reasoning about
Data, Kluwer Academic Publishers, Norwell, MA, 1992

Google Scholar Count in October 2006: 329

Graph Mining
============

 #18. gSpan

Yan, X. and Han, J. 2002. gSpan: Graph-Based Substructure Pattern
Mining. In Proceedings of the 2002 IEEE International Conference on
Data Mining (ICDM '02) (December 09 - 12, 2002). IEEE Computer
Society, Washington, DC.

Google Scholar Count in October 2006: 155
posted @ 2010-02-08 20:52 Painmoth.Lee 阅读(6) | 评论 (0)编辑
  2010年1月9日

今天photo频道看到了一些日本新年的活动照片,感觉在文化的某些方面,日本进化的还是比中国快的,中国过春节,讲求的团圆,但是对团圆之后的活动不太讲究,一般都是吃吃饭,唠唠嗑,看看春晚这种比较老俗的活动; 在今天的社会状况下,已然对年轻一代不能满足,春节的年味淡淡的淡了下来。

但是,从日本新年的活动观察,譬如相扑,书法,成人礼,冷水浴这样的传统文化活动还是具有相当好的继承性和吸引力的(interesting), 中国人在解决了基本的温饱问题了,将来的发展方向应该向日本学习一下,发挥一些如传统书法,对联,太极等文体活动。好的,要学!maybe...

 

posted @ 2010-01-09 23:22 Painmoth.Lee 阅读(12) | 评论 (0)编辑
  2010年1月7日

记得曾经读高中的时候及其喜欢在家里弄些花花草草的,MS是一个养花的高手
可是N年之后,发现世界已经变了。最近在公司尝试了2次,一次是wife生日的时候送的一盆花,直接养死,以至于后来我看到路边的花想补偿一下这次过失
最近公司又给发了一盆花,结果经历了若干次的拯救之后,现在的状况是一支独秀。。。估计离那个啥也不远了。。。
这人。。。不能太懒惰

 

posted @ 2010-01-07 20:46 Painmoth.Lee 阅读(8) | 评论 (0)编辑
  2010年1月2日

Ref:http://cacm.acm.org/magazines/2010/1/55768-native-client-a-sandbox-for-portable-untrusted-x86-native-code/abstract

 

一定是Google 的ChromeOS的基础技术了

所以说。。。虚拟机相关的技术还是非常关键的

 

posted @ 2010-01-02 17:05 Painmoth.Lee 阅读(11) | 评论 (0)编辑

1. 对大型任务的细粒度的容错机制. 1hr的任务,中间数据有错误或者失败,可以容错重做 

2.  对异构系统,异构存储的集群环境中的数据处理的良好支持.

3. 相比于SQL,MapReduce提供了一个支持更复杂的数据操作的框架

摘录自:

MapReduce: A Flexible Data Processing Tool, Jeffrey Dean, Sanjay Ghemawat

http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext


感叹:Cloud Computing 如同Grid Computing,火上那么2年,终究需要沉淀一点核心技术下来

posted @ 2010-01-02 16:48 Painmoth.Lee 阅读(15) | 评论 (0)编辑
  2009年12月15日

给个链接先: http://www.lookinto.cn/method/2072/

 

 

 

posted @ 2009-12-15 09:52 Painmoth.Lee 阅读(6) | 评论 (0)编辑
  2009年10月30日
     摘要: "boost/shared_ptr.hpp"具体用法可google之,记得毕业找工作的时候面试还有人用它bs过某些人  阅读全文
posted @ 2009-10-30 09:34 Painmoth.Lee 阅读(8) | 评论 (0)编辑
  2009年10月9日
     摘要: RT, 有些功课要补, 有些事情要做送给自己几句话:严格的要求自己,谦虚的听取他人的意见,在工作和生活中科学的规划和实践  阅读全文
posted @ 2009-10-09 12:44 Painmoth.Lee 阅读(20) | 评论 (0)编辑
  2009年1月5日
     摘要: good ideafocus on sloving problemopen open your heart, you will get more  阅读全文
posted @ 2009-01-05 20:08 Painmoth.Lee 阅读(103) | 评论 (0)编辑
  2008年12月16日
     摘要: http://www.techweb.com.cn/people/2008-12-16/380728.shtml谷歌副总裁玛丽莎·梅耶尔  导语:《商业周刊》日前撰文称,今年是谷歌创立10周年,在这个特殊的时刻,《商业周刊》硅谷站总编罗伯特·霍夫(Robert D. Hof)对负责搜索产品和用户体验的谷歌副总裁玛丽莎·梅耶尔(Marissa Mayer)进行了采访。梅...  阅读全文
posted @ 2008-12-16 22:23 Painmoth.Lee 阅读(127) | 评论 (1)编辑