Hadoop——你不得不了解的大数据工具

摘要: 转篇blog, 因为里面图不错, 以后找的方便 http://cloud.csdn.net/a/20120220/312061.html 如今Apache Hadoop已成为大数据行业发展背后的驱动力。Hive和Pig等技术也经常被提到,但是他们都有什么功能,为什么会需要奇怪的名字(如Oozie,ZooKeeper、Flume)。 Hadoop带来了廉价的处理大数据(大数据的数据容... 阅读全文
posted @ 2012-02-21 11:50 fxjwind 阅读(401) 评论(0) 推荐(0)

Interval Estimation 区间预估

摘要: Refer to R Tutorial andExercise SolutionIt is a common requirement to efficiently estimate population parameters based on simple random sample data.基于简单随机样本来estimate全局参数, 由于是预估, 一般是预估一个区间, 所以称为区间预估.Point Estimate of Population Mean, 全局平均值的点估计For any particular random sample, we can always compute it 阅读全文
posted @ 2012-02-17 16:58 fxjwind 阅读(649) 评论(0) 推荐(0)

Probability Distributions

摘要: Refer to R Tutorial andExercise SolutionA probability distribution describes how the values of a random variable is distributed.Binomial Distribution, 二项分布The binomial distribution is a discrete probability distribution. It describes the outcome of n independent trials in an experiment. Each trial i 阅读全文
posted @ 2012-02-16 15:32 fxjwind 阅读(970) 评论(0) 推荐(0)

Qualitative and Quantitative

摘要: Refer to R Tutorial andExercise Solution 数据分析和统计, 首先数据有两种, Qualitative Data (质性数据), also known as categorical, if its values belong to a collection of known defined non-overlapping classes. 就是离散数据.... 阅读全文
posted @ 2012-02-16 11:42 fxjwind 阅读(1351) 评论(0) 推荐(0)

Statistical Measures with R

摘要: Refer to R Tutorial andExercise SolutionMean, 平均值The mean of an observation variable is a numerical measure of the central location of the data values. It is the sum of its data values divided by data count.Hence, for a data sample of size n, its sample mean is defined as follows:> duration = fai 阅读全文
posted @ 2012-02-15 17:00 fxjwind 阅读(390) 评论(0) 推荐(0)

FaceBook API

摘要: Graph APIAt Facebook's core is the social graph; people and the connections they have to everything they care about. The Graph API presents a simple, consistent view of the Facebook social graph, uniformly representing objects in the graph (e.g., people, photos, events, and pages) and the connec 阅读全文
posted @ 2012-01-31 11:45 fxjwind 阅读(618) 评论(0) 推荐(0)

Mining the Social Web

摘要: Introduction: Hacking on Twitter Data鉴于个人对于python, Twitter, 甚至NLTK都非常熟悉, 我就直接切入主题.从Twitter数据中, 我们可以关注两个问题,What are people talking about right now?Extracting relationships from the tweets并且作为一个data scientist, 我们还要关注data visulization.What are people talking about right now?对于python的nlp问题, 我们可以借助nltk包来 阅读全文
posted @ 2012-01-31 10:14 fxjwind 阅读(577) 评论(0) 推荐(0)

NoSQL and Redis

摘要: 首先谈谈为什么需要NoSQL?这儿看到一篇blog说的不错http://robbin.iteye.com/blog/524977, 摘录一下首先是面对Web2.0网站, 出现的3高问题,1、High performance - 对数据库高并发读写的需求 web2.0网站要根据用户个性化信息来实时生成动态页面和提供动态信息,所以基本上无法使用动态页面静态化技术,因此数据库并发负载非常高,往往要达到每秒上万次读写请求。关系数据库应付上万次SQL查询还勉强顶得住,但是应付上万次SQL写数据请求,硬盘IO就已经无法承受了。其实对于普通的 BBS网站,往往也存在对高并发写请求的需求,例如像JavaEye 阅读全文
posted @ 2011-12-10 15:32 fxjwind 阅读(861) 评论(0) 推荐(0)

继续谈谈Twisted

摘要: 那我就来继续随便谈谈Twisted 首先讨论一下, 为什么需要twisted, 需要异步 为了更高效的利用CPU和资源, 提高用户的相应速度 任务需要较长时间才能完成分成两种情况, 1) 计算量较大, 需要CPU算好久才能算出来, 自然算出来才能给结果, 称为CPU等待. 2) 需要等待其他的数据, 比如需要从服务器等待获取信息, 需要从数据库等待查询结果, 这种虽然自己很闲, 无事可做... 阅读全文
posted @ 2011-11-12 16:44 fxjwind 阅读(673) 评论(0) 推荐(0)

An Introduction to Asynchronous Programming and Twisted (3)

摘要: Part 11: Your Poetry is Served A Twisted Poetry Server Now that we’ve learned so much about writing clients with Twisted, let’s turn around and re-implement our poetry server with Twisted too. And t... 阅读全文
posted @ 2011-09-15 10:55 fxjwind 阅读(465) 评论(0) 推荐(0)