http://www.donanza.com/jobs/p3315101-crawler_with_data_analysis_hadoop_mapreduce_hbase_phase_i
crawler with data analysis (Hadoop, MapReduce, HBase) - Phase I - Data Modeling
	 	 				 			Goal for Phase 1: given a topic in English (e.g. "skiing"), crawl the web (sites, blogs, social media) and collect 1 million relevant  articles/pages/posts/documents. Perform analysis and generate meaningful reports on the topic, potentially including top keywords, concepts,  related topics or concepts, Optional task (bonus): add "intelligence" to your analysis, by determining rank/reputation, sentiment (negative vs.  positive), type (opinion article vs. advertisement vs. for sale ad vs.  wanted ad) - we are flexible and open to ideas. Development/staging  environment: 3-node cluster, CentOS 5.6 and Cloudera CDH3 (Hadoop,  MapReduce, Hue, Pig, Flume, HBase) + one management machine with CDH. If you bid on this job, please describe your prior experience with Big  Data, and tell us how you would approach this problem, a high-level  overview of steps you will need to perform... It's important for us to  see the way you approach problems. We speak English and Russian  fluently. Depending on your approach, we will define milestones and  timeline together. This is Phase I of the project, do your best! Desired Skills: Data Modeling, Scripts & Utilities, CentOS, Hadoop,  MapReduce
