07 Spark RDD编程 综合实例 英文词频统计

>>> s = txt.lower().split()
>>> dd = {}
>>> for word in s:
... if word not in dd:
... dd[word] = 1
... else:
... dd[word] = dic[word] + 1
...
>>> ss = sorted(dd.items(),key=operator.itemgetter(1),reverse=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'operator' is not defined
>>> import operator
>>> ss = sorted(dditems(),key=operator.itemgetter(1),reverse=True)
>>> print(ss)
[('the', 136), ('and', 111), ('of', 82), ('to', 71), ('our', 68), ('we', 59), ('that', 49), ('a', 46), ('is', 36), ('in', 26), ('this', 24), ('for', 23), ('are', 22), ('but', 20), ('--', 17), ('they', 17), ('on', 17), ('it', 17), ('will', 17), ('not', 16), ('have', 15), ('us', 14), ('has', 14), ('can', 13), ('with', 13), ('who', 13), ('be', 12), ('as', 11), ('or', 11), ('(applause.)', 11), ('those', 11), ('nation', 10), ('you', 10), ('their', 10), ('new', 9), ('these', 9), ('us,', 9), ('so', 8), ('by', 8), ('than', 8), ('must', 8), ('because', 8), ('what', 8), ('every', 8), ('all', 8), ('its', 8), ('been', 7), ('at', 7), ('when', 7), ('no', 6), ('less', 6), ('cannot', 6), ('let', 6), ('too', 6), ('common', 6), ('was', 5), ('time', 5), ('people', 5), ('only', 5), ('know', 5), ('nor', 5), ('now', 5), ('from', 5), ('seek', 4), ('work', 4), ('greater', 4), ('whether', 4), ('america', 4), ('more', 4), ('before', 4), ('power', 4), ('which', 4), ('long', 4), ('through', 4), ('men', 4), ('meet', 4), ('women', 4), ('journey', 3), ('up', 3), ('between', 3), ('were', 3), ('say', 3), ('where', 3), ('an', 3), ('god', 3), ('may', 3), ('last', 3), ('economy', 3), ('hard', 3), ('do', 3), ('today', 3), ('there', 3), ('founding', 3), ('hope', 3), ('crisis', 3), ('words', 3), ('carried', 3), ('them', 3), ('future', 3), ('come', 3), ('shall', 3), ('most', 3), ('generation', 3), ('day,', 3), ('you.', 3), ('things', 3), ('upon', 3), ('force', 3), ('i', 3), ('spirit', 3), ('just', 3), ('over', 3), ('father', 3), ('question', 3), ('your', 3), ('once', 3), ('across', 3), ('face', 2), ('better', 2), ('do,', 2), ('why', 2), ('did', 2), ('do.', 2), ('world', 2), ('role', 2), ('nothing', 2), ('oath.', 2), ('america:', 2), ('each', 2), ('faced', 2), ('rather', 2), ('lower', 2), ('peace', 2), ('faith', 2), ('grows', 2), ('ambitions,', 2), ('like', 2), ('service', 2), ('again', 2), ('lines', 2), ('taken', 2), ('thank', 2), ('fail', 2), ('greatness', 2), ('willingness', 2), ('trust', 2), ('work,', 2), ('life.', 2), ('mutual', 2), ('find', 2), ('day', 2), ('far', 2), ('freedom', 2), ('never', 2), ('moment', 2), ('feed', 2), ('one', 2), ('duties', 2), ('man', 2), ('wealth', 2), ('small', 2), ('earth.', 2), ('new.', 2), ('care', 2), ('prosperity', 2), ('bless', 2), ("we'll", 2), ('(applause)', 2), ('some', 2), ('my', 2), ('understand', 2), ('promise', 2), ('icy', 2), ('child', 2), ('friend', 2), ('schools', 2), ('light', 2), ('many', 2), ('still', 2), ('forth', 2), ('restore', 2), ('workers', 2), ('path', 2), ('enduring', 2), ('age.', 2), ('jobs', 2), ('children', 2), ('way', 2), ('market', 2), ('challenges', 2), ('american', 2), ('part', 2), ('west,', 2), ('something', 2), ('ourselves', 2), ('false', 2), ('planet.', 2), ('ideals', 2), ('even', 2), ('also', 2), ('knowledge', 2), ('cooperation', 2), ('peace.', 2), ('remain', 2), ('courage', 2), ('stronger', 2), ('brave', 2), ('extend', 2), ('today,', 2), ('stand', 2), ('government', 2), ('war', 2), ('out', 2), ('meaning', 2), ('old', 2), ('world,', 2), ('big', 2), ('charter', 2), ('might', 2), ('end', 2), ('calls', 2), ('make', 2), ('take', 2), ('willing', 2), ('generations.', 2), ('confidence', 2), ('back', 2), ('answer', 2), ('americans', 2), ("america's", 2), ('remember', 2), ('throughout', 2), ('begin', 2), ('longer', 2), ('era', 2), ('health', 2), ('success', 2), ('america.', 2), ('waters', 2), ('without', 2), ('quiet', 1), ('packed', 1), ('lose', 1), ('specter', 1), ('dark', 1), ('could', 1), ('afford', 1), ('generosity', 1), ('city', 1), ('then', 1), ('month,', 1), ('nation,', 1), ('hard-earned', 1), ('spoken', 1), ('task', 1), ('irresponsibility', 1), ('see', 1), ('law', 1), ('gettysburg,', 1), ('advance', 1), ('scripture,', 1), ('more.', 1), ('advancing.', 1), ('bind', 1), ('risk-takers,', 1), ('sacred', 1), ('many.', 1), ('decisions', 1), ('birth', 1), ('storms.', 1), ('needed', 1), ('promises,', 1), ('true.', 1), ('yes,', 1), ('broken', 1), ('conflict', 1), ('celebrated,', 1), ('whose', 1), ('perils', 1), ('recriminations', 1), ('swill', 1), ('politics.', 1), ('faithful', 1), ('done,', 1), ('interest', 1), ('depends', 1), ('world;', 1), ('earth;', 1), ('sum', 1), ('starting', 1), ('presidential', 1), ('prosperous,', 1), ('generation:', 1), ('drawn', 1), ('riches', 1), ('believe', 1), ('campfires', 1), ('effect.', 1), ('jews', 1), ('choose', 1), ('political', 1), ('survive...', 1), ('system', 1), ('birth,', 1), ('themselves.', 1), ('apologize', 1), ('eyes', 1), ('serious', 1), ('peoples', 1), ('muslims,', 1), ('blood.', 1), ('suggest', 1), ('told', 1), ('united,', 1), ('whisper', 1), ('prosper', 1), ('return', 1), ('joined', 1), ('band', 1), ('culture,', 1), ('decides', 1), ('defeat', 1), ('rule', 1), ('blood', 1), ('former', 1), ('revolution', 1), ('began.', 1), ('capital', 1), ('time.', 1), ('tolerate', 1), ('normandy', 1), ('responsibility', 1), ('grandest', 1), ('reform', 1), ("you've", 1), ('yet,', 1), ('works', 1), ('remained', 1), ('inducing', 1), ('protecting', 1), ('innocents,', 1), ('unmatched.', 1), ('recall', 1), ('endure', 1), ('traveled', 1), ('beneath', 1), ('dangers,', 1), ('fuel', 1), ('history,', 1), ('met', 1), ('liberty,', 1), ('inevitable,', 1), ('distant', 1), ('ages.', 1), ('year.', 1), ('narrow', 1), ('hours', 1), ('purpose,', 1), ('charity,', 1), ('reveal', 1), ('safety', 1), ('does', 1), ('served', 1), ('raw', 1), ('hours.', 1), ('reach', 1), ('consider', 1), ('citizenship.', 1), ('arguments', 1), ('cynics', 1), ('heroes', 1), ('hatred.', 1), ('indicators', 1), ('sow', 1), ('relies.', 1), ('endured', 1), ('sees', 1), ('watching', 1), ('inventive,', 1), ("world's", 1), ('some,', 1), ('services', 1), ('build,', 1), ('danger,', 1), ('true', 1), ('ushering', 1), ('friends', 1), ('poor', 1), ('darkest', 1), ('lie', 1), ('alone', 1), ('watchful', 1), ('task.', 1), ('wage,', 1), ('how', 1), ('remembrance', 1), ('short,', 1), ('memories', 1), ('emanates', 1), ('nagging', 1), ('stairway', 1), ('faint-hearted,', 1), ('choice', 1), ('seeks', 1), ('ordered', 1), ('everywhere', 1), ('inhabit', 1), ('discord.', 1), ('winter,', 1), ('scarcely', 1), ('individual', 1), ('differences', 1), ('woman', 1), ('would', 1), ('factories.', 1), ('met.', 1), ('flourish', 1), ('shown', 1), ('bodies', 1), ('liberty', 1), ('places', 1), ('silencing', 1), ('said', 1), ('unity', 1), ('gratitude', 1), ('lost,', 1), ('size', 1), ('tirelessly', 1), ('understanding', 1), ('surely', 1), ('nations', 1), ('manage', 1), ('digital', 1), ('humbled', 1), ('passed', 1), ('clouds', 1), ('greed', 1), ('stale', 1), ('pat,', 1), ('satisfying', 1), ('cause,', 1), ('entitle', 1), ('plenty,', 1), ('humanity', 1), ('fellow', 1), ('virtue', 1), ('progress', 1), ('keepers', 1), ('traveled.', 1), ('he', 1), ('week,', 1), ('network', 1), ('unpleasant', 1), ('spin', 1), ('words.', 1), ('documents.', 1), ('terror', 1), ('achieve', 1), ('failure', 1), ('use;', 1), ('grids', 1), ('muslim', 1), ('truths.', 1), ('tested', 1), ('smaller,', 1), ('change', 1), ('demanded,', 1), ('dignified.', 1), ('destiny.', 1), ('cling', 1), ('leisure', 1), ('action,', 1), ('span', 1), ('saw', 1), ('loyalty', 1), ('vision', 1), ('brings', 1), ('expedience', 1), ('precious', 1), ('forgotten', 1), ('dignity.', 1), ('putting', 1), ('crisis,', 1), ("society's", 1), ('civil', 1), ('clean', 1), ('starved', 1), ('favors', 1), ('amidst', 1), ('security', 1), ('collective', 1), ('fathers,', 1), ('firm', 1), ('rather,', 1), ('forge', 1), ('abandoned.', 1), ('huddled', 1), ('today.', 1), ('destroy.', 1), ('based', 1), ('raise', 1), ('example,', 1), ('bad', 1), ('fate.', 1), ('far-reaching', 1), ('embody', 1), ('plowed', 1), ('play,', 1), ('reminded', 1), ('help', 1), ('depended', 1), ('hungry', 1), ('transition.', 1), ('restaurant', 1), ('use', 1), ('well', 1), ('[it]."', 1), ('toiled', 1), ('set', 1), ('ill.', 1), ('sake.', 1), ('consequence', 1), ('americans.', 1), ('threaten', 1), ('fathers', 1), ('roll', 1), ('nurture', 1), ('threat,', 1), ('smoke,', 1), ('capitals', 1), ('corruption', 1), ('strength,', 1), ('precisely', 1), ('dust', 1), ('drafted', 1), ('difficult', 1), ('standing', 1), ('freedom.', 1), ('land;', 1), ('character', 1), ('carry', 1), ('apply.', 1), ('build', 1), ('forward.', 1), ('strangled', 1), ('universities', 1), ('missiles', 1), ('giving', 1), ('always', 1), ('energy', 1), ('worldly', 1), ('now,', 1), ('defining', 1), ('earned.', 1), ('ready', 1), ('restraint.', 1), ('language', 1), ('ground', 1), ('end,', 1), ('tempering', 1), ('guided', 1), ('storms', 1), ('prosperous.', 1), ('generate', 1), ('consumed', 1), ('proclaim', 1), ('side', 1), ('sacrificed', 1), ('reaffirm', 1), ('forty-four', 1), ('principles', 1), ('his', 1), ('falter;', 1), ('fascism', 1), ('alongside', 1), ('statistics.', 1), ('government.', 1), ('decent', 1), ('fame.', 1), ('world...that', 1), ('wield', 1), ('bigger', 1), ('often,', 1), ('gift', 1), ('nuclear', 1), ('few', 1), ('sapping', 1), ('here', 1), ('rising', 1), ('them,', 1), ('outcome', 1), ('productive', 1), ('lessen', 1), ('life,', 1), ('protect', 1), ('fear', 1), ('account,', 1), ('data', 1), ('humility', 1), ('violence', 1), ('people:', 1), ('create', 1), ('ask', 1), ('surest', 1), ('cost.', 1), ('minds.', 1), ('ancestors.', 1), ('live', 1), ('look,', 1), ('jobs,', 1), ('off,', 1), ('patrol', 1), ('hands', 1), ('spend', 1), ('grateful', 1), ('arlington', 1), ('job', 1), ('next', 1), ('obscure', 1), ('against', 1), ('profound,', 1), ('selflessness', 1), ('changed,', 1), ('yet', 1), ('measure', 1), ('transform', 1), ('nations.', 1), ('chapter', 1), ('leaders', 1), ('convictions.', 1), ('itself;', 1), ('indifference', 1), ('nation.', 1), ('courage.', 1), ('business', 1), ('moment,', 1), ('gather', 1), ('patriots', 1), ('prudent', 1), ('legacy.', 1), ('intend', 1), ('free', 1), ('so,', 1), ('humble', 1), ('conflict,', 1), ('waver', 1), ('state', 1), ('honor', 1), ('vital', 1), ('unfolds', 1), ('undiminished.', 1), ('christians', 1), ('instruments', 1), ('families', 1), ('remaking', 1), ('shuttered.', 1), ('easily', 1), ('winter', 1), ('doers,', 1), ('settling', 1), ('pledge', 1), ('growth.', 1), ('towards', 1), ('stained', 1), ('minds', 1), ('it.', 1), ('simply', 1), ('whip,', 1), ('demand', 1), ('chosen', 1), ('idea', 1), ('done.', 1), ('science', 1), ('doubt,', 1), ('route', 1), ('fair', 1), ('moments,', 1), ('further', 1), ('deserts', 1), ('knew', 1), ('turn', 1), ('respect.', 1), ('ourselves,', 1), ('warming', 1), ('source', 1), ('cars', 1), ('purpose', 1), ('blame', 1), ('levees', 1), ('together.', 1), ('labor', 1), ('gross', 1), ('shed,', 1), ('swift.', 1), ('come.', 1), ('hour', 1), ('worked', 1), ('curiosity,', 1), ('sacrifices', 1), ('seize', 1), ('grudgingly', 1), ('virtue,', 1), ('prefer', 1), ('roads', 1), ('threats', 1), ('afford,', 1), ('deserve', 1), ('tribe', 1), ('strengthen', 1), ('sun', 1), ('raging', 1), ("care's", 1), ('tides', 1), ('non-believers.', 1), ('young', 1), ('shifted', 1), ('nations,', 1), ("public's", 1), ('tasted', 1), ('storm', 1), ('pursue', 1), ('define', 1), ('understood', 1), ('fist.', 1), ('petty', 1), ('sights.', 1), ('bold', 1), ('price', 1), ('generations', 1), ('filled', 1), ('ills', 1), ('months,', 1), ('tell', 1), ('rights', 1), ('programs', 1), ('president', 1), ("children's", 1), ('gladly,', 1), ('powerful', 1), ('rightful', 1), ('justness', 1), ('judge', 1), ('alliances', 1), ('forward,', 1), ('safely', 1), ('prepare', 1), ('end.', 1), ('worn-out', 1), ('values', 1), ('well+++', 1), ('reject', 1), ('soil', 1), ('deceit', 1), ('noble', 1), ('up,', 1), ('subject', 1), ('"let', 1), ('flow;', 1), ('dollars', 1), ('oceans', 1), ('forebears', 1), ('far-off', 1), ('required', 1), ('join', 1), ('short-cuts', 1), ('around', 1), ('eye,', 1), ('consume', 1), ('history.', 1), ('free,', 1), ('chance', 1), ('adversaries', 1), ('winds', 1), ('sturdy', 1), ('determination', 1), ('mindful', 1), ('mountains.', 1), ('states', 1), ('country,', 1), ('down', 1), ('relative', 1), ('play', 1), ('expand', 1), ('scale', 1), ('faction.', 1), ('creed,', 1), ('communism', 1), ('dying', 1), ('weakness.', 1), ('enemy', 1), ('refused', 1), ('regard', 1), ('magnificent', 1), ('year', 1), ('heritage', 1), ('bestowed,', 1), ('much', 1), ('leave', 1), ('generation,', 1), ('very', 1), ('old.', 1), ('good', 1), ('ago', 1), ('plans.', 1), ('decline', 1), ('borders,', 1), ('god-given', 1), ('lash', 1), ('businesses', 1), ('already', 1), ('tolerance', 1), ('ideals.', 1), ('came', 1), ('segregation,', 1), ('badly', 1), ('spirit;', 1), ('guardians', 1), ('lead', 1), ('during', 1), ('spirit,', 1), ('slaughtering', 1), ('emerged', 1), ('grievances', 1), ('reaffirming', 1), ('citizens:', 1), ('born,', 1), ('dissent,', 1), ('happiness.', 1), ('honesty', 1), ('fought', 1), ('expanded', 1), ('grace', 1), ('great', 1), ('place,', 1), ('necessity', 1), ('patchwork', 1), ('people,', 1), ('demands', 1), ('sahn.', 1), ('costly,', 1), ('electric', 1), ('gift,', 1), ('globe', 1), ("god's", 1), ('wonders', 1), ('prosperity,', 1), ('hindus,', 1), ('assure', 1), ('ultimately', 1), ('often', 1), ('local', 1), ('midst', 1), ('bush', 1), ('equal,', 1), ('search', 1), ('wisely,', 1), ('ways', 1), ('short', 1), ('coldest', 1), ('if', 1), ('continue', 1), ('afghanistan.', 1), ('enjoy', 1), ('heart', 1), ('stranger', 1), ('outside', 1), ('history;', 1), ('nourish', 1), ('aside', 1), ('till', 1), ('office,', 1), ('interests', 1), ('recognition', 1), ('no,', 1), ('dissolve;', 1), ("parent's", 1), ('act,', 1), ('makers', 1), ('harness', 1), ('things.', 1), ('shaped', 1), ('product,', 1), ('foundation', 1), ('kindness', 1), ('been;', 1), ('all.', 1), ('unclench', 1), ('mark', 1), ('move', 1), ('timeless', 1), ('currents,', 1), ('passed.', 1), ('iraq', 1), ('full', 1), ('choices', 1), ('depth', 1), ('fallen', 1), ('wrong', 1), ('ours', 1), ('give', 1), ('concord', 1), ('other', 1), ('hardship,', 1), ('helps', 1), ('khe', 1), ('weakened,', 1), ('man,', 1), ('good.', 1), ('sweatshops,', 1), ('remains', 1), ('earlier', 1), ('we,', 1), ('accept,', 1), ('suffering', 1), ('skill', 1), ('borne', 1), ('shores', 1), ("firefighter's", 1), ('united', 1), ('patriotism', 1), ('cut', 1), ('opportunity', 1), ('mall;', 1), ('dogmas', 1), ('someday', 1), ('foes,', 1), ('ability', 1), ('fear,', 1), ('forward', 1), ('uncertain', 1), ('instead', 1), ('possessions', 1), ('hatreds', 1), ('shape', 1), ('evidence', 1), ('held', 1), ('small,', 1), ('finally', 1), ('settled', 1), ('governments', 1), ('bridges,', 1), ('effort,', 1), ('habits,', 1), ('years', 1), ('domestic', 1), ('oath', 1), ('read', 1), ('please.', 1), ('fixed', 1), ('60', 1), ('homes', 1), ('died', 1), ('river.', 1), ('responsibly', 1), ('pick', 1), ('capacity', 1), ('horizon', 1), ('country', 1), ('outlast', 1), ('bitter', 1), ('tanks,', 1), ('imagine,', 1), ('celebration', 1), ('hand', 1), ('delivered', 1), ('resources', 1), ('then,', 1), ('qualities', 1), ('goods', 1), ('colleges', 1), ('less.', 1), ('lay', 1), ('given.', 1), ('snow', 1), ('real.', 1), ('break,', 1), ('imagination', 1), ('off', 1), ('race', 1), ('village', 1), ('childish', 1), ('control.', 1), ('gathering', 1), ('commerce', 1), ('defense.', 1), ('struggled', 1), ('rugged', 1), ('alarmed', 1), ('run', 1), ("technology's", 1), ('pass;', 1), ('retirement', 1), ('understood.', 1), ('soon', 1), ('farms', 1), ('pleasures', 1), ('quality', 1), ('defense,', 1), ('high', 1), ('measurable,', 1), ('aims', 1)]
>>>

2. 并比较不同计算框架下编程的优缺点、适用的场景。

–Python

–MapReduce

–Hive

–Spark

Mapreduce,它最本质的两个过程就是Map和Reduce,Map的应用在于我们需要数据一对一的元素的映射转换,比如说进行截取,进行过滤,或者任何的转换操作,这些一对一的元素转换就称作是Map;Reduce主要就是元素的聚合,就是多个元素对一个元素的聚合,比如求Sum等,这就是Reduce。

Mapreduce是Hadoop1.0的核心,Spark出现慢慢替代Mapreduce。那么为什么Mapreduce还在被使用呢?因为有很多现有的应用还依赖于它,它不是一个独立的存在,已经成为其他生态不可替代的部分,比如pig,hive等。

尽管MapReduce极大的简化了大数据分析,但是随着大数据需求和使用模式的扩大,用户的需求也越来越多:

1. 更复杂的多重处理需求(比如迭代计算, ML, Graph);

2. 低延迟的交互式查询需求(比如ad-hoc query)

而MapReduce计算模型的架构导致上述两类应用先天缓慢,用户迫切需要一种更快的计算模型,来补充MapReduce的先天不足。

Spark的出现就弥补了这些不足,我们来了解一些Spark的优势:

1.每一个作业独立调度,可以把所有的作业做一个图进行调度,各个作业之间相互依赖,在调度过程中一起调度,速度快。

2.所有过程都基于内存,所以通常也将Spark称作是基于内存的迭代式运算框架。

3.spark提供了更丰富的算子,让操作更方便。

4.更容易的API:支持Python,Scala和Java

其实spark里面也可以实现Mapreduce,但是这里它并不是算法,只是提供了map阶段和reduce阶段,但是在两个阶段提供了很多算法。如Map阶段的map, flatMap, filter, keyBy,Reduce阶段的reduceByKey, sortByKey, mean, gourpBy, sort等。

Hive算是大数据数据仓库的事实标准吧。Hive可以方法HDFS和Hbase上的数据,impala、spark sql、Presto完全能读取hive建立的数据仓库了的数据。一般情况在批处理任务中还在使用Hive,而在热查询做数据展示中大量使用impala、spark sql或Presto。

Hive提供三种访问接口:Cli,web Ui,HiveServer2。

posted @ 2021-04-22 21:24  sevenven  阅读(47)  评论(0编辑  收藏  举报