task02-论文作者统计

task02: 论文作者统计

https://github.com/datawhalechina/team-learning-data-mining/tree/master/AcademicTrends

任务说明

  • 任务主题:论⽂文作者统计,统计所有论⽂文作者出现评率 Top10的姓名;
  • 任务内容:论⽂文作者的统计、使⽤ Pandas 读取数据并使⽤用字符串串操作;
  • 任务成果:任务成果:学习 Pandas 的字符串串操作;

代码

引入模块

import seaborn as sns
from bs4 import BeautifulSoup
import re
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from ast import literal_eval

从保存好的csv文件中读取data

data=pd.read_csv("data.csv")
data.head()
D:\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3146: DtypeWarning: Columns (0) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
id submitter authors title comments journal-ref doi report-no categories license abstract versions update_date authors_parsed
0 704 Pavel Nadolsky C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-... Calculation of prompt diphoton production cros... 37 pages, 15 figures; published version Phys.Rev.D76:013009,2007 10.1103/PhysRevD.76.013009 ANL-HEP-PR-07-12 hep-ph NaN A fully differential calculation in perturba... [{'version': 'v1', 'created': 'Mon, 2 Apr 2007... 2008-11-26 [['Balázs', 'C.', ''], ['Berger', 'E. L.', '']...
1 704 Louis Theran Ileana Streinu and Louis Theran Sparsity-certifying Graph Decompositions To appear in Graphs and Combinatorics NaN NaN NaN math.CO cs.CG http://arxiv.org/licenses/nonexclusive-distrib... We describe a new algorithm, the $(k,\ell)$-... [{'version': 'v1', 'created': 'Sat, 31 Mar 200... 2008-12-13 [['Streinu', 'Ileana', ''], ['Theran', 'Louis'...
2 704 Hongjun Pan Hongjun Pan The evolution of the Earth-Moon system based o... 23 pages, 3 figures NaN NaN NaN physics.gen-ph NaN The evolution of Earth-Moon system is descri... [{'version': 'v1', 'created': 'Sun, 1 Apr 2007... 2008-01-13 [['Pan', 'Hongjun', '']]
3 704 David Callan David Callan A determinant of Stirling cycle numbers counts... 11 pages NaN NaN NaN math.CO NaN We show that a determinant of Stirling cycle... [{'version': 'v1', 'created': 'Sat, 31 Mar 200... 2007-05-23 [['Callan', 'David', '']]
4 704 Alberto Torchinsky Wael Abu-Shammala and Alberto Torchinsky From dyadic $\Lambda_{\alpha}$ to $\Lambda_{\a... NaN Illinois J. Math. 52 (2008) no.2, 681-689 NaN NaN math.CA math.FA NaN In this paper we show how to compute the $\L... [{'version': 'v1', 'created': 'Mon, 2 Apr 2007... 2013-10-15 [['Abu-Shammala', 'Wael', ''], ['Torchinsky', ...
data.shape
(1796911, 14)
##统计AL领域
data2 = data[data['categories'].apply(lambda x: 'cs.AI' in x)]
data2.head()
id submitter authors title comments journal-ref doi report-no categories license abstract versions update_date authors_parsed
46 704.005 Igor Grabec T. Kosel and I. Grabec Intelligent location of simultaneously active ... 5 pages, 5 eps figures, uses IEEEtran.cls NaN NaN NaN cs.NE cs.AI NaN The intelligent acoustic emission locator is... [{'version': 'v1', 'created': 'Sun, 1 Apr 2007... 2009-09-29 [['Kosel', 'T.', ''], ['Grabec', 'I.', '']]
49 704.005 Igor Grabec T. Kosel and I. Grabec Intelligent location of simultaneously active ... 5 pages, 7 eps figures, uses IEEEtran.cls NaN NaN NaN cs.NE cs.AI NaN Part I describes an intelligent acoustic emi... [{'version': 'v1', 'created': 'Sun, 1 Apr 2007... 2007-05-23 [['Kosel', 'T.', ''], ['Grabec', 'I.', '']]
303 704.03 Carlos Gershenson Carlos Gershenson The World as Evolving Information 16 pages. Extended version, three more laws of... Minai, A., Braha, D., and Bar-Yam, Y., eds. Un... 10.1007/978-3-642-18003-3_10 NaN cs.IT cs.AI math.IT q-bio.PE http://arxiv.org/licenses/nonexclusive-distrib... This paper discusses the benefits of describ... [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... 2013-04-05 [['Gershenson', 'Carlos', '']]
984 704.098 Mohd Abubakr Mohd Abubakr, R.M.Vinay Architecture for Pseudo Acausal Evolvable Embe... 4 pages, 2 figures. Submitted to SASO 2007 NaN NaN NaN cs.NE cs.AI NaN Advances in semiconductor technology are con... [{'version': 'v1', 'created': 'Sat, 7 Apr 2007... 2007-05-23 [['Abubakr', 'Mohd', ''], ['Vinay', 'R. M.', '']]
1027 704.103 Jianlin Cheng Jianlin Cheng A neural network approach to ordinal regression 8 pages NaN NaN NaN cs.LG cs.AI cs.NE NaN Ordinal regression is an important type of l... [{'version': 'v1', 'created': 'Sun, 8 Apr 2007... 2007-05-23 [['Cheng', 'Jianlin', '']]
len(data2)
28061
data2.head(20)
id submitter authors title comments journal-ref doi report-no categories license abstract versions update_date authors_parsed
46 704.005 Igor Grabec T. Kosel and I. Grabec Intelligent location of simultaneously active ... 5 pages, 5 eps figures, uses IEEEtran.cls NaN NaN NaN cs.NE cs.AI NaN The intelligent acoustic emission locator is... [{'version': 'v1', 'created': 'Sun, 1 Apr 2007... 2009-09-29 [['Kosel', 'T.', ''], ['Grabec', 'I.', '']]
49 704.005 Igor Grabec T. Kosel and I. Grabec Intelligent location of simultaneously active ... 5 pages, 7 eps figures, uses IEEEtran.cls NaN NaN NaN cs.NE cs.AI NaN Part I describes an intelligent acoustic emi... [{'version': 'v1', 'created': 'Sun, 1 Apr 2007... 2007-05-23 [['Kosel', 'T.', ''], ['Grabec', 'I.', '']]
303 704.03 Carlos Gershenson Carlos Gershenson The World as Evolving Information 16 pages. Extended version, three more laws of... Minai, A., Braha, D., and Bar-Yam, Y., eds. Un... 10.1007/978-3-642-18003-3_10 NaN cs.IT cs.AI math.IT q-bio.PE http://arxiv.org/licenses/nonexclusive-distrib... This paper discusses the benefits of describ... [{'version': 'v1', 'created': 'Tue, 3 Apr 2007... 2013-04-05 [['Gershenson', 'Carlos', '']]
984 704.098 Mohd Abubakr Mohd Abubakr, R.M.Vinay Architecture for Pseudo Acausal Evolvable Embe... 4 pages, 2 figures. Submitted to SASO 2007 NaN NaN NaN cs.NE cs.AI NaN Advances in semiconductor technology are con... [{'version': 'v1', 'created': 'Sat, 7 Apr 2007... 2007-05-23 [['Abubakr', 'Mohd', ''], ['Vinay', 'R. M.', '']]
1027 704.103 Jianlin Cheng Jianlin Cheng A neural network approach to ordinal regression 8 pages NaN NaN NaN cs.LG cs.AI cs.NE NaN Ordinal regression is an important type of l... [{'version': 'v1', 'created': 'Sun, 8 Apr 2007... 2007-05-23 [['Cheng', 'Jianlin', '']]
1393 704.139 Tarik Had\v{z}i\'c Tarik Hadzic, Rune Moller Jensen, Henrik Reif ... Calculating Valid Domains for BDD-Based Intera... NaN NaN NaN NaN cs.AI NaN In these notes we formally describe the func... [{'version': 'v1', 'created': 'Wed, 11 Apr 200... 2007-05-23 [['Hadzic', 'Tarik', ''], ['Jensen', 'Rune Mol...
1408 704.141 Yao Hengshuai Yao HengShuai Preconditioned Temporal Difference Learning This paper has been withdrawn by the author. L... NaN NaN NaN cs.LG cs.AI NaN This paper has been withdrawn by the author.... [{'version': 'v1', 'created': 'Wed, 11 Apr 200... 2012-06-11 [['HengShuai', 'Yao', '']]
1674 704.168 Kristina Lerman Anon Plangprasopchok and Kristina Lerman Exploiting Social Annotation for Automatic Res... 6 pages, submitted to AAAI07 workshop on Infor... NaN NaN NaN cs.AI cs.CY cs.DL NaN Information integration applications, such a... [{'version': 'v1', 'created': 'Thu, 12 Apr 200... 2016-09-08 [['Plangprasopchok', 'Anon', ''], ['Lerman', '...
1675 704.168 Kristina Lerman Kristina Lerman, Anon Plangprasopchok and Chio... Personalizing Image Search Results on Flickr 12 pages, submitted to AAAI07 workshop on Inte... NaN NaN NaN cs.IR cs.AI cs.CY cs.DL cs.HC NaN The social media site Flickr allows users to... [{'version': 'v1', 'created': 'Thu, 12 Apr 200... 2007-05-23 [['Lerman', 'Kristina', ''], ['Plangprasopchok...
1782 704.178 Francesco Santini Stefano Bistarelli, Ugo Montanari, Francesca R... Unicast and Multicast Qos Routing with Soft Co... 45 pages NaN NaN NaN cs.LO cs.AI cs.NI NaN We present a formal model to represent and s... [{'version': 'v1', 'created': 'Fri, 13 Apr 200... 2009-09-29 [['Bistarelli', 'Stefano', ''], ['Montanari', ...
2009 704.201 Juliana Bernardes Juliana S Bernardes, Alberto Davila, Vitor San... A study of structural properties on profiles HMMs 6 pages, 7 figures NaN NaN NaN cs.AI http://arxiv.org/licenses/nonexclusive-distrib... Motivation: Profile hidden Markov Models (pH... [{'version': 'v1', 'created': 'Mon, 16 Apr 200... 2008-12-11 [['Bernardes', 'Juliana S', ''], ['Davila', 'A...
2082 704.208 Hassan Satori H. Satori, M. Harti and N. Chenfour Introduction to Arabic Speech Recognition Usin... 4 pages, 3 figures and 2 tables, was in Inform... NaN NaN NaN cs.CL cs.AI NaN In this paper Arabic was investigated from t... [{'version': 'v1', 'created': 'Tue, 17 Apr 200... 2007-05-23 [['Satori', 'H.', ''], ['Harti', 'M.', ''], ['...
2200 704.22 Hassan Satori H. Satori, M. Harti and N. Chenfour Arabic Speech Recognition System using CMU-Sph... 5 pages, 3 figures and 2 tables, in French NaN NaN NaN cs.CL cs.AI NaN In this paper we present the creation of an ... [{'version': 'v1', 'created': 'Tue, 17 Apr 200... 2007-05-23 [['Satori', 'H.', ''], ['Harti', 'M.', ''], ['...
3156 704.316 Giorgio Terracina Giorgio Terracina, Nicola Leone, Vincenzino Li... Experimenting with recursive queries in databa... To appear in Theory and Practice of Logic Prog... NaN NaN NaN cs.AI cs.DB NaN This paper considers the problem of reasonin... [{'version': 'v1', 'created': 'Tue, 24 Apr 200... 2007-05-23 [['Terracina', 'Giorgio', ''], ['Leone', 'Nico...
3358 704.336 Alex Smola J Quoc Le and Alexander Smola Direct Optimization of Ranking Measures NaN NaN NaN NaN cs.IR cs.AI NaN Web page ranking and collaborative filtering... [{'version': 'v1', 'created': 'Wed, 25 Apr 200... 2007-05-23 [['Le', 'Quoc', ''], ['Smola', 'Alexander', '']]
3394 704.34 Marko A. Rodriguez Marko A. Rodriguez General-Purpose Computing on a Semantic Networ... NaN Emergent Web Intelligence: Advanced Semantic T... NaN LA-UR-07-2885 cs.AI cs.PL http://creativecommons.org/licenses/publicdomain/ This article presents a model of general-pur... [{'version': 'v1', 'created': 'Wed, 25 Apr 200... 2010-06-08 [['Rodriguez', 'Marko A.', '']]
3432 704.343 Tshilidzi Marwala Tshilidzi Marwala and Bodie Crossingham Bayesian approach to rough set 20 pages, 3 figures NaN NaN NaN cs.AI NaN This paper proposes an approach to training ... [{'version': 'v1', 'created': 'Wed, 25 Apr 200... 2007-05-23 [['Marwala', 'Tshilidzi', ''], ['Crossingham',...
3452 704.345 Tshilidzi Marwala S. Mohamed, D. Rubin, and T. Marwala An Adaptive Strategy for the Classification of... 9 pages, 5 tables, 3 figures NaN NaN NaN cs.AI q-bio.QM NaN One of the major problems in computational b... [{'version': 'v1', 'created': 'Wed, 25 Apr 200... 2007-06-25 [['Mohamed', 'S.', ''], ['Rubin', 'D.', ''], [...
3514 704.351 Jegor Uglov Mr J. Uglov, V. Schetinin, C. Maple Comparing Robustness of Pairwise and Multiclas... NaN NaN 10.1155/2008/468693 NaN cs.AI NaN Noise, corruptions and variations in face im... [{'version': 'v1', 'created': 'Thu, 26 Apr 200... 2016-02-17 [['Uglov', 'J.', ''], ['Schetinin', 'V.', ''],...
3885 704.389 W Saba Walid S. Saba A Note on Ontology and Ordinary Language 19 pages, 1 figure NaN NaN NaN cs.AI cs.CL NaN We argue for a compositional semantics groun... [{'version': 'v1', 'created': 'Mon, 30 Apr 200... 2007-05-23 [['Saba', 'Walid S.', '']]
# ast中的literal_eval可将字符串形式的list转回list、
tmplist=literal_eval(data2.authors_parsed.iloc[5])
tmplist
[['Hadzic', 'Tarik', ''],
 ['Jensen', 'Rune Moller', ''],
 ['Andersen', 'Henrik Reif', '']]
# 拼接所有的作者
all_authors=[]
for i in range(0,len(data2)):
    all_authors.extend(literal_eval(data2.authors_parsed.iloc[i]))
authors_names=[' '.join(x)[:-1] for x in all_authors]
authors_names[0:5]
['Kosel T.',
 'Grabec I.',
 'Kosel T.',
 'Grabec I.',
 'Gershenson Carlos']
authors_names = pd.DataFrame(authors_names)
authors_names.head()
0
0 Kosel T.
1 Grabec I.
2 Kosel T.
3 Grabec I.
4 Gershenson Carlos
# 根据作者频率绘制直⽅方图
plt.figure(figsize=(10, 6))
authors_names[0].value_counts().head(10).plot(kind='barh')
plt.ylabel('Author')
plt.xlabel('Count')
Text(0.5, 0, 'Count')



# 统计姓
authors_lastnames = [x[0] for x in all_authors]
authors_lastnames = pd.DataFrame(authors_lastnames)

plt.figure(figsize=(10, 6))
authors_lastnames[0].value_counts().head(10).plot(kind='barh')
plt.ylabel('Author')
plt.xlabel('Count')
Text(0.5, 0, 'Count')


authors_lastnames.value_counts()
Wang        1591
Zhang       1561
Li          1395
Liu         1220
Chen        1126
            ... 
Merayo         1
Mercat         1
Mercer         1
Merchant       1
'Baya          1
Length: 26326, dtype: int64

posted @ 2021-01-17 00:24  Zfancy  阅读(90)  评论(0)    收藏  举报