task01: 论文数据统计
https://github.com/datawhalechina/team-learning-data-mining/tree/master/AcademicTrends
任务说明
- 任务主题:论文数量统计,即统计2019年全年计算机各个方向论文数量;
- 任务内容:赛题的理解、使用 Pandas 读取数据并进行统计;
- 任务成果:学习 Pandas 的基础操作;
数据集介绍
-
数据集来源:数据集链接;
-
数据集的格式如下:
id
:arXiv ID,可用于访问论文;
submitter
:论文提交者;
authors
:论文作者;
title
:论文标题;
comments
:论文页数和图表等其他信息;
journal-ref
:论文发表的期刊的信息;
doi
:数字对象标识符,https://www.doi.org;
report-no
:报告编号;
categories
:论文在 arXiv 系统的所属类别或标签;
license
:文章的许可证;
abstract
:论文摘要;
versions
:论文版本;
authors_parsed
:作者的信息。
-
数据集实例
"root":{
"id":string"0704.0001"
"submitter":string"Pavel Nadolsky"
"authors":string"C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan"
"title":string"Calculation of prompt diphoton production cross sections at Tevatron and LHC energies"
"comments":string"37 pages, 15 figures; published version"
"journal-ref":string"Phys.Rev.D76:013009,2007"
"doi":string"10.1103/PhysRevD.76.013009"
"report-no":string"ANL-HEP-PR-07-12"
"categories":string"hep-ph"
"license":NULL
"abstract":string" A fully differential calculation in perturbative quantum chromodynamics is presented for the production of massive photon pairs at hadron colliders. All next-to-leading order perturbative contributions from quark-antiquark, gluon-(anti)quark, and gluon-gluon subprocesses are included, as well as all-orders resummation of initial-state gluon radiation valid at next-to-next-to leading logarithmic accuracy. The region of phase space is specified in which the calculation is most reliable. Good agreement is demonstrated with data from the Fermilab Tevatron, and predictions are made for more detailed tests with CDF and DO data. Predictions are shown for distributions of diphoton pairs produced at the energy of the Large Hadron Collider (LHC). Distributions of the diphoton pairs from the decay of a Higgs boson are contrasted with those produced from QCD processes at the LHC, showing that enhanced sensitivity to the signal can be obtained with judicious selection of events."
"versions":[
0:{
"version":string"v1"
"created":string"Mon, 2 Apr 2007 19:18:42 GMT"
}
1:{
"version":string"v2"
"created":string"Tue, 24 Jul 2007 20:10:27 GMT"
}]
"update_date":string"2008-11-26"
"authors_parsed":[
0:[
0:string"Balázs"
1:string"C."
2:string""]
1:[
0:string"Berger"
1:string"E. L."
2:string""]
2:[
0:string"Nadolsky"
1:string"P. M."
2:string""]
3:[
0:string"Yuan"
1:string"C. -P."
2:string""]]
}
arxiv论文类别介绍
代码
首先导入模块
import seaborn as sns
from bs4 import BeautifulSoup
import re
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
从json中加载数据,保存为DataFrame
data = []
with open("arxiv-metadata-oai-snapshot.json",'r') as f:
first=True
for line in tqdm(f):
data.append(json.loads(line))
if first:
print(json.loads(line))
first=False
3987it [00:00, 39567.97it/s]
{'id': '0704.0001', 'submitter': 'Pavel Nadolsky', 'authors': "C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan", 'title': 'Calculation of prompt diphoton production cross sections at Tevatron and\n LHC energies', 'comments': '37 pages, 15 figures; published version', 'journal-ref': 'Phys.Rev.D76:013009,2007', 'doi': '10.1103/PhysRevD.76.013009', 'report-no': 'ANL-HEP-PR-07-12', 'categories': 'hep-ph', 'license': None, 'abstract': ' A fully differential calculation in perturbative quantum chromodynamics is\npresented for the production of massive photon pairs at hadron colliders. All\nnext-to-leading order perturbative contributions from quark-antiquark,\ngluon-(anti)quark, and gluon-gluon subprocesses are included, as well as\nall-orders resummation of initial-state gluon radiation valid at\nnext-to-next-to-leading logarithmic accuracy. The region of phase space is\nspecified in which the calculation is most reliable. Good agreement is\ndemonstrated with data from the Fermilab Tevatron, and predictions are made for\nmore detailed tests with CDF and DO data. Predictions are shown for\ndistributions of diphoton pairs produced at the energy of the Large Hadron\nCollider (LHC). Distributions of the diphoton pairs from the decay of a Higgs\nboson are contrasted with those produced from QCD processes at the LHC, showing\nthat enhanced sensitivity to the signal can be obtained with judicious\nselection of events.\n', 'versions': [{'version': 'v1', 'created': 'Mon, 2 Apr 2007 19:18:42 GMT'}, {'version': 'v2', 'created': 'Tue, 24 Jul 2007 20:10:27 GMT'}], 'update_date': '2008-11-26', 'authors_parsed': [['Balázs', 'C.', ''], ['Berger', 'E. L.', ''], ['Nadolsky', 'P. M.', ''], ['Yuan', 'C. -P.', '']]}
1796911it [05:07, 5838.67it/s]
data=pd.DataFrame(data)
data.to_csv("data.csv",index=False)
print(data.shape)
print(data.columns)
data.head()
(1796911, 14)
Index(['id', 'submitter', 'authors', 'title', 'comments', 'journal-ref', 'doi',
'report-no', 'categories', 'license', 'abstract', 'versions',
'update_date', 'authors_parsed'],
dtype='object')
|
id |
submitter |
authors |
title |
comments |
journal-ref |
doi |
report-no |
categories |
license |
abstract |
versions |
update_date |
authors_parsed |
0 |
0704.0001 |
Pavel Nadolsky |
C. Bal\'azs, E. L. Berger, P. M. Nadolsky, C.-... |
Calculation of prompt diphoton production cros... |
37 pages, 15 figures; published version |
Phys.Rev.D76:013009,2007 |
10.1103/PhysRevD.76.013009 |
ANL-HEP-PR-07-12 |
hep-ph |
None |
A fully differential calculation in perturba... |
[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... |
2008-11-26 |
[[Balázs, C., ], [Berger, E. L., ], [Nadolsky,... |
1 |
0704.0002 |
Louis Theran |
Ileana Streinu and Louis Theran |
Sparsity-certifying Graph Decompositions |
To appear in Graphs and Combinatorics |
None |
None |
None |
math.CO cs.CG |
http://arxiv.org/licenses/nonexclusive-distrib... |
We describe a new algorithm, the $(k,\ell)$-... |
[{'version': 'v1', 'created': 'Sat, 31 Mar 200... |
2008-12-13 |
[[Streinu, Ileana, ], [Theran, Louis, ]] |
2 |
0704.0003 |
Hongjun Pan |
Hongjun Pan |
The evolution of the Earth-Moon system based o... |
23 pages, 3 figures |
None |
None |
None |
physics.gen-ph |
None |
The evolution of Earth-Moon system is descri... |
[{'version': 'v1', 'created': 'Sun, 1 Apr 2007... |
2008-01-13 |
[[Pan, Hongjun, ]] |
3 |
0704.0004 |
David Callan |
David Callan |
A determinant of Stirling cycle numbers counts... |
11 pages |
None |
None |
None |
math.CO |
None |
We show that a determinant of Stirling cycle... |
[{'version': 'v1', 'created': 'Sat, 31 Mar 200... |
2007-05-23 |
[[Callan, David, ]] |
4 |
0704.0005 |
Alberto Torchinsky |
Wael Abu-Shammala and Alberto Torchinsky |
From dyadic $\Lambda_{\alpha}$ to $\Lambda_{\a... |
None |
Illinois J. Math. 52 (2008) no.2, 681-689 |
None |
None |
math.CA math.FA |
None |
In this paper we show how to compute the $\L... |
[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... |
2013-10-15 |
[[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]] |
观察data的categories列
data["categories"].describe()
count 395123
unique 28321
top cs.CV
freq 12076
Name: categories, dtype: object
data["categories"].head(20)
0 hep-ph
1 math.CO cs.CG
2 physics.gen-ph
3 math.CO
4 math.CA math.FA
5 cond-mat.mes-hall
6 gr-qc
7 cond-mat.mtrl-sci
8 astro-ph
9 math.CO
10 math.NT math.AG
11 math.NT
12 math.NT
13 math.CA math.AT
14 hep-th
15 hep-ph
16 astro-ph
17 hep-th
18 math.PR math.AG
19 hep-ex
Name: categories, dtype: object
unique_categories = set([i for l in [x.split(' ') for x in data["categories"]] for i in l])
print(len(unique_categories))
unique_categories
176
{'acc-phys',
'adap-org',
'alg-geom',
'ao-sci',
'astro-ph',
'astro-ph.CO',
'astro-ph.EP',
'astro-ph.GA',
'astro-ph.HE',
'astro-ph.IM',
'astro-ph.SR',
'atom-ph',
'bayes-an',
'chao-dyn',
'chem-ph',
'cmp-lg',
'comp-gas',
'cond-mat',
'cond-mat.dis-nn',
'cond-mat.mes-hall',
'cond-mat.mtrl-sci',
'cond-mat.other',
'cond-mat.quant-gas',
'cond-mat.soft',
'cond-mat.stat-mech',
'cond-mat.str-el',
'cond-mat.supr-con',
'cs.AI',
'cs.AR',
'cs.CC',
'cs.CE',
'cs.CG',
'cs.CL',
'cs.CR',
'cs.CV',
'cs.CY',
'cs.DB',
'cs.DC',
'cs.DL',
'cs.DM',
'cs.DS',
'cs.ET',
'cs.FL',
'cs.GL',
'cs.GR',
'cs.GT',
'cs.HC',
'cs.IR',
'cs.IT',
'cs.LG',
'cs.LO',
'cs.MA',
'cs.MM',
'cs.MS',
'cs.NA',
'cs.NE',
'cs.NI',
'cs.OH',
'cs.OS',
'cs.PF',
'cs.PL',
'cs.RO',
'cs.SC',
'cs.SD',
'cs.SE',
'cs.SI',
'cs.SY',
'dg-ga',
'econ.EM',
'econ.GN',
'econ.TH',
'eess.AS',
'eess.IV',
'eess.SP',
'eess.SY',
'funct-an',
'gr-qc',
'hep-ex',
'hep-lat',
'hep-ph',
'hep-th',
'math-ph',
'math.AC',
'math.AG',
'math.AP',
'math.AT',
'math.CA',
'math.CO',
'math.CT',
'math.CV',
'math.DG',
'math.DS',
'math.FA',
'math.GM',
'math.GN',
'math.GR',
'math.GT',
'math.HO',
'math.IT',
'math.KT',
'math.LO',
'math.MG',
'math.MP',
'math.NA',
'math.NT',
'math.OA',
'math.OC',
'math.PR',
'math.QA',
'math.RA',
'math.RT',
'math.SG',
'math.SP',
'math.ST',
'mtrl-th',
'nlin.AO',
'nlin.CD',
'nlin.CG',
'nlin.PS',
'nlin.SI',
'nucl-ex',
'nucl-th',
'patt-sol',
'physics.acc-ph',
'physics.ao-ph',
'physics.app-ph',
'physics.atm-clus',
'physics.atom-ph',
'physics.bio-ph',
'physics.chem-ph',
'physics.class-ph',
'physics.comp-ph',
'physics.data-an',
'physics.ed-ph',
'physics.flu-dyn',
'physics.gen-ph',
'physics.geo-ph',
'physics.hist-ph',
'physics.ins-det',
'physics.med-ph',
'physics.optics',
'physics.plasm-ph',
'physics.pop-ph',
'physics.soc-ph',
'physics.space-ph',
'plasm-ph',
'q-alg',
'q-bio',
'q-bio.BM',
'q-bio.CB',
'q-bio.GN',
'q-bio.MN',
'q-bio.NC',
'q-bio.OT',
'q-bio.PE',
'q-bio.QM',
'q-bio.SC',
'q-bio.TO',
'q-fin.CP',
'q-fin.EC',
'q-fin.GN',
'q-fin.MF',
'q-fin.PM',
'q-fin.PR',
'q-fin.RM',
'q-fin.ST',
'q-fin.TR',
'quant-ph',
'solv-int',
'stat.AP',
'stat.CO',
'stat.ME',
'stat.ML',
'stat.OT',
'stat.TH',
'supr-con'}
转化年份数据
data["update_date"].head(10)
0 2008-11-26
1 2008-12-13
2 2008-01-13
3 2007-05-23
4 2013-10-15
5 2015-05-13
6 2008-11-26
7 2009-02-05
8 2010-03-18
9 2007-05-23
Name: update_date, dtype: object
data["year"] = pd.to_datetime(data["update_date"]).dt.year
data["year"].head(20)
0 2008
1 2008
2 2008
3 2007
4 2013
5 2015
6 2008
7 2009
8 2010
9 2007
10 2008
11 2007
12 2008
13 2009
14 2009
15 2008
16 2009
17 2007
18 2007
19 2015
Name: year, dtype: int64
del data["update_date"]
len(data.columns)
14
取2019年及以后的数据
data = data[data["year"] >= 2019]
data.reset_index(drop=True, inplace=True)
data.head()
|
id |
submitter |
authors |
title |
comments |
journal-ref |
doi |
report-no |
categories |
license |
abstract |
versions |
authors_parsed |
year |
0 |
0704.0297 |
Sung-Chul Yoon |
Sung-Chul Yoon, Philipp Podsiadlowski and Step... |
Remnant evolution after a carbon-oxygen white ... |
15 pages, 15 figures, 3 tables, submitted to M... |
None |
10.1111/j.1365-2966.2007.12161.x |
None |
astro-ph |
None |
We systematically explore the evolution of t... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... |
2019 |
1 |
0704.0342 |
Patrice Ntumba Pungu |
B. Dugmore and PP. Ntumba |
Cofibrations in the Category of Frolicher Spac... |
27 pages |
None |
None |
None |
math.AT |
None |
Cofibrations are defined in the category of ... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Dugmore, B., ], [Ntumba, PP., ]] |
2019 |
2 |
0704.0360 |
Zaqarashvili |
T.V. Zaqarashvili and K Murawski |
Torsional oscillations of longitudinally inhom... |
6 pages, 3 figures, accepted in A&A |
None |
10.1051/0004-6361:20077246 |
None |
astro-ph |
None |
We explore the effect of an inhomogeneous ma... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Zaqarashvili, T. V., ], [Murawski, K, ]] |
2019 |
3 |
0704.0525 |
Sezgin Ayg\"un |
Sezgin Aygun, Ismail Tarhan, Husnu Baysal |
On the Energy-Momentum Problem in Static Einst... |
This submission has been withdrawn by arXiv ad... |
Chin.Phys.Lett.24:355-358,2007 |
10.1088/0256-307X/24/2/015 |
None |
gr-qc |
None |
This paper has been removed by arXiv adminis... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... |
2019 |
4 |
0704.0535 |
Antonio Pipino |
Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... |
The Formation of Globular Cluster Systems in M... |
32 pages (referee format), 9 figures, ApJ acce... |
Astrophys.J.665:295-305,2007 |
10.1086/519546 |
None |
astro-ph |
None |
The most massive elliptical galaxies show a ... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... |
2019 |
提取网页上的类别信息
#提取网页上的类别信息
website_url = requests.get('https://arxiv.org/category_taxonomy').text
soup = BeautifulSoup(website_url,'lxml')
root = soup.find('div',{'id':'category_taxonomy_list'})
tags = root.find_all(["h2","h3","h4","p"], recursive=True)
#保存信息
level_1_name = ""
level_2_name = ""
level_2_code = ""
level_1_names = []
level_2_codes = []
level_2_names = []
level_3_codes = []
level_3_names = []
level_3_notes = []
for t in tags:
if t.name == "h2":
level_1_name = t.text
level_2_code = t.text
level_2_name = t.text
elif t.name == "h3":
raw = t.text
level_2_code = re.sub(r"(.*)\((.*)\)",r"\2",raw) #正则表达式:模式字符串串:(.*)\((.*)\);被替换字符串串 "\2";被处理理字符串串: raw
level_2_name = re.sub(r"(.*)\((.*)\)",r"\1",raw)
elif t.name == "h4":
raw = t.text
level_3_code = re.sub(r"(.*) \((.*)\)",r"\1",raw)
level_3_name = re.sub(r"(.*) \((.*)\)",r"\2",raw)
elif t.name == "p":
notes = t.text
level_1_names.append(level_1_name)
level_2_names.append(level_2_name)
level_2_codes.append(level_2_code)
level_3_names.append(level_3_name)
level_3_codes.append(level_3_code)
level_3_notes.append(notes)
#根据以上信息⽣生成 dataframe格式的数据
df_taxonomy = pd.DataFrame({
'group_name' : level_1_names,
'archive_name' : level_2_names,
'archive_id' : level_2_codes,
'category_name' : level_3_names,
'categories' : level_3_codes,
'category_description': level_3_notes
})
df_taxonomy.to_csv("df_taxonomy.csv",index=False)
#按照 "group_name" 进⾏行行分组,在组内使⽤用 "archive_name" 进⾏行行排序
df_taxonomy.groupby(["group_name","archive_name"])
df_taxonomy.head()
|
group_name |
archive_name |
archive_id |
category_name |
categories |
category_description |
0 |
Computer Science |
Computer Science |
Computer Science |
Artificial Intelligence |
cs.AI |
Covers all areas of AI except Vision, Robotics... |
1 |
Computer Science |
Computer Science |
Computer Science |
Hardware Architecture |
cs.AR |
Covers systems organization and hardware archi... |
2 |
Computer Science |
Computer Science |
Computer Science |
Computational Complexity |
cs.CC |
Covers models of computation, complexity class... |
3 |
Computer Science |
Computer Science |
Computer Science |
Computational Engineering, Finance, and Science |
cs.CE |
Covers applications of computer science to the... |
4 |
Computer Science |
Computer Science |
Computer Science |
Computational Geometry |
cs.CG |
Roughly includes material in ACM Subject Class... |
每篇论文的categories(多)类型不分开,连接data与类型数据
_df = data.merge(df_taxonomy, on="categories", how="left").drop_duplicates(["id","group_name"]).groupby("group_name").agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
_df
|
group_name |
id |
0 |
Physics |
79985 |
1 |
Mathematics |
51567 |
2 |
Computer Science |
40067 |
3 |
Statistics |
4054 |
4 |
Electrical Engineering and Systems Science |
3297 |
5 |
Quantitative Biology |
1994 |
6 |
Quantitative Finance |
826 |
7 |
Economics |
576 |
若一篇论文被分为多个类别,则该论文的每个类别都有一条记录
newdata=data.copy(deep=True)
tmps=data.categories.str.split(" ")
newdata.categories=tmps
newdata.head(20)
|
id |
submitter |
authors |
title |
comments |
journal-ref |
doi |
report-no |
categories |
license |
abstract |
versions |
authors_parsed |
year |
0 |
0704.0297 |
Sung-Chul Yoon |
Sung-Chul Yoon, Philipp Podsiadlowski and Step... |
Remnant evolution after a carbon-oxygen white ... |
15 pages, 15 figures, 3 tables, submitted to M... |
None |
10.1111/j.1365-2966.2007.12161.x |
None |
[astro-ph] |
None |
We systematically explore the evolution of t... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... |
2019 |
1 |
0704.0342 |
Patrice Ntumba Pungu |
B. Dugmore and PP. Ntumba |
Cofibrations in the Category of Frolicher Spac... |
27 pages |
None |
None |
None |
[math.AT] |
None |
Cofibrations are defined in the category of ... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Dugmore, B., ], [Ntumba, PP., ]] |
2019 |
2 |
0704.0360 |
Zaqarashvili |
T.V. Zaqarashvili and K Murawski |
Torsional oscillations of longitudinally inhom... |
6 pages, 3 figures, accepted in A&A |
None |
10.1051/0004-6361:20077246 |
None |
[astro-ph] |
None |
We explore the effect of an inhomogeneous ma... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Zaqarashvili, T. V., ], [Murawski, K, ]] |
2019 |
3 |
0704.0525 |
Sezgin Ayg\"un |
Sezgin Aygun, Ismail Tarhan, Husnu Baysal |
On the Energy-Momentum Problem in Static Einst... |
This submission has been withdrawn by arXiv ad... |
Chin.Phys.Lett.24:355-358,2007 |
10.1088/0256-307X/24/2/015 |
None |
[gr-qc] |
None |
This paper has been removed by arXiv adminis... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... |
2019 |
4 |
0704.0535 |
Antonio Pipino |
Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... |
The Formation of Globular Cluster Systems in M... |
32 pages (referee format), 9 figures, ApJ acce... |
Astrophys.J.665:295-305,2007 |
10.1086/519546 |
None |
[astro-ph] |
None |
The most massive elliptical galaxies show a ... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... |
2019 |
5 |
0704.0710 |
Joerg Junkersfeld |
J. Junkersfeld (for the CB-ELSA collaboration) |
Photoproduction of pi0 omega off protons for E... |
8 pages, 13 figures |
Eur.Phys.J.A31:365-372,2007 |
10.1140/epja/i2006-10302-7 |
None |
[nucl-ex] |
None |
Differential and total cross-sections for ph... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Junkersfeld, J., , for the CB-ELSA collabora... |
2019 |
6 |
0704.0752 |
Davoud Kamani |
Davoud Kamani |
Actions for the Bosonic String with the Curved... |
8 pages, Latex, no figure, Some minor changes ... |
Braz. J. Phys. 38, 268-271 (2008) |
10.1590/S0103-97332008000200010 |
None |
[hep-th] |
None |
At first we introduce an action for the stri... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Kamani, Davoud, ]] |
2020 |
7 |
0704.0803 |
Josephine Nanao |
Walter A. Simmons and Sandip S. Pakvasa |
Geometric Phase and Superconducting Flux Quant... |
5 pages, pdf format |
None |
None |
None |
[quant-ph] |
None |
In a ring of s-wave superconducting material... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Simmons, Walter A., ], [Pakvasa, Sandip S., ]] |
2019 |
8 |
0704.0880 |
Qiuping A. Wang |
Q. A. Wang (ISMANS), F. Tsobnang (ISMANS), S. ... |
Stochastic action principle and maximum entropy |
This work is a further development of the idea... |
Chaos, Solitons and Fractals, 40(2009)2550-2556 |
None |
None |
[cond-mat.stat-mech] |
None |
A stochastic action principle for stochastic... |
[{'version': 'v1', 'created': 'Fri, 6 Apr 2007... |
[[Wang, Q. A., , ISMANS], [Tsobnang, F., , ISM... |
2020 |
9 |
0704.0981 |
Xuan Hien Nguyen |
Xuan Hien Nguyen |
Construction of Complete Embedded Self-Similar... |
30 pages |
Adv. Differential Equations 15 (2010), no. 5-6... |
None |
None |
[math.DG] |
None |
We study the Dirichlet problem associated to... |
[{'version': 'v1', 'created': 'Sat, 7 Apr 2007... |
[[Nguyen, Xuan Hien, ]] |
2019 |
10 |
0704.1000 |
Liming Zhang |
L.M. Zhang, et al (for the Belle Collaboration) |
Measurement of D0-D0bar mixing in D0->Ks pi+ p... |
6 pages, 4 figures, Submitted to Physical Revi... |
Phys.Rev.Lett.99:131803,2007 |
10.1103/PhysRevLett.99.131803 |
BELLE-CONF-0702 |
[hep-ex] |
None |
We report a measurement of D0-D0bar mixing i... |
[{'version': 'v1', 'created': 'Sat, 7 Apr 2007... |
[[Zhang, L. M., ]] |
2019 |
11 |
0704.1245 |
Pamela Klaassen |
P.D. Klaassen and C.D. Wilson |
Outflow and Infall in a Sample of Massive Star... |
34 pages, 9 figures, accepted for publication ... |
Astrophys.J.663:1092-1102,2007 |
10.1086/518760 |
None |
[astro-ph] |
None |
We present single pointing observations of S... |
[{'version': 'v1', 'created': 'Tue, 10 Apr 200... |
[[Klaassen, P. D., ], [Wilson, C. D., ]] |
2019 |
12 |
0704.1369 |
Kazuya Aoki |
K. Aoki (for the PHENIX Collaboration) |
Double Helicity Asymmetry of Inclusive pi0 Pro... |
4 pages, 3 figures, to be published in the Pro... |
AIPConf.Proc.915:339-342,2007 |
10.1063/1.2750791 |
None |
[hep-ex] |
None |
The proton spin structure is not understood ... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Aoki, K., , for the PHENIX Collaboration]] |
2019 |
13 |
0704.1403 |
Alberto S. Cattaneo |
Alberto S. Cattaneo, Florian Schaetz |
Equivalences of Higher Derived Brackets |
16 pages; minor changes; corrected typos; to a... |
J. Pure Appl. Algebra, 212, 2450-2460 (2008) |
10.1016/j.jpaa.2008.03.013 |
None |
[math.QA, math.DG, math.SG] |
None |
This note elaborates on Th. Voronov's constr... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Cattaneo, Alberto S., ], [Schaetz, Florian, ]] |
2020 |
14 |
0704.1430 |
Simone Zaggia R. |
Y. Momany, E.V. Held, I. Saviane, S. Zaggia, L... |
The blue plume population in dwarf spheroidal ... |
Accepted for publication in Astronomy & Astrop... |
None |
10.1051/0004-6361:20067024 |
None |
[astro-ph] |
None |
Abridged... Blue stragglers (BSS) are though... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Momany, Y., ], [Held, E. V., ], [Saviane, I.... |
2019 |
15 |
0704.1445 |
Yasha Gindikin |
Yasha Gindikin and Vladimir A. Sablikov |
Deformed Wigner crystal in a one-dimensional q... |
10 pages, 11 figures. Misprints fixed |
Phys. Rev. B 76, 045122 (2007) |
10.1103/PhysRevB.76.045122 |
None |
[cond-mat.str-el, cond-mat.mes-hall] |
http://arxiv.org/licenses/nonexclusive-distrib... |
The spatial Fourier spectrum of the electron... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Gindikin, Yasha, ], [Sablikov, Vladimir A., ]] |
2019 |
16 |
0704.1454 |
James R. Graham |
James R. Graham (1), Bruce Macintosh (2), Rene... |
Ground-Based Direct Detection of Exoplanets wi... |
White paper submitted to the NSF-NASA-DOE Astr... |
None |
None |
None |
[astro-ph] |
None |
The Gemini Planet (GPI) imager is an "extrem... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Graham, James R., ], [Macintosh, Bruce, ], [... |
2019 |
17 |
0704.1507 |
David Ardila |
D.R. Ardila, D.A. Golimowski, J.E. Krist, M. C... |
HST/ACS Coronagraphic Observations of the Dust... |
Accepted to ApJ |
None |
None |
None |
[astro-ph] |
None |
We present ACS/HST coronagraphic observation... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Ardila, D. R., ], [Golimowski, D. A., ], [Kr... |
2019 |
18 |
0704.1579 |
Jose Alfonso Lopez Aguerri |
J. A. L. Aguerri, R. Sanchez-Janssen and C. Mu... |
A Study of Catalogued Nearby Galaxy Clusters i... |
19 pages, 11 figures, accepted for publication... |
None |
10.1051/0004-6361:20066478 |
None |
[astro-ph] |
None |
We have selected a sample of 88 nearby (z<0.... |
[{'version': 'v1', 'created': 'Thu, 12 Apr 200... |
[[Aguerri, J. A. L., ], [Sanchez-Janssen, R., ... |
2019 |
19 |
0704.1776 |
Joerg Junkersfeld |
H. van Pee, O. Bartholomy, V. Crede (for the C... |
Photoproduction of pi0-mesons off protons from... |
17 pages, 17 figures |
Eur.Phys.J.A31:61-77,2007 |
10.1140/epja/i2006-10160-3 |
None |
[nucl-ex] |
None |
Photoproduction of pi0 mesons was studied wi... |
[{'version': 'v1', 'created': 'Fri, 13 Apr 200... |
[[van Pee, H., , for the CB-ELSA Collaboration... |
2019 |
explode_data=newdata.explode("categories")
explode_df = explode_data.merge(df_taxonomy, on="categories", how="left").drop_duplicates(["id","group_name"]).groupby("group_name").agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
explode_df
|
group_name |
id |
0 |
Physics |
173191 |
1 |
Computer Science |
134283 |
2 |
Mathematics |
116930 |
3 |
Statistics |
39655 |
4 |
Electrical Engineering and Systems Science |
24834 |
5 |
Quantitative Biology |
8140 |
6 |
Quantitative Finance |
3680 |
7 |
Economics |
2595 |
若一篇论文属于多个分类,则取第一个分类
tmplist=[]
for i in range(0,len(tmps)):
tmplist.append(tmps[i][0])
first_data=newdata.copy(deep=True)
first_data.categories=tmplist
first_data.head(20)
|
id |
submitter |
authors |
title |
comments |
journal-ref |
doi |
report-no |
categories |
license |
abstract |
versions |
authors_parsed |
year |
0 |
0704.0297 |
Sung-Chul Yoon |
Sung-Chul Yoon, Philipp Podsiadlowski and Step... |
Remnant evolution after a carbon-oxygen white ... |
15 pages, 15 figures, 3 tables, submitted to M... |
None |
10.1111/j.1365-2966.2007.12161.x |
None |
astro-ph |
None |
We systematically explore the evolution of t... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Yoon, Sung-Chul, ], [Podsiadlowski, Philipp,... |
2019 |
1 |
0704.0342 |
Patrice Ntumba Pungu |
B. Dugmore and PP. Ntumba |
Cofibrations in the Category of Frolicher Spac... |
27 pages |
None |
None |
None |
math.AT |
None |
Cofibrations are defined in the category of ... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Dugmore, B., ], [Ntumba, PP., ]] |
2019 |
2 |
0704.0360 |
Zaqarashvili |
T.V. Zaqarashvili and K Murawski |
Torsional oscillations of longitudinally inhom... |
6 pages, 3 figures, accepted in A&A |
None |
10.1051/0004-6361:20077246 |
None |
astro-ph |
None |
We explore the effect of an inhomogeneous ma... |
[{'version': 'v1', 'created': 'Tue, 3 Apr 2007... |
[[Zaqarashvili, T. V., ], [Murawski, K, ]] |
2019 |
3 |
0704.0525 |
Sezgin Ayg\"un |
Sezgin Aygun, Ismail Tarhan, Husnu Baysal |
On the Energy-Momentum Problem in Static Einst... |
This submission has been withdrawn by arXiv ad... |
Chin.Phys.Lett.24:355-358,2007 |
10.1088/0256-307X/24/2/015 |
None |
gr-qc |
None |
This paper has been removed by arXiv adminis... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Aygun, Sezgin, ], [Tarhan, Ismail, ], [Baysa... |
2019 |
4 |
0704.0535 |
Antonio Pipino |
Antonio Pipino (1,3), Thomas H. Puzia (2,4), a... |
The Formation of Globular Cluster Systems in M... |
32 pages (referee format), 9 figures, ApJ acce... |
Astrophys.J.665:295-305,2007 |
10.1086/519546 |
None |
astro-ph |
None |
The most massive elliptical galaxies show a ... |
[{'version': 'v1', 'created': 'Wed, 4 Apr 2007... |
[[Pipino, Antonio, ], [Puzia, Thomas H., ], [M... |
2019 |
5 |
0704.0710 |
Joerg Junkersfeld |
J. Junkersfeld (for the CB-ELSA collaboration) |
Photoproduction of pi0 omega off protons for E... |
8 pages, 13 figures |
Eur.Phys.J.A31:365-372,2007 |
10.1140/epja/i2006-10302-7 |
None |
nucl-ex |
None |
Differential and total cross-sections for ph... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Junkersfeld, J., , for the CB-ELSA collabora... |
2019 |
6 |
0704.0752 |
Davoud Kamani |
Davoud Kamani |
Actions for the Bosonic String with the Curved... |
8 pages, Latex, no figure, Some minor changes ... |
Braz. J. Phys. 38, 268-271 (2008) |
10.1590/S0103-97332008000200010 |
None |
hep-th |
None |
At first we introduce an action for the stri... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Kamani, Davoud, ]] |
2020 |
7 |
0704.0803 |
Josephine Nanao |
Walter A. Simmons and Sandip S. Pakvasa |
Geometric Phase and Superconducting Flux Quant... |
5 pages, pdf format |
None |
None |
None |
quant-ph |
None |
In a ring of s-wave superconducting material... |
[{'version': 'v1', 'created': 'Thu, 5 Apr 2007... |
[[Simmons, Walter A., ], [Pakvasa, Sandip S., ]] |
2019 |
8 |
0704.0880 |
Qiuping A. Wang |
Q. A. Wang (ISMANS), F. Tsobnang (ISMANS), S. ... |
Stochastic action principle and maximum entropy |
This work is a further development of the idea... |
Chaos, Solitons and Fractals, 40(2009)2550-2556 |
None |
None |
cond-mat.stat-mech |
None |
A stochastic action principle for stochastic... |
[{'version': 'v1', 'created': 'Fri, 6 Apr 2007... |
[[Wang, Q. A., , ISMANS], [Tsobnang, F., , ISM... |
2020 |
9 |
0704.0981 |
Xuan Hien Nguyen |
Xuan Hien Nguyen |
Construction of Complete Embedded Self-Similar... |
30 pages |
Adv. Differential Equations 15 (2010), no. 5-6... |
None |
None |
math.DG |
None |
We study the Dirichlet problem associated to... |
[{'version': 'v1', 'created': 'Sat, 7 Apr 2007... |
[[Nguyen, Xuan Hien, ]] |
2019 |
10 |
0704.1000 |
Liming Zhang |
L.M. Zhang, et al (for the Belle Collaboration) |
Measurement of D0-D0bar mixing in D0->Ks pi+ p... |
6 pages, 4 figures, Submitted to Physical Revi... |
Phys.Rev.Lett.99:131803,2007 |
10.1103/PhysRevLett.99.131803 |
BELLE-CONF-0702 |
hep-ex |
None |
We report a measurement of D0-D0bar mixing i... |
[{'version': 'v1', 'created': 'Sat, 7 Apr 2007... |
[[Zhang, L. M., ]] |
2019 |
11 |
0704.1245 |
Pamela Klaassen |
P.D. Klaassen and C.D. Wilson |
Outflow and Infall in a Sample of Massive Star... |
34 pages, 9 figures, accepted for publication ... |
Astrophys.J.663:1092-1102,2007 |
10.1086/518760 |
None |
astro-ph |
None |
We present single pointing observations of S... |
[{'version': 'v1', 'created': 'Tue, 10 Apr 200... |
[[Klaassen, P. D., ], [Wilson, C. D., ]] |
2019 |
12 |
0704.1369 |
Kazuya Aoki |
K. Aoki (for the PHENIX Collaboration) |
Double Helicity Asymmetry of Inclusive pi0 Pro... |
4 pages, 3 figures, to be published in the Pro... |
AIPConf.Proc.915:339-342,2007 |
10.1063/1.2750791 |
None |
hep-ex |
None |
The proton spin structure is not understood ... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Aoki, K., , for the PHENIX Collaboration]] |
2019 |
13 |
0704.1403 |
Alberto S. Cattaneo |
Alberto S. Cattaneo, Florian Schaetz |
Equivalences of Higher Derived Brackets |
16 pages; minor changes; corrected typos; to a... |
J. Pure Appl. Algebra, 212, 2450-2460 (2008) |
10.1016/j.jpaa.2008.03.013 |
None |
math.QA |
None |
This note elaborates on Th. Voronov's constr... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Cattaneo, Alberto S., ], [Schaetz, Florian, ]] |
2020 |
14 |
0704.1430 |
Simone Zaggia R. |
Y. Momany, E.V. Held, I. Saviane, S. Zaggia, L... |
The blue plume population in dwarf spheroidal ... |
Accepted for publication in Astronomy & Astrop... |
None |
10.1051/0004-6361:20067024 |
None |
astro-ph |
None |
Abridged... Blue stragglers (BSS) are though... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Momany, Y., ], [Held, E. V., ], [Saviane, I.... |
2019 |
15 |
0704.1445 |
Yasha Gindikin |
Yasha Gindikin and Vladimir A. Sablikov |
Deformed Wigner crystal in a one-dimensional q... |
10 pages, 11 figures. Misprints fixed |
Phys. Rev. B 76, 045122 (2007) |
10.1103/PhysRevB.76.045122 |
None |
cond-mat.str-el |
http://arxiv.org/licenses/nonexclusive-distrib... |
The spatial Fourier spectrum of the electron... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Gindikin, Yasha, ], [Sablikov, Vladimir A., ]] |
2019 |
16 |
0704.1454 |
James R. Graham |
James R. Graham (1), Bruce Macintosh (2), Rene... |
Ground-Based Direct Detection of Exoplanets wi... |
White paper submitted to the NSF-NASA-DOE Astr... |
None |
None |
None |
astro-ph |
None |
The Gemini Planet (GPI) imager is an "extrem... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Graham, James R., ], [Macintosh, Bruce, ], [... |
2019 |
17 |
0704.1507 |
David Ardila |
D.R. Ardila, D.A. Golimowski, J.E. Krist, M. C... |
HST/ACS Coronagraphic Observations of the Dust... |
Accepted to ApJ |
None |
None |
None |
astro-ph |
None |
We present ACS/HST coronagraphic observation... |
[{'version': 'v1', 'created': 'Wed, 11 Apr 200... |
[[Ardila, D. R., ], [Golimowski, D. A., ], [Kr... |
2019 |
18 |
0704.1579 |
Jose Alfonso Lopez Aguerri |
J. A. L. Aguerri, R. Sanchez-Janssen and C. Mu... |
A Study of Catalogued Nearby Galaxy Clusters i... |
19 pages, 11 figures, accepted for publication... |
None |
10.1051/0004-6361:20066478 |
None |
astro-ph |
None |
We have selected a sample of 88 nearby (z<0.... |
[{'version': 'v1', 'created': 'Thu, 12 Apr 200... |
[[Aguerri, J. A. L., ], [Sanchez-Janssen, R., ... |
2019 |
19 |
0704.1776 |
Joerg Junkersfeld |
H. van Pee, O. Bartholomy, V. Crede (for the C... |
Photoproduction of pi0-mesons off protons from... |
17 pages, 17 figures |
Eur.Phys.J.A31:61-77,2007 |
10.1140/epja/i2006-10160-3 |
None |
nucl-ex |
None |
Photoproduction of pi0 mesons was studied wi... |
[{'version': 'v1', 'created': 'Fri, 13 Apr 200... |
[[van Pee, H., , for the CB-ELSA Collaboration... |
2019 |
first_df=first_data.merge(df_taxonomy, on="categories", how="left").drop_duplicates(["id","group_name"]).groupby("group_name").agg({"id":"count"}).sort_values(by="id",ascending=False).reset_index()
first_df
|
group_name |
id |
0 |
Physics |
162521 |
1 |
Computer Science |
103933 |
2 |
Mathematics |
92523 |
3 |
Electrical Engineering and Systems Science |
14555 |
4 |
Statistics |
11618 |
5 |
Quantitative Biology |
5205 |
6 |
Quantitative Finance |
2151 |
7 |
Economics |
1776 |
取第一个类别时,论文分类(大类)情况
mydf=first_df
fig = plt.figure(figsize=(15,12))
explode = (0, 0, 0, 0.2, 0.3, 0.3, 0.2, 0.1)
plt.pie(mydf["id"], labels=mydf["group_name"], autopct='%1.2f%%',
startangle=160, explode=explode)
plt.tight_layout()
plt.show()

统计计算机小类(2019年和2020年)的论文数量
group_name="Computer Science"
cats = data.merge(df_taxonomy, on="categories").query("group_name == @group_name")
mydf=cats.groupby(["year","category_name"]).count().reset_index().pivot(index="category_name", columns="year",values="id")
mydf["2019+2020"]=mydf[2019]+mydf[2020]
mydf.sort_values("2019+2020",ascending=False)
year |
2019 |
2020 |
2019+2020 |
category_name |
|
|
|
Computer Vision and Pattern Recognition |
5559 |
6517 |
12076 |
Computation and Language |
2153 |
2906 |
5059 |
Cryptography and Security |
1067 |
1238 |
2305 |
Robotics |
917 |
1298 |
2215 |
Networking and Internet Architecture |
864 |
783 |
1647 |
Data Structures and Algorithms |
711 |
902 |
1613 |
Distributed, Parallel, and Cluster Computing |
715 |
774 |
1489 |
Software Engineering |
659 |
804 |
1463 |
Artificial Intelligence |
558 |
757 |
1315 |
Human-Computer Interaction |
420 |
580 |
1000 |
Logic in Computer Science |
470 |
504 |
974 |
Computers and Society |
346 |
564 |
910 |
Machine Learning |
177 |
538 |
715 |
Databases |
282 |
342 |
624 |
Computer Science and Game Theory |
281 |
323 |
604 |
Information Retrieval |
245 |
331 |
576 |
Programming Languages |
268 |
294 |
562 |
Systems and Control |
415 |
133 |
548 |
Social and Information Networks |
202 |
325 |
527 |
Neural and Evolutionary Computing |
235 |
279 |
514 |
Computational Geometry |
199 |
216 |
415 |
Computational Complexity |
131 |
188 |
319 |
Computational Engineering, Finance, and Science |
108 |
205 |
313 |
Formal Languages and Automata Theory |
152 |
137 |
289 |
Digital Libraries |
125 |
157 |
282 |
Graphics |
116 |
151 |
267 |
Hardware Architecture |
95 |
159 |
254 |
Emerging Technologies |
101 |
84 |
185 |
Multiagent Systems |
85 |
90 |
175 |
Discrete Mathematics |
84 |
81 |
165 |
Multimedia |
76 |
66 |
142 |
Other Computer Science |
67 |
69 |
136 |
Performance |
45 |
51 |
96 |
Symbolic Computation |
44 |
36 |
80 |
Mathematical Software |
27 |
45 |
72 |
Operating Systems |
36 |
33 |
69 |
Numerical Analysis |
40 |
11 |
51 |
Sound |
7 |
4 |
11 |
General Literature |
5 |
5 |
10 |