Python基本数据统计(三)---- Grouping & Merge

6. Grouping

  6.1 统计近一年每个月的开盘天数:

In [38]: tempdf
Out[38]: 
                 open      close       high        low      volume  month
2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
...
2017-01-20  75.989998  76.199997  76.910004  75.389999   8382000.0      1

[252 rows x 6 columns]

In [39]: tempdf.groupby('month').count()
Out[39]: 
       open  close  high  low  volume
month                                
1        19     19    19   19      19
2        20     20    20   20      20
...
12       21     21    21   21      21

In [41]: tempdf.groupby('month').count().open
Out[41]: 
month
1     19
2     20
3     22
4     21
5     21
6     22
7     20
8     23
9     21
10    21
11    21
12    21
Name: open, dtype: int64
tempdf.groupby('month').count().open

  6.2 统计近一年每个月的总成交量:(计算总和,再选出需求列)

In [43]: tempdf.groupby('month').sum()
Out[43]: 
              open        close         high          low       volume
month                                                                 
1      1317.435962  1311.872716  1328.947705  1301.086107  190434600.0
2      1056.469042  1058.853017  1066.995761  1046.069857  159711400.0
3      1281.837069  1286.438220  1293.423343  1270.692270  129864200.0
4      1296.296570  1299.964789  1308.367777  1288.042310   99855500.0
5      1332.088683  1333.182926  1342.942408  1323.906527   96741700.0
6      1365.402100  1363.315079  1376.771659  1350.264324  121788800.0
7      1244.562320  1247.524731  1254.559166  1237.280184   90064900.0
8      1484.674795  1484.189312  1492.571208  1475.708341   77514100.0
9      1342.668112  1341.954764  1353.150415  1332.294785   95572800.0
10     1331.611493  1329.281184  1341.236638  1320.451196  116243400.0
11     1459.883671  1463.727278  1471.155603  1449.657290   99527200.0
12     1549.889790  1549.352107  1561.082060  1538.528258   75948200.0

In [44]: tempdf.groupby('month').sum().volume
Out[44]: 
month
1     190434600.0
2     159711400.0
3     129864200.0
4      99855500.0
5      96741700.0
6     121788800.0
7      90064900.0
8      77514100.0
9      95572800.0
10    116243400.0
11     99527200.0
12     75948200.0
Name: volume, dtype: float64
tempdf.groupby('month').sum().volume

  6.3 前两例总结为df_object.groupby('month')后跟  .sum()  .mean()  .min()  .max()

  6.4 更高效地统计近一年每个月的总成交量:(先选出需求列,再计算和。处理大数据时效果明显)

In [46]: tempdf.groupby('month').volume.sum()
Out[46]: 
month
1     190434600.0
2     159711400.0
3     129864200.0
4      99855500.0
5      96741700.0
6     121788800.0
7      90064900.0
8      77514100.0
9      95572800.0
10    116243400.0
11     99527200.0
12     75948200.0
Name: volume, dtype: float64
tempdf.groupby('month').volume.sum()

7. Merge

  7.1 Append

In [49]: quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05'])
Out[49]: 
                 open      close       high        low      volume
2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0
2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0
2016-12-01  72.142314  72.221976  72.460955  71.475161   4298200.0
2016-12-02  72.212016  71.554823  72.351429  71.126648   2959100.0
2016-12-05  72.022829  71.724100  72.331509  71.684269   3929100.0
quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05'])

  7.2 Concat

    7.2.1 两个相同逻辑结构的对象连接

In [52]: pieces = [tempdf[:2],tempdf[len(tempdf)-2:]]

In [53]: pieces
Out[53]: 
[                 open      close       high        low      volume  month
 2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
 2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1,
                  open      close       high        low     volume  month
 2017-01-19  77.610001  76.690002  77.779999  76.610001  8111400.0      1
 2017-01-20  75.989998  76.199997  76.910004  75.389999  8382000.0      1]

In [54]: pd.concat(pieces)
Out[54]: 
                 open      close       high        low      volume  month
2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
2017-01-19  77.610001  76.690002  77.779999  76.610001   8111400.0      1
2017-01-20  75.989998  76.199997  76.910004  75.389999   8382000.0      1
pd.concat([tempdf[:2],tempdf[len(tempdf)-2:]])

    7.2.2 两个不同逻辑结构的对象连接

In [56]: piece1 = quotesdf[:3]

In [57]: piece2 = tempdf[:3]

In [58]: piece1
Out[58]: 
                 open      close       high        low      volume
2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0
2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0
2016-01-26  54.154255  54.046338  54.684026  53.918800  12844800.0

In [59]: piece2
Out[59]: 
                 open      close       high        low      volume  month
2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
2016-01-26  54.154255  54.046338  54.684026  53.918800  12844800.0      1

In [60]: pd.concat([piece1,piece2],ignore_index=True)
Out[60]: 
       close       high        low  month       open      volume
0  54.016907  57.774347  53.114334    NaN  57.283819  43783400.0
1  53.977664  54.713455  53.114334    NaN  53.428272  18498300.0
2  54.046338  54.684026  53.918800    NaN  54.154255  12844800.0
3  54.016907  57.774347  53.114334    1.0  57.283819  43783400.0
4  53.977664  54.713455  53.114334    1.0  53.428272  18498300.0
5  54.046338  54.684026  53.918800    1.0  54.154255  12844800.0

In [61]: pd.concat([piece1,piece2],ignore_index=False)
Out[61]: 
                close       high        low  month       open      volume
2016-01-22  54.016907  57.774347  53.114334    NaN  57.283819  43783400.0
2016-01-25  53.977664  54.713455  53.114334    NaN  53.428272  18498300.0
2016-01-26  54.046338  54.684026  53.918800    NaN  54.154255  12844800.0
2016-01-22  54.016907  57.774347  53.114334    1.0  57.283819  43783400.0
2016-01-25  53.977664  54.713455  53.114334    1.0  53.428272  18498300.0
2016-01-26  54.046338  54.684026  53.918800    1.0  54.154255  12844800.0
pd.concat([piece1,piece2],ignore_index=True)

   7.3 Join  -- 将 [AXP和KO近一年中每个月的交易总量表(包含code)] 与 [30只道琼斯股票信息] 合并

In [190]: AKdf
Out[190]: 
            volume code  month
month                         
1      190434600.0  AXP      1
2      159711400.0  AXP      2
3      129864200.0  AXP      3
4       99855500.0  AXP      4
5       96741700.0  AXP      5
6      121788800.0  AXP      6
7       90064900.0  AXP      7
8       77514100.0  AXP      8
9       95572800.0  AXP      9
10     116243400.0  AXP     10
11      99527200.0  AXP     11
12      75948200.0  AXP     12
1      244121600.0   KO      1
2      304133400.0   KO      2
3      347238100.0   KO      3
4      317282000.0   KO      4
5      230074600.0   KO      5
6      265483400.0   KO      6
7      235959400.0   KO      7
8      235118300.0   KO      8
9      251007200.0   KO      9
10     264839100.0   KO     10
11     316271200.0   KO     11
12     280949400.0   KO     12

In [191]: djidf
Out[191]: 
    code                                         name lasttrade
0   AAPL                                   Apple Inc.   120.000
1    AXP                     American Express Company    76.200
2     BA                           The Boeing Company   159.530
3    CAT                             Caterpillar Inc.    94.580
4   CSCO                          Cisco Systems, Inc.    30.100
5    CVX                          Chevron Corporation   115.600
6     DD         E. I. du Pont de Nemours and Company    73.030
7    DIS                      The Walt Disney Company   107.660
8     GE                     General Electric Company    30.530
9     GS                The Goldman Sachs Group, Inc.   232.200
10    HD                         The Home Depot, Inc.   135.600
11   IBM  International Business Machines Corporation   170.550
12  INTC                            Intel Corporation    36.940
13   JNJ                        Johnson & Johnson   114.150
14   JPM                     JPMorgan Chase & Co.    83.670
15    KO                        The Coca-Cola Company    41.320
16   MCD                       McDonald's Corporation   122.260
17   MMM                                   3M Company   178.490
18   MRK                        Merck & Co., Inc.    62.530
19  MSFT                        Microsoft Corporation    62.740
20   NKE                                   NIKE, Inc.    53.200
21   PFE                                  Pfizer Inc.    31.770
22    PG             The Procter & Gamble Company    87.450
23   TRV                The Travelers Companies, Inc.   118.020
24   UNH              UnitedHealth Group Incorporated   158.660
25   UTX              United Technologies Corporation   110.790
26     V                                    Visa Inc.    81.840
27    VZ                  Verizon Communications Inc.    52.720
28   WMT                        Wal-Mart Stores, Inc.    67.180
29   XOM                      Exxon Mobil Corporation    85.890

In [192]: pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)
Out[192]: 
   code                      name       volume  month
0   AXP  American Express Company  190434600.0      1
1   AXP  American Express Company  159711400.0      2
2   AXP  American Express Company  129864200.0      3
3   AXP  American Express Company   99855500.0      4
4   AXP  American Express Company   96741700.0      5
5   AXP  American Express Company  121788800.0      6
6   AXP  American Express Company   90064900.0      7
7   AXP  American Express Company   77514100.0      8
8   AXP  American Express Company   95572800.0      9
9   AXP  American Express Company  116243400.0     10
10  AXP  American Express Company   99527200.0     11
11  AXP  American Express Company   75948200.0     12
12   KO     The Coca-Cola Company  244121600.0      1
13   KO     The Coca-Cola Company  304133400.0      2
14   KO     The Coca-Cola Company  347238100.0      3
15   KO     The Coca-Cola Company  317282000.0      4
16   KO     The Coca-Cola Company  230074600.0      5
17   KO     The Coca-Cola Company  265483400.0      6
18   KO     The Coca-Cola Company  235959400.0      7
19   KO     The Coca-Cola Company  235118300.0      8
20   KO     The Coca-Cola Company  251007200.0      9
21   KO     The Coca-Cola Company  264839100.0     10
22   KO     The Coca-Cola Company  316271200.0     11
23   KO     The Coca-Cola Company  280949400.0     12
pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)

  +    =   

 

posted on 2017-01-22 16:13  你的踏板车要滑向哪里  阅读(389)  评论(0编辑  收藏  举报

导航