tush
Tushare 的get_k_data
get_k_data含义是获取k线数据,所以起了这么一个简单的名称。虽然一贯的不标准,不规范,但主要看气质,主要看数据。
新接口融合了get_hist_data和get_h_data两个接口的功能,即能方便获取日周月的低频数据,也可以获取5、15、30和60分钟相对高频的数据。同时,上市以来的前后复权数据也能在一行代码中轻松获得,当然,您也可以选择不复权。
主要参数说明
code
证券代码:支持沪深A、B股支持全部指数支持ETF基金
ktype
数据类型:默认为D日线数据D=日k线 W=周 M=月 5=5分钟 15=15分钟 30=30分钟 60=60分钟
autype
复权类型:qfq-前复权 hfq-后复权 None-不复权,默认为qfq
index
是否为指数:默认为False设定为True时认为code为指数代码
start
开始日期 format:YYYY-MM-DD 为空时取当前日期
end
结束日期 :format:YYYY-MM-DD
数据属性说明
date
日期和时间低频数据时为:YYYY-MM-DD高频数为:YYYY-MM-DD HH:MM
open开盘价
close收盘价
high最高价
low最低价
volume成交量
code证券代码
数据来源与不足经过了考察分析,决定采用鹅厂的数据作为新行情数据接口的数据源。
目前看来数据质量还不错,希望鹅厂继续保持稳定高效的优良作风,为舍不得花钱还天天嗷嗷叫的职业和非职业量化投资人员提供优质数据服务。:)
本接口不足的地方是,目前暂时还没有成交额数据。另外,几类平均线数据也没有提供,而在写这个接口的时候,也由于时间有限,还没有把平均线数据加进来。所以跟get_hist_data比起来,少了以上两类数据。
未来的计划
1、增加包括期货、期权、美股港股在内的多品种支持。
2、根据各类证券品种的数据特点,返回相对应的数据格式和数据属性。
3、提供包括涨跌幅、换手率、量比在内的衍生数据列或者函数接口。
4、将get_k_data打造成一个统一的行情数据接口,即让它成为一个最常用的接口。
使用方法和要点升级或安装全新tushare
1、安装pip install tushare
2、升级pip install tushare --upgrade
检验和使用import tushare as tsprint(ts.__verson__)
要点1、index=True时,接口会自动匹配指数代码例如,要获取上证综指行情,调用方法为:ts.get_k_data('000001', index=True)
目前支持567个指数行情
2、index=True时,没有复权数据,即autype无效
3、本接口的复权数据由数据源直接提供,区别于get_h_data是通过复权因子实时计算
4、几种常见的调用方法
1)获取浦发银行近一年半的前复权日线行情:ts.get_k_data('600000')
2)获取浦发银行近6年后复权周线行情:ts.get_k_data('600000', ktype='W', autype='hfq')
3)获取浦发银行近期5分钟行情:ts.get_k_data('600000', ktype='5')
4)获取沪深300指数10月份日线行情:ts.get_k_data('399300', index=True,start='2016-10-01', end='2016-10-31')
5)获取鹏华银行分级B的60分钟行情:ts.get_k_data('150228', ktype='60')
近日,挖地兔更新了tushare版本。主要是推出了新的函数get_k_data函数。来对此函数做一些分析。
函数头部分:
def get_k_data(code=None, start='', end='', ktype='D', autype='qfq', index=False, retry_count=3, pause=0.001): """ 获取k线数据 --------- Parameters: code:string 股票代码 e.g. 600848 start:string 开始日期 format:YYYY-MM-DD 为空时取当前日期 end:string 结束日期 format:YYYY-MM-DD 为空时取去年今日 autype:string 复权类型,qfq-前复权 hfq-后复权 None-不复权,默认为qfq ktype:string 数据类型,D=日k线 W=周 M=月 5=5分钟 15=15分钟 30=30分钟 60=60分钟,默认为D retry_count : int, 默认 3 如遇网络等问题重复执行的次数 pause : int, 默认 0 重复请求数据过程中暂停的秒数,防止请求间隔时间太短出现的问题 drop_factor : bool, 默认 True 是否移除复权因子,在分析过程中可能复权因子意义不大,但是如需要先储存到数据库之后再分析的话,有该项目会更加灵活
接下来一行行分析(用红色表示get_k_data函数的代码):
1
2
3
|
symbol = ct.INDEX_SYMBOL[code] if index else _code_to_symbol(code) url = '' dataflag = '' |
index若为True直接去预先定义好的字典中找对应的symb,如果index是False,则调用函数_code_to_symbol:
1
2
3
4
5
6
7
8
9
10
11
|
def _code_to_symbol(code): """ 生成symbol代码标志 """ if code in ct.INDEX_LABELS: return ct.INDEX_LIST[code] else : if len (code) ! = 6 : return '' else : return 'sh%s' % code if code[: 1 ] in [ '5' , '6' , '9' ] else 'sz%s' % code |
找到INDEX_LABELS和INDEX_LIST的定义:
1
|
INDEX_LABELS = [ 'sh' , 'sz' , 'hs300' , 'sz50' , 'cyb' , 'zxb' , 'zx300' , 'zh500' ] |
1
2
|
INDEX_LIST = { 'sh' : 'sh000001' , 'sz' : 'sz399001' , 'hs300' : 'sz399300' , 'sz50' : 'sh000016' , 'zxb' : 'sz399005' , 'cyb' : 'sz399006' , 'zx300' : 'sz399008' , 'zh500' : 'sh000905' } |
如果code是以'5','6','9'开头,则在code前加上sh,否则在code前加上sz。
可见这个symbol的主要作用是根据code在前面加上了sh或sz。
1
2
3
4
5
6
7
8
9
10
|
if ktype.upper() in ct.K_LABELS: % K_LABELS = [ 'D' , 'W' , 'M' ] fq = autype if autype is not None else '' % 是否复权以及复权类型 if code[: 1 ] in ( '1' , '5' ) or index: % 如果code是 '1' , '5' 开头或者index(是指数)为真 fq = '' kline = ' ' if autype is None else ' fq' % 只有填 None 才是不复权 url = ct.KLINE_TT_URL % (ct.P_TYPE[ 'http' ], ct.DOMAINS[ 'tt' ], % P_TYPE = { 'http' : 'http://' , 'ftp' : 'ftp://' },DOMAINS定义见下方 kline, fq, symbol, % ' '或者' fq ',具体复权类型或者' ',加了sh或sz的code ct.TT_K_TYPE[ktype.upper()], start, end, % TT_K_TYPE = { 'D' : 'day' , 'W' : 'week' , 'M' : 'month' } fq, _random( 17 )) % 具体复权类型或者'',生成一个 10 * * 16 到 10 * * 17 - 1 之间的随机数 dataflag = '%s%s' % (fq, ct.TT_K_TYPE[ktype.upper()]) % 复权类型或' '并上' day '或' week '或' month' |
1
2
3
4
5
6
7
|
elif ktype in ct.K_MIN_LABELS: % K_MIN_LABELS = [ '5' , '15' , '30' , '60' ] url = ct.KLINE_TT_MIN_URL % (ct.P_TYPE[ 'http' ], ct.DOMAINS[ 'tt' ], % 基本同上 symbol, ktype, ktype, _random( 16 )) dataflag = 'm%s' % ktype % m '5' 或 '15' 或 '30' 或 '60' else : raise TypeError( 'ktype input error.' ) |
1
|
DOMAINS定义: |
上面两个URL的定义
1
2
|
KLINE_TT_URL = '%sweb.ifzq.%s/appstock/app/%skline/get?_var=kline_day%s¶m=%s,%s,%s,%s,320,%s&r=0.%s' KLINE_TT_MIN_URL = '%sifzq.%s/appstock/app/kline/mkline?param=%s,m%s,,320&_var=m%s_today&r=0.%s' |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
for _ in range (retry_count): % retry_count是重做次数,_只是作为一个变量,就跟变量i一样 time.sleep(pause) % 中间暂停的时间 try : request = Request(url) % 使用上面求出的url lines = urlopen(request, timeout = 10 ).read() % 读出数据 if len (lines) < 100 : #no data %如果lines太短,表明未读到数据 return None except Exception as e: print (e) else : lines = lines.decode( 'utf-8' ) if ct.PY3 else lines % PY3 = (sys.version_info[ 0 ] > = 3 ) 这个解码出来的lines在下方 lines = lines.split( '=' )[ 1 ] % 按 '=' 分隔,取第一个分片。 reg = re. compile (r ',{"nd.*?}' ) lines = re.subn(reg, '', lines) % 对lines进行正则表达式替换 js = json.loads(lines[ 0 ]) % 之所以要选lines[ 0 ]是因为subn返回的是一个 tuple ,lines[ 1 ]部分是替换次数 df = pd.DataFrame(js[ 'data' ][symbol][dataflag], columns = ct.KLINE_TT_COLS) % KLINE_TT_COLS就是date, open ,close等六列标题 df[ 'code' ] = symbol if index else code % df新加一列code,且设置为指数代码或股票代码 if ktype in ct.K_MIN_LABELS: % 如果是分钟k线数据 df[ 'date' ] = df[ 'date' ]. map ( lambda x: '%s-%s-%s %s:%s' % (x[ 0 : 4 ], x[ 4 : 6 ], x[ 6 : 8 ], x[ 8 : 10 ], x[ 10 : 12 ])) % date部分改成天 - 时 - 分 - 秒的格式 return df raise IOError(ct.NETWORK_URL_ERROR_MSG) |
lines:
kline_dayhfq = { "code" : 0 , "msg" :" "," data ":{" sz002792 ":{" hfqday ":[[" 2016 - 10 - 26 "," 84.635 "," 82.541 "," 85.268 "," 82.149 "," 27380.000 "],<br>[" 2016 - 10 - 27 "," 82.707 "," 82.556 "," 83.038 "," 80.748 "," 22315.000 "],[" 2016 - 10 - 28 "," 82.903 "," 82.571 "," 83.731 "," 78.428 "," 22165.000 "],<br>[" 2016 - 10 - 31 "," 82.541 "," 81.502 "," 82.556 "," 79.995 "," 16437.000 "],[" 2016 - 11 - 01 "," 81.517 "," 84.319 "," 85.072 "," 81.517 "," 30741.000 "],<br>[" 2016 - 11 - 02 "," 84.349 "," 82.873 "," 85.268 "," 82.707 "," 30526.000 "],[" 2016 - 11 - 03 "," 81.200 "," 81.984 "," 83.611 "," 81.200 "," 24593.000 "],<br>[" 2016 - 11 - 04 "," 81.863 "," 85.720 "," 86.729 "," 81.863 "," 57996.000 "],[" 2016 - 11 - 07 "," 85.464 "," 85.991 "," 86.383 "," 84.756 "," 31572.000 "],<br>[" 2016 - 11 - 08 "," 86.292 "," 84.801 "," 86.322 "," 79.845 "," 29328.000 "]],<br>" qt ":{" sz002792 ":[" 51 "," \u901a\u5b87\u901a\u8baf "," 002792 "," 55.91 "," 56.29 "," 56.25 "," 36536 "," 18510 "," 18026 "," 55.91 "," 38 "," 55.90 "," 127 ",<br>" 55.89 "," 201 "," 55.85 "," 10 "," 55.83 "," 10 "," 55.99 "," 30 "," 56.00 "," 3 "," 56.10 "," 10 "," 56.12 "," 8 "," 56.15 "," 26 ",<br>" 15 : 00 : 04 \ / 55.91 \ / 301 \ / S\ / 1682891 \ / 15265 | 14 : 57 : 00 \ / 55.89 \ / 1 \ / B\ / 5589 \ / 15163 | 14 : 56 : 52 \ / 55.71 \ / 90 \ / S\ / 503812 \ / 15154 |<br> 14 : 56 : 45 \ / 55.89 \ / 18 \ / B\ / 100602 \ / 15146 | 14 : 56 : 39 \ / 55.82 \ / 8 \ / S\ / 44544 \ / 15140 | 14 : 56 : 36 \ / 56.12 \ / 12 \ / B\ / 67324 \ / 15136 "," 20161109150137 ",<br>" - 0.38 "," - 0.68 "," 56.75 "," 54.46 "," 55.89 \ / 36235 \ / 201929177 "," 36536 "," 20361 "," 8.12 "," 56.40 "," "," 56.75 "," 54.46 "," 4.07 "," 25.16 ",<br>" 125.80 "," 7.09 "," 61.92 "," 50.66 "," 1.05 "]," market ":[" 2016 - 11 - 09 20 : 57 : 01 |HK_close_\u5df2\u6536\u76d8|SH_close_\u5df2\u6536\u76d8|<br>SZ_close_\u5df2\u6536\u76d8|US_close_\u672a\u5f00\u76d8|SQ_close_\u5df2\u4f11\u5e02|DS_close_\u5df2\u4f11\u5e02|ZS_close_<br>\u5df2\u4f11\u5e02 "]," zjlx ":[" sz002792 "," 8206.89 "," 10347.24 "," - 2140.35 "," - 10.51 "," 12154.32 "," 10013.97 "," 2140.35 "," 10.51 ",<br>" 20361.21 "," 41080.23 "," 41732.96 "," \u901a\u5b87\u901a\u8baf "," 20161109 "," 20161108 ^ 5889.20 ^ 7540.99 "," 20161107 ^ 6888.64 ^ 7504.11 ",<br>" 20161104 ^ 15471.59 ^ 10227.30 "," 20161103 ^ 4623.91 ^ 6113.32 "]}," mx_price ":{" mx ":{" data ":[]," timeline ":[]}," price ":{" data ":[]}},<br>" prec ":" 22.940 "," version ":" 5 "}}} |
http://www.cnblogs.com/yzymickey/p/6048486.html
生成时间序列:
dates = pandas.date_range('2013-01-01',periods = 6)
Pandas读取excel数据:
df=pd.read_excel("mystock.xls")
DF排序:
df=df.sort('data',ascending=False)
获取单个年月日再组合:
import datetime t = datetime.date(datetime.date.today().year,\ datetime.date.today().month,datetime.date.today().day) print(t) 输出:2016-09-16 start_day=start_day.strftime("%Y-%m-%d") end_day=end_day.strftime("%Y-%m-%d")
import datetime now = datetime.datetime.now() y = now.year m = now.month d = now.day
http://www.cnblogs.com/wumac/p/5876572.html
创业板平均市盈率
import os import pandas as pd stock_code_list = [] for root,dirs,files in os.walk('stock data'): if files: for f in files: if '.csv' in f: stock_code_list.append(f.split('.csv')[0]) all_stock = pd.DataFrame() for code in stock_code_list: if code[2]!='3': continue print(code) stock_data = pd.read_csv('stock data/'+code+'.csv',parse_dates=[1]) stock_data = stock_data[stock_data['PE_TTM'].notnull()]#删除PE_TTM为空的行 #PE_TTM = 总市值/净利润_TTM,这里通过这个公式计算净利润_TTM stock_data['净利润']=stock_data['market_value']/stock_data['PE_TTM'] #选取需要的字段,去除其他不需要的字段 stock_data = stock_data[['code','date','market_value','净利润']] #将该股票的合并 all_stock = all_stock.append(stock_data,ignore_index=True) #基于all_stock表格,通过groupby语句,计算创业板股票每天的平均市盈率 #通过groupby语句计算每天所有股票的市值之各、净利润之和,以及当天交易的股票的数量 output = all_stock.groupby('date')[['market_value','净利润']].sum() output['股票数量'] = all_stock.groupby('date').size() #平均市盈率=所有股票的市值之和/所有股票的净利润之和 output['创业板平均市盈率'] = output['market_value']/output['净利润'] #算好的数据输出 output.to_csv('创业板平均市盈率.csv',encoding='gbk')
将日线数据转为周线、月线或其他周期
import os import pandas as pd stock_data = pd.read_csv('stock data/sh600898.csv',parse_dates=[1]) #设定转换周期period_type 转换为周是'W',月'M',季度线'Q',五分钟'5min',12天'12D' period_type = 'W' #将[date]设定为 index inplace是原地修改,不要创建一个新对象 stock_data.set_index('date',inplace=True) #进行转换,周线的每个变量都等于那一周中最后一个交易日的变量值 period_stock_data = stock_data.resample(period_type,how='last') #周线的change等于那一周中每日change的连续相乘 period_stock_data['change'] = stock_data['change'].resample(period_type,how=lambda x:(x+1.0).prod()-1.0) #周线的open等于那一周中第一个交易日的open period_stock_data['open'] = stock_data['open'].resample(period_type,how='first') #周线的high等于那一周中的high的最大值 period_stock_data['high'] = stock_data['high'].resample(period_type,how='max') #周线的low等于那一周中的low的最大值 period_stock_data['low'] = stock_data['low'].resample(period_type,how='min') #周线的volume和money等于那一周中volume和money各自的和 period_stock_data['volume'] = stock_data['volume'].resample(period_type,how='sum') period_stock_data['money'] = stock_data['money'].resample(period_type,how='sum') #计算周线turnover period_stock_data['turnover'] = period_stock_data['volume']/\ (period_stock_data['traded_market_value']/period_stock_data['close']) #股票在有些周一天都没有交易,将这些周去除 period_stock_data = period_stock_data[period_stock_data['code'].notnull()] period_stock_data.reset_index(inplace=True) #导出数据 period_stock_data.to_csv('week_stock_data.csv',index=False)