3.2 DS汇总
# Pandas
## Spark dataframe vs Pandas dataframe
https://blog.csdn.net/weixin_31866177/article/details/120754456
## pandas100个骚操作
https://blog.csdn.net/yuxiaosmd/article/details/114647974
## pandas50道练习题
https://www.jianshu.com/p/4250e35c2d45
https://www.jianshu.com/p/b4338aa7bf55
https://www.jianshu.com/p/3f3c0fb8fa4e
https://www.jianshu.com/p/7ceafbe79ed4
https://www.heywhale.com/mw/project/59e77a636d213335f38daec2
## read_csv读入csv文件报错 'utf-8' codec can't decode byte 0xbe in position 0
https://blog.csdn.net/K_nightWang/article/details/79862206
## ValueError: Cannot mask with non-boolean array containing NA / NaN values
https://blog.csdn.net/K_nightWang/article/details/79862206
## 将含有指定字符串的行筛选出来
https://blog.csdn.net/qq_46240549/article/details/109553344
## 生成sql语句字符串
https://blog.csdn.net/SeafyLiang/article/details/120262833
## Pandas 练习题
参考https://blog.csdn.net/cd_sywe/article/details/103150386
https://www.jianshu.com/p/4ab8720071dd
## pandas按行、按列遍历
https://blog.csdn.net/weixin_43115411/article/details/126030711
## pandas行、列显示不完全
https://blog.csdn.net/m0_46624667/article/details/109773960
https://blog.csdn.net/weekdawn/article/details/81389865
https://blog.csdn.net/wugou2014/article/details/129770484
## pandas中to_dict的用法详解
参考https://www.jb51.net/article/141481.htm
## pandas报错 'Series' object has no attribute 'as_matrix'
参考https://blog.csdn.net/weixin_44550865/article/details/105785876
## pandas报错:index must be monotonic increasing or decreasing
参考https://blog.csdn.net/weixin_44149358/article/details/110874865
## pandas报错:'Series' object has no attribute 'order'
参考https://blog.csdn.net/NextAction/article/details/85097904
## pandas报错:ModuleNotFoundError: No module named 'pandas.io.data'
参考https://blog.csdn.net/qq_23347459/article/details/104966818
## pandas:Timestamp object has no attribute dt
参考https://stackoverflow.com/questions/62803633/timestamp-object-has-no-attribute-dt/62815103
## pandas:unique()函数与nunique()函数区别
参考https://blog.csdn.net/feizxiang3/article/details/93380525
## pandas:groupby详解
参考https://zhuanlan.zhihu.com/p/101284491
## pandas:map、apply、applymap详解
参考https://zhuanlan.zhihu.com/p/100064394
## pandas:数据类型转换
参考https://blog.csdn.net/u010916338/article/details/82385618
## pandas:两列转换成字典的健和值
参考https://blog.csdn.net/mao15827639402/article/details/107832903
## pandas:删除满足条件元素所在的行
参考https://blog.csdn.net/u014636245/article/details/104202889
## pandas:判断值是否在列表中
参考https://blog.csdn.net/qq_38115310/article/details/103290582
## pandas:findall()
https://blog.csdn.net/claroja/article/details/64927541
## pandas中Series对象下的str方法汇总
参考https://blog.csdn.net/weixin_43750377/article/details/107979607
## pandas中groupby的参数:as_index
参考https://www.cnblogs.com/zhangzhixing/p/11074416.html
## pandas报错: xlrd.biffh.XLRDError: Excel xlsx file; not supported
参考https://blog.csdn.net/weixin_44073728/article/details/111054157
## pandas报错: TypeError: read_excel() got an unexpected keyword argument `sheetname`
参考https://www.cnblogs.com/mmjing/p/11935889.html
## pandas.get_dummies的用法
参考https://blog.csdn.net/maymay_/article/details/80198468
## pandas报错:Getting TypeError: reduction operation 'argmax' not allowed for this dtype when trying to use idxmax()
参考https://stackoverflow.com/questions/48719937
## pandas报错:ImportError: No module named 'pandas.tools'
参考https://www.cnblogs.com/zhhy236400/p/11111036.html
## pandas函数:diff
参考https://www.cnblogs.com/anovana/p/10429237.html
## dataframe和list转换
https://blog.csdn.net/liujingwei8610/article/details/125438336
## dataframe的reset_index()
https://blog.csdn.net/longge_number1/article/details/117203253
## dataframe的窗口函数rolling()
https://blog.csdn.net/chinacmt/article/details/104757646
## pandas时间序列
https://www.jianshu.com/p/93558d24509c
https://www.jianshu.com/p/8d3d612afbb2
## rank函数
https://blog.51cto.com/u_15671528/5358933
# Numpy
## numpy.random.randint用法
参考https://blog.csdn.net/u011851421/article/details/83544853
## Print 数组无法完整输出解决方法
参考https://blog.csdn.net/hustwayne/article/details/84393485
## numpy中stack(),hstack(),vstack()函数详解
参考https://blog.csdn.net/csdn15698845876/article/details/73380803
## numpy.random.uniform介绍
参考https://blog.csdn.net/u013920434/article/details/52507173
## 报错 AxisError: axis 0 is out of bounds for array of dimension 0
参考https://www.cnblogs.com/WMT-Azura/p/13632440.html
## numpy,pandas计算均值、方差、标准差
参考https://www.shangmayuan.com/a/88af74b8434f49b0ba255fe2.html
## numpy.reshape(-1,1)
参考https://blog.csdn.net/qq_42804678/article/details/99062431
## numpy数组拼接方法
参考https://blog.csdn.net/zyl1042635242/article/details/43162031
## numpy练习题
https://www.machinelearningplus.com/python/101-numpy-exercises-python/
## numpy:squeeze()函数
https://blog.csdn.net/weixin_44001371/article/details/125008596
## numpy中mat与array在相乘上的区别

# 统计
## 用python做z检验,t检验
https://blog.csdn.net/robert_chen1988/article/details/103378351
https://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.ztest.html#statsmodels-stats-weightstats-ztest
## 用python求统计功效
https://www.statsmodels.org/dev/generated/statsmodels.stats.power.zt_ind_solve_power.html
https://campus.datacamp.com/courses/practicing-statistics-interview-questions-in-python/statistical-experiments-and-significance-testing?ex=7
## 正态分布上求概率
https://blog.csdn.net/a857553315/article/details/117554389
# Sklearn
## Sklearn的train_test_split用法
参考https://blog.csdn.net/fxlou/article/details/79189106
## No module named 'sklearn.cross_validation'解决方法
参考https://blog.csdn.net/rocling/article/details/89002209
## No module named 'sklearn.grid_search'
参考https://blog.csdn.net/u012852847/article/details/84639213/
## DictVectorizer的使用
参考https://blog.csdn.net/qq_36847641/article/details/78279309
## Sklearn数据预处理函数fit_transform()和transform()的区别
参考https://blog.csdn.net/quiet_girl/article/details/72517053
## Sklearn数据预处理函数StandardScaler
参考https://blog.csdn.net/wzyaiwl/article/details/90549391
## sklearn:文本特征提取方法CountVectorizer
参考https://blog.csdn.net/weixin_38278334/article/details/82320307
## sklearn:predict()与predict_proba()用法区别
参考https://www.cnblogs.com/mrtop/p/10309083.html
## sklearn报错:Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty
参考https://blog.csdn.net/qq_22592457/article/details/103504796
## sklearn报错:TypeError: 'KFold' object is not iterable
参考https://stackoverflow.com/questions/48641290
# 理论
## TF-IDF理论及Sklearn的TfidfVectorizer函数
参考http://www.ruanyifeng.com/blog/2013/03/tf-idf.html
https://blog.csdn.net/m0_37991005/article/details/105074754
## 文本向量化实战
参考https://zhuanlan.zhihu.com/p/44917421
## jieba分词模块的基本用法
参考https://www.cnblogs.com/jiayongji/p/7119065.html
## ROC曲线-阈值评价标准
参考https://blog.csdn.net/abcjennifer/article/details/7359370
## 混淆矩阵
参考https://blog.csdn.net/u011734144/article/details/80277225
## 相关性分析
https://blog.csdn.net/chenlei456/article/details/123603257
## 关联规则——Apriori算法
https://zhuanlan.zhihu.com/p/432733354
# 其他
## 打开.data文件的步骤
参考https://blog.csdn.net/ziqingnian/article/details/108013340
## name 'json' is not defined
参考https://blog.csdn.net/qq_38161040/article/details/91410095
## string indices must be integers 错误原因
参考https://blog.csdn.net/weixin_43256057/article/details/83867876
## 解析时间戳并以毫秒为单位计算时间差
参考https://oomake.com/question/1192202
## time与datetime模块如何转换
参考http://www.jquerycn.cn/a_38061
## csv模块csv.writer().writerow()产生空行的问题
参考https://blog.csdn.net/youzhouliu/article/details/53138661
## 生成csv文件时内容中包含逗号的处理方式
参考https://blog.csdn.net/hjp1137/article/details/48656049
## cursor游标讲解
参考https://blog.csdn.net/pdcfighting/article/details/104085622/
## 报错:TypeError: Object of type Decimal is not JSON serializable
参考https://blog.csdn.net/ILovePythonhao/article/details/105347755
posted on 2022-04-07 22:25 Hiteration 阅读(113) 评论(0) 收藏 举报
浙公网安备 33010602011771号