Python Reference in Data Analysis / Mining Tools
If you are already familiar with the module/package loading methods of Python, the following table is relatively easy to find.
Python is referenced in the following table as a module. Some modules are not native modules. Please use pip install * to install;
Mechine Learning
|
Connector & IO
Database
| Category | Python |
|---|---|
| MySQL | mysql-connector-python(Official) |
| Oracle | cx_Oracle |
| Redis | redis |
| MongoDB | pymongo |
| neo4j | py2neo |
| Cassandra | cassandra-driver |
| ODBC | pyodbc |
| JDBC | Unknown[Jython Only] |
IO
| Category | Python |
|---|---|
| excel | xlsxWriter, pandas.(from/to)_excel, openpyxl |
| csv | csv.writer |
| json | json |
| picture | PIL |
Statistics
| Category | Python |
|---|---|
| 描述性统计汇总 | scipy.stats.descirbe |
| 均值 | scipy.stats.gmean(几何平均数), scipy.stats.hmean(调和平均数), numpy.mean, numpy.nanmean, pandas.Series.mean |
| 中位数 | numpy.median, numpy.nanmediam, pandas.Series.median |
| 众数 | scipy.stats.mode, pandas.Series.mode |
| 分位数 | numpy.percentile, numpy.nanpercentile, pandas.Series.quantile |
| 经验累积函数(ECDF) | statsmodels.tools.ECDF |
| 标准差 | scipy.stats.std, scipy.stats.nanstd, numpy.std, pandas.Series.std |
| 方差 | numpy.var, pandas.Series.var |
| 变异系数 | scipy.stats.variation |
| 协方差 | numpy.cov, pandas.Series.cov |
| (Pearson)相关系数 | scipy.stats.pearsonr, numpy.corrcoef, pandas.Series.corr |
| 峰度 | scipy.stats.kurtosis, pandas.Series.kurt |
| 偏度 | scipy.stats.skew, pandas.Series.skew |
| 直方图 | numpy.histogram, numpy.histogram2d, numpy.histogramdd |
Regression (including statistics and machine learning)
| 类别 | Python |
|---|---|
| 普通最小二乘法回归(ols) | statsmodels.ols, sklearn.linear_model.LinearRegression |
| 广义线性回归(gls) | statsmodels.gls |
| 分位数回归(Quantile Regress) | statsmodels.QuantReg |
| 岭回归 | sklearn.linear_model.Ridge |
| LASSO | sklearn.linear_model.Lasso |
| 最小角回归 | sklearn.linear_modle.LassoLars |
| 稳健回归 | statsmodels.RLM |
Hypothetical Test
| 类别 | Python |
|---|---|
| t检验 | statsmodels.stats.ttest_ind, statsmodels.stats.ttost_ind, statsmodels.stats.ttost.paired; scipy.stats.ttest_1samp, scipy.stats.ttest_ind, scipy.stats.ttest_ind_from_stats, scipy.stats.ttest_rel |
| ks检验(检验分布) | scipy.stats.kstest, scipy.stats.kstest_2samp |
| wilcoxon(非参检验,差异检验) | scipy.stats.wilcoxon, scipy.stats.mannwhitneyu |
| Shapiro-Wilk正态性检验 | scipy.stats.shapiro |
| Pearson相关系数检验 | scipy.stats.pearsonr |
Time series
| Category | Python |
|---|---|
| AR | statsmodels.ar_model.AR |
| ARIMA | statsmodels.arima_model.arima |
| VAR | statsmodels.var_model.var |
浙公网安备 33010602011771号