Matminer学习

 

 

Matminer—example 

机器学习预测弹性模量

1.从pandas数据集中下载弹性常数数据;

from matminer.datasets.convenience_loaders import load_elastic_tensor
df = load_elastic_tensor()  # loads dataset in a pandas DataFrame object

2.去除不需要的参数

unwanted_columns = ["volume", "nsites", "compliance_tensor", "elastic_tensor", 
                    "elastic_tensor_original", "K_Voigt", "G_Voigt", "K_Reuss", "G_Reuss"]
df = df.drop(unwanted_columns, axis=1)

3.数据描述符

确定输出( K_VRH、G_VRH, elastic_anisotropy)和输入

1)成分特征:对于不是数字信息的输入特征要进行转换,确定描述符

                         1.化学式转化成pymatgen Composition(类似列表形式)

from matminer.featurizers.conversions import StrToComposition
df = StrToComposition().featurize_dataframe(df, "formula")

  

                         2.将化学式和结构的字母信息转为其他描述符,如电负性,磁性等;

#化学式信息
from matminer.featurizers.composition import ElementProperty ep_feat = ElementProperty.from_preset(preset_name="magpie") df = ep_feat.featurize_dataframe(df, col_id="composition") # input the "composition" column to the featurizer
from matminer.featurizers.conversions import CompositionToOxidComposition
from matminer.featurizers.composition import OxidationStates

df = CompositionToOxidComposition().featurize_dataframe(df, "composition")

os_feat = OxidationStates()
df = os_feat.featurize_dataframe(df, "composition_oxid")
#结构信息,密度等
from matminer.featurizers.structure import DensityFeatures

df_feat = DensityFeatures()
df = df_feat.featurize_dataframe(df, "structure")  # input the structure column to the featurizer

4. 分别采用线性回归和随机森林模型预测体积模量

1)确定input和output

y = df['K_VRH'].values
excluded = ["G_VRH", "K_VRH", "elastic_anisotropy", "formula", "material_id", 
            "poisson_ratio", "structure", "composition", "composition_oxid"]
X = df.drop(excluded, axis=1)
print("There are {} possible descriptors:\n\n{}".format(X.shape[1], X.columns.values))

2)采用线性回归模型;

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

lr = LinearRegression()

lr.fit(X, y)

# get fit statistics
print('training R2 = ' + str(round(lr.score(X, y), 3)))
print('training RMSE = %.3f' % np.sqrt(mean_squared_error(y_true=y, y_pred=lr.predict(X))))


training R2 = 0.927
training RMSE = 19.625

 3)对线性回归模型进行交叉验证,以避免过拟合;

交叉验证的基本思想是:将数据集分为k等份,对于每一份数据集,其中k-1份用作训练集,单独的那一份用作测试集;

交叉验证分为两步,一是划分训练集与测试集,二是运用交叉验证进行模型评估;

数据集划分:k折交叉验证( KFold):

KFold(n_split, random_state, shuffle),参数:n_split:需要划分多少折数,shuffle:是否进行数据打乱,random_state:随机数;

模型交叉验证:

cross_value_score,cross_validate,cross_val_predict

(cross_val_predict 和 cross_val_score的使用方法是一样的,但是它返回的是一个使用交叉验证以后的输出值,而不是评分标准)

 

from sklearn.model_selection import KFold, cross_val_score

# Use 10-fold cross validation (90% training, 10% test)
crossvalidation = KFold(n_splits=10, shuffle=True, random_state=1)
scores = cross_val_score(lr, X, y, scoring='neg_mean_squared_error', cv=crossvalidation, n_jobs=1)
rmse_scores = [np.sqrt(abs(s)) for s in scores]
r2_scores = cross_val_score(lr, X, y, scoring='r2', cv=crossvalidation, n_jobs=1)

print('Cross-validation results:')
print('Folds: %i, mean R2: %.3f' % (len(scores), np.mean(np.abs(r2_scores))))
print('Folds: %i, mean RMSE: %.3f' % (len(scores), np.mean(np.abs(rmse_scores))))


Cross-validation results:
Folds: 10, mean R2: 0.902
Folds: 10, mean RMSE: 22.467

5.画图

from figrecipes import PlotlyFig
from sklearn.model_selection import cross_val_predict

pf = PlotlyFig(x_title='DFT (MP) bulk modulus (GPa)',
               y_title='Predicted bulk modulus (GPa)',
               title='Linear regression',
               mode='notebook',
               filename="lr_regression.html")

pf.xy(xy_pairs=[(y, cross_val_predict(lr, X, y, cv=crossvalidation)), ([0, 400], [0, 400])], 
      labels=df['formula'], 
      modes=['markers', 'lines'],
      lines=[{}, {'color': 'black', 'dash': 'dash'}], 
      showlegends=False
     )

  总结,机器学习预测流程:

      1.采用matminer对数据进行预处理,输入特征;

      2.机器学习预测,选择模型+交叉验证;

 

参考资料:使用sklearn进行交叉验证 - 小舔哥 - 博客园 (cnblogs.com)

                  K折交叉验证的使用之KFold和split函数_沐风大大的博客-CSDN博客_kfold.split

 

posted @ 2021-10-05 20:12  ZHAO_X  阅读(3016)  评论(1)    收藏  举报