scipy.sparse的摘选、错误修正和总结

1、COO_Matrix

不难发现，coo_matrix是可以根据行和列索引进行data值的累加。

>>> row  = np.array([0, 0, 1, 3, 1, 0, 0])
>>> col  = np.array([0, 2, 1, 3, 1, 0, 0])
>>> data = np.array([1, 1, 1, 1, 1, 1, 1])
>>> coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
array([[3, 0, 1, 0],
       [0, 2, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])

2、CSC_Matrix和CSR_Matrix

csr_matrix是按行对矩阵进行压缩的，csc_matrix则是按列对矩阵进行压缩的。通过row_offsets,column_indices，data来确定矩阵。column_indices，data与coo格式的列索引与数值的含义完全相同，row_offsets表示元素的行偏移量。

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

在csr_matrix中，indices代表这些数据对应的是哪一列，而data代表这一列对应的数据。而indptr的作用是对每一个i in range(indptr), data[indptr[i]:indptr[i+1]]属于第i行。这表明data中[0:2]（即前两个数）属于第0行，[2:3]（即第三个数）属于第1行，......csc_matrix同理，只不过indices和indptr交换行列。

3、总结

加载数据文件时使用coo_matrix快速构建稀疏矩阵，然后调用to_csr()、to_csc()、to_dense()把它转换成CSR或稠密矩阵(numpy.matrix)。
coo_matrix格式常用于从文件中进行稀疏矩阵的读写，而csr_matrix格式常用于读入数据后进行稀疏矩阵计算。

posted @ 2024-10-13 21:40 CScgy 阅读(62) 评论(0) 收藏举报来源

刷新页面返回顶部

cscgy

scipy.sparse的摘选、错误修正和总结

1、COO_Matrix

2、CSC_Matrix和CSR_Matrix

3、总结

公告