3-7学习笔记

1.在clone项目时出现了一下错误:

error: RPC failed; curl 18 transfer closed with outstanding read data remaining

尝试了设置缓冲区大小：https://blog.csdn.net/CLOUD_J/article/details/90241544不行。

https://stackoverflow.com/questions/38618885/error-rpc-failed-curl-transfer-closed-with-outstanding-read-data-remaining

这个是ok的：

$ git clone http://github.com/large-repository --depth 1
$ cd large-repository
$ git fetch --unshallow

3-8日——————————————

1.from string import punctuation

# import string library function 
import string 
    
# Storing the sets of punctuation in variable result 
result = string.punctuation 
    
# Printing the punctuation values 
print(result) 
print(len(result))

#输出：
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
32

从上面可以看出，它包含了32个标点符号，可以用在文本清洗中，去掉特殊符号~~~

2.from collections import Counter

https://www.hackerrank.com/challenges/collections-counter/problem

>>> from collections import Counter
>>> 
>>> myList = [1,1,2,3,4,5,3,2,3,4,2,1,2,3]
>>> print Counter(myList)
Counter({2: 4, 3: 4, 1: 3, 4: 2, 5: 1})
>>>
>>> print Counter(myList).items()
[(1, 3), (2, 4), (3, 4), (4, 2), (5, 1)]
>>> 
>>> print Counter(myList).keys()
[1, 2, 3, 4, 5]
>>> 
>>> print Counter(myList).values()
[3, 4, 4, 2, 1]

从上面可以看出，它是一个计数器，可以将元素出现次数统计为一个字典。

3.什么是停用词？

https://blog.51cto.com/kinglixing/688063

http://sofasofa.io/forum_main_post.php?postid=1001801

简单来说，就是虚词，没有意义，或者一些太常用的词，去掉它们能够增强搜索能力。

https://github.com/goto456/stopwords

这里就给出了百度等常用停用词表。

4.验证集到底有什么用？有必要吗？在大数据集情况下合适？

在这里训练过程中就在验证集上跑：

保持疑问。。https://www.zhihu.com/question/26588665

说是这么说的，验证集就是用来微调参数的，比如模型结构而不是模型权重，例如调整模型的隐层数，这个就很迷。。

这个先放这了，等到我以后看到有k折交叉验证的代码时再来学习。

7.numpy.hstack与vstack

https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html

对它有了一个全新的认识：

import numpy as np
a = np.array([[1],[2],[3]])
b = np.array([[2],[3],[4]])
c=np.hstack((a,b))
d=np.vstack((a,b))

可以进行一个简单的分析：

a、b的size为(3,1)，hstack是按列合并，那么就是将列相加size变成了(3,2)。

而vstack是按行相加，变为(6,1)，结果分别如下:

>>> c
array([[1, 2],
       [2, 3],
       [3, 4]])
>>> d
array([[1],
       [2],
       [3],
       [2],
       [3],
       [4]])

注意，两者均没有其他参数，只有要合并的这些参数，也可以多个一起合并。

也可以三个或以上:

f=np.array([[2],[3],[7]])
e=np.hstack((a,b,f))

>>> e
array([[1, 2, 2],
       [2, 3, 3],
       [3, 4, 7]])

也可以针对单个进行：

g=np.hstack(e)
h=np.vstack(e)

#结果：
>>> g
array([1, 2, 2, 2, 3, 3, 3, 4, 7])
>>> h
array([[1, 2, 2],
       [2, 3, 3],
       [3, 4, 7]])

posted @ 2020-03-07 16:00 lypbendlf 阅读(223) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部