2019 daily work

2019/09/07

http://neuralnetworksanddeeplearning.com

paradigms 范式

promising 有希望

hazy hazy朦胧

Technologies come and technologies go, but insight is forever.

https://github.com/mnielsen/neural-networks-and-deep-learning

elementary algebra and plots of functions 初等代数和函数图

multivariable calculus and linear algebra 多变量微积分和线性代数

effortlessly 毫不费力

deceptive 骗人的

visual cortices 视觉皮层

tuned 调整

stupendously, astoundingly 令人惊讶的是，令人惊讶的

morass 沼泽

caveats 注意事项

什么是perceptron 感知？？（举例，去一个地方的因素）

payoffs 收益

subtle 微妙

sophisticated 复杂的

reassuring 令人欣慰

opaque 不透明

perceptrons 作用，NAND，有什么劣势（改变某个weight，可能使其它output彻底翻转，因为output只有0和1），引入sigmoid

2019/09/09

Panic 恐慌

Legitimate 合法

terminology 术语

multilayer perceptrons 多层感知器

despite 尽管

While the design of the input and output layers of a neural network is often straightforward, there can be quite an art to the design of the hidden layers.

Stimulate 刺激

The ultimate justification is empirical 最终的理由是经验性的

Heuristic 启发式

denote 表示

approximates 接近

quantify 量化

mean squared error or just MSE. 均方误差

In other words, we want to find a set of weights and biases which make the cost as small as possible. We'll do that using an algorithm known as gradient descent

为什么引入cost function而不是直接找个函数判断类别

For the most part, making small changes to the weights and biases won't cause any change at all in the number of training images classified correctly. That makes it difficult to figure out how to change the weights and biases to get improved performance. If we instead use a smooth cost function like the quadratic cost it turns out to be easy to figure out how to make small changes in the weights and biases so as to get an improvement in the cost. That's why we focus first on minimizing the quadratic cost, and only after that will we examine the classification accuracy.

、analytically 解析

Stimulate 刺激

constrain 压抑

2019/09/10

Conceptually 从概念上讲

carrying out a poll is easier than running a full election 进行民意调查比进行全面选举更容易

it's much easier to sample a small mini-batch than it is to apply gradient descent to the full batch

statistical fluctuations 统计波动

2019/09/12

Starkly 赤裸裸

2019/09/17

recursive decomposition 递归分解

2019/09/19

香农熵

熵是指不确定性

物理中的熵是指粒子在运动过程中可能的位置

熵是对概率取对数，概率越大，熵越小，不确定性越小

另外一种理解可以用提问的问题数来表示熵，提问的问题可以用树形结构来表示，这就引出了另外一个问题，怎么让提问的问题最少，让熵达到最小化，

决策树算法优缺点，方法

K-Means算法，优缺点

2019/10/11

推荐长尾效应搜索马太效应

推荐系统的目标连接用户和物体，发觉长尾商品（留住用户和内容生产者，实现商业目标）

推荐系统应用场景（头条新闻，小视频抖音，快手，商品推荐淘宝，亚马逊 netflix）

推荐系统评估指标用户：满足需求，获取快乐，扩展视野

网站：留住用户实现商业目标

内容提供方：获取长尾流量，获得认可和收益

多样性，新颖性，惊喜性

推荐系统评估方法

问卷调查（成本高）

离线评估

在线评估 A/B TESTING

2019/10/12

User-based CF

Item-based CF

余弦相似度

2019/10/14

Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.

为什么需要正则化（过拟合）

Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights only to take small values, which makes the distribution of weight values more "regular". This is called "weight regularization", and it is done by adding to the loss function of the network a cost associated with having large weights. This cost comes in two flavors:

2019/10/15

召回算法

Model BASE

什么是SVD 奇异值分解

2019/10/16

Embedding from topic

PLSA LDA

词向量的优势：

降维，不丢失信息，线性信息可加减

选择算法：

有监督或者无监督
有序无序
量级
实时性
多样性
业务场景目标

Learning & serving

什么是推荐系统召回，召回就是match

2019/10/18

重排序算法

多目标排序

为什么需要多目标排序：

不同目标表达的偏好程序

单个目标衡量不全面，标题党

Learning to rank

2019/10/26

最终目标：词向量表示作为机器学习、特别是深度学习的输入和表示空间

在深度学习中，数据决定了结果的上限，算法只能决定能在多大程度上逼近这个上限。

2019/11/20

In general, one of the best ways of reducing overfitting is to increase the size of the training data. With enough training data it is difficult for even a very large network to overfit. Unfortunately, training data can be expensive or difficult to acquire, so this is not always a practical option.

Empirically 凭经验

2019/11/21

In a nutshell 简而言之

In this section I briefly describe three other approaches to reducing overfitting: L1 regularization, dropout, and artificially increasing the training set size

2019/11/24

为什么反向传播能够快速的计算出梯度下降呢

broad structure

It occurs surprisingly often that sophisticated techniques can be implemented with small changes to code.（令人惊讶的是，经常可以通过对代码进行少量更改来实现复杂的技术。）

这指的是L2正则化的实现方法很简单，但是却很重要，

just to remind you 提醒你一下

It's easy to feel lost in hyper-parameter space. This can be particularly frustrating if your network is very large, or uses a lot of training data, since you may train for hours or days or weeks, only to get no result. If the situation persists, it damages your confidence 。。。。That's a huge subject, and it's not, in any case, a problem that is ever completely solved

看来调参是一项枯燥乏味的工作，我觉得可能大家都并不是走在了正确的道路上，因为我的信仰告诉我，凡事皆有因果，如果没有结果合理的解释，那么肯定是解释的方向发生了错误。

Debilitating 使人衰弱

And so I'd like to re-emphasize that during the early stages you should make sure you can get quick feedback from experiments.

如果调参不顺利，尽量使用小的batch，能够尽快获得反馈的参数，这样有利于快速选择不同的参数，从而找出适合的参数。

Use early stopping to determine the number of training epochs

早停有助于无效的训练，方法是如果有几个epochs没有任何提升，可以考虑早停

Caveats 注意事项

Approximation 近似

2019/11/26

A big advantage of sharing weights and biases is that it greatly reduces the number of parameters involved in a convolutional network

参数少了，计算量少了。

If you remove an ink cartridge for later use, recap the ink cartridge using the cap that came

with it to prevent the ink from drying out and to protect the surrounding area from getting smeared by ink.

Store the ink cartridge in the same environment as the product. Do not leave the product with the ink

cartridges removed for an extended period of time. Otherwise, ink remaining in the print head nozzles

may dry out and you may not be able to print.

梯度下降和反向传播的关系

一句话总结：梯度是神经网络代价函数对网络参数的导数；梯度下降，代价函数变小，则最终可以找到最小代价函数机器对应的网络参数；反向传播为我们提供了一种计算梯度的方法

2019/12/2

为什么要向量化（减少for循环，加快计算速度）

梯度下降用在反向传播中

注意，神经网络的层数是这么定义的：从左到右，由0开始定义，输入层是0

有一个隐藏层的神经网络，就是一个两层神经网络。记住当我们算神经网络的层数时，我们不算输入层，我们只算隐藏层和输出层。

中国的农历是阴阳结合历，既有以月亮的变化周期的阴历，初一，十五，又有以立春，立夏，夏至，冬至等二十四节气来划分的公历，所以二十四节气在国际通行的公历中日期也是相对确定的。

2019/12/6

Generally speaking, avoid creating short-term temporary objects if you can. Fewer objects created mean less-frequent garbage collection, which has a direct impact on user experience.

Still, it's good practice to declare constants static final whenever possible.

那么，你应该怎样做呢？为对象封装需要结束的资源（如文件或线程），而不是为该类编写Finalizer 和

Cleaner 机制？让你的类实现AutoCloseable 接口即可，并要求客户在在不再需要时调用每个实例close 方

法，通常使用try-with-resources 确保终止，即使面对有异常抛出情况（详⻅第9 条）。

回顾一下，除非父类已经这样做了，否则在每个实例化的类中重写Object 的toString 实现。它使得类更加舒适地使用和协助调试。toString 方法应该以一种美观的格式返回对象的简明有用的描述。

2019/12/7

用systrace跑了一遍糖果传奇，

安装systrace的guide https://developer.android.com/topic/performance/tracing/command-line

需要android studio, python2.7,还需要安装syx和win32
https://github.com/mhammond/pywin32/releases
https://pypi.org/project/six/#files

打开cmd，打开放置six解压文件的目录，执行命令python setup.py install

C:\Python27\python C:\Users\long8691.he\AppData\Local\Android\Sdk\platform-tools\systrace\systrace.py -o mynewtrace.html sched freq idle am wm gfx view binder_driver hal dalvik camera input res

“同比是今年某月与去年某月比。同比发展速度主要是消除季节变动影响，说明本期发展水平与去年同期发展水平对比的相对发展速度。环比是连续2个单位周期内量的变化比。环比发展速度是报告期水平与前一时期水平比，是现象逐期的发展速度。”

sdb shell dumpsys meminfo package-name

2019/12/12

Python下划线命名模式 - 小结

2019/12/13

列表和元组

t = (‘王大锤’， 20， True)

person = list(t)

test_tuple = tuple(person)

这里有一个非常值得探讨的问题，我们已经有了列表这种数据结构，为什么还需要元组这样的类型呢？

元组中的元素是无法修改的，事实上我们在项目中尤其是多线程环境（后面会讲到）中可能更喜欢使用的是那些不变对象（一方面因为对象状态不能修改，所以可以避免由此引起的不必要的程序错误，简单的说就是一个不变的对象要比可变的对象更加容易维护；另一方面因为没有任何一个线程能够修改不变对象的内部状态，一个不变对象自动就是线程安全的，这样就可以省掉处理同步化的开销。一个不变对象可以方便的被共享访问）。所以结论就是：如果不需要对元素进行添加、删除、修改的时候，可以考虑使用元组，当然如果一个方法要返回多个值，使用元组也是不错的选择。
元组在创建时间和占用的空间上面都优于列表。我们可以使用sys模块的getsizeof函数来检查存储同样的元素的元组和列表各自占用了多少内存空间，这个很容易做到。我们也可以在ipython中使用魔法指令%timeit来分析创建同样内容的元组和列表所花费的时间，下图是我的macOS系统上测试的结果。

2019/12/16

How to change the speed of Google Voice?

To adjust the speaking rate:

Go to Settings and tap Accessibility.

Tap Spoken Content.

Use the slider for Speaking Rate to adjust the speed

2019/12/20

Transfer learning

有一个技巧叫做：conservative training，你现在有大量的source data，(比如说：在语音辨识里面就是很多不同speaker的声音)，那你拿来做neural network。target data是某个speaker的声音，如果你直接拿这些去train的话就坏掉了。你可以在training的时候加一些constraint(regularization)，让新的model跟旧的model不要差太多。你会希望新的model的output跟旧的model的output在看同一笔data的时候越接近越好。或者说新的model跟旧的model L2-Norm差距越小越好(防止overfitting的情形)

2019/12/21

推荐系统多目标排序

什么是learning to rank,，什么是pointwise, pairwise, listwise，三种lr方法

LTR evaluation MAP, F1 score, AUC, ROC, Ndcg

Bayesian Personalized Ranking

MTL multi task learning

2019/12/23

所以RNN不好训练的原因不是来自activation function而是来自于它有high sequence同样的weight在不同的时间点被反复的使用。

--其实广泛被使用的技巧就是LSTM，LSTM可以让你的error surface不要那么崎岖。它可以做到的事情是，它会把那些平坦的地方拿掉，解决gradient vanish的问题，不会解决gradient explode的问题。

为什么呢？？？

RNN跟LSTM在面对memory的时候，它处理的操作其实是不一样的。你想想看，在RNN里面，在每一个时间点，memory里面的值都是会被洗掉，在每一个时间点，neuron的output都要memory里面去，所以在每一个时间点，memory里面的值都是会被覆盖掉。但是在LSTM里面不一样，它是把原来memory里面的值乘上一个值再把input的值加起来放到cell里面。所以它的memory input是相加的。所以今天它和RNN不同的是，如果今天你的weight可以影响到memory里面的值的话，一旦发生影响会永远都存在。不像RNN在每个时间点的值都会被format掉，所以只要这个影响被format掉它就消失了。但是在LSTM里面，一旦对memory造成影响，那影响一直会被留着(除非forget gate要把memory的值洗掉)，不然memory一旦有改变，只会把新的东西加进来，不会把原来的值洗掉，所以它不会有gradient vanishing的问题

2019/12/26

Sophisticated 复杂的

Dichotomy 二分法

ever-growing 不断增长

immense 巨大

make extensive use of 广泛使用

exhibit 展示

2019/12/31

编程大师访谈录

好软件出自2到4人的小团队，成员之间互动频繁，尽量使系统用不着20人参与。

编写代码之前，设计好数据结构，并思考系统的整个流程。

编程相对于科学来说，更像是工程实现，但好的代码也具有艺术美感，在于设计。

建立小型项目团队，4,5个人一组，其中一个人证实有能力掌控整个程序。

代码review很重要

把重要的工作留在早上

2020/1/2

ConstraintLayout 辅助线和边界线的使用，用辅助线来确定控件位置

posted @ 2020-01-10 20:17 调皮的贝叶斯阅读(132) 评论(0) 收藏举报

刷新页面返回顶部

调皮的贝叶斯

2019 daily work

公告