算法第一章作业

1. 请上网了解编码规范,说明你本学期的编码遵循哪个规范及该规范的链接。

It seems that the coding style of Google is highly recommended and popular.
https://google.github.io/styleguide/cppguide.html

and here's its Chinese ver..
https://zh-google-styleguide.readthedocs.io/en/latest/google-cpp-styleguide/contents/

Following a largely accepted coding style is of great importance, and should be frequently emphasized by teachers in student's first year of college. However, our teachers didn't emphasize its importance much when we were learning the basic of coding and that led to many problems. I encountered many code written in a badly-organized and hard-to-read style when guys asked me questions about assignments and it made me suffer a lot. Unfortunately C++ does not have a strict constraint in indentation, unlike Python, and the code can be badly indented and hard to read.

I was following the coding style in my textbook when I was learning C++ and yet I don't know if there are differences btw the style in the book and the Google's. It'll take me some time to read through the doc provided by Google~

Add more later when I am reading the doc.

And here's the coding style of Python~
https://www.python.org/dev/peps/pep-0008/

Reading code written in Python is relaxing since Python has a strict rule of indentation~

 


2. 请阅读《数学之美》,结合网上其他文章,谈谈算法在软件开发中的作用,本学期计划用算法实现一个什么软件。

I didn't know much about the book til I read it. Originally I thought the book was purely full of equations, formula and things about maths, but it's a surprise that the book contains lots of content about NLP. No wonder my teacher in NLP Lab recommended to us.

Ch 3 is about statistical language model. In the task of judging if a sentence is natural, a model is adapted to calculate the occurence probability of every word rather than to judge if the sentence matches the syntax, which enables computers to finish the judgement easilier and more accurate.


And later a more efficient method put forward by Markov made things easier.


Ch 3 describes the importance of models in solving NLP problems. In addition to those models, there are actually plenty of models in the field of machine learning, such as linear regression, logistic regression and SVM in supervised learning, clustering in unsupervised learning. Models of ml nowadays are largely based on statistical models, requiring a large amount of data and efficient algorithms. And Ch 5 introduces another model which is widely used in NLP: Hidden Markov Model.

The Hidden Markov Model is said to be the fastest and efficient way to solve NLP problems. According to the book, lots of NLP problems such as machine learning, automatic correction and speech recognition can be considered as problems of decoding in a communiacation system, which makes problems more easily to solve than focusing more attention in grammar and syntax. In simple words, the task is to restore signal o1, o2, o3... after transformations into the original signal s1, s2, s3. And the Hidden Markov Model can be adapted to solve these problems. Markov put forwad a simplified assumption that the probability distribution of every status in a ramdom process is merely related to its previous status.


Some content in Ch 3 is related to the assumption. Based on the assumption we have the Markov Chain, and the Hidden Markov Model is the extention of the Markov Chain.

Another topic I'm interested in is about graphs and crawlers. I've written plenty of crawlers for fun or to scratch some useful info. But previously the most things I was considering were how to organize the data structure to make data well stored and seldom has I considered about graphs. Maybe that's because mostly I run my crawlers in a single website. And the crawler the book describes is something more like a search engine. A search engine is also a crawler, a very very big crawler. Crawlers for search engines traverse each hyper link in each website once using Graph Traversal and also a hash list. Websites here are like nodes, and hyper links are like arcs in a graph.

In this term I plan to learn the Markov Model as well as other popular machine learning models in depth, not only things on surface but also derivations related to them. And I wanna write a raw model of machine translation for multiple languages, with my experience in NLP Lab. Also I wanna apply the graph theory to crawlers. Text matching is also fun, maybe I will adapt it in a searching func.

posted @ 2019-08-31 14:07  Sola~  阅读(194)  评论(0)    收藏  举报