A new field to work on

在美国之后写博客可能还能起到保持中文的作用。这点挺好的。

实验室老板给我的第一个project竟然是machine translation。这是一个我一直都有兴趣但没接触也据一些同事介绍说没什么意思的领域= =!不过萝卜青菜各有所爱,既然涉及到中英文的翻译,其实我的中文背景还能帮上点忙。前段时间把Dan Jurafsky和Chris Manning的online course videos看完了。没有完全掌握里面的知识点,大概只记住了50%。不过对natural language processing的情况有了些sense。列下来:

I made some analogies between speech and natural language processing. For me, it seems the function of linguistics in NLP is just like that of signal processing in speech science. Linguistics provides ways for feature extraction and objective or subjective metric for system evaluation. It's the "heuristic" or "not so automatic" part in NLP, just like signal processing in speech. Linguistics also provides ways for preprocessing of raw NLP data or post-processing techniques on nal outcome. All other parts in NLP relates to machine learning.

Problems in NLP seem to have even more exibility than those in speech processing. In speech recognition or synthesis, there is not that much variability in output text or sounds, but NLP outcome may have several forms or interpretations. Thus there might be more unsupervised or heuristic learning methods applied in NLP than in speech processing.

最近在看Peter Brown早期的machine translation的文章,希望能对这个具体的领域有一些sense。后面的一个主要工作是把这两篇paper看完,对NLP整个领域的会议做一个调研(会议水平,paper接受率,每年的deadline是什么时候等等)。以及对machine translation做一个field survey,看看大家都在做哪些hot topic。试着把这个subfield做一个分类,每一类找一些survey paper或者journal paper读一下。选择一个自己的方向。另外一个需要research的方面是看看有没有开源的代码,像speech里面的HTK或者image中的OpenCV这些baseline tools。

另外重要的一点就是要开始上手看看我们目前的system了,基于读的paper和system本身选一个方向可能会更靠谱一点。这些问题我还需要经常性的和导师sync meeting。

posted on 2012-07-28 10:49  J. V. King  阅读(182)  评论(0编辑  收藏  举报

导航