QA也许是咱们要关注的一个方面。
以下是CMU CS跟咱们相关的几个研究领域
http://www.csd.cs.cmu.edu/research/areas/ai/ AI研究团队 AI: Planning, Knowledge Representation, and Game Theory
M. Bilotti, L. Zhao, E. Nyberg, and J. Callan. (To appear.) "Focused retrieval over richly-annotated collections." SIGIR 2008 Workshop on Focused Retrieval. Singapore.
http://www.cs.cmu.edu/~callan/Papers/
http://www.cs.cmu.edu/~callan 此位先生做ir的研究,值得跟踪,做了一些blog的信息抽取
Pedro, Vasco, Eric Nyberg and Jaime Carbonell. 2006. "Federated Ontology Search", Proceedings of the 1st International Workshop on Semantic Information Integration on Knowledge Discovery (SIIK 2006), Yogyakarta, Indonesia PDF.
Wang, Mengqiu, Kenji Sagae, Teruko Mitamura. 2006. “A Fast, Accurate Deterministic Parser for Chinese”. Proceedings of COLING/ACL 2006. Sydney, Australia. PDF-------ZXW读一下?
| JAVELIN Open-Domain Question Answering |
Typical IR systems return a set of documents, or perhaps a set of queries. LTI Question Answering software extracts information from documents in large, open-domain corpora to answer questions in subject areas that are not known in advance.
Contact: Eric Nyberg and Teruko Mitamura
| Utility-based Information Distillation |
We study supervised, unsupervised and semi-supervised learning techniques for automatically detecting novel events and tracking the new trends for relevant events from temporally-ordered documents, for dynamically updating user profiles under context, and for optimizing the utility of passage selection and summarization based on relevance, novelty, readability, readability and user cost (e.g., time). Collaborative and adaptive information filtering among multiple users is also a part of the open challenge.
Contacts: Yiming Yang and Jaime Carbonell
Knowledge Acquisition Projects
| Dark Matter Knowledge Acquisition from Text |
LTI is participating in Project Halo, a research effort to design and implement a "Digital Aristotle". Our focus is on the definition of KAL (Knowledge Acquisition Language), a form of controlled language that can be used to acquire domain knowledge from subject matter experts in domains such as Chemistry, Physics and Biology.
Contacts: Eric Nyberg and Teruko Mitamura
| IAMTC Interlingual Annotation of Multilingual Text Corpora |
IAMTC is a multi-site NSF ITR project focusing on the annotation of six sizable bilingual parallel corpora for interlingual content with the goal of providing a significant data set for improving knowledge-based approaches to machine translation (MT) and a range of other Natural Language Processing (NLP) applications. The central goals of the project are: (1) to produce a practical, commonly-shared system for representing the information conveyed by a text, or interlingua (IL), (2) to develop a methodology for accurately and consistently assigning such representations to texts across languages and across annotators, (3) to annotate a sizable multilingual of parallel corpus of source language texts and translations for IL content.
Contacts: Lori Levin and Teruko Mitamura
| Scone Symbolic Knowledge Base |
Scone is a symbolic knowledge representation system designed to run well on a standard workstation. Scone's primary design goals are ability to represent "common sense" knowledge, efficiency in performing inference and search, scalability to several million assertions, and ease of use.
Also see: Tutalk and A Shared Resource for Robust Semantic Interpretation for Both Linguists and Non-Linguists.
| A Shared Resource for Robust Semantic Interpretation for Both Linguists and Non-Linguists |
The majority of existing authoring tools for constructing advanced conversational interfaces were designed for use by computational linguists. Our research goal is to explore strategies for supporting the development of language understanding interfaces by non-linguists. In our previous work we have developed Carmel-Tools, a behavior oriented authoring environment for building semantic knowledge sources for the CARMEL core understanding engine. In our recent work, we have begun conducting user studies that aim to better understand how people process a large amount of corpus data when faced with a task comparable to programming a dialogue agent using a data driven methodology. Our preliminary user study results hint that participants (1) introduce a bias when processing data sequentially (i.e. primacy effects) and (2) naturally represent semantic relatedness using spatial proximity. Based on these observations, we have developed the InfoMagnets interface that provides a physical metaphor for exploratory data analysis that is consistent with user conceptions of semantic relatedness and helps users avoid being biased by primacy effects by gaining a birds-eye view of their whole inventory of dialogue topics simultaneously.
| WebKB The World Wide Knowledge Base Project |
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of this research project is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. If successful, this would lead to much more effective retrieval of information from the web, the use of this information to support new knowledge based problem solvers. Our approach is to use machine learning algorithms to train the system to extract information of the desired types. Our web page describes the overall approach, plus several new algorithms we have developed that successfully extract information from the web.
Research / AREAS
Computer Science Department (See Faculty Research Guide)
3 Search, planning, and knowledge representation
Another AI focus at Carnegie Mellon CSD is search, planning, and knowledge representation.
This is often intertwined with multiagent systems.
Search algorithms for market clearing
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Carbonell:Jaime_G=.html
Sandholm pioneered the idea of mediated marketplaces that allow the participants to use highly expressive preference specification languages in order to reach better outcomes. He developed the world’s fastest optimal algorithms for clearing the market. On highly expressive real-world procurement auctions, they have optimally solved problems with over 2.6 million bids and over 160,000 items to be procured, as well as instances with over 320,000 side constraints. The techniques are a combination of tree search from AI, mixed integer programming from operations research, and dozens of techniques that he developed. Since 2002, he has used these techniques to clear over $20 billion of the most combinatorial procurement auctions ever conducted, resulting in value creation in the world (by increased economic efficiency) of over $2 billion. He has applied many of the techniques to other search problems as well.
Search for homeland security
Carbonell and Fink have studied techniques for fast identification of matches in multi-attribute exchange markets, which allow fast-paced trading of complex non-standardized goods. They have also applied these matching techniques to a homeland-security project, focused on identifying suspicious and unexpected patterns in massive structured databases. For example, the developed techniques may allow the detection of money-laundering patterns in banking transactions.
Distributed planning
Guestrin is working on efficient distributed multiagent coordination, planning and learning. Using the factored Markov decision processes representation, which exploits problem-specific structure using Bayesian networks, he designed efficient approximate planning algorithms, leveraged by a novel linear programming decomposition technique. The decomposition technique yields efficient distributed algorithms for planning and learning in collaborative multiagent settings, where multiple decision makers must coordinate their actions to maximize a common goal. Guestrin also works on wireless sensor networks using efficient inference methods from probabilistic graphical models.
Probabilistic replanning
Veloso et al. introduced extended rapidly-exploring random trees (E-RRT), as a novel reuse strategy that solves the general replan/reuse question, in which a past plan probabilistically guides a new search. The replan algorithm considers an initial state, a path, and a goal to be achieved; from the initial state, it grows a search tree by extending towards the goal with probability , towards a point in the path with probability , and towards a random exploration target with probability . The past (or failed) plan is effectively used as a bias in the new search, therefore solving the general reuse problem in a probabilistic manner.
Learning domain-specific planners
Instead of hand writing domain-specific planners to solve large-scale planning problems, Veloso uses example plans to demonstrate how to solve problems in a particular domain and to use that information to automatically learn domainspecific planners that model the observed behavior. Her group developed the ITERANT algorithm for identifying repeated structures in observed plans and show how to convert looping plans into domain-specific planners, or dsPlanners. Looping dsPlanners are able to apply experience acquired from the solutions to small problems to solve arbitrarily large ones. The automatically learned dsPlanners are able to solve large-scale problems much more effectively than any state-of-the-art general-purpose planners and are able to solve problems many orders of magnitude larger than general-purpose planners can solve.
Knowledge representation
Fahlman is working on Scone, a knowledge representation system. In addition to representing all kinds of knowledge (including “common sense” knowledge), Scone is designed to support efficient inference and search. Compared to some other knowledge-representation efforts, Scone’s emphasis is on efficiency, scalability (up to a few million entities and statements about them), and ease of use. Members of the Scone research group are working on a natural-language front-end that will make it possible to add knowledge to Scone and to ask questions using simple English. Scone is intended to be a software component, useful in a large number of other software systems, much as databases are used today. As a longer-term goal, the Scone group is working to develop a flexible declarative representation for episodic memory, i.e., a hierarchical representation of action-types and event-types, along with the entities involved and affected by each event.
- Computational Molecular Biology
- Computational Neuroscience
- Computer Architecture
- Databases
- Formal Methods
- Graphics
- Human-Computer Interaction
- Large-Scale Distributed Systems
- Machine Learning
- Mobile and Pervasive Computing
- Networking
- Principles of Programming
- Robotics
- Security
- Scientific Computing
- Software Engineering
- Technology and Society
- Vision, Speech, and Natural Languages
Robotics Institute (See Faculty Research Guide)
- Artificial Intelligence
- Graphics & Visualization
- Human-Computer Interaction
- Manipulation
- Manufacturing & Inspection
- Medical Applications
- Mobile RoboticsOther Areas
- Space Robotics
- Systems & Control
- Vision, Perception & Sensors
Institute for Software Research, International
Human-Computer Interaction Institute (See All HCII Projects )
- Cognitive Modeling
- PACT Center
- Project LISTEN
- Computing Workshop
- Designing Interfaces to Support Human Attention
- Interaction Design
- Interactive Systems Laboratories (ISL)
- Societal Impacts of ComputingSpeech
- Usability Analysis
- User Interface Design
Language Technologies Institute (See All LTI Projects)
- Computer Aided Language Learning
- Computational Linguistics
- Informational Retrieval
- Machine Translation
- Speech
Machine Learning Department (See Research Guide)
浙公网安备 33010602011771号