poj 1795 DNA Laboratory

DNA Laboratory

Time Limit: 5000MS		Memory Limit: 30000K
Total Submissions: 2892		Accepted: 516

Description

Background
Having started to build his own DNA lab just recently, the evil doctor Frankenstein is not quite up to date yet. He wants to extract his DNA, enhance it somewhat and clone himself. He has already figured out how to extract DNA from some of his blood cells, but unfortunately reading off the DNA sequence means breaking the DNA into a number of short pieces and analyzing those first. Frankenstein has not quite understood how to put the pieces together to recover the original sequence.
His pragmatic approach to the problem is to sneak into university and to kidnap a number of smart looking students. Not surprisingly, you are one of them, so you would better come up with a solution pretty fast.
Problem
You are given a list of strings over the alphabet A (for adenine), C (cytosine), G (guanine), and T (thymine),and your task is to find the shortest string (which is typically not listed) that contains all given strings as substrings.
If there are several such strings of shortest length, find the smallest in alphabetical/lexicographical order.

Input

The first line contains the number of scenarios.
For each scenario, the first line contains the number n of strings with 1 <= n <= 15. Then these strings with 1 <= length <= 100 follow, one on each line, and they consist of the letters "A", "C", "G", and "T" only.

Output

The output for every scenario begins with a line containing "Scenario #i:", where i is the number of the scenario starting at 1. Then print a single line containing the shortest (and smallest) string as described above. Terminate the output for the scenario with a blank line.

Sample Input

1
2
TGCACA
CAT

Sample Output

Scenario #1:
TGCACAT

Source

TUD Programming Contest 2004, Darmstadt, Germany

题意：给定n个由A,G,C,T构成的字符，现在要找到一个字符串使得该字符串能匹配到给定的n个字符，并且使得这个字符串字典序最小。

思路：首先n个字符中可能有一些字符包含于另一些字符当中，那么这些被包含的字符当然不需要考虑了。先预处理出一个字符的头部加上另一个字符后整体增加的长度大小,存于dist[i][j](在字符j的头部添加上字符i后整体增加的长度)中。显然可以用状压dp解决，dp[state][i]:由state状态中的字符构成的最小字符串，且这个字符串是以第i个字符开头的，dp[state][i]记录这个字符串的长度。

状态转移：dp[state|1<<j][j]=min(dp[state|1<<j][j],dp[state][i]+dist[j][i]);

利用dp，先得到字符串最小长度是多少，并且也知道这个字符串的头部应该是由哪个字符构成的。再从头部至尾部递归的寻找最小的字符串的组成。

AC代码：

#define _CRT_SECURE_NO_DEPRECATE
#include <iostream>
#include<vector>
#include<algorithm>
#include<cstring>
#include<bitset>
#include<set>
#include<map>
#include<cmath>
using namespace std;
#define N_MAX 16
#define MOD 100000000
#define INF 0x3f3f3f3f
typedef long long ll;
string s[N_MAX];
int dp[1<<N_MAX][N_MAX];//状态是i,当前字符串的头部是字符串j时总字符串最小长度
int dist[N_MAX][N_MAX];//dist[i][j]:在j的前面加上字符串i，整体字符串所需要增加的长度
vector<string>vec;
int t,n;

void init() {
    memset(dp, INF, sizeof(dp));
    memset(dist, 0, sizeof(dist));
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n;j++) {
            if (i == j)continue;
            int sz = min(vec[i].size(), vec[j].size());
            for (int k = sz; k >= 0;k--) {
                if (vec[i].substr(vec[i].size() - k) == vec[j].substr(0, k)) {//首尾重复的部分不算
                    dist[i][j] = vec[i].size() - k;
                    break;
                }
            }
        }
    }
}
string res = "";
void dfs(int head,int state) {//state状态表示当前还有哪些字符串没有被使用
    if (state == 0)return;
    string min_s = "Z";int  min_head;
    for (int i = 0; i < n;i++) {
        if ((state >> i & 1)&&dp[state|1<<head][head]==dp[state][i]+dist[head][i]) {
            int Len = vec[head].size() - dist[head][i];
            string s = vec[i].substr(Len);
            if (min_s > s) { min_s = s; min_head = i; }
        }
    }
    res += min_s;
    dfs(min_head, state ^ (1 << min_head));
}

int main() {
    int t; scanf("%d",&t);
    for (int cs = 1; cs <= t;cs++) {
        scanf("%d",&n);
        printf("Scenario #%d:\n",cs);
        for (int i = 0; i < n; i++) cin >> s[i];
        vec.clear();
        for (int i=0; i < n;i++) {//检查是否有重复的字符串
            bool flag = 1;
            for (int j = 0; j < n;j++) {
                if (i == j || s[i].size() > s[j].size())continue;
                if (s[j].find(s[i]) != string::npos) {//找到重复
                    flag = 0; break;
                }
            }
            if (flag)vec.push_back(s[i]);
        }
        if (vec.size() == 0) { cout << s[0] << endl << endl; continue; }
        sort(vec.begin(), vec.end());
        n = vec.size();
        init();
        int allstates = 1 << n;
        for (int i = 0; i < n;i++) {
            dp[1 << i][i] = vec[i].size();
        }

        for (int state = 0; state < allstates; state++) {
             for (int i = 0; i < n;i++) {            
                if (dp[state][i] == INF)continue;
                for (int j = 0; j < n; j++) {
                    if (!(state >> j & 1)) {
                        dp[state | 1 << j][j] = min(dp[state | 1 << j][j], dp[state][i] + dist[j][i]);
                    }
                }
            }
        }
        int head=0;
        for(int i=1;i<n;i++)
            if (dp[allstates - 1][i] < dp[allstates-1][head]) {
                head = i;
            }
        res = vec[head];
        dfs(head, (allstates -1)^ (1 << head));//!!!!
        cout << res << endl<<endl;
    }
    return 0;
}

posted on 2018-04-08 22:06 ZefengYao 阅读(345) 评论(0) 收藏举报

刷新页面返回顶部

poj 1795 DNA Laboratory

导航

公告