HDU DNA repair (AC自动机+DP)

DNA repair

Time Limit : 5000/2000ms (Java/Other)   Memory Limit : 32768/32768K (Java/Other)
Total Submission(s) : 2   Accepted Submission(s) : 2

Font: Times New Roman | Verdana | Georgia

Font Size:  

Problem Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.
The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.
The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the
number of characters which need to be changed. If it's impossible to repair the given DNA, print -1.

Sample Input

2
AAA
AAG
AAAG    
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

Case 1: 1
Case 2: 4
Case 3: -1

Source

2008 Asia Hefei Regional Contest Online by USTC
 
 
 
题意:已知一个DNA串和一些病毒DNA序列,求出最少改变DNA串中多少个字符,能使得串中不包含任意一个病毒序列。
题解:如果DNA串中含病毒串,则在AC自动机上能匹配,反之则不匹配。为了不匹配,就应该在AC自动机匹配过程中改变DNA序列使其不匹配。而我们在AC自动机上枚举的状态就是使其不能匹配到病毒串的可行状态,(可以理解成用AC自动机来压缩状态)
 

题意:先给出m个DNA片段(含致病基因),然后给一个长为n的DNA序列,求最少需要修改多少次,使得这个DNA序列不含致病基因。修改操作定义为将DNA中某个碱基变为另一个碱基,如将A变为G

数据范围:1<=m<=50,1<=n<=1000

分析:先建自动机,然后DP。

状态设计:dp[i][j]为从根结点出发走 i 步后到达状态 j 最少需要修改的次数。

状态转移:

1、dp[i][j]=MIN(dp[i-1][k]),从状态k能根据s[i]跳到状态j,无需修改;

2、dp[i][j]=MIN(dp[i-1][k])+1,从状态k不能根据s[i]跳到状态j,需要修改s[i]。(注意区分DP的状态和自动机的状态)

初始化:dp[0][0]=0,其余的dp[0][i]=INF.

 

#include<iostream>
#include<cstdio>
#include<cstring>

using namespace std;

const int N=1010;
const int INF=0x3f3f3f3f;

struct Trie{
    int count;
    Trie *fail;
    Trie *next[4];
    void init(){
        count=0;
        fail=NULL;
        memset(next,NULL,sizeof(next));
    }
}*root,*q[N],a[N];

int k,dp[N][N];
char wrd[30];
char str[1010];

int find(char ch){
    switch(ch){
        case 'A':return 0;
        case 'C':return 1;
        case 'T':return 2;
        case 'G':return 3;
    }
    return 0;
}

void Insert(char *str){
    Trie *loc=root;
    int i=0;
    while(str[i]!='\0'){
        int id=find(str[i]);
        if(loc->next[id]==NULL){
            a[k].init();
            loc->next[id]=&a[k++];
        }
        loc=loc->next[id];
        i++;
    }
    loc->count=1;
}

void AC_automation(){
    int head=0,tail=0;
    q[tail++]=root;
    Trie *cur,*tmp;
    while(head!=tail){
        cur=q[head++];
        tmp=NULL;
        for(int i=0;i<4;i++){
            if(cur->next[i]==NULL){
                if(cur==root)
                    cur->next[i]=root;
                else
                    cur->next[i]=cur->fail->next[i];
            }else{
                if(cur==root)
                    cur->next[i]->fail=root;
                else{
                    tmp=cur->fail;
                    while(tmp!=NULL){
                        if(tmp->next[i]!=NULL){
                            cur->next[i]->fail=tmp->next[i];
                            cur->next[i]->count |= tmp->next[i]->count;
                            break;
                        }
                        tmp=tmp->fail;
                    }
                    if(tmp==NULL)
                        cur->next[i]->fail=root;
                }
                q[tail++]=cur->next[i];
            }
        }
    }
}

int main(){

    //freopen("input.txt","r",stdin);

    int n,cases=0;
    while(~scanf("%d",&n) && n){
        k=0;
        root=&a[k++];
        root->init();
        for(int i=0;i<n;i++){
            scanf("%s",wrd);
            Insert(wrd);
        }
        AC_automation();
        scanf("%s",str);
        int len=strlen(str);
        memset(dp,0x3f,sizeof(dp));
        dp[0][0]=0;
        for(int i=1;i<=len;i++)
            for(int j=0;j<k;j++){
                for(int idx=0;idx<4;idx++){
                    Trie *ptr=a[j].next[idx];
                    if(ptr->count)
                        continue;
                    int tmp=ptr-root;
                    dp[i][tmp]=min(dp[i][tmp],dp[i-1][j]+(idx!=find(str[i-1])));
                }
            }
        int ans=INF;
        for(int i=0;i<k;i++)
            ans=min(ans,dp[len][i]);
        printf("Case %d: %d\n",++cases,ans==INF?-1:ans);
    }
    return 0;
}

 

posted @ 2013-05-03 21:25  Jack Ge  阅读(538)  评论(0编辑  收藏  举报