Pattern and Text

1008 Pattern and Text

Time Limit : 3000/1000ms (Java/Other) Memory Limit : 65535/32768K (Java/Other)

Total Submission(s) : 48 Accepted Submission(s) : 4

Font: Times New Roman | Verdana | Georgia

Font Size: ← →

Problem Description

Given two strings A and B—a pattern and a text, containing only lower case letters of the English alphabe,your task is to align them continuous and tell the similarity of the pattern and the text.
For example, the text string B is ‘aababacab’ while the pattern string A is ‘abaab’, you should output 12:

Input

The first line contains an integer T(1 <= T <= 10), indicating the number of test cases.For each test case , there are two lines, the first line gives the string A, and the second line gives the string B, length (B) <= 2,000,000. And it is guaranteed that A is not longer than B.

Output

For each test case, output the similarity of the given stringS.

Sample Input

abaab

aababacab

Sample Output

很不好意思说比赛的最后一个小时卡在这道题目上了，最初的想法是遍历，虽然自己知道必然TLE… 实践证明确实TLE掉了。接着做了一些标记的优化措施，还是TLE…

接着就想：KMP吧！但是KMP还不会…（要不要这么水！）

后来和HUT学校的童鞋们交流后，发现了一个很好的处理方法，merlininice师父也帮忙看了一些我在比赛的时候没过的题目，当他看到这个题目时，直说了一句：1008是道傻逼题。 = = …直接被秒杀，然后又听了他的分析，也算是彻悟了… 下面将两种处理方法都介绍下：

首先是merlininice师父的方法（这次的数据太卡内存了，所以这个方法实现起来比较麻烦）：

cnt[char][len]数组中记录在主串pat中每个字母从起点开始到相应i位置出现的次数，那么遍历一遍后（时间复杂度O（n））就能得到这个预处理数组，接着遍历子串，只需要实行操作：cnt[sub[i]-'a'][lena-lenb+i] - cnt[sub[i]][i]，再加一个边界（第i的位置和lena-lenb+i的位置）的比较处理即可。

可是本题中int的数组最多约8,000,000，而主串的长度可能有2,000,000，那么针对26个字母，要开一个cnt[26][2000000]，是不可能的，那么就要求开26个2,000,000的数组分别记录，这样操作的话，可能代码就会显得有些长了…

那么，接下来这种方法显然要更优越些：

看懂这幅图的话，基本也就理解这个的处理方法了。

也是开一个cnt的数组，这次cnt数组的大小只需要26即可，对应 ‘a’~‘z’ 26个字母，cnt记录的是在子串sub某个字符可匹配的范围内母串pat中每个字符出现的次数。这次我们首先先求出sub[0]的可匹配范围：0~len(pat)-len(sub)，遍历一遍后得到cnt数值的一组初始值，接着只需要将总值sum+=cnt[sub[0]]，即是sub[0]的匹配个数，接下来我们模拟题意的匹配过程，因为子串sub是每次前进一个的，所以我们只需要遍历i：len(pat)-len(sub)+1~len(pat)，每次将前一个i的起点对应的字母的cnt值减1，再将新加进来的字母对应的cnt值加1，即完成更新cnt的工作，此时sum累加i-(len(pat)-len(sub))位置上对应的字母的cnt值即可。这样下来整个的时间复杂度也仅仅是O（n），而且代码精悍。是在很佩服想到这个方法的HUT的童鞋~ ^^ ~

 1 int cnt[30];
 2 char pat[MAXN], sub[MAXN];
 3 
 4  int i, j, icase;
 5     int lena, lenb, pos;
 6     long long sum;
 7     scanf("%d",&icase);
 8     while(icase--){
 9         memset(cnt,0,sizeof(cnt));
10            scanf("%s",sub);
11            scanf("%s",pat);
12             lena = strlen(pat);
13             lenb = strlen(sub);
14             pos = lena - lenb;
15             sum = 0;
16             for(i=0; i<=pos; ++i)  cnt[pat[i]-'a']++;
17             sum += cnt[sub[0]-'a'];
18 
19             for(i=pos+1; i<lena; ++i){
20                    cnt[pat[i-pos-1]-'a']--;
21                    cnt[pat[i]-'a']++;
22                    sum += (long long )cnt[sub[i-pos]-'a'];
23             }
24             printf("%I64d\n",sum);
25     }

最后再膜拜下~~

posted on 2012-07-30 13:27 Yuna_ 阅读(92) 评论(0) 收藏举报