多单词匹配查找

来源：http://www.cnblogs.com/justinzhang/archive/2012/09/17/2689642.html 华为2013年招聘-上机题--多单词匹配查找

在一个给定的源字符串中查找由若干个单词组成的字串（模式串），要求如下：

1、源字符串与模式串组成格式为：单词<空格>单词<空格>单词...，不用考虑标点符号；源字符串与模式串部字符长度≤64；

2、单词由数字、字母组成，不区分大小写；

3、优先匹配模式串中字符多的单词，如模式串为”he hello hell”，源字符串为”hello world”，则源字符串中”hello”匹配模式串中的”hello”；如果出现模式串中多个相同字符数的单词匹配源串中同一个单词，优先匹配模式串中先出现的，如模式串”he lo”，源串为”hello world”，则匹配模式串中的”he”匹配源串中的”hello”

4、源字符串中单词被匹配顺序是:优先匹配完全相同的单词;如果有多个单词可以被匹配,优先匹配第一个.如:源字符串为”hell hello world hell he”,如果模式串为”he”,那么匹配”he”;如果模式串为”hel”,那么匹配第一个”hell”;如果模式串为”hell”,那么匹配”第一个”hell”;

5、源字符串的单词只能被匹配一次，如上例中”hello”已经匹配了，此时”he”和”hell”不能再部分匹配”hello world”中的”hello”，又如模式串为”he hello hell”，源字符串为”hello world hello world”，模式串的”hello”匹配了文本串的第一个”hello”，则模式串的”hell”可以匹配源字符串的第二个”hello”；

1）实现函数：

int MultiWordMatch(char *Src, char *Match)

2）参数说明：

输入：源字符串，模式串（单词<空格>单词<空格>单词...）；

3）返回值：

类型：整型

错误或异常：-1

无匹配：0

完全匹配（模式串在源字符串中精确出现，次序也完全一样）：1

匹配（模式串的所有单词在整个源字符串中可以找到，但次序打乱）：2

部分匹配（模式串仅部分单词在源字符串中可以找到）：3

例如：

文本串：“Hello world he is a programmer What the hell”

查找模式串”test”，输出0；

查找模式串”hello world”、”a prog”，输出1；

查找模式串”what world”、”prog hell”，输出2；

查找模式串”hello hello”、”nand ram”，输出3；

代码：

约定输入格式如下：（in.txt内容）

Hello world he is a programmer What the hell
test
hello world
a prog
what world
prog hell
hello hello
nand ram

 1 #include<cstdio>
 2 #include<cstdlib>
 3 #include<cstring>
 4 
 5 char substr[10][65];
 6 char str[65];
 7 
 8 char * mygets(char *buf, FILE *fp)        //fgets会存储最后的换行符,这个函数去掉换行符
 9 {
10     char * temp = fgets(buf, 65, fp);
11     if(temp)
12         if(buf[strlen(buf)-1] == '\n')
13             buf[strlen(buf)-1] = '\0';
14     return temp;
15 }
16 
17 void tolower(char *str, int length)
18 {
19     for(int i = 0; i < length; i++)
20         if(str[i] >= 'A' && str[i] <= 'Z')
21             str[i] = str[i] + 32;
22 }
23 
24 int MultiStrMatch(char *str, char *match)
25 {
26     if(!str || !match)
27         return -1;
28 
29     int count[65], index = 0;
30 
31     //准备源串和待比较的串
32     char temp_str[65];        
33     strncpy(temp_str, str, 65);
34     tolower(temp_str, strlen(temp_str));
35     tolower(match, strlen(match));
36 
37     //判断字符串是否存在
38     char *split = strtok(match, " ");        //使用strtok函数来分隔字符串！！（话说C对字符串的处理真麻烦。。）
39     while(split!= NULL)
40     {
41         char * find = strstr(temp_str, split);        //使用strstr函数来查找字符串是否存在
42         if(find)
43         {
44             memset(find, ' ', strlen(split));
45             count[index++] = find - temp_str;    //指针相减
46         }
47         else
48             count[index++] = -1;
49         
50         split = strtok(NULL, " ");
51     }
52     
53     //根据count数组进行输出状态判断
54     bool has_no_exit = false;
55     bool has_exit = false;
56     for(int i = 0; i < index; i++)
57     {
58         if(count[i] == -1)
59             has_no_exit = true;
60         else
61             has_exit = true;
62     }
63     if(!has_exit && has_no_exit)
64         return 0;
65     else if(has_no_exit && has_exit)
66         return 3;
67     else
68     {
69         int pre = 0;
70         for(int j = 1; j < index; j++)
71         {
72             if(count[j] < count[pre])
73                 return 2;
74         }
75         return 1;
76     }
77 }
78 
79 int main()
80 {
81     FILE * fp = fopen("in.txt", "r");
82     mygets(str, fp);
83     int number = 0;
84     while(1)
85     {
86         if(mygets(substr[number], fp))
87             number++;
88         else
89             break;
90     }
91     
92     for(int i = 0; i < number; i++)
93         printf("%d\n", MultiStrMatch(str, substr[i]));
94 
95     system("pause");
96     return 0;
97 }

posted @ 2012-09-27 14:59 dandingyy 阅读(1792) 评论(1) 收藏举报

刷新页面返回顶部

dandingyy

window.onload = function() { dp.SyntaxHighlighter.ClipboardSwf = 'https://files.cnblogs.com/dandingyy/clipboard.swf'; dp.SyntaxHighlighter.HighlightAll('code'); };

多单词匹配查找

公告