• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录

duluu

  • 博客园
  • 联系
  • 订阅
  • 管理

公告

View Post

PAT (Top Level) Practice-1005 Programming Pattern

Programmers often have a preference among program constructs. For example, some may prefer if(0==a), while others may prefer if(!a). Analyzing such patterns can help to narrow down a programmer's identity, which is useful for detecting plagiarism.

Now given some text sampled from someone's program, can you find the person's most commonly used pattern of a specific length?

Input Specification:

Each input file contains one test case. For each case, there is one line consisting of the pattern length N (1), followed by one line no less than N and no more than 1048576 characters in length, terminated by a carriage return \n. The entire input is case sensitive.

Output Specification:

For each test case, print in one line the length-N substring that occurs most frequently in the input, followed by a space and the number of times it has occurred in the input. If there are multiple such substrings, print the lexicographically smallest one.

Whitespace characters in the input should be printed as they are. Also note that there may be multiple occurrences of the same substring overlapping each other.

 

 1 // 寻找长度为m,出现次数最多,字典序最小的子串
 2 #include<cstdio>
 3 #include<cstring>
 4 #include<unordered_map>
 5 
 6 typedef unsigned long long ull;
 7 const int maxn = 1148576, base = 257;
 8 char s[maxn];
 9 std::unordered_map<ull, int> c; // c[hash(长度为m的子串x)]=x出现的次数
10 
11 int main() {
12     int m, n;
13     scanf("%d\n", &m);
14     fgets(s, sizeof(s), stdin); n = strlen(s);
15     if (s[n-1] == '\n') s[--n] = '\0';
16     ull hash = 0, t = 1;
17     for (int i = 0; i < m; ++i) { hash *= base; hash += s[i]; t*= base; }
18     c[hash] = 1;
19     int maxi = m - 1, maxc = 1;
20     for (int i = m; i < n; ++i) { // 计算子串s[i-m+1,i]的hash值
21         hash *= base;
22         hash += s[i] - s[i-m]*t;
23         ++c[hash];
24         if (c[hash] > maxc || c[hash] == maxc && strcmp(s+i-m+1, s+maxi-m+1) < 0) { maxc = c[hash]; maxi = i; }
25     }
26     for (int i = maxi-m+1; i <= maxi; ++i) printf("%c", s[i]);
27     printf(" %d\n", maxc);
28     return 0;
29 }
View Code

 

没有啥好说的,暴力hash

后缀数组代码量太大了,效率不清楚( •̀ ω •́ )y

 

posted on 2020-06-27 18:04  duluu  阅读(64)  评论(0)    收藏  举报

刷新页面返回顶部
 
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3