词法分析程序的设计与实现
词法分析程序(Lexical Analyzer)要求:
- 从左至右扫描构成源程序的字符流
- 识别出有词法意义的单词(Lexemes)
- 返回单词记录(单词类别,单词本身)
- 滤掉空格
- 跳过注释
- 发现词法错误
程序结构:
输入:字符流(什么输入方式,什么数据结构保存)
处理:
–遍历(什么遍历方式)
–词法规则
输出:单词流(什么输出形式)
–二元组
单词类别:
1.标识符(10)
2.无符号数(11)
3.保留字(一词一码)
4.运算符(一词一码)
5.界符(一词一码)
单词符号 |
种别码 |
单词符号 |
种别码 |
public |
1 |
: |
17 |
class |
2 |
:= |
18 |
static |
3 |
< |
20 |
void |
4 |
<= |
21 |
main |
5 |
<> |
22 |
} |
30 |
> |
23 |
l(l|d)* |
10 |
>= |
24 |
dd* |
11 |
= |
25 |
+ |
13 |
; |
26 |
- |
14 |
( |
27 |
* |
15 |
) |
28 |
/ |
16 |
{ |
29 |
package cn.itcast.day13Collection; import java.io.*; import java.util.*; public class Lexical_Analyzer { public static void main(String[] args) throws IOException { StringBuffer str=new StringBuffer(""); FileReader fr = new FileReader("D:\\webquanzhan\\day05-code\\day05_code\\src\\cn\\itcast\\day13Collection\\demo01.txt"); BufferedReader bf = new BufferedReader(fr); //读取文章结尾 //消除注释 String tail = ""; String data = ""; while ((tail = bf.readLine()) != null) { if (!tail.startsWith("//") && !tail.endsWith("//")) { data += tail; } } bf.close(); //分割文章,划分为多个单词可能含有a+b,这种字母和符号结合 String regex = "[ ,|\\n,|,|\\s]"; String[] word = data.split(regex); //添加到链表里进行操作 ArrayList<String> wordlist = new ArrayList<>(); for (int i = 0; i < word.length; i++) { wordlist.add(word[i]); } HashMap<String, Integer> rewordmap = new HashMap<>(); rewordmap.put("public", 1); rewordmap.put("class", 2); rewordmap.put("static ", 3); rewordmap.put("void ", 4); rewordmap.put("main", 5); rewordmap.put("+", 13); rewordmap.put("-", 14); rewordmap.put("*", 15); rewordmap.put("/", 16); rewordmap.put(":", 17); rewordmap.put(":=", 18); rewordmap.put("<", 20); rewordmap.put("<=", 21); rewordmap.put("<>", 22); rewordmap.put(">", 23); rewordmap.put(">=", 24); rewordmap.put("=", 25); rewordmap.put(";", 26); rewordmap.put("(", 27); rewordmap.put(")", 28); rewordmap.put("{", 29); rewordmap.put("}", 30); Iterator<String> it = wordlist.iterator(); while (it.hasNext()) { String words = it.next(); //将每个单词进行遍历,查看是否有符号,将符号提取出来 char[] chars = words.toCharArray(); for (int i = 0; i < chars.length; i++) { String s = String.valueOf(chars[i]); if (rewordmap.containsKey(s)) { //判断map是否含有对应的键,返回对应的值 Integer integer1 = rewordmap.get(s); System.out.println(s + "-->" + integer1); } } //将符号替代为空格,剩下单词空格隔开,然后利用空格将单词分割成一个数组 String new_words = words.replaceAll("[+,-,*,/,=,;]", " "); String[] split_new_words = new_words.split("[ ,{,},(,)]"); //遍历数组查看是否符合条件 for (int i = 0; i < split_new_words.length; i++) { String s=split_new_words[i]; //判断是否为数子,查阅百度得知该方法 boolean isNum01 = checkStrIsNum01(s); Integer integer = rewordmap.get(split_new_words[i]); //判断是否为保留字 if (rewordmap.containsKey(split_new_words[i])) { System.out.println(split_new_words[i] + "-->" + integer); } else if (isNum01) { System.out.println(split_new_words[i] + "-->11"); } else if (split_new_words[i]!=""){ System.out.println(split_new_words[i] + "-->10"); } } } } public static boolean checkStrIsNum01(String str) { for (int i = 0; i < str.length(); i++) { if (!Character.isDigit(str.charAt(i))) { return false; } } return true; } }
测试文件:
运行结果: