6.4 Using tokens with references 将令牌与引用一起使用

https://lalrpop.github.io/lalrpop/lexer_tutorial/004_token_references.html

When using a custom lexer, you might want tokens to hold references to the original input. This allows to use references to the input when the grammar can have arbitrary symbols such as variable names. Using references instead of copying the symbols can improve performance and memory usage of the parser.

MST -- 使用自定义词法分析器时,您可能希望令牌保存对原始输入的引用。这允许在语法可以包含任意符号(例如变量名称)时使用对输入的引用。使用引用而不是复制符号可以提高解析器的性能和内存使用率。

The Lexer

We can now create a new calculator parser that can deal with symbols the same way an interpreter would deal with variables. First we need the corresponding AST :

MST -- 我们现在可以创建一个新的计算器解析器,它可以像解释器处理变量一样处理符号。首先我们需要相应的 AST :

pub enum ExprSymbol<'input>{
    NumSymbol(&'input str),
    Op(Box<ExprSymbol<'input>>, Opcode, Box<ExprSymbol<'input>>),
    Error,
}

Then, we need to build the tokens:

#[derive(Copy, Clone, Debug)]
pub enum Tok<'input> {
    NumSymbol(&'input str),
    FactorOp(Opcode),
    ExprOp(Opcode),
    ParenOpen,
    ParenClose,
}

Notice the NumSymbol type holding a reference to the original input. It represents both numbers and variable names as a slice of the original input.

MST -- 请注意 NumSymbol 类型包含对原始输入的引用。它将数字和变量名称表示为原始输入的切片。

Then, we can build the lexer itself.

MST -- 然后,我们可以构建词法分析器本身。

use std::str::CharIndices;

pub struct Lexer<'input> {
    chars: std::iter::Peekable<CharIndices<'input>>,
    input: &'input str,
}

impl<'input> Lexer<'input> {
    pub fn new(input: &'input str) -> Self {
        Lexer {
            chars: input.char_indices().peekable(),
            input,
        }
    }
}

It needs to hold a reference to the input to put slices in the tokens.

MST -- 它需要保存对 input 的引用,才能将 slice 放入 tokens 中。

impl<'input> Iterator for Lexer<'input> {
    type Item = Spanned<Tok<'input>, usize, ()>;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.chars.next() {
                Some((_, ' '))  | Some((_, '\n')) | Some((_, '\t')) => continue,
                Some((i, ')')) => return Some(Ok((i, Tok::ParenClose, i + 1))),
                Some((i, '(')) => return Some(Ok((i, Tok::ParenOpen, i + 1))),
                Some((i, '+')) => return Some(Ok((i, Tok::ExprOp(Opcode::Add), i + 1))),
                Some((i, '-')) => return Some(Ok((i, Tok::ExprOp(Opcode::Sub), i + 1))),
                Some((i, '*')) => return Some(Ok((i, Tok::FactorOp(Opcode::Mul), i + 1))),
                Some((i, '/')) => return Some(Ok((i, Tok::FactorOp(Opcode::Div), i + 1))),

                None => return None, // End of file
                Some((i,_)) => {
                    loop {
                        match self.chars.peek() {
                            Some((j, ')'))|Some((j, '('))|Some((j, '+'))|Some((j, '-'))|Some((j, '*'))|Some((j, '/'))|Some((j,' '))
                            => return Some(Ok((i, Tok::NumSymbol(&self.input[i..*j]), *j))),
                            None => return Some(Ok((i, Tok::NumSymbol(&self.input[i..]),self.input.len()))),
                            _ => {self.chars.next();},
                        }
                    }
                }
            }
        }
    }
}

It's quite simple, it returns any operator, and if it detects any other character, stores the beginning then continues to the next operator and sends the symbol it just parsed.

MST -- 这很简单,它返回任何运算符,如果检测到任何其他字符,则存储开头,然后继续下一个运算符并发送它刚刚解析的符号。

The parser

We can then take a look at the corresponding parser with a new grammar:

MST -- 然后,我们可以查看具有新语法的相应解析器:

Term: Box<ExprSymbol<'input>> = {
    "num" => Box::new(ExprSymbol::NumSymbol(<>)),
    "(" <Expr> ")"
};

We need to pass the input to the parser so that the input's lifetime is known to the borrow checker when compiling the generated parser.

MST -- 我们需要将 input 传递给解析器,以便在编译生成的解析器时,borrow checker 知道 input 的生命周期。

grammar<'input>(input: &'input str);

Then we just need to define the tokens the same as before :

MST -- 然后我们只需要定义和以前一样的 token:

extern {
    type Location = usize;
    type Error = ();

    enum Tok<'input> {
        "num" => Tok::NumSymbol(<&'input str>),
        "FactorOp" => Tok::FactorOp(<Opcode>),
        "ExprOp" => Tok::ExprOp(<Opcode>),
        "(" => Tok::ParenOpen,
        ")" => Tok::ParenClose,
    }
}

Calling the parser

We can finally run the parser we built:

let input = "22 * pi + 66";
let lexer = Lexer::new(input);
let expr = calculator9::ExprParser::new()
    .parse(input,lexer)
    .unwrap();
assert_eq!(&format!("{:?}", expr), "((\"22\" * \"pi\") + \"66\")");

posted on 2025-01-05 16:13  及途又八  阅读(11)  评论(0)    收藏  举报

导航