Rust 错误

一个简单的需求：读入一个目录内的所有文本文件，每个文件的格式都是每一行定义一个二维点，例如x=1,y=2；获取所有点的一个列表。这里不使用serde或者现成的parser库，这么简单的格式直接手写就行了

没有错误处理

先来一个糙快猛的版本。其中用了一个nightly feature str_split_once，不过这不重要

#![feature(str_split_once)]

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32,
}

#[derive(Debug)]
struct Directory(Vec<(&'static str, &'static str)>);

fn parse_file(text: &str) -> Vec<Point> {
    let mut points = Vec::new();
    for line in text.lines() {
        let mut x = None;
        let mut y = None;
        for field in line.split(',') {
            let (k, v) = field.split_once('=').unwrap(); // todo
            match k.trim() {
                "x" => { x = Some(v.trim().parse().unwrap()); }, // todo
                "y" => { y = Some(v.trim().parse().unwrap()); }, // todo
                _ => {}, // todo
            }
        }

        points.push(Point { x: x.unwrap(), y: y.unwrap() }); // todo
    }

    points
}

fn main() {
    let d = Directory(vec![
        ("file1", "x=3,y=4\nx=1,y=2")
    ]);

    for (filename, text) in &d.0 {
        println!("{} {:?}", filename, parse_file(text));
    }
}

标准的错误处理

上面的代码里留下了5个todo。想成为一段健壮的程序，它们全部都要处理。这里使用了thiserror库，要换成snafu或者手写也是完全没问题的。代码行数一下就翻倍了，不过看起来是那么的正确

#![feature(str_split_once)]

use std::num::ParseIntError;

use thiserror::Error;
use fehler::{throws, throw};

use self::ParseFileError::*;

#[derive(Error, Debug)]
enum ParseFileError {
    #[error("no equal sign found: `{field}` in line `{line}`")]
    NoEqualSign { field: String, line: String },
    #[error("parse x as i32 failed: `{field}` in line `{line}`")]
    XParseFailed { source: ParseIntError, field: String, line: String },
    #[error("parse y as i32 failed: `{field}` in line `{line}`")]
    YParseFailed { source: ParseIntError, field: String, line: String },
    #[error("unknown key: `{field}` in line `{line}`")]
    UnknownField { field: String, line: String},
    #[error("x not found: `{line}`")]
    XNotFound { line: String },
    #[error("y not found: `{line}`")]
    YNotFound { line: String },
}


#[derive(Debug)]
struct Point {
    x: i32,
    y: i32,
}

#[derive(Debug)]
struct Directory(Vec<(&'static str, &'static str)>);

#[throws(ParseFileError)]
fn parse_file(text: &str) -> Vec<Point> {
    let mut points = Vec::new();
    for line in text.lines() {
        let mut x = None;
        let mut y = None;
        for field in line.split(',') {
            let (k, v) = field.split_once('=').ok_or_else(|| NoEqualSign {
                field: field.into(),
                line: line.into(),
            })?;
            match k.trim() {
                "x" => { x = Some(v.trim().parse().map_err(|e| {
                    XParseFailed {
                        source: e,
                        field: field.into(),
                        line: line.into(),
                    }
                })?); },
                "y" => { y = Some(v.trim().parse().map_err(|e| {
                    YParseFailed {
                        source: e,
                        field: field.into(),
                        line: line.into(),
                    }
                })?); },
                _ => {
                    throw!(UnknownField {
                        field: field.into(),
                        line: line.into(),
                    })
                },
            }
        }

        points.push(Point {
            x: x.ok_or_else(|| XNotFound { line: line.into() })?,
            y: y.ok_or_else(|| YNotFound { line: line.into() })?,
        });
    }

    points
}

fn main() {
    let d = Directory(vec![
        ("file1", "x=3,y=4\nx=1,y=2")
    ]);

    for (filename, text) in &d.0 {
        println!("{} {:?}", filename, parse_file(text));
    }
}

同时返回多个错误

需求方说，你的程序的错误报告做得非常好，不过如果能够同时报告所有错误，那就完美了。于是我们又给ParseFileError加上了List的variant：

#![feature(str_split_once)]

use std::num::ParseIntError;

use thiserror::Error;
use fehler::{throws, throw};

use self::ParseFileError::*;

#[derive(Error, Debug)]
enum ParseFileError {
    #[error("no equal sign found: `{field}` in line `{line}`")]
    NoEqualSign { field: String, line: String },
    #[error("parse x as i32 failed: `{field}` in line `{line}`")]
    XParseFailed { source: ParseIntError, field: String, line: String },
    #[error("parse y as i32 failed: `{field}` in line `{line}`")]
    YParseFailed { source: ParseIntError, field: String, line: String },
    #[error("unknown key: `{field}` in line `{line}`")]
    UnknownField { field: String, line: String},
    #[error("x not found: `{line}`")]
    XNotFound { line: String },
    #[error("y not found: `{line}`")]
    YNotFound { line: String },
    #[error("multiple errors")]
    List(Vec<ParseFileError>),
}


#[derive(Debug)]
struct Point {
    x: i32,
    y: i32,
}

#[derive(Debug)]
struct Directory(Vec<(&'static str, &'static str)>);

#[throws(ParseFileError)]
fn parse_file(text: &str) -> Vec<Point> {
    let mut points = Vec::new();
    let mut error_list = Vec::new();
    for line in text.lines() {
        let mut line_error_list = Vec::new();
        let mut x = None;
        let mut y = None;
        for field in line.split(',') {
            let (k, v) = match field.split_once('=') {
                Some((k, v)) => (k, v),
                None => {
                    line_error_list.push(NoEqualSign {
                        field: field.into(),
                        line: line.into(),
                    });
                    continue; // 后面的解析都没有意义了，直接continue。如果parse_line是一个单独的函数，这里相当于无视收集错误可以直接用?返回。懒得展示了……
                }
            };
            match k.trim() {
                "x" => {
                    // 写起来会很麻烦……
                    match v.trim().parse() {
                        Ok(xx) => x = Some(xx),
                        Err(e) => {
                            line_error_list.push(XParseFailed {
                                source: e,
                                field: field.into(),
                                line: line.into(),
                            });
                        },
                    }
                },
                "y" => {
                    // 或者可以这样“简洁”地写
                    y = v.trim().parse().map_err(|e| {
                        line_error_list.push(YParseFailed {
                            source: e,
                            field: field.into(),
                            line: line.into(),
                        });
                        () // 可省略
                    }).ok();
                },
                _ => {
                    line_error_list.push(UnknownField {
                        field: field.into(),
                        line: line.into(),
                    });
                },
            }
        }

        // 可能需求会是如果此行没有找到等号，就不用报告找不到x/y的错误了
        // 或者如果解析x时出错，也不用报告找不到x的错误
        // 这里就不处理那么多了……
        if x.is_none() {
            line_error_list.push(XNotFound {
                line: line.into(),
            });
        }
        if y.is_none() {
            line_error_list.push(YNotFound {
                line: line.into(),
            });
        }

        if line_error_list.is_empty() {
            points.push(Point {
                x: x.unwrap(),
                y: y.unwrap(),
            });
        } else {
            error_list.extend(line_error_list);
        }
    }

    if error_list.is_empty() {
        points
    } else {
        throw!(List(error_list));
    }
}

fn main() {
    let d = Directory(vec![
        ("file1", "x=3,y=4t\nx=1,y=2,z=3")
    ]);

    for (filename, text) in &d.0 {
        println!("{} {:?}", filename, parse_file(text));
    }
}

支持警告

甲方爸爸又来了，先是惯例地夸赞了一番，然后说，某些数据里除了x、y之外，还可能有z、w等等字段，它们直接忽略即可，就不用报错了。不过呢，如果能够输出一句警告，说明哪个文件的哪一行有这些字段，那就能让这个完美的工具更完美了。

@@ -3,7 +3,7 @@
 use std::num::ParseIntError;

 use thiserror::Error;
-use fehler::{throws, throw};
+use w_result::WResult;

 use self::ParseFileError::*;

@@ -35,10 +35,10 @@ struct Point {
 #[derive(Debug)]
 struct Directory(Vec<(&'static str, &'static str)>);

-#[throws(ParseFileError)]
-fn parse_file(text: &str) -> Vec<Point> {
+fn parse_file(text: &str) -> WResult<Vec<Point>, ParseFileError, ParseFileError> {
        let mut points = Vec::new();
        let mut error_list = Vec::new();
+       let mut warning_list = Vec::new();
        for line in text.lines() {
                let mut line_error_list = Vec::new();
                let mut x = None;
@@ -80,7 +80,7 @@ fn parse_file(text: &str) -> Vec<Point> {
                                        }).ok();
                                },
                                _ => {
-                                       line_error_list.push(UnknownField {
+                                       warning_list.push(UnknownField {
                                                field: field.into(),
                                                line: line.into(),
                                        });
@@ -113,18 +113,18 @@ fn parse_file(text: &str) -> Vec<Point> {
        }

        if error_list.is_empty() {
-               points
+               WResult::WOk(points, warning_list)
        } else {
-               throw!(List(error_list));
+               WResult::WErr(List(error_list))
        }
 }

这次的改动比较小，主要就是引入WResult。继放弃?之后，fehler也被放弃了。

MyResult

现在的代码已经非常别扭了。如果一开始就知道有这些需求，应该怎么设计呢？我们已经用过了标准库里的Result，第三方的WResult，试试自己定义的MyResult。下面是经过了几次整理的最终代码，自我感觉思路应该是对了。命名等比较随意就不要追究了。

#![feature(str_split_once)]
#![feature(try_trait)]

#[macro_use] extern crate shrinkwraprs;

use std::num::ParseIntError;
use std::fmt::{self, Debug};
use std::error::Error as _;
use std::ops::Try;

use thiserror::Error;


use self::ParseLineError::*;
use self::MyResult::{MyOk, MyErr};
use self::MyError::{Collected, Fatal};

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32,
}

#[derive(Debug)]
struct Directory(Vec<(&'static str, &'static str)>);

#[derive(Debug, Shrinkwrap)]
#[shrinkwrap(mutable)]
struct Warnings<E>(Vec<E>);

impl<E> Warnings<E> {
    fn new() -> Self {
        Self(Vec::new())
    }
}

impl<E: Debug> From<Warnings<E>> for MyError<E> {
    fn from(warn: Warnings<E>) -> Self {
        Collected(warn.0)
    }
}

#[derive(Debug, Error)]
enum MyError<E: Debug> {
    Collected(Vec<E>),
    Fatal(E),
}

impl<E: Debug> fmt::Display for MyError<E> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{:?}", self)
    }
}

#[must_use]
#[derive(Debug)]
enum MyResult<T, E: Debug> {
    MyOk {
        ok: T,
        warn: Warnings<E>,
    },
    MyErr(MyError<E>),
}

impl<T, E: Debug> Try for MyResult<T, E> {
    type Ok = T;
    type Error = E;

    fn into_result(self) -> Result<T, E> {
        unimplemented!()
    }

    fn from_ok(v: T) -> Self {
        MyOk {
            ok: v,
            warn: Warnings::new(),
        }
    }

    fn from_error(v: E) -> Self {
        MyErr(Fatal(v))
    }
}

impl<T, E: Debug> MyResult<T, E> {
    fn drain_warning(self, f: &dyn Fn(E)) -> MyResult<T, E> {
        match self {
            MyOk {ok, mut warn} => {
                for w in warn.drain(..) {
                    f(w);
                }
                MyOk {
                    ok,
                    warn: Warnings::new(),
                }
            },
            MyErr(e) => MyErr(e),
        }
    }
}

#[derive(Error, Debug)]
enum ParseLineError {
    #[error("no equal sign found: `{field}`")]
    NoEqualSign { field: String },
    #[error("parse x as i32 failed: `{field}`")]
    XParseFailed { source: ParseIntError, field: String },
    #[error("parse y as i32 failed: `{field}`")]
    YParseFailed { source: ParseIntError, field: String },
    #[error("unknown key: `{field}`")]
    UnknownField { field: String},
    #[error("x not found")]
    XNotFound,
    #[error("y not found")]
    YNotFound,
}

fn ok_wrapping<T, E: Debug>(ok: impl FnOnce() -> T, warning_list: Warnings<E>, error_list: Vec<E>) -> MyResult<T, E> {
    if error_list.is_empty() {
        MyOk {
            ok: ok(),
            warn: warning_list,
        }
    } else {
        MyErr(Collected(error_list))
    }
}

fn parse_line(line: &str) -> MyResult<Point, ParseLineError> {
    let mut warning_list = Warnings::new();
    let mut error_list = Vec::new();
    let mut x = None;
    let mut y = None;
    for field in line.split(',') {
        let (k, v) = match field.split_once('=') {
            Some((k, v)) => (k, v),
            None => {
                return MyErr(Fatal(NoEqualSign {
                    field: field.into(),
                })); // 为MyResult实现了Try，也可以使用?：Err(NoEqualSign { field: field.into(), })?
            }
        };
        match k.trim() {
            "x" => {
                // 写起来会很麻烦……
                match v.trim().parse() {
                    Ok(xx) => x = Some(xx),
                    Err(e) => {
                        error_list.push(XParseFailed {
                            source: e,
                            field: field.into(),
                        });
                    },
                }
            },
            "y" => {
                // 或者可以这样“简洁”地写
                y = v.trim().parse().map_err(|e| {
                    error_list.push(YParseFailed {
                        source: e,
                        field: field.into(),
                    });
                    () // 可省略
                }).ok();
            },
            _ => {
                warning_list.push(UnknownField {
                    field: field.into(),
                });
            },
        }
    }

    // 可能需求会是如果此行没有找到等号，就不用报告找不到x/y的错误了
    // 或者如果解析x时出错，也不用报告找不到x的错误
    // 这里就不处理那么多了……
    if x.is_none() {
        error_list.push(XNotFound);
    }
    if y.is_none() {
        error_list.push(YNotFound);
    }

    ok_wrapping(|| Point {
        x: x.unwrap(),
        y: y.unwrap(),
    }, warning_list, error_list)
}

#[derive(Error, Debug)]
enum ParseFileError {
    #[error("parse line failed: `{line}`")]
    Line {line: String, source: MyError<ParseLineError>},
}

fn parse_file(text: &str) -> MyResult<Vec<Point>, ParseFileError> {
    let mut points = Vec::new();
    let mut warning_list = Warnings::new();
    let mut error_list = Vec::new();
    for line in text.lines() {
        match parse_line(line) {
            MyOk { ok, warn } => {
                points.push(ok);
                if !warn.is_empty() {
                    warning_list.push(ParseFileError::Line {
                        line: line.into(),
                        source: warn.into(),
                    });
                }
            },
            MyErr(e) => {
                error_list.push(ParseFileError::Line {
                    line: line.into(),
                    source: e,
                });
            }
        }
    }

    ok_wrapping(|| points, warning_list, error_list)
}

#[derive(Error, Debug)]
enum ParseDirectoryError {
    #[error("parse file failed: `{file}`")]
    File {file: String, source: MyError<ParseFileError>},
}

fn parse_directory(directory: &Directory) -> MyResult<Vec<Point>, ParseDirectoryError> {
    let mut points = Vec::new();
    let mut warning_list = Warnings::new();
    let mut error_list = Vec::new();

    for (filename, text) in &directory.0 {
        match parse_file(text) {
            MyOk { ok, warn } => {
                points.extend(ok);
                if !warn.is_empty() {
                    warning_list.push(ParseDirectoryError::File {
                        file: (*filename).into(),
                        source: warn.into(),
                    });
                }
            },
            MyErr(e) => {
                error_list.push(ParseDirectoryError::File {
                    file: (*filename).into(),
                    source: e,
                });
            }
        }
    }

    ok_wrapping(|| points, warning_list, error_list)
}

fn main() {
    let d = Directory(vec![
        ("file1", "x=3,y=4\nx=1,y=2,z=3")
    ]);

    let r = parse_directory(&d).drain_warning(&|e| {
        println!("{}", e);
        if let Some(s) = e.source() {
            println!("caused by: {}", s);
        }
    });

    println!("{:?}", r);
}

确实每个函数都应该有一个错误类型，甚至可以有多个，例如ParseLineError可以再抽一个ParseFieldError出来，反复出现的field字段就是征兆
错误的来源可以是多个，最终是树状结构。正如rfc2965里说的那样（RustConf 2020的报告里也提到了）：(Pain points) (TheErrortrait) can only represent singly-linked lists for chains of errors。现在的状态Error Reporter并不能太好地工作
函数独立地决定某个错误是Error还是Warning，但调用者可以把自己对应的错误放到自己的error_list或warning_list里，甚至直接丢弃。例如需求可能是，file1可以有xy之外的字段不用管，其它文件则需要警告。比起把额外的信息传入parse_line函数，在parse_directory里处理UnknownField会更清晰
Warning的类型和Error一致，不要再单独弄个枚举。如果参考log的术语，似乎还应该有info/debug/trace等等，但我暂时认为它们不需要返回给调用者
如果Rust支持匿名枚举，也许不需要特意定义MyError了

错误处理真是太累了……总之暂时先用这个思路试试看效果如何。当然绝大部分情况下我肯定都会偷懒，但至少需要返回错误列表时我会统一使用MyResult而不是一会Vec<Result<()>>一会Vec<Error>了

转自:

作者:

posted @ 2022-08-06 20:43 CrossPython 阅读(85) 评论(0) 收藏举报

刷新页面返回顶部