ASCII字符与非ASCII字符的正则

private static System.Text.RegularExpressions.Regex regex = new Regex("([\u0000-\uffff])");
        private static System.Text.RegularExpressions.Regex cRegex = new Regex("([\u0100-\uffff])");

private static System.Text.RegularExpressions.Regex eRegex = new Regex(([\u0000-\u00ff])");

这三个正则表达式是用来匹配 Unicode 字符的，它们的作用如下：

1️⃣ regex = new Regex("([\u0000-\uffff])");
🔍 解析
\u0000-\uffff：表示所有的 Unicode 字符（U+0000 到 U+FFFF）。
([\u0000-\uffff])：
方括号 []：表示匹配字符范围。
小括号 ()：表示捕获分组（grouping），会将匹配到的字符存入分组。
\u0000：Unicode 编码的起始字符（NULL）。
\uffff：基本多文种平面（BMP，Basic Multilingual Plane）的最后一个字符（特殊字符）。
🎯 作用
这个正则匹配任何单个 Unicode 字符，相当于 .（匹配所有字符，包括中文、英文、特殊符号等）。

✅ 匹配示例

"A" ✅
"汉" ✅
"你" ✅
"🌍" ✅
"😀" ✅
"\n" ✅
⚠️ 注意：

这个正则只能匹配 Unicode BMP 平面（U+0000 ~ U+FFFF），无法匹配 U+10000 及以上的字符（如𠜎、𡃁这样的增补字符）。
如果想匹配完整的 Unicode，应该使用 RegexOptions.Singleline 或者 \p{L} 等 Unicode 字符分类。
2️⃣ cRegex = new Regex("([\u0100-\uffff])");
🔍 解析
\u0100-\uffff：匹配 U+0100 及以上的字符（非 ASCII 但仍在 BMP 内的字符）。
([\u0100-\uffff])：
只匹配 U+0100 ~ U+FFFF 之间的字符（跳过 U+0000 ~ U+00FF）。
也就是跳过了 ASCII 及扩展 ASCII（U+0000 ~ U+00FF），仅匹配非 ASCII 的 Unicode 字符。
🎯 作用
匹配中文、日文、韩文、阿拉伯文等非 ASCII 语言字符。
不会匹配 ASCII 字符（如 A-Z, a-z, 0-9, 标点符号）。
不会匹配扩展 ASCII（西欧特殊符号，如 é, ñ, ü）。
✅ 匹配示例

"汉" ✅
"你" ✅
"😀" ✅
"🌍" ✅
"é" ❌（U+00E9，在 ASCII 扩展范围内）
"A" ❌（U+0041，在 ASCII 范围内）
3️⃣ eRegex = new Regex("([\u0000-\u00ff])");
🔍 解析
\u0000-\u00ff：匹配 U+0000 到 U+00FF 之间的字符（即 ASCII + 扩展 ASCII）。
([\u0000-\u00ff])：
只匹配基本 ASCII + 扩展 ASCII，不会匹配 U+0100 以上的 Unicode 字符。
🎯 作用
匹配 ASCII（U+0000 ~ U+007F）：
A-Z, a-z, 0-9, 标点符号（.,;!?）。
匹配扩展 ASCII（U+0080 ~ U+00FF）：
法语、西班牙语、德语等拉丁字符（é, ñ, ü, ç）。
部分特殊符号。
✅ 匹配示例

"A" ✅
"9" ✅
"," ✅
"é" ✅（U+00E9，在扩展 ASCII 内）
"汉" ❌（U+6C49，不在 U+00FF 内）
"你" ❌（U+4F60，不在 U+00FF 内）
🔥 综述

📌 代码示例

using System;
using System.Text.RegularExpressions;

class Program
{
    private static Regex regex = new Regex("([\u0000-\uffff])"); // 所有 Unicode
    private static Regex cRegex = new Regex("([\u0100-\uffff])"); // 仅非 ASCII
    private static Regex eRegex = new Regex("([\u0000-\u00ff])"); // 仅 ASCII 和扩展 ASCII

    static void Main()
    {
        string test1 = "Hello 世界!";
        string test2 = "École française";
        string test3 = "123456";
        
        Console.WriteLine($"regex  ({test1}): {regex.IsMatch(test1)}");   // ✅ true
        Console.WriteLine($"cRegex ({test1}): {cRegex.IsMatch(test1)}");  // ✅ true （匹配 "世界"）
        Console.WriteLine($"eRegex ({test1}): {eRegex.IsMatch(test1)}");  // ✅ true （匹配 "Hello"）

        Console.WriteLine($"regex  ({test2}): {regex.IsMatch(test2)}");   // ✅ true
        Console.WriteLine($"cRegex ({test2}): {cRegex.IsMatch(test2)}");  // ❌ false （"É" 在 U+00FF 内）
        Console.WriteLine($"eRegex ({test2}): {eRegex.IsMatch(test2)}");  // ✅ true
        
        Console.WriteLine($"regex  ({test3}): {regex.IsMatch(test3)}");   // ✅ true
        Console.WriteLine($"cRegex ({test3}): {cRegex.IsMatch(test3)}");  // ❌ false （全是 ASCII）
        Console.WriteLine($"eRegex ({test3}): {eRegex.IsMatch(test3)}");  // ✅ true
    }
}

📌 结论
regex（[\u0000-\uffff]）：匹配所有 Unicode 字符。
cRegex（[\u0100-\uffff]）：匹配非 ASCII Unicode 字符（如中文、日文、韩文）。
eRegex（[\u0000-\u00ff]）：匹配ASCII 和扩展 ASCII（如英文、数字、西欧字符）。
如果你需要匹配中文字符，应该使用：

Regex chineseRegex = new Regex("([\u4E00-\u9FFF])"); // 匹配中文字符

posted @ 2025-03-15 14:24 JohnYang819 阅读(197) 评论(0) 收藏举报

刷新页面返回顶部

JohnYang

定心者，勇猛精进，终获证果 （知乎：https://www.zhihu.com/people/johnyang819）

ASCII字符与非ASCII字符的正则

定心者，勇猛精进，终获证果（知乎：https://www.zhihu.com/people/johnyang819）