Python过滤utf8mb4无效字符

    def replace_utf8mb4(self, v):
        """Replace 4-byte unicode characters by REPLACEMENT CHARACTER"""
        import re
        INVALID_UTF8_RE = re.compile(u'[^\u0000-\uD7FF\uE000-\uFFFF]', re.UNICODE)
        INVALID_UTF8_RE.sub(u'\uFFFD', v)

 

posted @ 2013-12-26 14:51  smallcode  阅读(1243)  评论(0)    收藏  举报