URL编码
本文的目的是设计一个完毕URL编码的C++类。
在我以前的项目中。我须要从VC++ 6.0应用程序中POST数据,而这些数据须要进行URL编码。
我在MSDN中查找能依据提供的字符串生成URL编码的相关类或API。但我没有找到。因此我必须设计一个自己的URLEncode C++类。
URLEncoder.exe是一个使用URLEncode类的MFC对话框程序。
怎样处理
一些特殊字符在Internet上传送是件棘手的事情, 经URL编码特殊处理。能够使全部字符安全地从Internet传送。
比如,回车的ASCII值是13,在发送FORM数据时候这就觉得是一行数据的结束。
通常。全部应用程序採用HTTP或HTTPS协议在client和server端传送数据。server端从client接收数据有两种基本方法:
1、数据能够从HTTP头传送(COOKIES或作为FORM数据发送)
2、能够包括在URL中的查询部分
当数据包括在URL。它必须遵循URL语法进行编码。
在WEBserver端,数据自己主动解码。考虑一下以下的URL,哪个数据是作为查询參数。
比如:http://WebSite/ResourceName?
Data=Data
WebSite是URL名称
ResourceName能够是ASP或Servlet名称
Data是须要发送的数据。假设MIME类型是Content-Type: application/x-www-form-urlencoded,则要求进行编码。
RFC 1738
RFC 1738指明了统一资源定位(URLs)中的字符应该是US-ASCII字符集的子集。这是受HTML的限制,还有一方面,同意在文档中使用全部ISO-8859-1(ISO-Latin)字符集。这将意味着在HTML FORM里POST的数据(或作为查询字串的一部分),全部HTML编码必须被编码。
ISO-8859-1 (ISO-Latin)字符集
在下表中。包括了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每一个字符范围(10进制),描写叙述,实际值,十六进制值,HTML结果。某个范围中的字符是否安全。
Character range(decimal) | Type | Values | Safe/Unsafe |
0-31 | ASCII Control Characters | These characters are not printable | Unsafe |
32-47 | Reserved Characters | '' ''!? #$%&''()*+,-./ |
Unsafe |
48-57 | ASCII Characters and Numbers | 0-9 | Safe |
58-64 | Reserved Characters | :;<=>? @ |
Unsafe |
65-90 | ASCII Characters | A-Z | Safe |
91-96 | Reserved Characters | [\]^_` | Unsafe |
97-122 | ASCII Characters | a-z | Safe |
123-126 | Reserved Characters | {|}~ | Unsafe |
127 | Control Characters | '' '' | Unsafe |
128-255 | Non-ASCII Characters | '' '' | Unsafe |
全部不安全的ASCII字符都须要编码。比如,范围(32-47, 58-64, 91-96, 123-126)。
下表描写叙述了这些字符为什么不安全。
Character | Unsafe Reason | Character Encode |
"<" | Delimiters around URLs in free text | %3C |
> | Delimiters around URLs in free text | %3E |
. | Delimits URLs in some systems | %22 |
# | It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. | %23 |
{ | Gateways and other transport agents are known to sometimes modify such characters | %7B |
} | Gateways and other transport agents are known to sometimes modify such characters | %7D |
| | Gateways and other transport agents are known to sometimes modify such characters | %7C |
\ | Gateways and other transport agents are known to sometimes modify such characters | %5C |
^ | Gateways and other transport agents are known to sometimes modify such characters | %5E |
~ | Gateways and other transport agents are known to sometimes modify such characters | %7E |
[ | Gateways and other transport agents are known to sometimes modify such characters | %5B |
] | Gateways and other transport agents are known to sometimes modify such characters | %5D |
` | Gateways and other transport agents are known to sometimes modify such characters | %60 |
+ | Indicates a space (spaces cannot be used in a URL) | %20 |
/ | Separates directories and subdirectories | %2F |
? | Separates the actual URL and the parameters | %3F |
& | Separator between parameters specified in the URL | %26 |
怎样实现
字符的URL编码是将字符转换到8位16进制并在前面加上''%''前缀。比如。US-ASCII字符集中空格是10进制的32或16进制的20。因此,URL编码是%20。
URLEncode: URLEncode是一个C++类,来实现字符串的URL编码。CURLEncode类包括例如以下函数:
isUnsafeString
decToHex
convert
URLEncode
URLEncode()函数完毕编码过程,URLEncode检查每一个字符,看是否安全。
假设不安全将用%16进制值进行转换并加入
到原始字符串中。
代码片断:
class CURLEncode { private: static CString csUnsafeString; CString (char num, int radix); bool isUnsafe(char compareChar); CString convert(char val); public: CURLEncode() { }; virtual ~CURLEncode() { }; CString (CString vData); }; bool CURLEncode::isUnsafe(char compareChar) { bool bcharfound = false; char tmpsafeChar; int m_strLen = 0; m_strLen = csUnsafeString.GetLength(); for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++) { tmpsafeChar = csUnsafeString.GetAt(ichar_pos); if(tmpsafeChar == compareChar) { bcharfound = true; break; } } int char_ascii_value = 0; //char_ascii_value = __toascii(compareChar); char_ascii_value = (int) compareChar; if(bcharfound == false && char_ascii_value > 32 && char_ascii_value < 123) { return false; } // found no unsafe chars, return false else { return true; } return true; } CString CURLEncode::decToHex(char num, int radix) { int temp=0; CString csTmp; int num_char; num_char = (int) num; if (num_char < 0) num_char = 256 + num_char; while (num_char >= radix) { temp = num_char % radix; num_char = (int)floor(num_char / radix); csTmp = hexVals[temp]; } csTmp += hexVals[num_char]; if(csTmp.GetLength() < 2) { csTmp += ''0''; } CString strdecToHex(csTmp); // Reverse the String strdecToHex.MakeReverse(); return strdecToHex; } CString CURLEncode::convert(char val) { CString csRet; csRet += "%"; csRet += decToHex(val, 16); return csRet; }
參考:
URL编码: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm.
RFC 1866: The HTML 2.0 规范 (纯文本). 附录包括了字符表: http://www.rfc-editor.org/rfc/rfc1866.txt.
Web HTML 2.0 版本号(RFC 1866) : http://www.w3.org/MarkUp/html-spec/html-spec_13.html.
The HTML 3.2 (Wilbur) 建议: http://www.w3.org/MarkUp/Wilbur/.
The HTML 4.0 建议: http://www.w3.org/TR/REC-html40/.
W3C HTML 国际化区域: http://www.w3.org/International/O-HTML.html.