摘要: URL REWRITE 如下的地址http://127.0.0.1/playflash/1_2_3_4_*****.html或者http://127.0.0.1/playflash/1/2/3/4/5/*****/当这个站点的URL "/playflash/" 开始算起 达到 260 的length后,IIS会报出如下错误:Bad Request (Invalid URL)经查GG之后,按照帮助h...阅读全文
posted @ 2008-10-25 16:22 Kevan 阅读(976) 评论(4) 编辑

C# Regular Expressions

by Brad Merrill
01/18/2001

Introduction

Regular expressions have been used in various programming languages and tools for many years. The .NET Base Class Libraries include a namespace and a set of classes for utilizing the power of regular expressions. They are designed to be compatible with Perl 5 regular expressions whenever possible.

In addition, the regexp classes implement some additional functionality, such as named capture groups, right- to-left pattern matching, and expression compilation.

In this article, I'll provide a quick overview of the classes and methods of the System.Text.RegularExpression assembly, some examples of matching and replacing strings, a more detailed walk-through of a grouping structure, and finally, a set of cookbook expressions for use in your own applications.

Presumed Knowledge Base

Regular expression knowledge seems to be one of those topics that most programmers have learned and forgotten, more than once. For the purposes of this article, I will presume some previous use of regular expressions, and specifically, some experience with their use within Perl 5, as a reference point. The .NET regexp classes are a superset of Perl 5 functionality, so this will serve as a good conceptual starting point.

I'm also presuming a basic knowledge of C# syntax and the .NET Framework environment.

If you are new to regular expressions, I suggest starting with some of the basic Perl 5 introductions. The perl.com site has some great resource materials and introductory tutorials.

The definitive work on regular expressions is Mastering Regular Expressions, by Jeffrey E. F. Friedl. For those who want to get the most out of working with regular expressions, I highly recommend this book.

The RegularExpression Assembly

The regexp classes are contained in the System.Text.RegularExpressions.dll assembly, and you will have to reference the assembly at compile time in order to build your application. For example: csc /r:System.Text.RegularExpressions.dll foo.cs will build the foo.exe assembly, with a reference to the System.Text.RegularExpressions assembly.

A Brief Overview of the Namespace

There are actually only six classes and one delegate definition in the assembly namespace. These are:

Capture: Contains the results of a single match

CaptureCollection: A sequence of Capture's

Group: The result of a single group capture, inherits from Capture

Match: The result of a single expression match, inherits from Group

MatchCollection: A sequence of Match's

MatchEvaluator: A delegate for use during replacement operations

Regex: An instance of a compiled regular expression

The Regex class also contains several static methods:

Escape: Escapes regex metacharacters within a string

IsMatch: Methods return a boolean result if the supplied regular expression matches within the string

Match: Methods return Match instance

Matches: Methods return a list of Match as a collection

Replace: Methods that replace the matched regular expressions with replacement strings

Split: Methods return an array of strings determined by the expression

Unescape: Unescapes any escaped characters within a string

Simple Matches

Let's start with simple expressions using the Regex and the Match class.

    Match m = Regex.Match("abracadabra", "(a|b|r)+");

You now have an instance of a Match that can be tested for success, as in:

    if (m.Success)
...

without even looking at the contents of the matched string.

If you wanted to use the matched string, you can simply convert it to a string:

    Console.WriteLine("Match="+m.ToString());

This example gives us the output:

Match=abra

which is the amount of the string that has been successfully matched.

Replacing Strings

Simple string replacements are very straightforward. For example, the statement:

  string s = Regex.Replace("abracadabra", "abra", "zzzz");

returns the string zzzzcadzzzz, in which all occurrences of the matching pattern are replaced by the replacement string zzzzz.

Now let's look at a more complex expression:

  string s = Regex.Replace("  abra  ", @"^\s*(.*?)\s*$", "$1");

This returns the string abra, with preceeding and trailing spaces removed.

The above pattern is actually generally useful for removing leading and trailing spaces from any string. We also have used the literal string quote construct in C#. Within a literal string, the compiler does not process the \ as an escape character. Consequently, the @"..." is very useful when working with regular expressions, when you are specifying escaped metacharacters with a \. Also of note is the use of $1 as the replacement string. The replacement string can only contain substitutions, which are references to Capture Group in the regular expression.

Engine Details

Now let's try to understand a slightly more complex sample by doing a walk-through of a grouping structure. Given the following sample:

string text = "abracadabra1abracadabra2abracadabra3";
string pat = @"
(		# start the first group
abra	# match the literal 'abra'
(		# start the second (inner) group
cad	# match the literal 'cad'
)?	# end the second (optional) group
)		# end the first group
+		# match one or more occurences
";
// use 'x' modifier to ignore comments
Regex r = new Regex(pat, "x");
// get the list of group numbers
int[] gnums = r.GetGroupNumbers();
// get first match
Match m = r.Match(text);
while (m.Success)
{
// start at group 1
for (int i = 1; i < gnums.Length; i++)
{
Group g = m.Group(gnums[i]);
// get the group for this match
Console.WriteLine("Group"+gnums[i]+"=["+g.ToString()+"]");
// get caps for this group
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
Console.WriteLine("	Capture" + j + "=["+c.ToString()
+ "] Index=" + c.Index + " Length=" + c.Length);
}
}
// get next match
m = m.NextMatch();
}

the output of this sample would be:

Group1=[abra]
Capture0=[abracad] Index=0 Length=7
Capture1=[abra] Index=7 Length=4
Group2=[cad]
Capture0=[cad] Index=4 Length=3
Group1=[abra]
Capture0=[abracad] Index=12 Length=7
Capture1=[abra] Index=19 Length=4
Group2=[cad]
Capture0=[cad] Index=16 Length=3
Group1=[abra]
Capture0=[abracad] Index=24 Length=7
Capture1=[abra] Index=31 Length=4
Group2=[cad]
Capture0=[cad] Index=28 Length=3

Let's start by examining the string pat, which contains the regular expression. The first capture group is marked by the first parenthesis, and then the expression will match an abra, if the regex engine matches the expression to that which is found in the text. Then the second capture group, marked by the second parenthesis, begins, but the definition of the first capture group is still ongoing. What this tells us is that the first group must match abracad and the second group would just match the cad. So, if you decide to make the cad match an optional occurrence with the ? metacharacter, then abra or abracad will be matched. Next, you end the first group, and ask the expression to match 1 or more occurrences by specifying the + metacharacter.

Now let's examine what happens during the matching process. First, create an instance of the expression by calling the Regex constructor, which is also where you specify your options. In this case, I'm using the x option, as I have included comments in the regular expression itself, and some whitespace for formatting purposes. By turning on the x option, the expression will ignore the comments, and all whitespace that I have not explicitly escaped.

Next, get the list of group numbers (gnums) defined in this regular expression. You could also have used these numbers explicitly, but this provides you with a programmatic method. This method is also useful if you have specified named groups, as a way of quickly indexing through the set of groups.

Next, perform the first match. Then enter a loop testing for success of the current match. The next step is to iterate through the list of groups starting at group 1. The reason you do not use group 0 in this sample is that group 0 is the fully captured match string, and what you usually (but not always) want to pick out of a string is a subgroup. You might use group 0 if you wanted to collect the fully matched string as a single string.

Within each group, iterate through the CaptureCollection. There is usually only one capture per match, per group, but in this case, for Group1, two captures show: Capture0 and Capture1. And if you had asked for only the ToString of Group1, you would have received abra, although it also did match the abracad. The group ToString value will be the value of the last Capture in its CaptureCollection. This is the expected behavior, and if you want the match to stop after just the abra, you would remove the + from the expression, telling the regex engine to match on just the expression.

Procedural-Based vs. Expression-Based

Generally, the users of regular expressions will tend to fall into one of two groups.

The first group tends to use minimal regular expressions that provide matching or grouping behaviors, and then write procedural code to perform some iterative behavior.

The second group tries to utilize the maximum power and functionality of the expression-processing engine itself, with as little procedural logic as possible.

For most of us, the best answer is somewhere in between, and I hope this article outlines both the capabilities of the .NET regexp classes, as well as the trade-offs in complexity and performance of the solution.

Procedural-Based Patterns

A common processing need is to match certain parts of a string and perform some processing. So, here's an example that matches words within a string and capitalizes them:

string text = "the quick red fox jumped over the lazy brown dog.";
System.Console.WriteLine("text=[" + text + "]");
string result = "";
string pattern = @"\w+|\W+";
foreach (Match m in Regex.Matches(text, pattern))
{
// get the matched string
string x = m.ToString();
// if the first char is lower case
if (char.IsLower(x[0]))
// capitalize it
x = char.ToUpper(x[0]) + x.Substring(1, x.Length-1);
// collect all text
result += x;
}
System.Console.WriteLine("result=[" + result + "]");

As you can see, you use the C# foreach statement to process the set of matches found, and perform some processing. In this case, creating a new result string.

The output of the sample is:

text=[the quick red fox jumped over the lazy brown dog.]
result=[The Quick Red Fox Jumped Over The Lazy Brown Dog.]

Expression-Based Patterns

Another way to implement the above example is by providing a MatchEvaluator, which will process it as a single result set.

So the new sample looks like:

  static string CapText(Match m)
{
// get the matched string
string x = m.ToString();
// if the first char is lower case
if (char.IsLower(x[0]))
// capitalize it
return char.ToUpper(x[0]) + x.Substring(1, x.Length-1);
return x;
}
static void Main()
{
string text = "the quick red fox jumped over the
lazy brown dog.";
System.Console.WriteLine("text=[" + text + "]");
string pattern = @"\w+";
string result = Regex.Replace(text, pattern,
new MatchEvaluator(Test.CapText));
System.Console.WriteLine("result=[" + result + "]");
}

Also of note is that the pattern was simplified since I only needed to modify the words and not the non-words.

Cookbook Expressions

To wrap up this overview of how regular expressions are used in the C# environment, I'll leave you with a set of useful expressions that have been used in other environments. I got them from a great book, the Perl Cookbook, by Tom Christiansen and Nathan Torkington, and updated them for C# programmers. I hope you find them useful.

Roman Numbers

string p1 = "^m*(d?c{0,3}|c[dm])"
+ "(l?x{0,3}|x[lc])(v?i{0,3}|i[vx])$";
string t1 = "vii";
Match m1 = Regex.Match(t1, p1);

Swapping First Two Words

string t2 = "the quick brown fox";
string p2 = @"(\S+)(\s+)(\S+)";
Regex x2 = new Regex(p2);
string r2 = x2.Replace(t2, "$3$2$1", 1);

Keyword = Value

string t3 = "myval = 3";
string p3 = @"(\w+)\s*=\s*(.*)\s*$";
Match m3 = Regex.Match(t3, p3);

Line of at Least 80 Characters

string t4 = "********************"
+ "******************************"
+ "******************************";
string p4 = ".{80,}";
Match m4 = Regex.Match(t4, p4);

MM/DD/YY HH:MM:SS

string t5 = "01/01/01 16:10:01";
string p5 =
@"(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)";
Match m5 = Regex.Match(t5, p5);

Changing Directories (for Windows)

string t6 =
@"C:\Documents and Settings\user1\Desktop\";
string r6 = Regex.Replace(t6,
@"\\user1\\",
@"\\user2\\");

Expanding (%nn) Hex Escapes

string t7 = "%41"; // capital A
string p7 = "%([0-9A-Fa-f][0-9A-Fa-f])";
// uses a MatchEvaluator delegate
string r7 = Regex.Replace(t7, p7,
HexConvert);

Deleting C Comments (Imperfectly)

string t8 = @"
/*
* this is an old cstyle comment block
*/
";
string p8 = @"
/\*  # match the opening delimiter
.*?	 # match a minimal numer of chracters
\*/	 # match the closing delimiter
";
string r8 = Regex.Replace(t8, p8, "", "xs");

Removing Leading and Trailing Whitespace

string t9a = "   leading";
string p9a = @"^\s+";
string r9a = Regex.Replace(t9a, p9a, "");
string t9b = "trailing  ";
string p9b = @"\s+$";
string r9b = Regex.Replace(t9b, p9b, "");

Turning '\' Followed by 'n' Into a Real Newline

string t10 = @"\ntest\n";
string r10 = Regex.Replace(t10, @"\\n", "\n");

IP Address

string t11 = "55.54.53.52";
string p11 = "^" +
@"([01]?\d\d|2[0-4]\d|25[0-5])\." +
@"([01]?\d\d|2[0-4]\d|25[0-5])\." +
@"([01]?\d\d|2[0-4]\d|25[0-5])\." +
@"([01]?\d\d|2[0-4]\d|25[0-5])" +
"$";
Match m11 = Regex.Match(t11, p11);

Removing Leading Path from Filename

string t12 = @"c:\file.txt";
string p12 = @"^.*\\";
string r12 = Regex.Replace(t12, p12, "");

Joining Lines in Multiline Strings

string t13 = @"this is
a split line";
string p13 = @"\s*\r?\n\s*";
string r13 = Regex.Replace(t13, p13, " ");

Extracting All Numbers from a String

string t14 = @"
test 1
test 2.3
test 47
";
string p14 = @"(\d+\.?\d*|\.\d+)";
MatchCollection mc14 = Regex.Matches(t14, p14);

Finding All Caps Words

string t15 = "This IS a Test OF ALL Caps";
string p15 = @"(\b[^\Wa-z0-9_]+\b)";
MatchCollection mc15 = Regex.Matches(t15, p15);

Finding All Lowercase Words

string t16 = "This is A Test of lowercase";
string p16 = @"(\b[^\WA-Z0-9_]+\b)";
MatchCollection mc16 = Regex.Matches(t16, p16);

Finding All Initial Caps

string t17 = "This is A Test of Initial Caps";
string p17 = @"(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)";
MatchCollection mc17 = Regex.Matches(t17, p17);

Finding Links in Simple HTML

string t18 = @"
<html>
<a href=""first.htm"">first tag text</a>
<a href=""next.htm"">next tag text</a>
</html>
";
string p18 = @"<A[^>]*?HREF\s*=\s*[""']?"
+ @"([^'"" >]+?)[ '""]?>";
MatchCollection mc18 = Regex.Matches(t18, p18, "si");

Finding Middle Initial

string t19 = "Hanley A. Strappman";
string p19 = @"^\S+\s+(\S)\S*\s+\S";
Match m19 = Regex.Match(t19, p19);

Changing Inch Marks to Quotes

string t20 = @"2' 2"" ";
string p20 = "\"([^\"]*)";
string r20 = Regex.Replace(t20, p20, "``$1''");
posted @ 2008-12-26 15:34 Kevan 阅读(310) 评论(0) 编辑
有時後在網路上找到一些 JavaScript 的程式片段或函數庫,但是排版很醜,不容易閱讀,所以我就找了幾個好用的 JavaScript 格式化工具:

除此之外,我還在 Wikipedia 上面找到 Prettyprint 的文件,列出更多格式化原始碼相關的參考資料,有興趣的可以去看看。

posted @ 2008-11-19 20:56 Kevan 阅读(1136) 评论(0) 编辑

URL REWRITE 如下的地址

http://127.0.0.1/playflash/1_2_3_4_*****.html

或者

http://127.0.0.1/playflash/1/2/3/4/5/*****/

当这个 站点的URL "/playflash/" 开始算起 达到 260 的 length 后,IIS会报出如下错误:

Bad Request (Invalid URL)

经查GG之后,按照帮助

http://support.microsoft.com/kb/820129

修改了 HTTP.SYS 的注册表配置,重新启动服务后还是失效!

然后再仔细翻阅GG的资料、、、原来这个

 

UrlSegmentMaxLength 260 0 - 32,766 字符 URL 路径段(URL 中斜杠之间的部分)中的最大字符数。如果为零,则其长度受 ULONG 最大值的限制。

 

设置最大值后,在ASP.NET下不起作用,还是260限制的!

请问各位是怎么解决 ASP.NET REWRITE URL的长度?

我想问的是 ASP.NET 的配置有 max request url 的设置吗?

然后我再查了GG资料。。。得出的是 IIS 7 可以配置、、、

那 IIS 6 呢??? 没找到解决方案。。。所以发到首页来求助一下!

 

已经解决。HTTP 貌似遵循的是 windows 目录结构,路径最大260字符!

 

Windows 对长路径名文件的限制

众所周知,微软的文件系统经历了 fat->fat32->NTFS 的技术变革。且不论安全和文件组织方式上的革新,单就文件名而言,已经从古老的 DOS 8.3 文件格式(仅支持最长 8 个字符的文件名和 3 个字符的后缀名)转变为可以支持长达 255 个字符的文件名。而对于路径长度,NTFS 也已经支持长达 32768 个字符的路径名。

然而,Windows 操作系统并没有完全放开路径名长度的限制,在 windef.h 中,可以找到如下的宏:

#define MAX_PATH 260

事实上,所有的 Windows API 都遵循这个限制。因此,每当我们试图更改某一文件的文件名时,当输入的文件名长度 ( 全路径 ) 到达一定限度时,虽然文件名本身还未达到 255 个字符的限制,但是任何输入将不再被接受,这其实正是由于操作系统不允许 260 个字符(byte)的文件全路径。

实际应用中,这种 260 个字符的全路径的限制给应用开发带来了很大的不便。试想如下应用:我们希望给应用服务器增加一个本地 cache 的功能,该功能可以把远程服务器上的文件留下一个本地的副本。一个合理的实现可以把 url 映射为文件名,当 url 很长时,cache 文件的长度也会很长。当文件名长度超过 255,我们可以把映射文件名的前 255 个字符作为目录名称。但是,我们仍然无法解决 260 个字符的全路径限制。另外,如果一个应用软件的目录结构过深,很容易出现某些文件名长度(含路径)超过 260 个字符,并因此造成安装或删除的失败。总而言之,该限制给我们的开发测试工作带来了诸多不便。

 

 

posted @ 2008-10-25 16:22 Kevan 阅读(976) 评论(4) 编辑

岗位职责:负责7K7K.com社区程序模块开发

岗位要求:

1.精通.NET、SQL Server数据库
2.有团队合作精神
3.有责任心、做事认真;具备高效的执行力
4.有社区开发经验这者优先
5.有大型网站程序开发两年以上工作经验优先
6.有面向对象系统架构经验者优先

 

------联系方式如下

 

联系地址:海淀区海淀大街38号银科大厦1017号
联系人:刘学
联系电话:13671177255

EMAIL:lvaichun@gmail.com

 

------具体待遇需要面议才能确定,如果您是有实力的~欢迎应聘。

posted @ 2008-07-28 12:48 Kevan 阅读(1180) 评论(17) 编辑

迁移数据,可以复制自动编号啦!
SET   IDENTITY_INSERT  DATA   ON 
 INSERT   INTO   DATA(ID, A1, A2)   SELECT ID, UserName, Email   FROM OLDDATA


A.   重命名表  
  下例将表   customers   重命名为   custs。  
   
  EXEC   sp_rename   'customers',   'custs'   
    
B.   重命名列  
  下例将表   customers   中的列   contact   title   重命名为   title。  
   
  EXEC   sp_rename   'customers.[contact   title]',   'title',   'COLUMN'   
 

C.删除列

ALTER TABLE 表名 DROP COLUMN 列名

D.添加列

ALTER TABLE  表名  ADD  列名   VARCHAR(20) NULL" 

---------以下为(创建新列属性)然后(复制旧列数据)后再(删除旧列名)---------------------------------

--添加列  
  alter   table   table_A   add   column_C   varchar(20)  
  go  
  --添加数据  
  update   table_A   set   column_C=column_B  
  go  
  --添加默认  
  alter     table     table_A     add     constraint     myColumnDef     default     'c'     for     column_C  
  go  
  --删除旧默认  
  declare   @name   varchar(20)  
  select   @name=b.name   from   syscolumns   a,sysobjects   b   where   a.id=object_id('tablename')   and   b.id=a.cdefault   and   a.name='column_B'   and   b.name   like   'DF%'  
  exec('alter   table   table_A     drop   constraint   '+@name)    
  go  
  --删除旧列  
  alter   table   table_A     drop   column   column_B  
  go

----------------以下为删除带默认值的列名-------------

declare @name varchar(20)

select @name = b.name from sysobjects b join syscolumns a on

b.id = a.cdefault where a.id = object_id('table_name') and a.name = 'column_name'

exec('alter table test drop constraint '+@name)

去掉列的默认值后(其实是去掉列约束),再执行alter table table_name drop column column_name 语句,列被删除了。当然也可以一起执行整个语句。

posted @ 2008-06-27 14:21 Kevan 阅读(294) 评论(0) 编辑
posted @ 2008-06-26 13:51 Kevan 阅读(561) 评论(0) 编辑
posted @ 2008-06-24 10:45 Kevan 阅读(1938) 评论(1) 编辑
posted @ 2008-06-20 22:36 Kevan 阅读(1552) 评论(4) 编辑
posted @ 2008-06-18 16:35 Kevan 阅读(386) 评论(0) 编辑
posted @ 2008-06-18 15:39 Kevan 阅读(152) 评论(0) 编辑