String

String类是我们最常用的类之一。

本篇是以1.6版本来介绍的。

我们首先来了解一下String特殊点：

1 public final class String implements java.io.Serializable, Comparable<String>, CharSequence

我们可以看到String是用finale修饰的，所以String是不可以继承的。

其中的所有方法自然而然也就全都是final的。

注意：final修饰的类下的方法虽然都是final的，但是这个类下的成员变量并不一定是final的，这个很多人都会认为只要是final类，成员变量也一定就是final修饰的。

String定义的字符串是不可变得。

众所周知，它是用来定义字符串的类。但是字符串在java的底层结构中又是什么呢？

首先我们来了解java1.6中的5个常量和1个变量。（这些常量或变量是jdk1.6里面的，到了1.7以后常量就被缩减为了4个）

value数组

1 private final char value[];

The value is used for character storage.　　这是源码中的注释，很明确的告诉了你：这个value是用字符型来存储的。

java中定义的字符串的形式其实都是用这个char数组的形式来存储的。例如：

1 String str = "bj";

以上一段代码在String中存储是这样的：

　　　　index： 0　　 1

　　　　value：'b'　　'j'

这个char类型的数组常量决定了String是不可变字符串的特性。为什么这么说呢？

首先他是以private修饰的，那么其他类是不可以访问的，并且在String中也没有提供任何关于value[]这个数组public的get，set方法。

再看，他还用了final修饰，final这个关键字表明了它是最终的，不可变的！！！

这两个修饰符首先你访问不到，其次还是不可变的。。。这就相当于这是他的镇家之宝，不仅不给你看，自己还不能改变，改变了那就是对不起祖宗。。。那么你还怎么来访问和改变他呢？

offset

private final int offset;

The offset is the first index of the storage that is used.　　---这个offset是用来存储第一个索引的值的

在执行代码的时候可以看到他的值：

这个值表示了value数组的索引是从几开始的。

这个常量也没有get，set方法

这个特性在1.7以后被移除了。

在1.6中offset这个参数在使用中作用也不是太大，大部分情况offset是以“offset+**”这种方式才会用到它，然而几乎所有的String定义开始都是从0开始的，所以这个参数变得可有可无

在1.7以后所有的offset要不就是直接用“**”这种方式来替代，要不就直接赋值为0这样来处理了

count

1 private final int count;

The count is the number of characters in the String.　　---这个count是计算String字符串中的字符数的

在以上的那个截图中我们可以看到count这个属性的值是2，他计算的是“aa”这个字符串在value数组中字符的个数。

这个常量没有get，set方法

这个特性在1.7以后被移除了。

在1.7以后count参数用value.lenth这样的方式替代了。

hash

1 private int hash;　　// Default to 0

Cache the hash code for the string.　　---缓存字符串的hash代码

这个并没有final，他只是一个私有的变量。。。。。终于有一个可以改变的了，String好吝啬呀。。。

他的作用就是获得这个String定义的字符串的内容对应的hash代码，但是后边还有加了一句注解，指明了他默认情况下是0，那么他都默认是0了，还有什么用呢？

虽说他理论上是可以变得，但是它并没有get，set方法，那么他的值是怎么来的呢？

他的值是依赖于hashcode()这个方法的，这个一会再说。。

serialVersionUID

1 private static final long serialVersionUID = -6849794470754667710L;

use serialVersionUID from JDK 1.0.2 for interoperability　　---使用serialVersionUID为JDK 1.0.2提供互操作性

这个常量是用来序列化和反序列化控制的，具体怎么用，这里就不赘述了，大家有兴趣可以百度一下。。。

serialPersistentFields

1 private static final ObjectStreamField[] serialPersistentFields =
2      new ObjectStreamField[0];

Class String is special cased within the Serialization Stream Protocol.　　---String类是特殊包装在序列化流协议中的。

A String instance is written initially into an ObjectOutputStream in the following format:　　---一个String实例最初书写是生成ObjectOutputStream，格式如下：
<pre>
<code>TC_STRING</code> (utf String)

</pre>

The String is written by method <code>DataOutput.writeUTF</code>　　---这个String是通过DataOutPut接口中的writeUTF这个方法书写的

A new handle is generated to refer to all future references to the string instance within the stream.　　---一个新的操作是参考以后所有的引用去生成流内的字符串实例

ObjectStreamField 类为序列化机制提供序列化对象的成员变量以及成员变量的类型。在这里我们不深究它。。。

这个常量也是与序列化有关的。。。

接下来就是一些String的构造方法

这些String的构造方法，不建议大家去深究。。。

在源码中其实最重要的就是无参构造，因为这关系到初始化的时候这个类是怎么样的。。。

1 public String() {
2     this.offset = 0;
3     this.count = 0;
4     this.value = new char[0];
5 }

String的无参构造指定了offset，count，value这三个常量的初始状态。

checkBounds()

在String的源码中这个方法是对于String的构造方法来说是一个很重要的方法。

1 private static void checkBounds(byte[] bytes, int offset, int length) {
2     if (length < 0)
3         throw new StringIndexOutOfBoundsException(length);
4     if (offset < 0)
5         throw new StringIndexOutOfBoundsException(offset);
6     if (offset > bytes.length - length)
7         throw new StringIndexOutOfBoundsException(offset + length);
8 }

Common private utility method used to bounds check the byte array and requested offset & length values used by the String(byte[],..) constructors.

---常见的private实用方法，用来检测这个byte数组的边界和String(byte[],...)构造方法中的offset&length的要求。

其实这个函数不看注解只看代码和方法名也能明白他是干什么用的。。。

这个方法只在String的构造方法中调用它，并且在String的构造方法中并不是每个都会调用它的。只要知道这个方法是用来检测边界用的就可以了。

length()

1 public int length() {
2     return count;
3 }

Returns the length of this string.　　---返回这个字符串的长度

这个方法的作用就是获得字符串的长度。

例子：

1 public class Test {
2     public static void main(String[] args) {
3         String str = "";
4         String str1 = "ab";
5         System.out.println(str.length());    //0
6                 System.out.println(str1.length());    //2
7     }
8 }

isEmpty()

1 public boolean isEmpty() {
2     return count == 0;
3 }

这个方法是用来判断String中的内容是否为空的

例子：

1 public class Test {
2     public static void main(String[] args) {
3         String str = "";
4         String str1 = "aa";
5         System.out.println(str.isEmpty());    //打印输出true
6         System.out.println(str1.isEmpty());    //打印输入false
7     }
8 }

charAt(int index)

1 public char charAt(int index) {
2     if ((index < 0) || (index >= count)) {
3         throw new StringIndexOutOfBoundsException(index);
4     }
5     return value[index + offset];
6 }

Returns the <code>char</code> value at the specified index.　　---返回这个指定索引的字符值

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         String str = "";
 4         String str1 = "ab";
 5         System.out.println(str1.charAt(0));    //a
 6         System.out.println(str1.charAt(1));    //b
 7         /**
 8          * 这句会报数组下标越界：因为在charAt()方法有以下一段代码
 9          *  if ((index < 0) || (index >= count)) {
10          *   throw new StringIndexOutOfBoundsException(index);
11          *  }
12          */
13         System.out.println(str.charAt(0));   
14     }
15 }

codePointAt(int index)

1 public int codePointAt(int index) {
2     if ((index < 0) || (index >= count)) {
3         throw new StringIndexOutOfBoundsException(index);
4     }
5     return Character.codePointAtImpl(value, offset + index, offset + count);
6 }

Returns the character (Unicode code point) at the specified index.　　---返回指定索引的字符（Unicode码的位置）。

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         String str = "";
 4         String str1 = "ab";
 5         System.out.println(str1.codePointAt(0));    //97
 6         System.out.println(str1.codePointAt(1));    //98
 7         /**
 8          * 这句会报数组下标越界：因为在charAt()方法有以下一段代码
 9          *  if ((index < 0) || (index >= count)) {
10          *   throw new StringIndexOutOfBoundsException(index);
11          *  }
12          */
13         System.out.println(str.codePointAt(0));
14     }
15 }

codePointBefore(int index)

1 public int codePointBefore(int index) {
2 　　int i = index - 1;
3     if ((i < 0) || (i >= count)) {
4         throw new StringIndexOutOfBoundsException(index);
5     }
6     return Character.codePointBeforeImpl(value, offset + index, offset);
7 }

Returns the character (Unicode code point) before the specified index.　　---返回指定索引的前一个字符（Unicode码的位置）

为了确保能够将String类定义的字符串中的所有的字符都找到，在方法的第一行这样写：int i = index - 1;

可能大家还不太明白，看一下下面这个简单的例子就非常清楚了。。。

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         String str = "abc";
 4         System.out.println(str.codePointBefore(1));//97
 5         System.out.println(str.codePointBefore(2));//98
 6         System.out.println(str.codePointBefore(3));//99
 7         System.out.println(str.codePointBefore(4));//报错
 8         System.out.println(str.codePointBefore(0));//报错
 9     }
10 }

这样大家就清楚了吧，你想想，如果你将int i = index - 1改为int i = index，那么str中的“c”怎么拿呢？这不就不合理了吗！！写源码的大牛们怎么会让这样的情况出现呢。。。

codePointCount(int beginIndex, int endIndex)

1 public int codePointCount(int beginIndex, int endIndex) {
2     if (beginIndex < 0 || endIndex > count || beginIndex > endIndex) {
3         throw new IndexOutOfBoundsException();
4     }
5     return Character.codePointCountImpl(value, offset+beginIndex, endIndex-beginIndex);
6 }

这个方法需要留意一下，在1.5以后java引入了Unicode增补（增补码），这种字符的长度大于2B，需要使用两个char型变量来表示，所以用单个char类型的数组并不能来很好的表示。

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         String s=String.valueOf(Character.toChars(0x2F81A));
 4         System.out.println(s);
 5         char[]chars=s.toCharArray();
 6         for(char c:chars){
 7             System.out.format("%x",(short)c);
 8         }
 9     }
10 }

这段代码的输出如下：

我们从控制台可以看到，s的输出并没有什么不对。。。

但是我们在第4行打断点进入debug的时候呢？我们看到了什么。。。

count的大小是2，value数组里面也是两个字符，这就是java对于增补字符的存储方式，虽然在控制台输出并没有什么不同

我们再来看一下这段代码：

1 public class Test {
2     public static void main(String[] args) {
3         String s=String.valueOf(Character.toChars(0x2F81A));
4         System.out.println(s);
5         System.out.println(s.length());    //2
6         System.out.println(s.codePointCount(0,1));    //1
7         System.out.println(s.codePointCount(0,2));    //1
8     }
9 }

从输出情况来看，我们是不是突然间就人事到了codePointCount(int beginIndex, int endIndex)方法的作用了呢？

没错，这个方法就是为了用来表示1.5之后加入的增补码的长度无法很好的表示而应运而生的。。。

offsetByCodePoints(int index, int codePointOffset)

1 public int offsetByCodePoints(int index, int codePointOffset) {
2     if (index < 0 || index > count) {
3         throw new IndexOutOfBoundsException();
4     }
5     return Character.offsetByCodePointsImpl(value, offset, count,
6                     offset+index, codePointOffset) - offset;
7 }

Returns the index within this <code>String</code> that is offset from the given <code>index</code> by <code>codePointOffset</code> code points.

---返回从给定String的index处偏移codePointOffset个代码点的索引。

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         String str = "煜ބ屮";
 4         System.out.println(str.offsetByCodePoints(0,1));    //1
 5         System.out.println(str.offsetByCodePoints(0,2));    //2
 6         System.out.println(str.offsetByCodePoints(1,2));    //3
 7         String s=String.valueOf(Character.toChars(0x2F81A));
 8         System.out.println(s.offsetByCodePoints(0,1));    //2
 9         System.out.println(s.offsetByCodePoints(0,2));    //报错
10     }
11 }

我们依旧是以增补码为例来做测试，可以看到我们打印输出的还是2，由此更加验证了增补码存储的value数组是两个char组成的。

getChars(char dst[], int dstBegin)

1 void getChars(char dst[], int dstBegin) {
2     System.arraycopy(value, offset, dst, dstBegin, count);
3 }

Copy characters from this string into dst starting at dstBegin.　　---从字符串中复制字符到从dstBegin开始的dst中

This method doesn't perform any range checking.　　---这个方法不执行任何检查范围

大家稍微留意一下，就会发现这个方法没有任何修饰符，所以默认是default，所以我们现在我们自己定义的包中来用这个方法是不可能的。

也可以说这个方法只在lang包下的类可以使用。可以对外提供的是下面的这个重载方法。

getChars(int srcBegin, int srcEnd, char dst[], int dstBegin)

 1 public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
 2     if (srcBegin < 0) {
 3         throw new StringIndexOutOfBoundsException(srcBegin);
 4     }
 5     if (srcEnd > count) {
 6         throw new StringIndexOutOfBoundsException(srcEnd);
 7     }
 8     if (srcBegin > srcEnd) {
 9         throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
10     }
11     System.arraycopy(value, offset + srcBegin, dst, dstBegin,
12          srcEnd - srcBegin);
13 }

Copies characters from this string into the destination character array　　---复制String中的字符到目标字符数组。

srcBegin：要复制字符串的开始索引

srcEnd：要复制字符串的结束索引

dst[]：复制目标字符数组

dstBegin：复制的字符串从目标数组中哪个索引开始复制

这其中的注意点也就是源码中的三个if条件。

例子：

1 public class Test {
2     public static void main(String[] args) {
3         char value[]={'c', ' ', ' ', ' ',' ', ' ', ' ', ' ',' '};
4         System.out.println(value.length);    //9
5         String str = "abc";
6         str.getChars(0, 3, value, 5);
7         System.out.println(value.length);    //9
8     }
9 }

在第四行value数组中的字符是：

在第7行value数组中的字符是：

不管是从控制台打印的length属性还是debug显示出来的属性，都可以看到getChars()这个方法并没有改变数组的长度。

getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) && getBytes(String charsetName) && getBytes(Charset charset) && getBytes()

接下来是getBytes()的4个重载方法，其中getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin)加入了@Deprecated注解，表示废弃了的，就不做介绍了

public byte[] getBytes(String charsetName) throws UnsupportedEncodingException {
    if (charsetName == null) 
        throw new NullPointerException();
    return StringCoding.encode(charsetName, value, offset, count);
}

Encodes this {@code String} into a sequence of bytes using the named charset, storing the result into a new byte array.

---使用指定的字符集编码将这个字符串转换为一个字节序列，将结果存储到一个新的byte数组中。

这里“指定的”指的是UTF-8，GBK，ISO8859-1等一些编码格式

例子：

 1 public class Test {
 2     public static void main(String[] args) {
 3         try {
 4             byte[] b_gbk = "中".getBytes("GBK");
 5             byte[] b_utf8= "中".getBytes("UTF-8");
 6             byte[] b_iso88591 = "中".getBytes("ISO8859-1");
 7             System.out.println(b_gbk.length);    //2
 8             System.out.println(b_utf8.length);    //3
 9             System.out.println(b_iso88591.length);    //1
10         } catch (UnsupportedEncodingException e) {
11             e.printStackTrace();
12         }
13     }
14 }

从这个例子我们可以看出，不同的编码转换成的byte数组也不同。

我们再看下面这个例子，你会发现什么呢？

 1 public class Test {
 2     public static void main(String[] args) {
 3         try {
 4             byte[] b_gbk = "中".getBytes("GBK");
 5             byte[] b_utf8 = "中".getBytes("UTF-8");
 6             byte[] b_iso88591 = "中".getBytes("ISO8859-1"); 
 7             String s_gbk = new String(b_gbk,"GBK");
 8             String s_utf8 = new String(b_utf8,"UTF-8");
 9             String s_iso88591 = new String(b_iso88591,"ISO8859-1"); 
10             System.out.println(s_gbk);    //中
11             System.out.println(s_utf8);    //中
12             System.out.println(s_iso88591);    //?
13         } catch (UnsupportedEncodingException e) {
14             e.printStackTrace();
15         }
16     }
17 }

在上面一段代码中，可以通过new String(byte[], decode)的方式来还原这个“中”字，这个new String(byte[], decode)实际是使用decode指定的编码来将byte[]解析成字符串。

在第12行的时候打印输出的是“？”，这是为什么呢？没错，就是因为ISO8859-1编码中并不包含汉字，所以用ISO8859-1来编码的时候，他不认识，所以就给了你一个“？”，在询问你，你给的我这是什么玩意呀，大爷我不认识。

public byte[] getBytes(Charset charset) {
    if (charset == null) throw new NullPointerException();
    return StringCoding.encode(charset, value, offset, count);
}

这个方法中只要注意一点：Charset并不是指char的封装类，而是java.nio.charset.Charset这个类

1 public byte[] getBytes() {
2     return StringCoding.encode(value, offset, count);
3 }

Encodes this {@code String} into a sequence of bytes using the platform's default charset, storing the result into a new byte array.　　

---使用平台的默认字符集编码将这个字符串转换为一个字节序列，将结果存储到一个新的byte数组中。

例子：

1 public class Test {
2     public static void main(String[] args) {
3         String s=String.valueOf(Character.toChars(0x2F81A));
4         String str = "123a";
5         str = str+s;
6         System.out.println(str.getBytes());    //这里输出的是一个地址
7     }
8 }

equals(Object anObject)

 1 public boolean equals(Object anObject) {
 2     if (this == anObject) {
 3         return true;
 4     }
 5     if (anObject instanceof String) {
 6         String anotherString = (String)anObject;
 7         int n = count;
 8         if (n == anotherString.count) {
 9         char v1[] = value;
10         char v2[] = anotherString.value;
11         int i = offset;
12         int j = anotherString.offset;
13         while (n-- != 0) {
14             if (v1[i++] != v2[j++])
15             return false;
16         }
17         return true;
18         }
19     }
20     return false;
21 }

Compares this string to the specified object.　　---用这个字符串与指定对象比较

我们可以看到，这个方法首先判断的引用是否相同，你想呀，拿你自己与自己比较，能不一样吗。。。肯定返回的是true呀

如果引用不同，那么就会比较内容。

例子：

 1 package com.test.string;
 2 
 3 import java.io.UnsupportedEncodingException;
 4 
 5 public class Test {
 6     public static void main(String[] args) {
 7         String str = "aaa";
 8         String str2 = "aaa";
 9         String str3 = "bbb";
10         System.out.println(str.equals(str));    //true    
11         System.out.println(str.equals(str2));    //true
12         System.out.println(str.equals(str3));    //false
13     }
14 }

posted on 2015-04-24 18:33 00醉酒00 阅读(195) 评论(0) 收藏举报

刷新页面返回顶部

String

导航

公告