Android研发中对String的思考（源代码分析）

1、经常使用创建方式思考：

String text = "this is a test text ";

上面这一句话实际上是运行了三件事
1、声明变量 String text;
2、在内存中开辟空间（内存空间一）
3、将变量的text的引用指向开辟的内存空间

当有

text = "this is a change text";

这一句话运行了两件事
1、在内存中开辟空间
2、将变量text 的引用指向新开辟的内存空间
3、内存空间一此时依旧存在，这就是说明了String的不可变性

測试实践一：

String text = "oo";
 //循环体系将字符串拼接
for (int i = 0; i < 90000; i++) {
      text+=i;
}

这段小程序在我的应用中运行了8s的时间，太长了，原因非常easy。就是不断的在反复创建新的对象与内存空间。同一时候还要不时的释放未使用的内存空间

測试实践二：

String text = "oo";
//创建字符缓冲区
StringBuilder builder = new StringBuilder(text);
//循环体系将字符串拼接
for (int i = 0; i < 100000; i++) {
      builder.append(i);
 }

运行这一小段代码，运行的次数远远超于实践一中。然而时间仅仅使用了 30 ms，大优化了性能，原因仅仅是由于其在仅仅是开辟了一个缓存空间，每次仅仅把新的数据加入到缓存中。

2、创建String 的构造简析

2.1、String string = new String ();

这个方案是创建了一个空的String，在内存空间中开辟了一个空内容的地址，实际上也没有什么用，源代码中有这种描写叙述

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

尽管是空的String（是""而不是null。“”出是占有内存空间，仅仅只是内容是空的）。但在其构造方法中依旧调用了value方法。也就是空字符串的value,
所谓value

 /** The value is used for character storage. */
    private final char value[];

2.2、String string = new String (" test ");

这样的方法来创建String，与我们经常使用创建方式分析中的思路一至，声明变量，开辟空间。赋值引用

源代码中这样操作

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

毫无疑问，它的构造方法中首先将其转为了相应的字节数组中的数据。
同一时候获取其相应的Hash值

 /** Cache the hash code for the string */
    private int hash; // Default to 0

2.3、将char数组中的内容构造为String类型

 char[] chars = new char[]{"q","e","e"，"q","e","e"};
        //将char数组中的内容构造为String类型
        String string = new String(chars);

        //构造的字符串为 qeeqee

这里是依赖字符数组来创建构造String 。从源代码的角度来看，毫无疑问在其相应的构造中首先将其转为了相应字符数组

 public String(char value[]) {
            this.value = Arrays.copyOf(value, value.length);
        }

能够看到这里并没有直接通过 Arrays的copyOf创建了一个新的字符数组，并赋值于this.value,

public static char[] copyOf(char[] original, int newLength) {
            char[] copy = new char[newLength];
            System.arraycopy(original, 0, copy, 0,
                            Math.min(original.length, newLength));
            return copy;
        }

2.4、将字符数组中的部分数据构造为String

 char[] chars = new char[]{"q","e","e"，"q","e","e"};
        //构造字符串
        String string = new String(chars,0,2);

        //构造的字符串为 qe 
        //也就是说将字符数组中的部分字符 构造成String

从其构造源代码来看：

public String(char value[], int offset, int count) {
            if (offset < 0) {
                throw new StringIndexOutOfBoundsException(offset);
            }
             if (count <= 0) {
                 if (count < 0) {
                    throw new StringIndexOutOfBoundsException(count);
                }
            if (offset <= value.length) {
                    this.value = "".value;
                    return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
                throw new StringIndexOutOfBoundsException(offset + count);
            }
            this.value = Arrays.copyOfRange(value, offset, offset+count);
        }

也就是说当我们通过这样的方法来构造String 的时候，传入的參数中构造字符起始位置 offset小于零。那么将会出异常，构造字符个数 count 小于0，

那么意味取出的字符数组中的长度为0，也就是构造一个 ""字符串，同一时候赋值其字符数组。

当构造起始位置字符个娄都不小于0的时候，当起始位置与取出的字符长度设置不合理（所谓不合理指的是例字符数组长度为 5，

构造字符串的时候传入的起始位置为3 。构造长度为5，也就是说在字符数组中从3号位置取值，向后延取5个值，原字符数组长度必定不够）的时候，

抛出异常

当传入的參数合理的时候。则通过Arrays的copyOfRange方法创建一个新的字符数组空间，并赋值 this.value

public static char[] copyOfRange(char[] original, int from, int to) {
            int newLength = to - from;
            if (newLength < 0)
                throw new IllegalArgumentException(from + " > " + to);
            char[] copy = new char[newLength];
            System.arraycopy(original, from, copy, 0,
                            Math.min(original.length - from, newLength));
            return copy;
        }

2.6、通过 unicode数组来构造String

int[] charIntArray = new int[]{67,68,69,70};

String  string = new String (charIntArray,0,charIntArray.length);

//相应的字符串 CEDF

从源代码角度来看：（详细的关于unicode将在下文中简述）

public String(int[] codePoints, int offset, int count) {
                if (offset < 0) {
                    throw new StringIndexOutOfBoundsException(offset);
                }
                if (count <= 0) {
                    if (count < 0) {
                        throw new StringIndexOutOfBoundsException(count);
                    }
                    if (offset <= codePoints.length) {
                        this.value = "".value;
                        return;
                    }
                }
                // Note: offset or count might be near -1>>>1.
                if (offset > codePoints.length - count) {
                    throw new StringIndexOutOfBoundsException(offset + count);
                }

                final int end = offset + count;

                // Pass 1: Compute precise size of char[]
                int n = count;
                for (int i = offset; i < end; i++) {
                    int c = codePoints[i];
                    if (Character.isBmpCodePoint(c))
                        continue;
                    else if (Character.isValidCodePoint(c))
                        n++;
                    else throw new IllegalArgumentException(Integer.toString(c));
                }

                // Pass 2: Allocate and fill in char[]
                final char[] v = new char[n];

                for (int i = offset, j = 0; i < end; i++, j++) {
                    int c = codePoints[i];
                    if (Character.isBmpCodePoint(c))
                        v[j] = (char)c;
                    else
                        Character.toSurrogates(c, v, j++);
                }

                this.value = v;
            }

2.7、将字节数组构建为String

byte[] newByte = new byte[]{4,5,6,34,43};
        //构造字符 
        String string = new String (newByte,0,newByte,"UTF-8");
        //參数一 相应的字节数组
        //參数二 參数三 构造字符串的字节范围
        //參数四 构造字符串的编码方式 这里使用的 为UTF - 8;

从其源代码的角度来看

public String(byte bytes[], int offset, int length, String charsetName)
                throws UnsupportedEncodingException {
                    if (charsetName == null)
                        throw new NullPointerException("charsetName");
                    checkBounds(bytes, offset, length);
                    this.value = StringCoding.decode(charsetName, bytes, offset, length);
            }

假设传入的编码方式为空。则直接抛出异常

而以下的checkBounds方法仅仅是做了一些安全性检验

private static void checkBounds(byte[] bytes, int offset, int length) {
                if (length < 0)
                    throw new StringIndexOutOfBoundsException(length);
                if (offset < 0)
                    throw new StringIndexOutOfBoundsException(offset);
                if (offset > bytes.length - length)
                    throw new StringIndexOutOfBoundsException(offset + length);
            }

然后通过方法StringCoding.decode 方法创建了一个新的字符数组，并赋值与 this.value

2.8、将字节数组中的一部分数据构建为String

        byte[] newByte = new byte[]{4,5,6,34,43};
        //构造字符 
        String string = new String (newByte,"UTF-8");

3、经常使用处理String操作的方法分析

3.1、取出一个String中的指定角标下的一个字符

 String text = "thisisapicture";
        //取出角标索引为0位置的字符
        char cahrString = text . charAt(0);

        //这里取出来的字符为 t

从源代码角度来看

public char charAt(int index) {
            if ((index < 0) || (index >= value.length)) {
                throw new StringIndexOutOfBoundsException(index);
            }
            return value[index];
        }

从其源代码看来，当我们刚刚创建一个String 对象的时候，String对象的内容则会被转为内存空间中的对应的字数组，而在这里，

则是从其相应的内存数组中拿到相应角标下的相应字符

3.2、对字符串的切割

3.2.1、 split方法进行切割

 String textString = "q,q,w,e,e,r,f,g3,g";
        //将字符串以“,”为基准进行切割
        String[] stringArray = textString.split(",");
        //得到对应的字符串数组 
        // [q, q, w, e, e, r, f, g3, g]

从源代码的角度来看

public String[] split(String regex) {
            return split(regex, 0);
        }

实际中调用了两个參数的重载方法

public String[] split(String regex, int limit) {
            /* fastpath if the regex is a
             (1)one-char String and this character is not one of the
                RegEx's meta characters ".$|()[{^?
*+\\", or
             (2)two-char String and the first char is the backslash and
                the second is not the ascii digit or ascii letter.
             */
            char ch = 0;
            if (((regex.value.length == 1 &&
                     ".$|()[{^?
*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
                     (regex.length() == 2 &&
                      regex.charAt(0) == '\\' &&
                      (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
                      ((ch-'a')|('z'-ch)) < 0 &&
                      ((ch-'A')|('Z'-ch)) < 0)) &&
                    (ch < Character.MIN_HIGH_SURROGATE ||
                     ch > Character.MAX_LOW_SURROGATE))
                {
                    int off = 0;
                    int next = 0;
                    boolean limited = limit > 0;
                    ArrayList<String> list = new ArrayList<>();
                    while ((next = indexOf(ch, off)) != -1) {
                        if (!limited || list.size() < limit - 1) {
                            list.add(substring(off, next));
                            off = next + 1;
                        } else {    // last one
                            //assert (list.size() == limit - 1);
                            list.add(substring(off, value.length));
                            off = value.length;
                            break;
                        }
                    }
                    // If no match was found, return this
                    if (off == 0)
                        return new String[]{this};

                    // Add remaining segment
                    if (!limited || list.size() < limit)
                        list.add(substring(off, value.length));

                    // Construct result
                    int resultSize = list.size();
                    if (limit == 0) {
                        while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                            resultSize--;
                        }
                    }
                    String[] result = new String[resultSize];
                    return list.subList(0, resultSize).toArray(result);
                }
                return Pattern.compile(regex).split(this, limit);
            }

当看到这段源代码的时候。有点头疼，由于它有点长。事实上细致一看，也只是如此

能够看到方法内有一个if，假设条件为true。那么就使用indexOf（）打开循环体系，推断后substring()截取，
//第一步部分：当regex的长度为1且不是“.$|()[{^?*+\\”中的时，为真
(regex.value.length == 1 &&".$|()[{^?

*+\\".indexOf(ch = regex.charAt(0)) == -1)
//第二部分：当长度为2时且第一个字符为“\”转义字符。第二个字符不是字符0-9 a-z A-Z 以及utf-16之间的字符

从if能够看出假设regex内容为一个非正则匹配符或者是转以后的特殊字符时，採用indexOf()+substring()处理。

否则使用正則表達式 Pattern.compile(regex).split(this, limit)

也就是说，我们在使用Split方法对String字符串进行切割的时候，不仅能够以一个普通的字符为标准进行切割，

还能够使用一个正則表達式进行切割

而当使用的切割符号正好为正則表達式中的某些符号的时候。须要正则转义才可以得到正确的结果

3.3 获取字符串中指定角标字符的Unicode编码

String text = "ABCD";
    //获取0号位置也就是'A'字符的编码
    int uniCode = text.codePointAt(0);

从源代码来看：

public int codePointAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return Character.codePointAtImpl(value, index, value.length);
    }

实际的操作体则是Character这个类在发挥作用：Character.codePointAtImpl(value, index, value.length)

也就是直接将String text相应的字符数组。以及要获取字符角标，以及字符数组长度传參

Unicode给世界上每一个字符分配了一个编号。编号范围从0x000000到0x10FFFF。

编号范围在0x0000到0xFFFF之间的字符，为经常使用字符集，称BMP(Basic Multilingual Plane)字符。

编号范围在0x10000到0x10FFFF之间的字符叫做增补字符(supplementary character)。

Unicode主要规定了编号。但没有规定假设把编号映射为二进制，UTF-16是一种编码方式，或者叫映射方式，

它将编号映射为两个或四个字节，对BMP字符。它直接用两个字节表示，

对于增补字符，使用四个字节，前两个字节叫高代理项(high surrogate)。范围从0xD800到0xDBFF，

后两个字节叫低代理项(low surrogate)，范围从0xDC00到0xDFFF，UTF-16定义了一个公式。能够将编号与四字节表示进行相互转换。

Java内部採用UTF-16编码。char表示一个字符。但仅仅能表示BMP中的字符，对于增补字符，须要使用两个char表示。一个表示高代理项。一个表示低代理项。

使用int能够表示随意一个Unicode字符。低21位表示Unicode编号，高11位设为0。整数编号在Unicode中一般称为代码点(Code Point)，表示一个Unicode字符，与之相对，另一个词代码单元(Code Unit)表示一个char。

而在Character这个类中，相应的静态操作方法实则为：

static int codePointAtImpl(char[] a, int index, int limit) {
        char c1 = a[index];
        if (isHighSurrogate(c1) && ++index < limit) {
            char c2 = a[index];
            if (isLowSurrogate(c2)) {
                return toCodePoint(c1, c2);
            }
        }
        return c1;
    }

实际在这种方法中的第一步是从text相应的字符数组中取出相应的角标的字符，假设不符合当中if的条件，那么将直接获取的是字符本身。

也就是字符本身的unicode编码是本身
在if的推断条件中.运行推断取出的字符是否在isHighSurrogate这种方法所限定的范围内，

public static boolean isHighSurrogate(char ch) {
        // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
        return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
    }

MIN_HIGH_SURROGATE utf-16 编码中的 unicode 高代理项代码单元的最小值。高代理项也称为前导代理项。
MAX_HIGH_SURROGATE utf-16 编码中的 unicode 高代理项代码单元的最大值

假设在。则运行 isLowSurrogate方法的限定推断

public static boolean isLowSurrogate(char ch) {
        return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);
    }

此方法能够理解为确定给定char值是否为一个Unicode低代理项代码单元
也能够理解为推断相应字符是否在 0xDC00到0xDFFF 范围的低代理项。
假设在，则运行toCodePoint方法计算其相应的Unicode值，也就是依据高代理项high和低代理项low生成代码单元

public static int toCodePoint(char high, char low) {
        // Optimized form of:
        // return ((high - MIN_HIGH_SURROGATE) << 10)
        //         + (low - MIN_LOW_SURROGATE)
        //         + MIN_SUPPLEMENTARY_CODE_POINT;
        return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
                                       - (MIN_HIGH_SURROGATE << 10)
                                       - MIN_LOW_SURROGATE);
    }

3.4、获取字符串中指定角标索引前一个元素的代码点

String textString = "ABCDEF";
        //这里是获取三号位置前面一位，也就是二号位 C 的uniCode编码 
        int uniCode = textString.codePointBefore(3);

从源代码来看：实际实操作的还是字符串相应的字符数组，操作方法依旧是Character这个类所定义的静态方法

public int codePointBefore(int index) {
            int i = index - 1;
            if ((i < 0) || (i >= value.length)) {
                throw new StringIndexOutOfBoundsException(index);
            }
            return Character.codePointBeforeImpl(value, index, 0);
        }

而在Character这个类中的方法中则是直接从相应的字符数组中取出角标前一位的字符，

然后判定是否在编码区中的高低代理区，假设在。那么就计算返回相应的编码

static int codePointBeforeImpl(char[] a, int index, int start) {
        char c2 = a[--index];
        if (isLowSurrogate(c2) && index > start) {
            char c1 = a[--index];
            if (isHighSurrogate(c1)) {
                return toCodePoint(c1, c2);
            }
        }
        return c2;
        }

3.5、获取字符串中指定范围内的字符串Unicode代码点数

String textString = "ABCDEF";
        //这里获取的是整个字符串所相应的 代码点数
        int uniCodeCount = textString.codePointCount(0, textString.length());

从源代码角度来看实际还是在操作String相应的字符数组。负责操作的是 Character这个类

public int codePointCount(int beginIndex, int endIndex) {
        if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {
            throw new IndexOutOfBoundsException();
        }
            return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);
        }

这里直接传入字符串相应的字符数组，以及范围的開始点，范围长度

static int codePointCountImpl(char[] a, int offset, int count) {
            int endIndex = offset + count;
            int n = count;
            for (int i = offset; i < endIndex; ) {
                if (isHighSurrogate(a[i++]) && i < endIndex &&
                    isLowSurrogate(a[i])) {
                    n--;
                    i++;
                }
            }
                return n;
        }

能够看出来在这里面也是一个循环推断每个字符是否在高代理项区与低代理项区，然后进行计数

3.6、比較两个字符串是否相等

        String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //比較
        int compareNumber = text1.compareTo(text2);

事实上就是依次比較两个字符串ASC码。

假设两个字符的ASC码相等则继续兴许比較，否则直接返回两个ASC的差值。

假设两个字符串全然一样。则返回0

从源代码角度来看：

 public int compareTo(String anotherString) {
            int len1 = value.length;
            int len2 = anotherString.value.length;
            int lim = Math.min(len1, len2);
            char v1[] = value;
            char v2[] = anotherString.value;

            int k = 0;
            while (k < lim) {
                char c1 = v1[k];
                char c2 = v2[k];
                if (c1 != c2) {
                    return c1 - c2;
                }
                k++;
            }
            return len1 - len2;
        }

在这里先字符串本身相应的字符数组的长度，然后再通过方法anotherString.value.length方法获取比較对象String的长度

3.7 忽略大写和小写比較两个字符串

String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //比較
        int compareNumber = text1.compareToIgnoreCase(text2);

从源代码角度来看

public int compareToIgnoreCase(String str) {
            return CASE_INSENSITIVE_ORDER.compare(this, str);
        }

这里直接使用了CASE_INSENSITIVE_ORDER的compare方法来比較两个字符串。

而CASE_INSENSITIVE_ORDER是String类中定义的一个比較器
（1.2版本号開始使用）
public static final Comparator<String> CASE_INSENSITIVE_ORDER

3.8 将两个字符串拼接为一个字符串

 String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //这里将 text2 拼接到text1后面 
        String newString  = text1.concat(text2);

将两个字符串拼接到一起，能够使用 text1+text2这种方法，仅仅只是是在性能上有一点点的不合适

从源代码角度来看

public String concat(String str) {
            int otherLen = str.length();
            if (otherLen == 0) {
                return this;
            }
            int len = value.length;
            char buf[] = Arrays.copyOf(value, len + otherLen);
            str.getChars(buf, len);
            return new String(buf, true);
        }

第一步在这里先进行字符串判定，假设要拼接的字符串是""，那么拼接后依旧是本身，所在这里直接return。
第二步通过 Arrays.copyOf方法来创新一个新的指定大小的数组

 public static char[] copyOf(char[] original, int newLength) {
                char[] copy = new char[newLength];
                System.arraycopy(original, 0, copy, 0,
                                 Math.min(original.length, newLength));
                return copy;
            }
            //这里创建一个新长度的数组空间并将原来的字符数组COPY进去

第三步通过 String 的 getChars方法将两个字符串相应的字符数组拼接
第四步构造新的字符串返回

3.9 查找一个字符在字符串中第一次出现的位置

 String text = "ABCD ";
        //查找 "A" 在字符串text中第一次出现的位置 
        int index = text.indexOf("A");

从源代码角度来看

 public int indexOf(String str) {
            return indexOf(str, 0);
        }

        //再看indexOf(String str, int fromIndex)方法

 public int indexOf(String str, int fromIndex) {
         return indexOf(value, 0, value.length,
                    str.value, 0, str.value.length, fromIndex);
        }

上述indexfOf(String str)方法中调用indexOf(String str,int fromIndex)方法，然后传入0，也就是说将从String的開始位置查找指定字符在字符串中出现的位置，

而在indexOf(String str,int fromIndex)这种方法中则调用了indexOf的多參数方法，这时分别传入
            value 父字符串的字符数组
            0 父字符串的有效字符起始角标索引这里传入0，也就是整个父字符串都可作为操作对象
            value.length 父字符串的字符数组长度
            str.value 子字符串的字符数组
            0 子字符串的有效字符起始角标索引
str.value.length 子字符串的字符数组长度
fromIndex 在父字符串的查找的開始位置

再看多參数方法

static int indexOf(char[] source, int sourceOffset, int sourceCount,
                char[] target, int targetOffset, int targetCount,
                int fromIndex) {
            if (fromIndex >= sourceCount) {
                return (targetCount == 0 ? sourceCount : -1);
            }
            if (fromIndex < 0) {
                fromIndex = 0;
            }
            if (targetCount == 0) {
                return fromIndex;
            }

            char first = target[targetOffset];
            int max = sourceOffset + (sourceCount - targetCount);

            for (int i = sourceOffset + fromIndex; i <= max; i++) {
                /* Look for first character. */
                if (source[i] != first) {
                    while (++i <= max && source[i] != first);
                }

                /* Found first character, now look at the rest of v2 */
                if (i <= max) {
                    int j = i + 1;
                    int end = j + targetCount - 1;
                    for (int k = targetOffset + 1; j < end && source[j]
                            == target[k]; j++, k++);

                    if (j == end) {
                        /* Found whole string. */
                        return i - sourceOffset;
                    }
                }
            }
            return -1;
        }

在这种方法中，參数最多，那么说明处理的方法就在这里面
先分析參数
char[] source, 父String 相应字符数组
int sourceOffset, 父String 被使用起始索引
int sourceCount, 父String 相应字符数组长度
char[] target, 子String 相应字符数组
int targetOffset, 子String 被使用超始索引
int targetCount, 子String 相应字符数组长度
int fromIndex 检索起始位置

从上到下
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
假设查找子String标识的在父View中的起始位置大于等于父String的长度
则终止操作。返回0 或者 -1
当子字符串的长度为0的时候。则返加0，此时子字符串为”“ 内容为空的空字符串
当子字符串的长度不为0的时候，返回-1，也就是说没有查找到

if (fromIndex < 0) {
fromIndex = 0;
}
当检索起始位置小于0，将起始位置默认设为0

if (targetCount == 0) {
return fromIndex;
}
当检索的子String相应的字符数组长度为0时。返回传入的检索位置

char first = target[targetOffset];
取出检索子String的第一个字符

int max = sourceOffset + (sourceCount - targetCount);
父String被检索起始位 + （父String相应字符数组长度 - 子String相应字符数组长度）

for (int i = sourceOffset + fromIndex; i <= max; i++) {
。。

。。
}
假设循环体系中的条件没有被满足。那说明没有查找到相应的字符，返回-1

在循环体系中
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
上面这段代码让我感觉到无语
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);

if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}

3.10 推断一个字符串中是否包括指定的某一个字符

String text = "ABCE";
        //推断text中是否包括 "A"
        boolean isContans = text.contains("A");

从源代码的角度来看：

public boolean contains(CharSequence s) {
            return indexOf(s.toString()) > -1;
        }

能够看到事实上实际中还是String 的indexOf在发挥实际的作用，这里调用的indexOf(String str)方法，当查找到相应的字符时，会返回这个字符在字符串中的角标索引，角标索引从0 開始肯定大于-1，与-1相比較，返回true
假设没有找到，indexOf方法返回的是 -1，终于contains方法返回的是 false，假设传入的是 ""空字符串。indexOf方法会返回0，终于contains方法返回的是true

3.11 比較字符串到指定的CharSequence 序列是否同样

（2016-9-22 18：43 更新）

 String text1 = "ABCDEF";
        //比較序列
        boolean isExit = text1.contentEquals("AABCDEF");

从源代码的角度来看

        public boolean contentEquals(StringBuffer sb) {
            return contentEquals((CharSequence)sb);
        }
        public boolean contentEquals(CharSequence cs) {
            // Argument is a StringBuffer, StringBuilder
            if (cs instanceof AbstractStringBuilder) {
                if (cs instanceof StringBuffer) {
                    synchronized(cs) {
                       return nonSyncContentEquals((AbstractStringBuilder)cs);
                    }
                } else {
                    return nonSyncContentEquals((AbstractStringBuilder)cs);
                }
            }
            // Argument is a String
            if (cs instanceof String) {
                return equals(cs);
            }
            // Argument is a generic CharSequence
            char v1[] = value;
            int n = v1.length;
            if (n != cs.length()) {
                return false;
            }
            for (int i = 0; i < n; i++) {
                if (v1[i] != cs.charAt(i)) {
                    return false;
                }
            }
            return true;
        }

这里两个方法重载，传类型不一样。CharSequence类型和StringBuffer类型

而实际操作起作用的是contentEquals(CharSequence cs)方法

而在这种方法中

if (cs instanceof AbstractStringBuilder) {
                     if (cs instanceof StringBuffer) {
                          synchronized(cs) {
                            return nonSyncContentEquals((AbstractStringBuilder)cs);
                       }
            } else {
                 return nonSyncContentEquals((AbstractStringBuilder)cs);
          }
    }

能够看出来，当我们传入的參数 cs 是AbstractStringBuilder的实例的时候运行if内部的方法而且假设是StringBuffer的实例，加同步锁，

而在方法nonSyncContentEquals中

private boolean nonSyncContentEquals(AbstractStringBuilder sb) {
                    char v1[] = value;
                    char v2[] = sb.getValue();
                    int n = v1.length;
                    if (n != sb.length()) {
                        return false;
                    }
                    for (int i = 0; i < n; i++) {
                        if (v1[i] != v2[i]) {
                            return false;
                        }
                    }
                    return true;
                }

这种方法中吧，感觉到无语。首先先获取參数的相应字符数组的长度，然后比較字符数组的长度，假设长度不同样，那么直接返回false，也就是这两个字符串的序列号肯定不一样，
假设长度一样，再一循环比較每个字符

假设不满足if中的推断语句后。在向下运行，

// Argument is a String
                if (cs instanceof String) {
                    return equals(cs);
                }

假设传參是String的实例的时候
则调用了String的equals方法进行比較

假设不满足上面的条件。再向下运行

 // Argument is a generic CharSequence
                    char v1[] = value;
                    int n = v1.length;
                    if (n != cs.length()) {
                        return false;
                    }

能够看到这一步是直接比較的字符參数的长度，假设长度不一样，那么其相应的序列号肯定也就不一样了
假设不满足if条件

   for (int i = 0; i < n; i++) {
                        if (v1[i] != cs.charAt(i)) {
                            return false;
                        }
                    }

能够看到继续向下。则是比較每个字符，不同样返回false。
假设不满足上述方法。则最后返回true

3.12 感觉无力的 copyValueOf方法

（2016-9-22 18：43 更新）

            
        String string = "4444";
        
        char[] charTest = new char[]{'d','r','w','q'};
        
        String newString = string.copyValueOf(charTest);

构建结果尽然是： drwq

不忍心看源代码，可是还得看下，一看。更是无语

         /**
         * Equivalent to {@link #valueOf(char[])}.
         *
         * @param   data   the character array.
         * @return  a {@code String} that contains the characters of the
         *          character array.
         */
        public static String copyValueOf(char data[]) {
            return new String(data);
        }

擦，尽然是调用了String的一个构造来构造了一个新的 String

3.13 推断字符串是否以指定的字符或者字符串开头

（2016-9-23 8：33 更新）

 String string = "andThisIs4";
        //推断是否以a开头
        boolean flag = string.startsWith("a");
        //输出结果为 true

从源代码角度来看：

        public boolean startsWith(String prefix) {
            return startsWith(prefix, 0);
        }

这里调用其重载方法
对于这种方法来说，传入两个參数
String string = "thisIsAndAnd";
//推断从二号索引位置是否是以this开头
boolean flag = string.startsWith("this",2);
//结果为 flase

public boolean startsWith(String prefix, int toffset) {
            char ta[] = value;
            int to = toffset;
            char pa[] = prefix.value;
            int po = 0;
            int pc = prefix.value.length;
            // Note: toffset might be near -1>>>1.
            if ((toffset < 0) || (toffset > value.length - pc)) {
                return false;
            }
            while (--pc >= 0) {
                if (ta[to++] != pa[po++]) {
                    return false;
                }
            }
            return true;
        }

从上到下：
//获取父String 相应的字符数组
char ta[] = value;
//赋值检索父String的起始位置
int to = toffset;
//获取子String 相应的字符数组
char pa[] = prefix.value;
//定义变量标识
int po = 0;
//获取子String 相应字符数组的长度
int pc = prefix.value.length;
//假设检索父String的位置小于0 返回false
//假设
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
//循环比較
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true

3.14 推断字符串是否以指定的字符或者字符串结尾

（2016-9-23 8：33 更新）

String string = "andrThisIs4";
        //推断是否是以 4 结尾
        boolean flag = string.endsWith("4");
        //结果 为true

从源代码角度来看

public boolean endsWith(String suffix) {
                return startsWith(suffix, value.length - suffix.value.length);
            }

不忍直视。它尽然调用了 startsWith方法来运行推断

3.15 比較两个字符串中的内容是否同样

（2016-9-23 8：33 更新）

        String text1 = "ABCD";
        String text2 = "ABCE";
        //比較两个字符串是否相等
        boolean flage = text1.equals(text2);

text1 text2 分别指向两个堆内存空间
当使用 text1 == text2 来推断时，比較的是text1 与 text2这两个变量相应的引用地址是否相等。也就是说其指向的内存地址是否一样

从源代码角度来看 equals方法

public boolean equals(Object anObject) {
             if (this == anObject) {
                    return true;
                }
            if (anObject instanceof String) {
                String anotherString = (String)anObject;
                int n = value.length;
                if (n == anotherString.value.length) {
                    char v1[] = value;
                    char v2[] = anotherString.value;
                    int i = 0;
                    while (n-- != 0) {
                        if (v1[i] != v2[i])
                            return false;
                        i++;
                    }
                    return true;
                }
            }
            return false;
        }

參数格式要求是 Object类型

在这种方法中，其先比較的两个变量指向的堆内存空间地址是否是一样，假设一样，那么直接返回true ，
当 String text1 = "ABC";在内存空间开辟空间，并赋值变量text1
当定义 Sring text2 ="ABC";时。虚拟机会先去内存空间的常量区中寻找是否有相应内容的空间，假设有。那么就直接指向那边，在这里，因创建的text1 内容与 text2的内容一致，所以在text2创建的时候，会将text2的引用指向直接指向text1指向的内存空间

在接下来的一步中。假设传入的比較參数是String实例。将其强转为String 类型
然后通过循环体制一一比較每个字符，

3.16 获取String默认格式下的字节编码

（2016-9-27 14：33 更新）

        String text1 = "ABCDEF";
        //获取默认编码下的字节数组 
        byte[] bytes = text1.getBytes();

从源代码角度来看：

        public byte[] getBytes() {
            return StringCoding.encode(value, 0, value.length);
        }

这里间接使用了StringCoding类的静态方法 encode方法

        static byte[] encode(char[] ca, int off, int len) {
            String csn = Charset.defaultCharset().name();
            try {
                // use charset name encode() variant which provides caching.
                return encode(csn, ca, off, len);
            } catch (UnsupportedEncodingException x) {
                warnUnsupportedCharset(csn);
            }
            try {
                return encode("ISO-8859-1", ca, off, len);
            } catch (UnsupportedEncodingException x) {
                // If this code is hit during VM initialization, MessageUtils is
                // the only way we will be able to get any kind of error message.
                MessageUtils.err("ISO-8859-1 charset not available: "
                                 + x.toString());
                // If we can not find ISO-8859-1 (a required encoding) then things
                // are seriously wrong with the installation.
                System.exit(1);
                return null;
            }
        }

在这种方法中首先通过 String csn = Charset.defaultCharset().name(); 获取默认的编码方式（“UTF-8”）
然后调用其重载的方法来进行编译，
假设出现异常。则使用 ISO-8859-1 的编码方式来进行解析，假设再出现异常，系统异常退出

3.17 获取String指定编码格式下的字节数组

（2016-9-27 14：33 更新）

        String text1 = "abef";
        byte[] bytes = null;
        
        try {
            bytes = text1.getBytes("UTF-8");
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

可见这种方法是获取默认编码格式下的字节数组方法的重载方法

从源代码角度来看

        public byte[] getBytes(String charsetName)
            throws UnsupportedEncodingException {
                if (charsetName == null) throw new NullPointerException();
                return StringCoding.encode(charsetName, value, 0, value.length);
        }

能够看到这里直接调用的是 StringCoding 的四个參数的重载方法（getBytes()方法调用的是三个參数的encode方法）

3.18 将String 中的部分字符拷贝到指定的字符数组中去

（2016-9-27 14：33 更新）

        String text1  ="ABCDEF";
        //目标字符数组 
        char[] chars = new char[10];
        //拷贝
        text1.getChars(0,text1.length(),chars,0);
        //參数一 參数二  拷贝String中字符的范围
        //參数三  目标字符数组
        //參数四  目标字符数组中的存储起始位置

从源代码角度来看

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
            if (srcBegin < 0) {
                throw new StringIndexOutOfBoundsException(srcBegin);
            }
            if (srcEnd > value.length) {
                throw new StringIndexOutOfBoundsException(srcEnd);
            }
            if (srcBegin > srcEnd) {
                throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
            }
            System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
        }

其直接调用 System 的拷贝数组的方法将数据拷贝

        public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

对于System的arraycopy方法来说，其直接调用 JNI层方法实现

在上述调用 System的arraycopy方法的时候传入參数
參数一（value）父级String相应的字符数组
參数二（srcBegin）父级String開始拷贝数据的起始位置
參数三（dst）目标数组
參数四（dstBegin）目标数组存储数据的起始位置
參数五（srcEnd - srcBegin）截取父String中字符的长度

3.19 将其它数据类型转为String类型（String的静态方法valueOf()方法）

（2016-10-8 14：00 更新）

3.19.1

    //将boolean类型转为String类型
    String newString = String.valueOf(true);

源代码中这样描写叙述

    public static String valueOf(boolean b) {
        return b ?
 "true" : "false";
    }

源代码中这样写到，我能够说我也是有点醉了

3.19.2

    //将字符类型转为String类型
    
    char textChar = 'd';

    String newString = String.valueOf(textChar);

源代码中这样来描写叙述：

    public static String valueOf(char c) {
        char data[] = {c};
        return new String(data, true);
    }

在valueOf方法中直接构造了一个char数组，然后再通过String的构造方法将char数组构造成为一个新的String。只是在这里使用的这个String构造让我感到有点无语

   /*
    * Package private constructor which shares value array for speed.
    * this constructor is always expected to be called with share==true.
    * a separate constructor is needed because we already have a public
    * String(char[]) constructor that makes a copy of the given char[].
    */
    String(char[] value, boolean share) {
        // assert share : "unshared not supported";
        this.value = value;
    }

能够看得到这是一个私有的方法，传入的第二个參数没有使用到

3.19.3

    //将int类型数据转为String类型
    int textInt = 2345;
    String newString = String.valueOf(textInt);

从源代码中来看：

    public static String valueOf(int i) {
        return Integer.toString(i);
    }

事实上是 Integer类发挥了实际操作的作用
也就能够推理到

3.19.4

    将float类型数据转为String。实际上是Float.toString(f),方法发挥作用
    源代码中这样描写叙述 ：
    public static String valueOf(float f) {
        return Float.toString(f);
    }
    //将double类型数据转为String类型
    public static String valueOf(double d) {
        return Double.toString(d);
    }
    //将long类型数据转换为String
    public static String valueOf(long l) {
        return Long.toString(l);
    }

也就是说其基本数据类型相应的对象类的toString方法在发挥着实际的操作作用

3.20 去掉字符串前后的空格（trim()方法）

（2016-10-9 14：00 更新）

    String text = "  abce  ";
    //去掉空格
    String newString = text.trim();
    //生成新的字符串 "abce"

在源代码中：

    public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

基本实现思路是定义角标索引标识 int st, int len ，

接下来就是两个循环分别检索记录字符串的开头空格的位置，与字符串的结束空格的位置，最后调用推断逻辑。

当st=0,len=value.length,说明字符串的关部与尾部没有空格，直接返回本身，假设不为上述的值。st>0，说明字符串开头有空格，len<value.length,说明字符串结尾有空格，则调用substring方法对字符串进行截取

3.21 截取固定角标范围的字符串（substring()方法）

（2016-10-10 18：00 更新）

    String text = "dfdfabcef";
    //截取 "dfdf"
    String newString = text.substring(0,4);

从源代码的角度来看

    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

3.21.1 对于方法substring(int beginIndex,int endIndex)，传入參数beginIndex（截取字符串的開始位置），endIndex（截取字符串的结束位置）

3.21.2 在方法的内部就对这两个參数的范围先做了范围推断，beginIndex開始位置最小也是0吧，不可能有负位，endIndex最长也仅仅能等于父字符串的长度（value.length (value是父String相应的字符数组)）吧，不可能超出父字符串的长度而去截取

3.21.3 然后截取的子字符串的长度应为 int length = endIndex - beginIndex;也就是我们所说的不包括头包括尾

3.21.4 假设子字符串的长度length <0 那么说明传入的開始位置与结束位置是不合理的

3.21.5 然后最后通过 String 的三个參数的构造方法构造了新的字符串返回

官方这样简述：

    String text = "abcdefg";
    //截取 "bcdefg"
    String newText = text.substring(1);

源代码角度来看

    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

能够看到不管是 substring(int beginIndex,int endIndex)方法。还是substring(int beginIndex)方法，
终于发挥操作的是 String 的一个三參数的构造方法 String(char value[], int offset, int count)
然后实质就是将一个字符数组从指定的位置開始截取固定长度的字符从而构造成为一个新的String

未完...

posted @ 2017-08-20 20:18 lytwajue 阅读(1096) 评论(0) 收藏举报

刷新页面返回顶部

lytwajue

Android研发中对String的思考（源代码分析）

1、经常使用创建方式思考：

2、创建String 的构造简析

2.1、String string = new String ();

2.2、String string = new String (" test ");

2.3、将char数组中的内容构造为String类型

2.4、 将字符数组中的部分数据构造为String

2.6、 通过 unicode数组来构造String

2.7、将字节数组构建为String

2.8、 将字节数组中的一部分数据构建为String

3、经常使用处理String操作的方法分析

3.1、 取出一个String中的指定角标下的一个字符

3.2、 对字符串的切割

3.2.1、 split方法进行切割

3.3 获取字符串中指定角标字符的Unicode编码

3.4、获取字符串中指定角标索引前一个元素的代码点

3.5、获取字符串中指定范围内的字符串Unicode代码点数

3.6、比較两个字符串是否相等

3.7 忽略大写和小写比較两个字符串

3.8 将两个字符串拼接为一个字符串

3.9 查找一个字符在字符串中第一次出现的位置

3.10 推断一个字符串中是否包括指定的某一个字符

3.11 比較字符串 到指定的CharSequence 序列是否同样

3.12 感觉无力的 copyValueOf方法

3.13 推断字符串是否以指定的字符或者字符串开头

3.14 推断字符串是否以指定的字符或者字符串结尾

3.15 比較两个字符串中的内容是否同样

3.16 获取String默认格式下的字节编码

3.17 获取String指定编码格式下的字节数组

3.18 将String 中的部分字符拷贝到指定的字符数组中去

3.19 将其它数据类型转为String类型（String的静态方法valueOf()方法）

3.20 去掉字符串前后的空格（trim()方法）

3.21 截取固定角标范围的字符串 （substring()方法）

2.4、将字符数组中的部分数据构造为String

2.6、通过 unicode数组来构造String

2.8、将字节数组中的一部分数据构建为String

3.1、取出一个String中的指定角标下的一个字符

3.2、对字符串的切割

3.11 比較字符串到指定的CharSequence 序列是否同样

3.14 推断字符串是否以指定的字符或者字符串结尾

3.19 将其它数据类型转为String类型（String的静态方法valueOf()方法）

3.21 截取固定角标范围的字符串（substring()方法）