Skip to content

Java String, StringBuilder & StringBuffer

String is an important part in any application that we write everyday, and we need to manipulate many kinds of string data such as text file, and structured or unstructured data input by users. If you ever learn C/C++, you will know that deal with string is a complex work.

For example, in C/C++,

  • coder needs to deal with encoding, that C/C++ provide char and wchar, which represents for differnt kind of encoding
  • In C, it's only support to handle string with char array or wchar array. In C++, the standard library provides string or wstring class

Even C++ has significate improvements in string operation compared with C, but compared with Java, it's not easy to use.

What is Java String

Java String is an immmutable final class which provides kinds of String operations. Due to String is immutable,

  • String operations such as trim, split, concat will create a new String, this will have perfomance impact
  • String is a thread safe class
  • Cache
  • Easy encoding processing
  • Contains operations such as,

    • get length
    • trim
    • check if is empty
    • find
    • check with prefix or suffix
    • substring
    • regex
    • replace
    • case convert
    • value convert from other data types
    • etc...

There are some differce in String class between version 8 and version 9 afterward.

Java 8 and Before

  • it use char array to hold a String data
  • Char is 16 bit, so String is UTF-16 encoding
  • to get the String length is easy, just return the length of the char array

Java 9 and Afterwards

  • it use byte array to hold a String data
  • support compact String
  • the encoding is based on if COMPACT_STRINGS is enabled, and if string can be encoded with Latin encoding

    • if COMPACT_STRINGS is not enabled, string will be always encode as UTF-16
    • if COMPACT_STRINGS is enabled, and string can be encoded with Latin encoding, then the byte data will be encoded as Latin
    • othewise, encode string as UTF-18
  • new methods such as isBlank, strip and repeat are added

String Cache

According to research, 25% of an application data are string, and about half of the string are duplicated. Base on this, JVM will try to cache the string to avoid creating duplicated string in memory.

For example,

Kroki

Assume, there is a string I am a String in memory, and there are two variables that reference to this string, and this will saves memory. Cache is that a mechanism allow JVM to reduct duplicates string, that allows variables reference to a same string just have one copied of the real data.

Java provides a native method intern() to allow user add a string to the cache pool manually, when intern() method of a string is called,

  • if that string is already in the string pool, then the string in the pool returned
  • if that string is not in the string pool, then the string will be added into the pool and return

Intern is not a good mechanism, after Java 8u20, the G1 GC suppprts string duplication reduction by pointer multiple same string to one copy.

For example, guess what the output of the following code,

Java
var s = new String("aa");
var q = new String("aa");

System.out.println(s == q);


var p = "aa";
var t = "aa";

System.out.println(p == t);

The anwser is,

Bash
false
true

From the example, we can see,

  • using new to create a string, will always return a new string
  • literal string will use cache, that means return an exising string reference if there is

StringBuffer & StringBuilder

StringBuffer and StringBuilder are two classes which provide similiar functionailties to modify strings. Unlike using + operator on two strings, the StringBuffer and StringBuilde aims to reduce the string objects creation during the string modification.

StringBuffer and StringBuilder has an interal array like string to hold on the data,

  • String is immutable class, any modification will create a String object
  • Unlike String, any modification on StringBuffer or StringBuilder will update the internal array, not to create a new object every time

    • if the original array's size is not enough to hold on modified data, the array will be resized
    • the default array size of the StringBuffer and StringBuilder is 16, to improve the perfomance, you'd better to specify the initial size to reduce the array resize

The differece between StringBuffer and StringBuilder is that,

  • StringBuffer is thread safe, every method of StringBuffer is synchronized
  • StringBuilder is not thread safe, use should handle it manually

Consider the following code, run with different JDK with javac & javap.

Java
public class StringConcatExample {
    public static void main(String[] args) {
        System.out.println(concat1());
        System.out.println(concat2());
    }

    public static String concat1() {
        String a = "aa";
        String b = "bb";
        String str = a + b + "cc";
        return str;
    }

    public static String concat2() {
        String str = "aa" + "bb" + "cc";
        return str;
    }
}
  • JDK 8

    Bash
    javac StringConcatExample.java
    javap -v StringConcatExample.class
    
    Output is:
    Bash
    ...
    public static java.lang.String concat1();
        descriptor: ()Ljava/lang/String;
        flags: ACC_PUBLIC, ACC_STATIC
        Code:
          stack=2, locals=3, args_size=0
             0: ldc           #6                  // String aa
             2: astore_0
             3: ldc           #7                  // String bb
             5: astore_1
             6: new           #8                  // class java/lang/StringBuilder
             9: dup
            10: invokespecial #9                  // Method java/lang/StringBuilder."<init>":()V
            13: aload_0
            14: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
            17: aload_1
            18: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
            21: ldc           #11                 // String cc
            23: invokevirtual #10                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
            26: invokevirtual #12                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
            29: astore_2
            30: aload_2
            31: areturn
          LineNumberTable:
            line 8: 0
            line 9: 3
            line 10: 6
            line 11: 30
    
      public static java.lang.String concat2();
        descriptor: ()Ljava/lang/String;
        flags: ACC_PUBLIC, ACC_STATIC
        Code:
          stack=1, locals=1, args_size=0
             0: ldc           #13                 // String aabbcc
             2: astore_0
             3: aload_0
             4: areturn
          LineNumberTable:
            line 15: 0
            line 16: 3
    }
    ...
    

  • JDK 19

    Bash
    javac StringConcatExample.java
    javap -v StringConcatExample.class
    
    Output is:
    Bash
    ...
    public static java.lang.String concat1();
    descriptor: ()Ljava/lang/String;
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=3, args_size=0
         0: ldc           #28                 // String aa
         2: astore_0
         3: ldc           #30                 // String bb
         5: astore_1
         6: aload_0
         7: aload_1
         8: invokedynamic #32,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
        13: astore_2
        14: aload_2
        15: areturn
      LineNumberTable:
        line 10: 0
        line 11: 3
        line 12: 6
        line 13: 14
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            3      13     0     a   Ljava/lang/String;
            6      10     1     b   Ljava/lang/String;
           14       2     2   str   Ljava/lang/String;
    
    public static java.lang.String concat2();
    descriptor: ()Ljava/lang/String;
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=1, args_size=0
         0: ldc           #36                 // String aabbcc
         2: astore_0
         3: aload_0
         4: areturn
      LineNumberTable:
        line 17: 0
        line 18: 3
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            3       2     0   str   Ljava/lang/String;
    }
    ...
    

We can see that,

  • Both JDK8 and JDK19 will optimize the String concat in method concat2 to a concated constant String
  • JDK8 optimize the String concat in method concat1 by using the StringBuilder
  • JDK19 optimize the String concat in method concat1 by using JVM instruction InvokeDynamic #0:makeConcatWithConstants, which is decoupled with Java byte code