Why String Concatenation so Slow?

Beribey
2 min readJan 6, 2021

Why adding string will affect the memory and performance of the system?

Photo by George Pagan III on Unsplash

String

Once upon a time, when we were in Java, we were often told we had to use StringBuilder and append when adding strings instead of adding String. The reason is that String is immutable; its value does not change. When adding a string, we create a new string in memory. StringBuilder is mutable, so when we use the append, its value changes, not a new string is created. Therefore using StringBuilder will save memory and run faster.

Do not believe, please see the 2 code below. The code using StringBuilder only takes 4ms to run, and the code using String takes 4828ms (Source).

However, have we ever wondered why is string addition slow or not? Let’s read the article and find out.

String Concatenation, what is the problem?

Putting aside majestic new technologies, massive frameworks, magical languages, back from the day. Once upon a time, I used DevC, also worked with bytes and memory. Yes, the knowledge of bytes and memory is the foundation of your knowledge system.

How C stores string: String is an array of bytes, whose final character is the null character. With this storage, to know the string's length, we must run a loop starting from the pointer containing the first byte until the null character is encountered. Below is a version of strcat function, string addition function in C.

Let’s study together. On the first line, the code will run from the beginning until it encounters the last null character of the dest string, then it will copy each byte of the string src and dest. Two loops, the complexity is just O(n), nothing terrible ?? However, when we do the addition of strings many times, what about long strings?

Each time strcat calls, the loop will run from start to finish; the longer the string, the longer the loop runs. Until the string is extensive, the string addition takes place very heavy and slow.

The longer the original string, the longer the loop must run. Therefore, to solve this problem, another version of the strcat function was created:

With this notation, after adding the string, we return the last pointer position. Each addition, the loop runs from the returned pointer to the end of the src string, no longer iterating like the original algorithm.

Conclusion

A long time learning about low-level concepts/languages is quite good. Luckily we are using C #, Java frees itself up memory, and there are quite a few libraries to support. Many doctors still write C, C ++, access memory by hand is very extreme (Extreme but high salary). However, if I were the one who wrote the code, in 90% of the cases, I would still use String or String. Format instead of StringBuilder !! Why??

--

--

Beribey

Always be nice to anybody who has access to my toothbrush.