General Information about Strings

Nate Bross

Well-known member
Joined
Apr 6, 2005
Messages
601
Location
Chicago, IL
For starters its a good idea to understand the definition of a String in the land of Microsoft and .NET.

"A string is a sequential collection of Unicode characters, typically used to represent text, while a String is a sequential collection of System.Char objects that represents a string. The value of the String is the content of the sequential collection, and the value is immutable.

A String is called immutable because its value cannot be modified once it has been created. Methods that appear to modify a String actually return a new String containing the modification."
- Microsoft Developer Network

What this roughly means is that every time you write code like this

Code:
    I am using VB because it formats the best in this forum.
    Dim MyString as String = ""
    For i as Integer = 0 to 99999
        MyString &= "x"
    Next i

the OS has to re-allocate memory for the string 100,000 times, this is very inefficient.

Dont despair, there is hope we have the System.Text namespace, which includes the StringBuilder class.

This class represents a string-like object whose value is a mutable sequence of characters. The value is said to be mutable because it can be modified once it has been created by appending, removing, replacing, or inserting characters.
- Microsoft Developer Network

This means the same code as above (slightly altered)

Code:
    I am using VB because it formats the best in this forum.
    Dim MyStringBuilder as System.Text.StringBuilder
    For i as Integer = 0 to 99999
        MyStringBuilder.Append("x"c)
        We put the ("x"c) to indicate that we are appending a single System.Char object
    Next i

This code is so much faster than the first sample it is really quite funny, but I digress. The point here is that using the String class the system has to re-allocate memory 100,000 times; where as the StringBuilder (that starts out with a length of 16 and doubles every time you append the next char) has to re-allocate memory less than 20 times.

Finally to improve efficiency a little bit more, we can do this

Code:
    I am using VB because it formats the best in this forum.
    Dim MyStringBuilder as System.Text.StringBuilder(100000)
    For i as Integer = 0 to 99999
        MyStringBuilder.Append("x"c)
        We put the ("x"c) to indicate that we are appending a single System.Char object
    Next i

Setting the initial capacity of the StringBuilder means it only allocates memory once.

Long string processing doesnt come into play unless you are processing a lot of textual data, but can greatly improve the time it takes to process a large flat file, etc.

One last thing to mention; the StringBuilder class, as with all .NET Framework classes has a .ToString() method to easily return a normal string for use after processing.
 
Other interesting notes about String:

1. Strings reside on the managed heap rather than the stack, like other common primitive types, such as Integer and Boolean. Heap objects are cleaned up only when they are collected by the Garbage Collector while Stack objects are cleaned up as soon as they go out of scope.

2. Like most heap objects, Strings have a default value of Nothing (or null in C#)

3. Using the "&=" Visual Basic concatenation operators will not invoke the Microsoft.VisualBasic.CompilerServices library. It the equivalent of calling the String.Concat() method.

4. Using the the "&" Visual Basic concatenation operator WILL invoke the Microsoft.VisualBasic compatibility assemblies.

5. Using the Visual Basic "=" operator on strings WILL invoke the Microsoft.VisualBasic compatibility assemblies (e.g. If myString = "" Then). Use of String.Empty and the String.Length property can make the Visual Basic "=" unnecessary.
 
EDIT - I was misinformed see post below

Good point about the Garbage Collection; when you are working with large strings, each subsequent re-allocation of memory takes longer because as your string gets longer the system has to find a hunk of memory that long.
 
Last edited by a moderator:
Memory allocation takes place in constant time in .NET (which is why its possible to write apps that are faster than C++). Garbage collection moves all allocated memory to the front so that new memory can be allocated instantly since all free space is in one solid block.
 
Back
Top