The Wayback Machine - https://web.archive.org/web/20120911011654/http://www.codeguru.com:80/cpp/cpp/string/alts/article.php/c2781/Fast-and-efficient-CString-replacement.htm

Fast and efficient CString replacement

Typical applications contain lots of string operations, and MFC includes the CString class for precisely that purpose. Unfortunately, it suffers from major problems. Maybe the three most important are:

  • CStrings cannot be extended - their header file is buried within MFC
  • CStrings are slow. Catenating a simple value requires copying the string into a new buffer.
  • CStrings internally call malloc/free so often that memory becomes very fragmented, and your application incurs a major performance hit.
  • Reference counting (the ability to quickly assign one CStr to another without copying the characters) was first implemented in the MFC library accompanying Visual C++ 5. Besides, it's not that efficient.

This article describes a class named CStr, which in many respects is similar to CString -- and in most cases can be used interchangeably. However, the class improves much in the following areas:

  • The definition and implementation are open - you can easily edit its header file to include much-needed facilities.
  • The class is compatible both with MFC and with simple Win32-based applications.
  • The class includes much better method for reference counting. It also supports a buffer larger than the number of characters in the string, so catenation (and assignment of longer strings) becomes a super-fast process.
  • The class caches data blocks of commonly used sizes (typically 4, 8, 12, etc - up to 320, this is configurable). When your program destroys a string object, the data is not returned to the memory manager, but is kept in a cache pool. The next time CStr needs a block of that size (and that happens very often), it gets the block very quickly. And memory fragmentation is severly reduced in that way.

CStr features

CStr supports most of the features of CString. The following snippet from CStr.h shows some of the more important features and friend functions:


class CStr
{
// Construction, copying, assignment
public:
    CStr();
    CStr(const CStr& source);
    CStr(const char* s, CPOS prealloc = 0);
    CStr(CPOS prealloc);
    void operator=(const CStr& source);
    void operator=(const char* s);
    ~CStr();
    CStr(const CString& source, CPOS prealloc = 0);
    void operator=(const CString& source);

// Get attributes, get data, compare
    BOOL IsEmpty() const;
    CPOS GetLength() const;
    operator const char* () const;
    const char* GetString() const; // Same as above
    char GetFirstChar() const;
    char GetLastChar() const;
    char operator[](CPOS idx) const;
    char GetAt(CPOS idx) const; // Same as above
    void GetLeft (CPOS chars, CStr& result);
    void GetRight (CPOS chars, CStr& result);
    void GetMiddle (CPOS start, CPOS chars, CStr& result);
    int  Find (char ch, CPOS startat = 0) const;
    int  ReverseFind (char ch, CPOS startat = (CPOS) -1) const;
    int  Compare (const char* match) const; // -1, 0 or 1
    int  CompareNoCase (const char* match) const; // -1, 0 or 1
    // Operators == and != are also predefined

// Global modifications
    void Empty(); // Sets length to 0, but keeps buffer around
    void Reset(); // This also releases the buffer
    void GrowTo(CPOS size);
    void Compact(CPOS only_above = 0);
    static void CompactFree();
    void Format(const char* fmt, ...);
    void FormatRes(UINT resid, ...);
    BOOL LoadString(UINT resid);

// Catenation, truncation
    void operator += (const CStr& obj);
    void operator += (const char* s);
    void AddString(const CStr& obj);         // Same as +=
    void AddString(const char* s);           // Same as +=
    void AddChar(char ch);
    void AddChars(const char* s, CPOS startat, CPOS howmany);
    void AddStringAtLeft(const CStr& obj);
    void AddStringAtLeft(const char* s);
    void AddInt(int value);
    void AddDouble(double value, UINT after_dot);
    void RemoveLeft(CPOS count);
    void RemoveMiddle(CPOS start, CPOS count);
    void RemoveRight(CPOS count);
    void TruncateAt(CPOS idx);
    friend CStr operator+(const CStr& s1, const CStr& s2);
    friend CStr operator+(const CStr& s, const char* lpsz);
    friend CStr operator+(const char* lpsz, const CStr& s);

// Window operations and other utilities
    void GetWindowText (CWnd* wnd);

// Miscellaneous implementation methods
protected:
    // These may be reimplemented by the user
    static void ThrowIfNull(void* p);
    static void ThrowPgmError();
    static void ThrowNoUnicode();
    static void ThrowBadIndex();
#ifndef CSTR_LARGE_STRINGS
    static void ThrowTooLarge();
#endif
};

BOOL operator ==(const CStr& s1, const CStr& s2);
BOOL operator ==(const CStr& s1, LPCTSTR s2);
BOOL operator ==(LPCTSTR s1, const CStr& s2);
BOOL operator !=(const CStr& s1, const CStr& s2);
BOOL operator !=(const CStr& s1, LPCTSTR s2);
BOOL operator !=(LPCTSTR s1, const CStr& s2);


Tech note: the CPOS type and length limitations

Normally, CStr supports strings with up to 65500 characters. This increases the speed a bit, and saves 4 bytes per string. In some cases you might need to work with very large strings. To do this, define the symbol CSTR_LARGE_STRINGS before including CStr.h in your project

CPOS is a custom type identifying either the character length of a string, or a character position. It is defined as either a 16-bit WORD (with normal strings) or 32-bit UINT (when supporting strings with up to 2^32 characters)

If you work at compiler warning level 3, you will be able to freely mix UINT with CPOS. If you work at level 4, you may need to typecast, or use the CPOS type to prevent warnings.

Tech note: using CStr in place of LPCSTR (const char*)

Like CString, the class described here also has a predefined operator typecast to LPCSTR. This is why you can use CStr where LPCSTR is expected. You can even use CStr in functions declared as requiring CString, but this is not very efficient, since the compiler will generate a temporary CString instance for you.

Tech note: managing buffer length

One of the most important advantages of CStr is that it allows you to specify the length of the buffer that will hold the string. If you anticipate a string will soon grow to 80 bytes, you can request a buffer of that size, even if its initial content is only 7 bytes long. This saves a huge amount of reallocation and copy operations if you add to the string later.

To specify a larger buffer when constructing the string, use the following definitions:

CStr();                                     // No preallocation
CStr(const char* s, CPOS prealloc = 0);    // Buffer chars as second param
CStr(CPOS prealloc);                        // Buffer chars as only param

To increase the buffer size for an existing string, call

void GrowTo(CPOS size);         // If the buffer is smaller, increases it

Attempting to grow to buffer to a value larger than what's currently allocated is harmless.

At some point (particularly if you store many strings in memory) you may decided that a given string won't be changed, and its originally allocated buffer could be too large and could waste memory. On this occasion you may call

void Compact(CPOS only_above = 0)

Passing only_above=4, for example, means "reallocate and copy to smaller buffer only if 4 or more bytes would be saved".

It is important to know that the buffers for freed strings are not really deallocated. Thus, at certain points in your program (for example, after a large memory-consuming operation) you may wish to invoke a manual "garbage collection" that will return all pooled memory to the memory manager. To do this, call

CStr::CompactFree()         // static method

You should always call this method in the ExitInstance() method of your CWinApp class; otherwise, MFC will complain about memory leaks.

Tech note: using Format and FormatRes

CStr::Format and FormatRes are sprintf-like functions. The only difference is that the first takes a pointer to a const character string describing the format parameters (like sprintf does), while the second loads the string from the resource table.

Format specifiers can be looked up in your C++ RTL documentation under "printf"

Tech note: using Empty() or Reset()

Note that when you assign a number of characters to a CStr object, its buffer may be icreased if necessary, but it will not be decreased. This is valid even if you call Empty() - this leaves the string with zero length, but the allocated buffer stays intact for further use.

If you know that you will not assign to this empty string for a long time, it is better to call Reset() instead of Empty(). This will not only set the length to 0 characters, but also deallocate (or rather, return to the cache pool) the string buffer. This is especially important if you reset a large string (say, 512 bytes or more)

Note the presence of the CSTR_DITEMS constant in CStr.h This constant identifies the maximum string for which the "cached buffers" mechanisms will be in effect. Strings larger than this size are always passed to malloc/free. This, if you load a 500-kilobyte text file in CStr, you need not worry that the memory will not be released when you destroy the object.

Tech note: using the string support in single-threaded applications

The supplied class is designed to be completely safe in a multithreaded application, and uses some critical sections to achieve this. If you have a single-threaded app, or you are sure to use CStr from only one thread (and I really mean sure!) you can define the symbol CSTR_NOT_MT_SAFE.

This will omit any references to cirtical sections, and may speed your string operations between 5% (if you manage the string data itself) and 30% (if you do a lot of string assignments and reassignments)

How to use CStr

Remember that CStr is NOT compatible with UNICODE yet (if enough interest gathers, I will make a UNICODE version). When including the class in an MFC project, I suggest that you put the following references:

In stdafx.h:
#include "CStr.h"    // This, in turn, includes cstrimp.h
In stdafx.cpp:
#include "CStrMgr.cpp"    // Put this include after everything else

Thus, the string support headers will be precompiled, and you won't need to include them everywhere.

Note that CStrMgr.cpp (the implementation file) is designed to be included in another CPP, not added to the project. If you don't like this, just insert a #include "stdafx.h" in its beginning, and add it to the project file.

There are some conditional symbols you may wish to define right before including CStr.h

  • #define CSTR_LARGE_STRINGS: normally, strings can hold up to 65500 characters. Define this conditional to increase the range to 2^32 characters. This incurs a 4 byte penalty per object, and probably some small speed hit.
  • #define CSTR_OWN_ERRORS: you will probably want to define this symbol in larger applications. If you do, you will have to implement a couple of methods and functions that handle critical situations, such as out-of-memory conditions and program errors (e.g. out-of-bounds character reference)
  • If you do NOT use MFC, you will have to provide a body for the get_stringres() function. It should just return an instance handle so that CStr knows where to load string resources from. The sample application shows an implementation of this function.
  • #define CSTR_NOT_MT_SAFE: If you have a single-threaded application, or are completely sure that only one of your threads will use the string support subsystem, this will improve speed significantly. Be very careful - defining this and using CStr from multiple threads might cause very hard-to-detect errors in your application!

Have fun using CStr! Any comments are welcome. Also, I will be glad to add features provided that they seem to be useful to at least 3-4 people, and they do not take too much time. Write to me at kamen@kami.com

Download demo project - 23  KB

Download demo executable - 33 KB

Download source - 11 KB Updated October 17, 1998

IT Offers

Comments

  • How Can I change it to let it bu suitable to pre C++?

    Posted by Legacy on 01/12/2003 12:00am

    Originally posted by: fanhua

    I need work in workstation environment such as SUN, I have worked hard on it to change it, but it does not work in workstation(Pure C++ environment), How should I do?

    Thank YOU guys!

    Reply
  • Bug in code and article

    Posted by Legacy on 10/02/2002 12:00am

    Originally posted by: CallMeJoe

    It was the negative comments about CString that caught my eye:

    "CStrings cannot be extended - their header file is buried within MFC"

    Buried? It's a class; you can't "bury" the header file! I first subclassed CString with 1.52c. I had to make a minor change for 32-bit MFC and for the new CString classes, but other than that it's worked just fine.

    "CStrings are slow. Catenating a simple value requires copying the string into a new buffer."

    Again, this isn't true. CStrings are actually quite fast and your own benchmarks bear that out.

    "CStrings internally call malloc/free so often that memory becomes very fragmented, and your application incurs a major performance hit."

    Again, completely false. Have you even looked at the source for CString? Again, your own benchmarks contradict this remark.

    "Reference counting (the ability to quickly assign one CStr to another without copying the characters) was first implemented in the MFC library accompanying Visual C++ 5. Besides, it's not that efficient."

    And? It's actually quite efficient, but has been discarded in MFC 7.0 due to a fundamental, albeit rare, problem with most reference counting classes that has no real solution.


    I would have enjoyed benchmarking your class, but it's no longer freely available so there is no way to test your vaunted results. (Making me sign up even if for free does not make this freely available.)

    Note that I did compile the version available here and it was chock full of bugs. When I did get it to run, it was sometimes faster and sometimes slower than CString. The assignment operation was 40% slower and the concatenation degredated rapidly where 100 concatenations totally about 3k was 300% slower.

    By comparison when CStr was faster than CString (for those tests that worked), CString was always within 5% the speed of CStr.

    Reply
  • What do you mean by "fast" ?

    Posted by Legacy on 04/15/2002 12:00am

    Originally posted by: ET Tan

    I am looking for a real fast string class as my program operates on large strings - doing search and replace of substrings.

    Downloaded and looked at your source code, but I don't see how is it fast

    Reply
  • Converting a Cstring in a double

    Posted by Legacy on 10/07/2001 12:00am

    Originally posted by: giacomo moro

    I would like to know how to convert a Cstring in a double. I have written the following simple example: in a dialog I have put three edit and the third I would like to be the sum of the values put in the first and in the secon edit (for istance in the first I put 5.8 in the second I put 2.4, in the third edit should appear 8.2). After reading your article I know how to do if the values are integers but I don't know how to do if the values are float or double. In the edit I wold like to put values of type Cstring; if the variables
    associated to the edit are float or double in the edit box appear a zero and I don't want the zero appear in the edits when I execute the program.
    Could you help me?
    My best regards, Giacomo Moro

    Reply
  • Extended Find function..

    Posted by Legacy on 12/12/2000 12:00am

    Originally posted by: JongGurl Moon

    I loved to this source.
    
    I inconvenience myself about 'Find' function, because of
    'Find' funciton in this source which Find only one character.
    So. I wish to find words in some string.
    I Add to 'Extended Find function'
    This function is very useful..

    Source:
    int CStr::FindExt (char *ch) const
    {
    char* scan = strstr (data->m_Text, ch);
    if (scan == NULL)
    return -1;
    else
    return scan - data->m_Text +1;
    }


    Example:
    CStr temp;
    temp = "I love you";
    int love;

    if(temp.FindExt("love")
    {
    love=1;
    }

    else {
    love=0;
    }

    if(love == 1)
    {
    MessageBox(NULL,"LOVE find","LOVE find!!",MB_OK);
    }

    Reply
  • Bug to += operator

    Posted by Legacy on 11/29/2000 12:00am

    Originally posted by: Andrei Boz�ntan

    Just try this

    #include "CStr.h"
    void main()
    {
    CStr s("1");
    for (i = 0; i < 20; i++)
    s += s;
    }

    Reply
  • Improvement

    Posted by Legacy on 11/08/2000 12:00am

    Originally posted by: Bard

    Hi,
    
    

    in the inline function defining how to handle the oprator[] you in debugmode throw an error, otherwise you let the user handle errors himself.
    I changed it into:

    inline char CStr::operator[](CPOS idx) const
    {
    if (idx >= GetLength()) {
    return 0;
    } else {
    return data->m_Text[idx];
    }
    }

    So if one is about to access a character after the and of the string he gets back a 0.

    I think it's better. Isn't it?

    Greetings
    Bernhard "Bard" Doebler

    Reply
  • Doesn't work with ATL!!!

    Posted by Legacy on 11/04/2000 12:00am

    Originally posted by: Iliya

    I'm writing the shell extension using ATL library without using MFC. Your code doesn't compile - linker error message "unresolved external symbol "void * __cdecl get_stringres(void)" is appeared. If i try to put #include "CStrMgr.cpp" to stdafx.cpp, i getting too many different "unresolved external" errors.

    Reply
  • Total efficiency can not be Generic.

    Posted by Legacy on 07/30/1999 12:00am

    Originally posted by: Brad Hochgesang

    I see no problem with reinventing the wheel when it gains you a performance advantage. But, any good programmer realizes that a generally fast routine or class that is made for general applications will not be super-optimized for all applications.

    CString has it's place. If you need to do non-intensive routines that require simple string manipulation, by all means use CString. You would have to be crazy to re-write a string class for a simple application.

    On the other hand, needed to use a string class search in a database containing fields of undetermined lengths that would be inserted in alphabetical order. Had I used a List or Vector class, which I very well could have, and in fact should have had I not needed extreem speed, it would have taken way too long. Instead I wrote a specilized linked list class that could be searched with a binary search (well, a slightly modified binary search) and came up with a tremendously fast program. In that case, reinventing the wheel was useful.

    There are a lot of people out there who complain about MFC and Microsoft code being too slow. Well, I'll bet your code may be slow for my applications as well. Your code is optimized (I hope) for your programs. Microsoft's code is optimized for general use, where usability and functionality tend to be thought of before lightning speed.

    Re-write a string class if you want to, I say. Better, though, to optimize it for your purpose. General performance gains seem to be getting smaller and smaller these days.

    Reply
  • It is a very useful object

    Posted by Legacy on 06/28/1999 12:00am

    Originally posted by: Fan Xia

    I think the usefulness of this project is not for improving the CString class and instead for inventing a new string class which is similar to the CString class. So I can get away with the huge MFC library. It is also possible to make my codes better portable to other platforms (i.g. linux in the future). If someone can re-invent the MFC classes and make the source codes available to everyone, I will really appreciate him/her. Keep the great work, Kamen.

    Reply
  • Loading, Please Wait ...

Whitepapers and More

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds