Digging Into C++ String Stream Processing
WEBINAR: On-demand webcast
How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >
Processing strings is a common operation in any programming language. And, almost all programming languages provide some form of library APIs to deal with string processing. The problem is that a string basically is a collection of characters that cannot be deemed as a primitive data type like int, float, char, etc. But, frequent use of a string in programming demands such a behavior. For example, a string variable in general cannot be assigned directly with a string literal, or something as simple as concatenating two strings requires some sort of logic to realize it in code. The idea of providing specific APIs in the library, especially for strings, is to allay the complexity associated with its manipulation and deal with it in a manner as primitive data type, although it actually is not. This article delineates the string processing schemes as supplied by C++ standard library.
String Processing
String processing begins with defining a string type along with the various method of string manipulation such as searching, inserting, erasing, replacing, comparing, and concatenating strings. To begin with, let's see how a string variable is defined, assigned, and logically represented in memory.
string greetings = "Hello String";
Figure 1: A string variable definition in memory
Note that, unlike a string declaration in C as a character pointer, char *str = "Hello String"; is null terminated—it would be represented as:
'H','e','l','l','o',' ', 'S','t','r','i','n','g','\0'
C++ strings do not contain a terminating character '\0'. This is a very basic and important difference between string handling in C versus C++.
In C++, the header <string> in namespace std defines the basic_string template for manipulating varying length sequences of characters. This <string> library is extensive in the sense that it uses templates to create a family of various string types, such as:
namespace std { typedef basic_string<char> string; ... typedef basic_string<wchar_t> wstring; ... }
A character of wchar_t type is commonly used for supporting the Unicode character set, which is 16-bit characters but that is not fixed by the standard. The typedef wstring is a string representation of the wchar_t character type.
A string object can be initialized with the help of constructor as follows:
string s1 ("Hello String"); // direct initialization, // created from const char * string s2 = "Hello String"; // copy initialization string s3 (10, 'z'); // filled with 10 z's string s4 = string (10, 'z'); // copy initialization, // with 10 z's string s5 = s4; // copy s4 into s5
The String Class
The hallmark of C++ string handling is the string class, as stated earlier. Several operators are overloaded for easy manipulation of strings—such as copying, concatenation, and comparison—apart from providing functions to perform searching, erasing, insertion, and replacing. During performance, these operations manage memory allocation and modification takes place automatically, without concerning the programmer of its internal intricacies. A string object created without initialization always starts with size 0. The size gets modified as a string literal gets copied or initialized. Let's find that out with the help of a simple program.
#include <iostream> #include <string> using namespace std; string global_str1("Enter a string: "); int main(int argc, char **argv) { string separator(60, '-'); string s1; cout<<"uninitialized string size, s1 = "<<s1.size()<<endl; cout<<"initialized string size, global_str = "<<global_str1.size()<<endl; cout<<global_str1; getline(cin,s1); // the size() and length() are equivalent cout<<"Text entered is: '"<<s1<<"' of size = "<<s1.length()<<endl; string s2(global_str1, 2, 10); cout<<"substring of global_str copied to s2 is '"<<s2<<"', size = "<<s2.size()<<endl; // create more than one paragraph of text string text,para; cout<<separator<<endl<<"Enter some text: "<<endl; while(true){ getline(cin, para); if(para.size()==0) break; // string concatenated with overloaded + // operator text += "\n" + para; } cout<<separator<<endl<<"Text you entered is ..."<<endl; cout<<text<<endl; cout<<separator<<endl<<"Text you entered in reverse ...\n"<<endl; for(int i=text.size();i>=0;i--){ cout<<text[i]; } return 0; }
Listing 1: Modifying the size of a string object
Output
Figure 2: Output of Listing 1
Unlike C-style strings, which begin with subscript 0 and end with subscript length()-1, C++ string functions can take a subscript location as an argument and the number of characters to operate upon. A C++ string also overloads the stream extraction operator (>>) to support statements that read a string from cin.
string str1; cin>>str1;
In this case, the input is delimited by the white-space character. This means that the input given as 'Hello string' will extracted as only 'Hello' terminated by the white-space character. This is the reason the getline function is overloaded for the string.
getline(cin, str1);
This function reads a string from the keyboard (through the cin object) into str1, delimited by newline ('\n'), and not white-spaces like an overloaded extraction operator.
The string class also provides an overloaded version of the member function, called assign, that can be used to copy a specified numbers of characters in a string object.
string str1, str2; string str3="I saw a saw to saw a tree"; str1.assign(str3); // target string, start index, no. of characters str2.assign(str3, 2, 3);
String Concatenate, Compare
The string class overloads operators like + and += to realize concatenation of strings and operators like ==, !=, <,>,<=,and >= are defined to realize string comparison. However, they do not violate the common rules of precedence, such as + precedes comparison operators which precede assignment operators = and +=.
string s1, s2("higher"), s3(" you "), s4(" go"); s1 = s2 + s3 + s4;
There is a specific overloaded member function to concatenate or append a string.
s1.append(" the lighter you feel"); // append from 14
th
index of s1 string
String comparison is done lexicographically and comparison can be done with logical operators or the compare member function.
string s1("ac"), s2("ab"); if(s1==s2) cout<<"s1==s2"; else if(s1>s2) cout<<"s1>s2"; else cout<<"s1<s2";
When comparison is done between strings, say s1 and s2, if s1 is lexicographically greater that s2, a positive number is returned. If the result is equal, 0 is returned; otherwise, a negative value is returned.
int k = s1.compare(s2); if(k==0) cout<<"s1==s2"; else if(k>0) cout<<"s1>s2"; else cout<<"s1<s2";
String comparison can be performed on a substring or part of a string. In such a case, we can use the overloaded version of the compare function.
string s1("synchronize"), s2("sync"); int k = s1.compare(0,4,s2); //s1==s2
The first argument, 0, specifies the starting subscript; the second argument, 4, denotes the length; and the third argument is the reference string to compare.
Some More Common String Operations
Some of the other common operations performed by the member functions of string class are as follows.
- The class string provides the swap member function for swapping strings.
string s1("tick"),s2("tock"); cout<<"Before swap "<<s1<<"-"<<s2<<endl; s1.swap(s2); cout<<"After swap "<<s1<<"-"<<s2<<endl;
- The member function substr is used to retrieve a substring from a string. The first argument is the subscript of the string to begin and the second argument is the string length.
string s1("...tolerant as a tree"); cout<<s1.substr(3, 8);
- The member functions that provide information about the characteristics of string are as follows:
string s1; cout<<"Is empty string? "<<(s1.empty()?"yes":"no")<<endl; cout<<"Capacity: "<<s1.capacity()<<endl; cout<<"Maximum size: "<<s1.max_size()<<endl; cout<<"Length: "<<s1.length()<<endl; cout<<"------------------------"<<endl; s1="fiddler on the roof"; cout<<"Is empty string? "<<(s1.empty()?"yes":"no")<<endl; cout<<"Capacity: "<<s1.capacity()<<endl; cout<<"Maximum size: "<<s1.max_size()<<endl; cout<<"Length: "<<s1.length()<<endl;
- Searching the substring in a string, erasing, replacing, and inserting text.
#include <iostream> #include <string> using namespace std; int main(int argc, char **argv) { string s1("They are allone and the same," "The mind follows matter, and whatever " "it thinks of is also material"); cout<<"--------------------"<<endl; cout<<"Original text"<<endl; cout<<"--------------------"<<endl; cout<<s1<<endl; cout<<"--------------------"<<endl; cout<<"Replaced ' '(space) with ' '" <<endl; cout<<"--------------------"<<endl; // erased the extra word 'all' from // 'allone' // in the text s1.erase(9,3); size_t spc=s1.find(" "); while(spc!=string::npos){ s1.replace(spc,1," "); spc=s1.find(" ",spc+1); } cout<<s1<<endl; cout<<"--------------------"<<endl; cout<<"Back to original text"<<endl; cout<<"--------------------"<<endl; spc=s1.find(" "); while(spc!=string::npos){ s1.replace(spc,6," "); spc=s1.find(" ",spc+1); } cout<<s1<<endl; cout<<"--------------------"<<endl; cout<<"Inserting new text into existing text"<<endl; cout<<"--------------------"<<endl; s1.insert(0,"Matter or Transcendence? "); cout<<s1<<endl; return 0; }
Editor's Note: In the preceding code listing, each group of 48 hyphens was reduced to a groups of 20 hyphens to fit the available space without breaking the code lines unnecessarily. |
C-style String Operations in a String Class
The string class in C++ also provides member functions to convert string objects to C-style pointer-based strings. The following example illustrates how it may be done.
#include <iostream> #include <string> using namespace std; int main(int argc, char **argv) { string s1("ABC"); // copying characters into allocated memory int len=s1.length(); char *pstr1=new char[len+1]; s1.copy(pstr1,len,0); pstr1[len]='\0'; cout<<pstr1<<endl; // string converted to C-style string cout<<s1.c_str()<<endl; // function data() returns const char * // this is not a good idea because pstr3 // can become invalid if the value of s1 changes const char *pstr3=s1.data(); cout<<pstr3; return 0; }
Using Iterators with Strings
We can use iterators with string objects in the following manner.
string s1("a rolling stone gathers no moss"); for(string::iterator iter=s1.begin();iter!=s1.end();iter++) cout<<*iter;
Strings and IO Stream
A C++ stream IO can be used to operate directly with the string in memory. It provides two supporting class for that. One is called isstringstream for input, and ostringstream for output. They are basically typedefs of template class basic_istringstream and basic_ostringstream, respectively.
typedef basic_istringstream<char> istringstream; typedef basic_ostringstream<char> ostringstream;
These template classes provide the same functionality as istream and ostream in addition to their own member functions for in-memory formatting.
The ostringstream object uses string objects to store output data. It has a member function called str(), which returns a copy of that string. The ostringstream object uses a stream insertion operator to output a collection of strings and numeric values to the object. Data is appended to the in-memory string with the help of the stream insertion operator.
The istringstream object inputs data from a in-memory string to program variables. Data is stored in the form of a character. The input from the istringstream objects works in a manner similar to input from any file where the end of string is interpreted by the istringstream objects as end-of-file marker.
To get an idea of what this class object can do, let's implement it with a simple example.
#include <iostream> #include <string> #include <sstream> using namespace std; int main(int argc, char **argv) { ostringstream out; out<<"Float value = "<<1.3<<endl <<"and int value = "<<123<<"\t"<<"tabbed"<<endl; cout<<out.str(); string s1("0 1 2 3 4 5 6 7 8 9"); istringstream in(s1); while(in.good()){ int ival; in>>ival; cout<<ival<<endl; } return 0; }
Listing 2: Observing the istringstream objects
Output
Figure 3: Output of Listing 2
Conclusion
The standard C++ library class string provides all that is required for string processing apart from some out-of-the-box convenient functionality. It is better to stick to the object oriented way of handling string than resorting to C-style of string handling, although C++ supports both the way. This thumb-rule will not only enhance the readability of the code but also make less prone to bug in the code.
Comments
There are no comments yet. Be the first to comment!