COMS W4995 C++ Deep Dive for C Programmers

MyString class

In this chapter, we will create our own string implementation, MyString class. The C++ Standard Library already provides a very powerful and fast std::string class. Why then are we reinventing the wheel?

Writing our own string class will give us an opportunity to learn and practice the anatomy of C++ classes in a familiar setting where we already know how the class is supposed to behave.

Another reason to study our own string class is that it will give us a good mental model on how std::string class works under the hood. Even though std::string is highly optimized and does a lot of things that our naive implementation will not do, the way MyString class manages the underlying heap-allocated string still provides a good mental model for us to understand how real string works and how any class should properly manage its acquired resources.

Source code organization

First, let’s make sure we understand how MyString class source code is organized. We will study the following source files:

Makefile
mystring.h
mystring.cpp
test1.cpp, test2.cpp, test3.cpp, test4.cpp, test5.cpp

`mystring.h` and `mystring.cpp`

Unlike in Java & Python, we do not normally write member function definitions inside the class definition. We usually write member function declarations inside the class definition, and put them in a header file. Here is an excerpt from mystring.h:

class MyString {
public:

    // default constructor
    MyString();

    // constructor
    MyString(const char* p);

    // destructor
    ~MyString();

    ...

    // returns the length of the string
    int length() const { return len; }

    ...

private:

    char* data;

    int len;
};

There are times where we write member function definitions in the class definition, like the length() member function shown above. We’ll circle back to that later.

The header file also has an include guard:

#ifndef __MYSTRING_H__
#define __MYSTRING_H__

...

class MyString { ... };

#endif

These preprocessor directives ensure that this header file is only ever included once in a compilation unit. The first time the preprocessor encounters this header, it sees that the macro __MYSTRING_H__ is not defined yet (#ifndef), so it defines it and goes on to include the class defintion. Let’s say that this header is included by several other files – the next time the preprocessor encounters this header, it won’t bring in the MyString definition again because __MYSTRING_H__ is already defined.

Alternatively, you could’ve used the #pragma once preprocessor directive to achieve the same purpose. It is non-standard but is widely supported by most compilers. C++20 introduced modules, which eliminates the need for these preprocessing directives, but modules still aren’t fully supported by most compilers at the time of this writing.

The actual definitions for the member functions will go into mystring.cpp, which will #include "mystring.h":

#include <cstring>
#include <cstdio>

#include "mystring.h"

// default constructor

MyString::MyString() {
    data = new char[1];
    data[0] = '\0';

    len = 0;
}

// constructor

MyString::MyString(const char *p) {
    if (p) {
        len = strlen(p);
        data = new char[len+1];
        strcpy(data, p);
    } else {
        data = new char[1];
        data[0] = '\0';
        len = 0;
    }
}

...

Each member function definition that we write outside of the class definition must be qualified with MyString:: to indicate that it’s a member function of the MyString class.

`Makefile`

Here is the Makefile, prepended with line numbers:

CC  = g++
CXX = g++

CFLAGS   = -g -Wall
CXXFLAGS = -g -Wall -std=c++14

executables = test1 test2 test3 test4 test5
objects = mystring.o test1.o test2.o test3.o test4.o test5.o

.PHONY: default
default: $(executables)

$(executables): mystring.o

$(objects): mystring.h

.PHONY: clean
clean:
rm -f *~ a.out core $(objects) $(executables)

.PHONY: all
all: clean default

The CXX variable should contain the C++ compiler we want to use. Once all the C++ source files are compiled into the corresponding object files, the make program will use the CC variable to determine which command to invoke to link the object files. Setting the CC variable the same as CXX variable will ensure that the correct version of the Standard C++ Library gets linked into your executable.

The CFLAGS and CXXFLAGS variables set the compiler flags for C and C++ source files, respectively. We’re using -std=c++14 so that we can disable copy elision and verify our understanding of copy construction that we learned in the previous chapter.

Lines 10-11 define the first target, default, which simply depends on all the executables that we would like to build, so that the make program will build all the executables when we simply type make. (Remember that, when we run make without any argument, it will just build the first target.) The default target is marked as a “phony” target because it’s not a real file in the current directory for which make needs to check the timestamp.

Line 13 specifies that each executable depends on mystring.o, as well as on its own .o file implicitly. We also rely on make to deduce the appropriate command to link the executable out of the dependent object files. In other words, line 13 is a shorthand for the following sequence of rules:

test1: test1.o mystring.o
        g++ test1.o mystring.o -o test1

test2: test2.o mystring.o
        g++ test2.o mystring.o -o test2

test3: test3.o mystring.o
        g++ test3.o mystring.o -o test3

test4: test4.o mystring.o
        g++ test4.o mystring.o -o test4

test5: test5.o mystring.o
        g++ test5.o mystring.o -o test5

Line 15 similarly specifies the compilation dependencies. Line 15 is equivalent to the following verbose set of rules:

mystring.o: mystring.cpp mystring.h
        g++ -g -Wall -std=c++14  -c mystring.cpp

test1.o: test1.cpp mystring.h
        g++ -g -Wall -std=c++14  -c test1.cpp

test2.o: test2.cpp mystring.h
        g++ -g -Wall -std=c++14  -c test2.cpp

test3.o: test3.cpp mystring.h
        g++ -g -Wall -std=c++14  -c test3.cpp

test4.o: test4.cpp mystring.h
        g++ -g -Wall -std=c++14  -c test4.cpp

test5.o: test5.cpp mystring.h
        g++ -g -Wall -std=c++14  -c test5.cpp

Lines 17-19 define the “clean” target to delete all generated files when the user types make clean. Line 21-22 define make all to be equivalent to make clean followed by make, letting the user to rebuild everything by typing make all.

Test drivers

The provided test drivers test various parts of MyString class implementation:

test1.cpp tests Basic 4 of MyString class
test2.cpp tests MyString::operator+()
test3.cpp tests MyString::operator[]()
test4.cpp demonstrates exception handling
test5.cpp demonstrates the invocation of the MyString copy constructor when passing and returning MyString objects by value

Basic 4 of MyString

Constructor

MyString has two constructors: a default constructor that takes no arguments, and a constructor that takes a char*. Let’s start by looking at the latter.

We can declare a MyString object on the stack like this:

MyString s1("hello");

"hello" is a string literal, which is an array of 6 characters (including the null-terminator). The string literal gets converted into a char*, a pointer to the first character of the string, when we use it to initialize s1. So where in memory are those characters?

Recall our process memory address space diagram:

         +--------------------------------------------------+
         | operating system code & data                     |
    512G +--------------------------------------------------+
         | stack        (for automatic variables)           |
         +--------------------------------------------------+
         |        |                                         |
         |        |                                         |
         |        v     (stack grows down)                  |
         |                                                  |
         |                                                  |
         |                                                  |
         |        ^     (heap grows up)                     |
         |        |                                         |
         |        |                                         |
         +--------------------------------------------------+
         | heap         (for memory allocated by malloc())  |
         +--------------------------------------------------+
         | data section (for static variables)              |
         +--------------------------------------------------+
         | code section (for the program code)              |
       0 +--------------------------------------------------+

String literals are part of the program code. To be more exact, the characters in a string literal are actually stored in a section called rodata (read-only data), right in-between code and data:

         | ...                                              |
         +--------------------------------------------------+
         | data section                                     |
         +--------------------------------------------------+
         | rodata section                                   |
         |                                                  |
         |  +------------------------+                      |
         |  | h | e | l | l | o | \0 |                      |
         |  +------------------------+                      |
         +--------------------------------------------------+
         | code section                                     |
       0 +--------------------------------------------------+

With that in mind, look at MyString’s private fields:

class MyString {
public:
    ...
    // constructor
    MyString(const char* p);
    ...

private:
    char* data;
    int len;
};

MyString records the length of the string for convenience. In our example, len will be 5. data is supposed to point to the characters that make up the MyString. Is it enough to just store the pointer provided in the constructor? In this case, can we just refer to the characters in the rodata section?

As it name suggests, rodata memory is read-only. That means we can’t mutate a string literal. We want to be able to modify the characters in a MyString, so pointing to the string literal characters won’t do.

The MyString constructor will have to create a copy of the string on the heap and point to that instead. MyString has to own its underlying string. Taking the example of declaring MyString s1("hello") on the stack, its memory layout will look like this:

03-mystring-ctor1

The MyString object, along with its len and data fields are on the stack, but data points to the copied string on the heap.

Now that we understand the memory model, let’s look at the code for the constructor:

MyString::MyString(const char *p) {
    if (p) {
        len = strlen(p);
        data = new char[len+1];
        strcpy(data, p);
    } else {
        data = new char[1];
        data[0] = '\0';
        len = 0;
    }
}

Consider the case where p isn’t nullptr.(nullptr is the C++ type-safe version of C’s NULL.) We take the length of the provided string and allocate len+1 characters on the heap. Note that we allocate len+1 characters instead of len to account for the null-terminator.

It may feel odd that we chose to use new[] instead of directly using malloc(). We learned that new[] invokes the constructor for each element in the array, but char is a built-in data type that doesn’t have a constructor, so what’s the point? It’s not like new[] initializes the character array for us either. One benefit of using new over malloc() is how it handles errors. new actually throws an exception whereas the programmer has to remember to check the return value of malloc() for an error code. If left uncaught, the exception will percolate up the call stack and terminate the program. If we forget to error-check malloc(), the program would probably crash shortly afterwards. We’ll talk about how we can catch exceptions later.

By the way, do you know what’s wrong with this alternate implementation?

    len = strlen(p);
    char a[len + 1];
    data = a;
    strcpy(data, p);

This code creates an array of characters a on the stack frame of the constructor and assigns data to it. This means data points to the first element in a. Remember that a will be invalidated once the stack rolls up after the constructor returns. data will become a dangling pointer. We need the character array to live on after the constructor has returned!

Moving onto the the other branch of the constructor, we now handle the case where p is nullptr. The code is the same as the default constructor’s implementation – we simply allocate an empty string. While len is just 0, we allocate 1 character on the heap and write the null-terminator to it, which is the C representation of an empty string. We can declare an empty MyString on the stack by invoking its default constructor: MyString s2;.

03-mystring-ctor2

Having to heap-allocate 1 byte just for the empty string representation is unfortunate. Heap allocation is definitely not cheap; couldn’t we have done something else instead? Sure, we could’ve chosen data = nullptr to represent an empty string, but that comes at a cost, too. Our chosen design to heap-allocate an empty string ensures we have a nice invariant: data always points to a valid string. We will never have to check if data is nullptr in any class member definition. On the other hand, if we use nullptr for the empty string, our class members may have to check that first to avoid a null pointer dereference. Both are reasonable designs; we simply chose our representation to keep the code simple.

The C++ standard library std::string implementation actally avoids heap allocations whenever it can. In fact, it employs a “short string optimization” (SSO), where the std::string has a small buffer embedded directly into it. Short strings (including empty ones), can point to that embedded buffer instead of pointing to some heap-allocated buffer.

Destructor

The MyString destructor is very straightforward. It simply invokes delete[] on the heap-allocated character array.

MyString::~MyString() {
    delete[] data;
}

The MyString constructor and destructor pair is a perfect example of the C++ “Resouce Acquisition is Initialization” (RAII) paradigm. The constructor acquires a resource (heap-allocated character array), and the destructor releases it (invokes delete[]).

Copy Constructor

With our constructors and destructor in place, let’s now turn our attention to the copy constructor. Let’s say we don’t define one ourselves and go with the compiler-generated version. Will it do the job?

Let’s check by considering the following example:

void f(MyString s2) {
    std::cout << s2 << std::endl;
}

int main() {
    MyString s1("hello");
    f(s1);
}

The f() function takes a MyString by value and prints it out. Calling f() will invoke MyString’s copy constructor. The compiler generated copy constructor simply performs a member-wise copy of the object. That means simply copying over the values of its len and data fields, as shown below:

03-mystring-copy1

f() will be able print out s2 just fine, but there’s a problem when f() returns. s2 goes out of scope and gets its destructor invoked, releasing the "hello" string allocated on the heap.

03-mystring-copy2

What about s1 though? Its data pointer is now invalid; the data it points to was already freed when s2 was destroyed. When s1’s destructor is invoked, we erroneously double-delete the string. The compiler-generated copy constructor performed a shallow copy of the MyString. This is not the correct thing to do because each MyString has to own its heap-allocated string. Copying the data pointer doesn’t create a new copy of the string, it just copies the memory address where the string is stored at. Our MyString copy constructor has to perform a deep copy as follows:

MyString::MyString(const MyString& s) {
    len = s.len;

    data = new char[len+1];
    strcpy(data, s.data);
}

Our copy constructor allocates space for the string on the heap and invokes strcpy() to copy the string over. With this implementation, s1 and s2 in the example above now have their own independent copies of the string "hello", as shown below:

03-mystring-copy3

Copy Assignment

Finally, let’s take a look at MyString’s copy assignment implementation:

MyString& MyString::operator=(const MyString& rhs) {
    if (this == &rhs) {
        return *this;
    }

    // first, deallocate memory that 'this' used to hold

    delete[] data;

    // now copy from rhs

    len = rhs.len;

    data = new char[len+1];
    strcpy(data, rhs.data);

    return *this;
}

The second half of the copy assignment operator is the same as the copy constructor – we have to create a deep copy of rhs. The key difference is the delete[] statement before we create the deep copy. We first have to delete the old string that the this object was pointing to, make a deep copy of rhs’s string, and then attach the new string to this. Consider the following snippet of code:

MyString s1("hi");
MyString s2("hello");

s1 = s2;

The copy assignment operator deletes the "hi" string that s1 was referring to, makes a deep copy of s2’s "hello" string, and points s1’s data field to it:

03-mystring-assignment

The final piece of the copy assignment operator is the if-statement at the top. It’s meant to account for the case where a MyString is assigned to itself: s1 = s1. We have to short-circuit the implementation in this case because we start by deleting data. We check if we’re trying to self-assign by comparing the memory addresses this and &rhs.

Could we have alternatively written *this == rhs? It also compiles and it may even have the intended effect, but it may not be checking if *this and rhs are the same object. Note that operator==() could be arbitrarily implemented, so the safest way to determine if this and &rhs refer to the same object is to compare their memory addresses because pointer comparisons can’t be redefined in C++.

The Rule of 3

“The Rule of 3” is a C++ design guideline pertaining to user-defined destructor, copy constructor, and copy assignment. It states that if you have to implement any of those three special member functions, you most likely need to implement all three. In the case of MyString, we had to implement a custom destructor because our constructors perform a heap allocation. It followed that we had to implement copy construction and copy assignment to ensure they made deep copies of MyString. Whenever you design a C++ class, you must carefully think about if you have to implement these special member functions or if the compiler generated defaults are sufficient.

When move constructor and move assignment were introduced in C++11, the Rule of 3 envolved into the Rule of 5. We’ll talk about move operations later.

`length()`

We placed our implementation of length() directly into the class definition in mystring.h:

class MyString {
public:
    ...
    // returns the length of the string
    int length() const { return len; }
    ...
};

It’s a simple getter function that doesn’t mutate the object, so it makes sense to mark it const. By defining length() inside the class definition (as opposed to only declaring it), we are hinting to the compiler that it should inline the implementation if possible.

Consider the following for-loop that prints out each character in a MyString:

MyString s1("hello");
for (int i = 0; i < s1.length(); i++) {
    std::cout << s1[i] << std::endl;
}

The condition i < s1.length() is checked for every iteration of the for-loop. That means that length() is called every iteration. Function calls aren’t free though – consider that each function call means setting up a new stack frame, jumping to code’s implementation, and then jumping back to the loop. For such a simple function, the compiler could inline the implementation to remove the function call. The compiler could generate machine code as if we wrote the following instead:

for (int i = 0; i < s1.len; i++) { .. }

The compiler may not always inline a function call because its not always worth it. We wouldn’t want to duplicate some long function multiple times.

`operator+()`

Let’s now turn our attention over to the operator+() implementation. This function takes two MyString objects by const reference and returns a new MyString, the concatenation of the two strings, by value. Consider the following snippet from test2.cpp:

MyString s1("hello ");
MyString s2("world!");
MyString s3;

s3 = s1 + s2;

The expression s1 + s2 invokes operator+(s1, s2) and the returned temporary object becomes the rhs argument for the copy assignment into s3.

Note that operator+() takes two parameters whereas operator=() takes only one parameter, even though they are both binary operators. We previously discussed that operator=() only takes one parameter because it is implemented as a member function where the left operand is the this object and the right operand is passed in as the rhs argument. This means that operator+() is not a member function of the MyString class. It is simply a global function that is defined as follows:

MyString operator+(const MyString& s1, const MyString& s2) {
    MyString temp;

    delete[] temp.data;

    temp.len = s1.len + s2.len;

    temp.data = new char[temp.len+1];
    strcpy(temp.data, s1.data);
    strcat(temp.data, s2.data);

    return temp;
}

Since operator+() is not a member function, it doesn’t have the MyString:: qualifier like MyString::operator=(). Studying the implementation of operator+() will help us understand why it is defined as a global function instead of a member function.

The function declares an empty MyString object temp on the stack and replaces its underlying empty string with a newly allocated string that is the concatenation of s1 and s2. The stack object temp is then returned by value. As we saw with Pt::expand() in the previous chapter, this will trigger a copy construction from temp to an unnamed temporary object. In the example of s3 = s1 + s2, that unnamed temporary object becomes the rhs argument for the copy assignment into s3. That’s assuming that we disabled copy elision by the compiler using -fno-elide-constructors. Without that flag, the compiler will likely elide the construction of the unnamed temporary object.

Returning to our discussion of global vs. member functions, you should now see a clear difference between how operator=() and operator+() treat their two operands. Since operator=() mutates its left operand, it makes sense that it should be a member function. operator+(), on the other hand, doesn’t mutate either of its two operands. This is a good indication that operator+() should be defined as a global function. We’ll see the practical implication of this shortly.

Friend functions

Did you notice that operator+() is still able to access the private MyString members len and data despite not being defined as a member function? We had to declare operator+() as a friend function of the MyString class as follows:

class MyString {
public:
    ...
    // operator+
    friend MyString operator+(const MyString& s1, const MyString& s2);
    ...
};

A friend function of a class is not a member function, but has the privilege to access the class’s private members. Normally, when we define a global function in .cpp file, we write its prototype in the .h file. The operator+() prototype is omitted in mystring.h because its friend declaration inside the MyString class definition also serves as its prototype.

Implicit conversions

Consider the following snippet from test2.cpp:

MyString s1("hello ");
MyString s2("world!");

cout << s1 + "world!" << endl;
cout << "hello " + s2 << endl;

It looks like operator+() is being invoked with char* and MyString, but our implementation only accepts two MyString objects. Here, the compiler recognizes that the char* could be promoted to a MyString object in order to invoke our operator+() implementation. It uses the MyString(const char*) constructor to create a temporary MyString object for the sake of invoking operator+(). A constructor that enables implicit conversions like this is called a converting constructor.

C++ defines a number of cases where implicit conversion takes place. Creating a temporary object as an argument to a function, like we see above, is one of the cases. However, creating a temporary this object on which to invoke a member function is not a case that triggers implicit conversion. Had we defined operator+() as a member function instead of a global function, s1 + "world!" would have still worked, but not "hello " + s2.

`operator<<()`

Up until now, we’ve been printing out MyString objects like this:

MyString s("hello");
std::cout << s << std::endl;

This is made possible by implementing a operator<<() overload, also known as the put-to operator, for MyString as follows:

std::ostream& operator<<(std::ostream& os, const MyString& s) {
    os << s.data;
    return os;
}

The function takes an ostream object os by reference and a const MyString reference s to write to os. std::cout is a global object that is an instance of ostream – we’ll cover it later as part of the C++ I/O library. The function simply writes out the data member of s by delegating to the operator<<() overload for char*, which has been implemented by the library already. It then returns the same os object by reference so that put-to operations can be chained. For example, the expression cout << s1 << s2 << endl is equivalent to ((cout << s1) << s2) << endl.

Since we’re operating on the left operand ostream&, you may be wondering why this function isn’t implemented as a member function of ostream. That’s because the ostream class is already defined in the C++ library and we cannot change that definition. However, we can certainly define global functions that use ostream.

We defined operator<<() as a friend function of MyString so that it can access its private data member. We can’t define this operator<<() as a member function of MyString because the MyString is the right operand.

`operator>>()`

The following snippet from test3.cpp shows how to read an input word using operator>>(), also known as the get-from operator:

cout << "Enter a string: ";
MyString s;
cin >> s;

Here’s the implementaton for it:

std::istream& operator>>(std::istream& is, MyString& s) {
    // this is kinda cheating, but this is just to illustrate how this
    // function can work.

    std::string temp;
    is >> temp;

    delete[] s.data;

    s.len = strlen(temp.c_str());
    s.data = new char[s.len+1];
    strcpy(s.data, temp.c_str());

    return is;
}

The get-from operator’s function signature and friend designation are analogous to those of put-to as we discussed earlier. std::cin is a global object that is an instance of istream. One difference between the two operators is that the get-from operator takes the MyString as a mutable reference because it is going to write into it.

The implementation needs to read in a sequence of non-whitespace characters from is into a heap-allocated array and put it into the given MyString. Since we don’t know how long the string is going to be, we’ll have to manage a heap-allocated array that grows as we read more characters. We omitted that implementation since our objective is to study the C++ language, and we simply delegate the work to std::string’s get-from implementation. We access std::string’s underlying character array using its c_str() member function.

`operator[]()`

test3.cpp showcases how to access the characters in a MyString object using the operator[]() member function:

for (int i = 0; i < s.length(); ++i) {
    if ('a' <= s[i] && s[i] <= 'z') {
        s[i] = s[i] - ('a' - 'A');
    }
}

For each character in the MyString, we check if the character is a lowercase letter. If it is, we capitalize it and write it back into the MyString. The expression s[i] - ('a' - 'A') capitalizes s[i] if it’s a lowercase letter. This works by exploiting the fact that uppercase and lowercase letters have their ASCII codes laid out contiguously. We can offset any lowercase letter s[i]’s ASCII code by the difference 'a' - 'A' to get its uppercase equivalent.

The MyString class defines two versions of operator[]():

// operator[]
char& operator[](int i);

// operator[] const
const char& operator[](int i) const;

Let’s start with the first definition of operator[]():

char& MyString::operator[](int i) {
    if (i < 0 || i >= len) {
        throw std::out_of_range{"MyString::op[]"};
    }
    return data[i];
}

Our implementation performs a bounds-check on the provided index i to prevent an out-of-bounds access on the character array. If i is invalid, the implementation throws an std::out_of_range exception object (more on exceptions in a bit). The real std::string::operator[]() implementation does not perform this bounds-check, but std::string::at() does. Had we not performed the bounds-check here, we would’ve defined the function inside the class definition, letting the compiler inline it.

The function’s return type is char& instead of char because we need to enable writing into the string, like s[i] = 'X'. Here, s[i] is a reference to the actual character in the underlying array. This expression won’t make sense if s[i] returns a copy of the character by value.

`operator[]() const`

The second version of operator[]() is a const member function. We need it so we can invoke operator[]() on const MyString objects. While we could’ve implemented it by simply copying the implementation of the non-const version, we employed a casting trick to delegate the call to the non-const version to avoid duplicating code:

const char& MyString::operator[](int i) const {
    // illustration of casting away constness
    return ((MyString&)*this)[i];

    // The C-style casting above works, but the proper way
    // to cast away constness in C++ is to do the following:
    //
    // return const_cast<MyString&>(*this)[i];
}

Note that the return type is now const char& instead of char&. That’s because it would be semantically incorrect to return a mutable reference to the underlying characters when we’re implementing a const accessor.

To break down the casting trick, first consider the type of the expression *this, which is const MyString&. It’s const here because we’re inside a const member function. To invoke the non-const version of operator[](), we need to invoke the operator on a non-const MyString. Hence, we cast *this into MyString&, dropping the const, and then invoke the operator. The non-const version will return a mutable char&, but returning it from the const version will tack on the const.

By the way, there’s a better way to write the C-style cast. C++ introduced const_cast, which is basically the same thing, but with type-checking. In our case, it will check if *this is actually a const MyString& to begin with.

Exception Handling

test4.cpp triggers the bound-check in operator[]() and shows you how to catch exceptions:

void f2() {
    MyString s("abc");
    int x = s[-1];
    std::cout << x << std::endl;
}

void f1() {
    MyString s("xyx");
    f2();
    std::cout << s << std::endl;
}

int main() {
    using namespace std;

    try {
        f1();
    }
    catch (const out_of_range& e) {
        cout << e.what() << endl;
    }

    cout << "That's all folks!" << endl;
}

If you run it, you’ll see that neither of the print statements in f1() and f2() execute. The bad access s[-1] in f2() will cause the implementation to throw an std::out_of_range exception. The exception percolates up the callstack, through f2() and f1() until it is caught in main():

MyString::op[]
That's all folks!

The exception is just an object. To illustrate that, we could’ve been more verbose with how we throw the exception in operator[]():

if (i < 0 || i >= len) {
    std::out_of_range ex {"MyString::op[]"};
    throw ex;
}

Here, we create a named std::out_of_range object on the stack and throw it. throw std::out_of_range{...} is basically the same thing, it just creates an unnamed temporary instead.

The main() function wraps f1() in a try-catch block. If an exception is raised in the try block, you can catch it. Here, we catch the std::out_of_range object by const reference since we’re not going to mutate it and we don’t need to create a copy of it. The exception handler simply prints the what() message of the exception. As you can see, catching an exception allows main() to resume execution, and it goes on to print "That's all folks!".

Let’s see how the output changes if main() doesn’t wrap f1() in a try-catch block:

terminate called after throwing an instance of 'std::out_of_range'
  what():  MyString::op[]
Aborted (core dumped)

This time, the exception percolated all the way up the callstack, past main(), and caused the program to be aborted. It looks like the exception was still caught though. That’s because main() is not actually the entrypoint into the program – it’s wrapped by a library function that calls it. The main() wrapper caught the exception and printed it out for you before the program terminated.

We mentioned earlier that neither of the print statements in f1() and f2() get to execute because of the exception thrown. So does that mean the MyString objects created in these functions are never destroyed? Running test4 under Valgrind reveals otherwise; there is no memory leak! Since stack objects go out of scope when an exception is raised, they’ll have their destructors invoked.

Last updated: 2025-10-19