In this chapter, we will create our own string implementation, MyString
class.
The C++ Standard Library already provides a very powerful and fast std::string
class. Why then are we reinventing the wheel?
Writing our own string class will give us an opportunity to learn and practice the anatomy of C++ classes in a familiar setting where we already know how the class is supposed to behave.
Another reason to study our own string class is that it will give us a good
mental model on how std::string
class works under the hood. Even though
std::string
is highly optimized and does a lot of things that our naive
implementation will not do, the way MyString class manages the underlying
heap-allocated string still provides a good mental model for us to understand
how real string works and how any class should properly manage its acquired
resources.
First, let’s make sure we understand how MyString class source code is organized. We will study the following source files:
Makefile
mystring.h
mystring.cpp
test1.cpp
, test2.cpp
, test3.cpp
, test4.cpp
, test5.cpp
mystring.h
and mystring.cpp
Unlike in Java & Python, we do not normally write member function definitions
inside the class definition. We usually write member function declarations
inside the class definition, and put them in a header file. Here is an excerpt
from mystring.h
:
class MyString {
public:
// default constructor
MyString();
// constructor
MyString(const char* p);
// destructor
~MyString();
...
// returns the length of the string
int length() const { return len; }
...
private:
char* data;
int len;
};
There are times where we write member function definitions in the class
definition, like the length()
member function shown above. We’ll circle back
to that later.
The header file also has an include guard:
#ifndef __MYSTRING_H__
#define __MYSTRING_H__
...
class MyString { ... };
#endif
These preprocessor directives ensure that this header file is only ever included
once in a compilation unit. The first time the preprocessor encounters this
header, it sees that the macro __MYSTRING_H__
is not defined yet (#ifndef
),
so it defines it and goes on to include the class defintion. Let’s say that this
header is included by several other files – the next time the preprocessor
encounters this header, it won’t bring in the MyString
definition again
because __MYSTRING_H__
is already defined.
Alternatively, you could’ve used the #pragma once
preprocessor directive to
achieve the same purpose. It is non-standard but is widely supported by most
compilers. C++20 introduced
modules, which
eliminates the need for these preprocessing directives, but modules still aren’t
fully supported by most compilers at the time of this writing.
The actual definitions for the member functions will go into mystring.cpp
,
which will #include "mystring.h"
:
#include <cstring>
#include <cstdio>
#include "mystring.h"
// default constructor
MyString::MyString() {
data = new char[1];
data[0] = '\0';
len = 0;
}
// constructor
MyString::MyString(const char *p) {
if (p) {
len = strlen(p);
data = new char[len+1];
strcpy(data, p);
} else {
data = new char[1];
data[0] = '\0';
len = 0;
}
}
...
Each member function definition that we write outside of the class definition
must be qualified with MyString::
to indicate that it’s a member function of
the MyString class.
Makefile
Here is the Makefile, prepended with line numbers:
1 CC = g++
2 CXX = g++
3
4 CFLAGS = -g -Wall
5 CXXFLAGS = -g -Wall -std=c++14
6
7 executables = test1 test2 test3 test4 test5
8 objects = mystring.o test1.o test2.o test3.o test4.o test5.o
9
10 .PHONY: default
11 default: $(executables)
12
13 $(executables): mystring.o
14
15 $(objects): mystring.h
16
17 .PHONY: clean
18 clean:
19 rm -f *~ a.out core $(objects) $(executables)
20
21 .PHONY: all
22 all: clean default
The CXX
variable should contain the C++ compiler we want to use. Once all the
C++ source files are compiled into the corresponding object files, the make
program will use the CC
variable to determine which command to invoke to link
the object files. Setting the CC
variable the same as CXX
variable will
ensure that the correct version of the Standard C++ Library gets linked into
your executable.
The CFLAGS
and CXXFLAGS
variables set the compiler flags for C and C++
source files, respectively. We’re using -std=c++14
so that we can disable copy
elision and verify our understanding of copy construction that we learned in the
previous chapter.
Lines 10-11 define the first target, default
, which simply depends on all the
executables that we would like to build, so that the make
program will build
all the executables when we simply type make
. (Remember that, when we run
make
without any argument, it will just build the first target.) The default
target is marked as a “phony” target because it’s not a real file in the current
directory for which make
needs to check the timestamp.
Line 13 specifies that each executable depends on mystring.o
, as well as on
its own .o
file implicitly. We also rely on make
to deduce the appropriate
command to link the executable out of the dependent object files. In other
words, line 13 is a shorthand for the following sequence of rules:
test1: test1.o mystring.o
g++ test1.o mystring.o -o test1
test2: test2.o mystring.o
g++ test2.o mystring.o -o test2
test3: test3.o mystring.o
g++ test3.o mystring.o -o test3
test4: test4.o mystring.o
g++ test4.o mystring.o -o test4
test5: test5.o mystring.o
g++ test5.o mystring.o -o test5
Line 15 similarly specifies the compilation dependencies. Line 15 is equivalent to the following verbose set of rules:
mystring.o: mystring.cpp mystring.h
g++ -g -Wall -std=c++14 -c mystring.cpp
test1.o: test1.cpp mystring.h
g++ -g -Wall -std=c++14 -c test1.cpp
test2.o: test2.cpp mystring.h
g++ -g -Wall -std=c++14 -c test2.cpp
test3.o: test3.cpp mystring.h
g++ -g -Wall -std=c++14 -c test3.cpp
test4.o: test4.cpp mystring.h
g++ -g -Wall -std=c++14 -c test4.cpp
test5.o: test5.cpp mystring.h
g++ -g -Wall -std=c++14 -c test5.cpp
Lines 17-19 define the “clean” target to delete all generated files when the
user types make clean
. Line 21-22 define make all
to be equivalent to
make clean
followed by make
, letting the user to rebuild everything by
typing make all
.
The provided test drivers test various parts of MyString class implementation:
test1.cpp
tests Basic 4 of MyString classtest2.cpp
tests MyString::operator+()
test3.cpp
tests MyString::operator[]()
test4.cpp
demonstrates exception handlingtest5.cpp
demonstrates the invocation of the MyString copy constructor when
passing and returning MyString objects by valueMyString has two constructors: a default constructor that takes no arguments,
and a constructor that takes a char*
. Let’s start by looking at the latter.
We can declare a MyString object on the stack like this:
MyString s1("hello");
"hello"
is a string literal, which is an array of 6 characters (including the
null-terminator). The string literal gets converted into a char*
, a pointer to
the first character of the string, when we use it to initialize s1
. So where
in memory are those characters?
Recall our process memory address space diagram:
+--------------------------------------------------+
| operating system code & data |
512G +--------------------------------------------------+
| stack (for automatic variables) |
+--------------------------------------------------+
| | |
| | |
| v (stack grows down) |
| |
| |
| |
| ^ (heap grows up) |
| | |
| | |
+--------------------------------------------------+
| heap (for memory allocated by malloc()) |
+--------------------------------------------------+
| data section (for static variables) |
+--------------------------------------------------+
| code section (for the program code) |
0 +--------------------------------------------------+
String literals are part of the program code. To be more exact, the characters
in a string literal are actually stored in a section called rodata
(read-only
data), right in-between code and data:
| ... |
+--------------------------------------------------+
| data section |
+--------------------------------------------------+
| rodata section |
| |
| +------------------------+ |
| | h | e | l | l | o | \0 | |
| +------------------------+ |
+--------------------------------------------------+
| code section |
0 +--------------------------------------------------+
With that in mind, look at MyString’s private fields:
class MyString {
public:
...
// constructor
MyString(const char* p);
...
private:
char* data;
int len;
};
MyString records the length of the string for convenience. In our example, len
will be 5. data
is supposed to point to the characters that make up the
MyString. Is it enough to just store the pointer provided in the constructor? In
this case, can we just refer to the characters in the rodata
section?
As it name suggests, rodata
memory is read-only. That means we can’t mutate a
string literal. We want to be able to modify the characters in a MyString, so
pointing to the string literal characters won’t do.
The MyString constructor will have to create a copy of the string on the heap
and point to that instead. MyString has to own its underlying string. Taking the
example of declaring MyString s1("hello")
on the stack, its memory layout will
look like this:
The MyString object, along with its len
and data
fields are on the stack,
but data
points to the copied string on the heap.
Now that we understand the memory model, let’s look at the code for the constructor:
MyString::MyString(const char *p) {
if (p) {
len = strlen(p);
data = new char[len+1];
strcpy(data, p);
} else {
data = new char[1];
data[0] = '\0';
len = 0;
}
}
Consider the case where p
isn’t nullptr
.(nullptr
is the C++ type-safe
version of C’s NULL
.) We take the length of the provided string and allocate
len+1
characters on the heap. Note that we allocate len+1
characters instead
of len
to account for the null-terminator.
It may feel odd that we chose to use new[]
instead of directly using
malloc()
. We learned that new[]
invokes the constructor for each element in
the array, but char
is a built-in data type that doesn’t have a constructor,
so what’s the point? It’s not like new[]
initializes the character array for
us either. One benefit of using new
over malloc()
is how it handles errors.
new
actually throws an exception whereas the programmer has to remember to
check the return value of malloc()
for an error code. If left uncaught, the
exception will percolate up the call stack and terminate the program. If we
forget to error-check malloc()
, the program would probably crash shortly
afterwards. We’ll talk about how we can catch exceptions later.
By the way, do you know what’s wrong with this alternate implementation?
len = strlen(p);
char a[len + 1];
data = str;
strcpy(data, p);
This code creates an array of characters a
on the stack frame of the
constructor and assigns data
to it. This means data
points to the first
element in a
. Remember that a
will be invalidated once the stack rolls up
after the constructor returns. data
will become a dangling pointer. We need
the character array to live on after the constructor has returned!
Moving onto the the other branch of the constructor, we now handle the case
where p
is nullptr
. The code is the same as the default constructor’s
implementation – we simply allocate an empty string. While len
is just 0, we
allocate 1 character on the heap and write the null-terminator to it, which is
the C representation of an empty string. We can declare an empty MyString on the
stack by invoking its default constructor: MyString s2;
.
Having to heap-allocate 1 byte just for the empty string representation is
unfortunate. Heap allocation is definitely not cheap; couldn’t we have done
something else instead? Sure, we could’ve chosen data = nullptr
to represent
an empty string, but that comes at a cost, too. Our chosen design to
heap-allocate an empty string ensures we have a nice invariant: data
always points to a valid string. We will never have to check if data
is
nullptr
in any class member definition. On the other hand, if we use nullptr
for the empty string, our class members may have to check that first to avoid a
null pointer dereference. Both are reasonable designs; we simply chose our
representation to keep the code simple.
The C++ standard library std::string
implementation actally avoids heap
allocations whenever it can. In fact, it employs a “short string optimization”
(SSO), where the std::string
has a small buffer embedded directly into it.
Short strings (including empty ones), can point to that embedded buffer instead
of pointing to some heap-allocated buffer.
The MyString destructor is very straightforward. It simply invokes delete[]
on
the heap-allocated character array.
MyString::~MyString() {
delete[] data;
}
The MyString constructor and destructor pair is a perfect example of the C++
“Resouce Acquisition is Initialization” (RAII) paradigm. The constructor
acquires a resource (heap-allocated character array), and the destructor
releases it (invokes delete[]
).
With our constructors and destructor in place, let’s now turn our attention to the copy constructor. Let’s say we don’t define one ourselves and go with the compiler-generated version. Will it do the job?
Let’s check by considering the following example:
void f(MyString s2) {
std::cout << s2 << std::endl;
}
int main() {
MyString s1("hello");
f(s1);
}
The f()
function takes a MyString by value and prints it out. Calling
f()
will invoke MyString’s copy constructor. The compiler generated copy
constructor simply performs a member-wise copy of the object. That means simply
copying over the values of its len
and data
fields, as shown below:
f()
will be able print out s2
just fine, but there’s a problem when f()
returns. s2
goes out of scope and gets its destructor invoked, releasing the
"hello"
string allocated on the heap.
What about s1
though? Its data
pointer is now invalid; the data it points to
was already freed when s2
was destroyed. When s1
’s destructor is invoked, we
erroneously double-delete the string. The compiler-generated copy constructor
performed a shallow copy of the MyString. This is not the correct thing to
do because each MyString has to own its heap-allocated string. Copying the
data
pointer doesn’t create a new copy of the string, it just copies the
memory address where the string is stored at. Our MyString copy constructor has
to perform a deep copy as follows:
MyString::MyString(const MyString& s) {
len = s.len;
data = new char[len+1];
strcpy(data, s.data);
}
Our copy constructor allocates space for the string on the heap and invokes
strcpy()
to copy the string over. With this implementation, s1
and s2
in
the example above now have their own independent copies of the string "hello"
,
as shown below:
Finally, let’s take a look at MyString’s copy assignment implementation:
MyString& MyString::operator=(const MyString& rhs) {
if (this == &rhs) {
return *this;
}
// first, deallocate memory that 'this' used to hold
delete[] data;
// now copy from rhs
len = rhs.len;
data = new char[len+1];
strcpy(data, rhs.data);
return *this;
}
The second half of the copy assignment operator is the same as the copy
constructor – we have to create a deep copy of rhs
. The key difference is the
delete[]
statement before we create the deep copy. We first have to delete the
old string that the this
object was pointing to, make a deep copy of rhs
’s
string, and then attach the new string to this
. Consider the following snippet
of code:
MyString s1("hi");
MyString s2("hello");
s1 = s2;
The copy assignment operator deletes the "hi"
string that s1
was referring
to, makes a deep copy of s2
’s "hello"
string, and points s1
’s data
field
to it:
The final piece of the copy assignment operator is the if-statement at the top.
It’s meant to account for the case where a MyString is assigned to itself: s1 =
s1
. We have to short-circuit the implementation in this case because we start
by deleting data
. We check if we’re trying to self-assign by comparing the
memory addresses this
and &rhs
.
Could we have alternatively written *this == rhs
? It also compiles and it may
even have the intended effect, but it may not be checking if *this
and rhs
are the same object. Note that operator==()
could be arbitrarily implemented,
so the safest way to determine if this
and &rhs
refer to the same object is
to compare their memory addresses because pointer comparisons can’t be redefined
in C++.
“The Rule of 3” is a C++ design guideline pertaining to user-defined destructor, copy constructor, and copy assignment. It states that if you have to implement any of those three special member functions, you most likely need to implement all three. In the case of MyString, we had to implement a custom destructor because our constructors perform a heap allocation. It followed that we had to implement copy construction and copy assignment to ensure they made deep copies of MyString. Whenever you design a C++ class, you must carefully think about if you have to implement these special member functions or if the compiler generated defaults are sufficient.
When move constructor and move assignment were introduced in C++11, the Rule of 3 envolved into the Rule of 5. We’ll talk about move operations later.
length()
We placed our implementation of length()
directly into the class definition in
mystring.h
:
class MyString {
public:
...
// returns the length of the string
int length() const { return len; }
...
};
It’s a simple getter function that doesn’t mutate the object, so it makes sense
to mark it const
. By defining length()
inside the class definition (as
opposed to only declaring it), we are hinting to the compiler that it should
inline the implementation if possible.
Consider the following for-loop that prints out each character in a MyString:
MyString s1("hello");
for (int i = 0; i < s1.length(); i++) {
std::cout << s1[i] << std::endl;
}
The condition i < s1.length()
is checked for every iteration of the for-loop.
That means that length()
is called every iteration. Function calls aren’t free
though – consider that each function call means setting up a new stack frame,
jumping to code’s implementation, and then jumping back to the loop. For such a
simple function, the compiler could inline the implementation to remove the
function call. The compiler could generate machine code as if we wrote the
following instead:
for (int i = 0; i < s1.len; i++) { .. }
The compiler may not always inline a function call because its not always worth it. We wouldn’t want to duplicate some long function multiple times.
operator+()
Let’s now turn our attention over to the operator+()
implementation. This
function takes two MyString objects by const reference and returns a new
MyString, the concatenation of the two strings, by value. Consider the following
snippet from test2.cpp
:
MyString s1("hello ");
MyString s2("world!");
MyString s3;
s3 = s1 + s2;
The expression s1 + s2
invokes operator+(s1, s2)
and the returned temporary
object becomes the rhs
argument for the copy assignment into s3
.
Note that operator+()
takes two parameters whereas operator=()
takes only
one parameter, even though they are both binary operators. We previously
discussed that operator=()
only takes one parameter because it is implemented
as a member function where the left operand is the this
object and the right
operand is passed in as the rhs
argument. This means that operator+()
is not
a member function of the MyString class. It is simply a global function that is
defined as follows:
MyString operator+(const MyString& s1, const MyString& s2) {
MyString temp;
delete[] temp.data;
temp.len = s1.len + s2.len;
temp.data = new char[temp.len+1];
strcpy(temp.data, s1.data);
strcat(temp.data, s2.data);
return temp;
}
Since operator+()
is not a member function, it doesn’t have the MyString::
qualifier like MyString::operator=()
. Studying the implementation of
operator+()
will help us understand why it is defined as a global function
instead of a member function.
The function declares an empty MyString object temp
on the stack and replaces
its underlying empty string with a newly allocated string that is the
concatenation of s1
and s2
. The stack object temp
is then returned by
value. As we saw with Pt::expand()
in the previous chapter, this will trigger
a copy construction from temp
to an unnamed temporary object. In the example
of s3 = s1 + s2
, that unnamed temporary object becomes the rhs
argument for
the copy assignment into s3
. That’s assuming that we disabled copy elision by
the compiler using -fno-elide-constructors
. Without that flag, the compiler
will likely elide the construction of the unnamed temporary object.
Returning to our discussion of global vs. member functions, you should now see a
clear difference between how operator=()
and operator+()
treat their two
operands. Since operator=()
mutates its left operand, it makes sense that it
should be a member function. operator+()
, on the other hand, doesn’t mutate
either of its two operands. This is a good indication that operator+()
should
be defined as a global function. We’ll see the practical implication of this
shortly.
Did you notice that operator+()
is still able to access the private MyString
members len
and data
despite not being defined as a member function? We had
to declare operator+()
as a friend
function of the MyString class as
follows:
class MyString {
public:
...
// operator+
friend MyString operator+(const MyString& s1, const MyString& s2);
...
};
A friend function of a class is not a member function, but has the privilege to
access the class’s private members. Normally, when we define a global function
in .cpp
file, we write its prototype in the .h
file. The operator+()
prototype is omitted in mystring.h
because its friend declaration inside the
MyString class definition also serves as its prototype.
Consider the following snippet from test2.cpp
:
MyString s1("hello ");
MyString s2("world!");
cout << s1 + "world!" << endl;
cout << "hello " + s2 << endl;
It looks like operator+()
is being invoked with char*
and MyString
, but
our implementation only accepts two MyString
objects. Here, the compiler
recognizes that the char*
could be promoted to a MyString object in order to
invoke our operator+()
implementation. It uses the MyString(const char*)
constructor to create a temporary MyString object for the sake of invoking
operator+()
. A constructor that enables implicit conversions like this is
called a converting constructor.
C++ defines a number of cases where implicit conversion
takes place. Creating a temporary object as an argument to a function, like we
see above, is one of the cases. However, creating a temporary this
object on
which to invoke a member function is not a case that triggers implicit
conversion. Had we defined operator+()
as a member function instead of a
global function, s1 + "world!"
would have still worked, but not "hello " +
s2
.
operator<<()
Up until now, we’ve been printing out MyString objects like this:
MyString s("hello");
std::cout << s << std::endl;
This is made possible by implementing a operator<<()
overload, also known as
the put-to operator, for MyString as follows:
std::ostream& operator<<(std::ostream& os, const MyString& s) {
os << s.data;
return os;
}
The function takes an ostream
object os
by reference and a const MyString
reference s
to write to os
. std::cout
is a global object that is an
instance of ostream
– we’ll cover it later as part of the C++ I/O library.
The function simply writes out the data
member of s
by delegating to the
operator<<()
overload for char*
, which has been implemented by the library
already. It then returns the same os
object by reference so that put-to
operations can be chained. For example, the expression
cout << s1 << s2 << endl
is equivalent to ((cout << s1) << s2) << endl
.
Since we’re operating on the left operand ostream&
, you may be wondering why
this function isn’t implemented as a member function of ostream
. That’s
because the ostream
class is already defined in the C++ library and we cannot
change that definition. However, we can certainly define global functions that
use ostream
.
We defined operator<<()
as a friend function of MyString so that it can access
its private data
member. We can’t define this operator<<()
as a member
function of MyString because the MyString is the right operand.
operator>>()
The following snippet from test3.cpp
shows how to read an input word using
operator>>()
, also known as the get-from operator:
cout << "Enter a string: ";
MyString s;
cin >> s;
Here’s the implementaton for it:
std::istream& operator>>(std::istream& is, MyString& s) {
// this is kinda cheating, but this is just to illustrate how this
// function can work.
std::string temp;
is >> temp;
delete[] s.data;
s.len = strlen(temp.c_str());
s.data = new char[s.len+1];
strcpy(s.data, temp.c_str());
return is;
}
The get-from operator’s function signature and friend designation are analogous
to those of put-to as we discussed earlier. std::cin
is a global object that
is an instance of istream
. One difference between the two operators is that
the get-from operator takes the MyString as a mutable reference because it is
going to write into it.
The implementation needs to read in a sequence of non-whitespace characters from
is
into a heap-allocated array and put it into the given MyString
. Since we
don’t know how long the string is going to be, we’ll have to manage a
heap-allocated array that grows as we read more characters. We omitted that
implementation since our objective is to study the C++ language, and we simply
delegate the work to std::string
’s get-from implementation. We access
std::string
’s underlying character array using its
c_str()
member function.
operator[]()
test3.cpp
showcases how to access the characters in a MyString object using
the operator[]()
member function:
for (int i = 0; i < s.length(); ++i) {
if ('a' <= s[i] && s[i] <= 'z') {
s[i] = s[i] - ('a' - 'A');
}
}
For each character in the MyString, we check if the character is a lowercase
letter. If it is, we capitalize it and write it back into the MyString. The
expression s[i] - ('a' - 'A')
capitalizes s[i]
if it’s a lowercase letter.
This works by exploiting the fact that uppercase and lowercase letters have
their ASCII codes laid out contiguously. We can offset any lowercase letter
s[i]
’s ASCII code by the difference 'a' - 'A'
to get its uppercase
equivalent.
The MyString class defines two versions of operator[]()
:
// operator[]
char& operator[](int i);
// operator[] const
const char& operator[](int i) const;
Let’s start with the first definition of operator[]()
:
char& MyString::operator[](int i) {
if (i < 0 || i >= len) {
throw std::out_of_range{"MyString::op[]"};
}
return data[i];
}
Our implementation performs a bounds-check on the provided index i
to prevent
an out-of-bounds access on the character array. If i
is invalid, the
implementation throws an std::out_of_range
exception object (more on
exceptions in a bit). The real std::string::operator[]()
implementation does
not perform this bounds-check, but std::string::at()
does. Had we not
performed the bounds-check here, we would’ve defined the function inside the
class definition, letting the compiler inline it.
The function’s return type is char&
instead of char
because we need to
enable writing into the string, like s[i] = 'X'
. Here, s[i]
is a reference
to the actual character in the underlying array. This expression won’t make
sense if s[i]
returns a copy of the character by value.
operator[]() const
The second version of operator[]()
is a const member function. We need it so
we can invoke operator[]()
on const MyString objects. While we could’ve
implemented it by simply copying the implementation of the non-const version, we
employed a casting trick to delegate the call to the non-const version to avoid
duplicating code:
const char& MyString::operator[](int i) const {
// illustration of casting away constness
return ((MyString&)*this)[i];
// The C-style casting above works, but the proper way
// to cast away constness in C++ is to do the following:
//
// return const_cast<MyString&>(*this)[i];
}
Note that the return type is now const char&
instead of char&
. That’s
because it would be semantically incorrect to return a mutable reference to the
underlying characters when we’re implementing a const accessor.
To break down the casting trick, first consider the type of the expression
*this
, which is const MyString&
. It’s const here because we’re inside a
const member function. To invoke the non-const version of operator[]()
, we
need to invoke the operator on a non-const MyString. Hence, we cast *this
into
MyString&
, dropping the const
, and then invoke the operator. The non-const
version will return a mutable char&
, but returning it from the const version
will tack on the const
.
By the way, there’s a better way to write the C-style cast. C++ introduced
const_cast
, which is basically the same thing, but with type-checking. In our
case, it will check if *this
is actually a const MyString&
to begin with.
test4.cpp
triggers the bound-check in operator[]()
and shows you how to catch exceptions:
void f2() {
MyString s("abc");
int x = s[-1];
std::cout << x << std::endl;
}
void f1() {
MyString s("xyx");
f2();
std::cout << s << std::endl;
}
int main() {
using namespace std;
try {
f1();
}
catch (const out_of_range& e) {
cout << e.what() << endl;
}
cout << "That's all folks!" << endl;
}
If you run it, you’ll see that neither of the print statements in f1()
and
f2()
execute. The bad access s[-1]
in f2()
will cause the implementation
to throw an std::out_of_range
exception. The exception percolates up the
callstack, through f2()
and f1()
until it is caught in main()
:
MyString::op[]
That's all folks!
The exception is just an object. To illustrate that, we could’ve been more
verbose with how we throw the exception in operator[]()
:
if (i < 0 || i >= len) {
std::out_of_range ex {"MyString::op[]"};
throw ex;
}
Here, we create a named std::out_of_range
object on the stack and throw it.
throw std::out_of_range{...}
is basically the same thing, it just creates an
unnamed temporary instead.
The main()
function wraps f1()
in a try-catch block. If an exception is
raised in the try block, you can catch it. Here, we catch the
std::out_of_range
object by const reference since we’re not going to mutate it
and we don’t need to create a copy of it. The exception handler simply prints
the what()
message of the exception. As you can see, catching an exception
allows main()
to resume execution, and it goes on to print
"That's all folks!"
.
Let’s see how the output changes if main()
doesn’t wrap f1()
in a try-catch
block:
terminate called after throwing an instance of 'std::out_of_range'
what(): MyString::op[]
Aborted (core dumped)
This time, the exception percolated all the way up the callstack, past main()
,
and caused the program to be aborted. It looks like the exception was still
caught though. That’s because main()
is not actually the entrypoint into the
program – it’s wrapped by a library function that calls it. The main()
wrapper caught the exception and printed it out for you before the program
terminated.
We mentioned earlier that neither of the print statements in f1()
and f2()
get to execute because of the exception thrown. So does that mean the MyString
objects created in these functions are never destroyed? Running test4
under
Valgrind reveals otherwise; there is no memory leak!
Since stack objects go out of scope when an exception is raised, they’ll have
their destructors invoked.
Last updated: 2025-09-01