COMS W4995 C++ Deep Dive for C Programmers

Iterators

Introduction to Iterators

The 11/iterator1 program, shown below, prints out each element of an array using the function template For_Each():

using namespace std;

// Pretty much the same as std::for_each
template <typename I, typename F>
void For_Each(I b, I e, F f) {
    while (b != e) {
        f(*b);
        ++b;
    }
}

void print_int(int x)  { cout << x << ' '; }

int main() {
    int a[5] { 100, 101, 102, 103, 104 };

    For_Each(a, a + 5, &print_int);

    cout << '\n';
}

The For_Each() function template is written using the half-open interval paradigm we introduced when we used std::copy() in our Vec template class. The half-open interval that For_Each() takes is marked by its two parameters b and e, which stand for begin and end. They specify a range of elements starting from the one that b points to and ending with the element before e. From the main() function, we pass a, which decays into a pointer to the beginning of the array, and a + 5, which is a pointer to one past the last element of the array. This is illustrated in the diagram below:

11-begin-end

The while loop in For_Each() iterates as long as b doesn’t reach e. For each iteration, it invokes f(*b) and then moves b forward. In this case, f is &print_int, a pointer to the function print_int().

As such, the function template is instantiated with type parameter I as int* and F as void (*)(int). Note that the type I doesn’t actually have to be a pointer. It could be any class type that implements the three operators that For_Each() invokes: operator!=(), operator*(), and operator++(). STL refers to such class types as iterators.

By the same token, the type F can be any class type that implements the function call operator, operator(). Such class types are known as function objects, or functors, and we’ll revisit them in a few chapters.

`std::vector` iterator

We now change the main() function to use a std::vector instead of an integer array. Since the vector stores its elements contiguously like the array, we can call For_Each() by passing in a pointer to the first element of the vector and a pointer to one past the last element of the vector, as shown in the first call below:

int main() {
    vector<int> v { 100, 101, 102, 103, 104 };
    For_Each(&v[0], &v[0] + 5, &print_int);
    For_Each(v.begin(), v.end(), &print_int);

    cout << '\n';
}

Alternatively, we can use the begin() and end() member functions to achieve the same effect, as shown in the second For_Each() call. These member functions return std::vector::iterator type objects that pretty much act like the two int*s we were using before, namely, &v[0] and &v[0] + 5.

All STL containers define a nested iterator type, which resolves to some implementation-dependent type. For std::vector, iterator could resolve to simply value_type* or to a thin wrapper class that implements all the necessary operators. All STL containers also define the begin() member function that returns an iterator to the first element, and the end() member function that returns an iterator to one past the last element.

STL provides many generic function templates, collectively known as algorithms, that operate on a range of elements marked by two iterators, b and e, forming the half-open interval [b, e). Our For_Each() function is in fact a simplified version of the std::for_each() algorithm.

`std::list` iterator

Next, we change the main() function to use an std::list instead of an std::vector:

int main() {
    list<int> v { 100, 101, 102, 103, 104 };
    For_Each(v.begin(), v.end(), &print_int);

    cout << '\n';
}

We see that passing iterator objects returned by list::begin()/list::end() to For_Each() works just as well. In this case, however, list::iterator certainly cannot be an alias to value_type*, considering that the elements are not laid out contiguously in memory. As such, list::iterator must be a class type that implements the necessary operations. When For_Each() advances the list::iterator object with ++b, for example, list::iterator::operator++() traverses the linked list using the node pointers. We illustrate this in the following psuedocode:

template <typename T>
class list {
    ...
    class iterator {
        ...
        operator!=() ...
        operator*() ...

        iterator& operator++() {
            ...
            curr = curr->next;
            ...
            return *this;
        }
        ...
    }
    ...
}

Range-based for loop

Instead of calling For_Each(), we can write a loop using iterators directly as follows:

int main() {
    list<int> v { 100, 101, 102, 103, 104 };
    for (list<int>::iterator i = v.begin(); i != v.end(); ++i) {
        *i += 200;
        cout << *i << ' ';
    }

    cout << '\n';
}

begin() and end() have both non-const and const overloads. The former returns list::iterator whereas the latter returns list::const_iterator, which is another type alias defined in the class. Dereferencing list::iterator yields a value_type& whereas dereferencing a list::const_iterator yields a const value_type&. The code snippet below demonstrates the use of list::const_iterator:

int main() {
    const list<int> v { 100, 101, 102, 103, 104 };
    for (list<int>::const_iterator i = v.begin(); i != v.end(); ++i) {
        cout << *i << ' ';
    }

    cout << '\n';
}

The iterator type aliases can become unwieldy. The C++ keyword auto comes in handy here; it let’s the compiler deduce the type of the variable from its initializer. We can write auto in place of the iterator type as follows:

for (auto i = v.begin(); i != v.end(); ++i) { ... }

C++11 introduced range-based for loops that vastly simplify writing loops:

int main() {
    const list<int> v { 100, 101, 102, 103, 104 };
    for (auto& x : v) {
        cout << x << ' ';
    }

    cout << '\n';
}

The range-based for loop is syntactic sugar that gets translated into the following iterator-based loop:

for (auto i = v.begin(); i != v.end(); ++i) {
    auto& x = *i;
    ...
}

If you happen to write your own container class, you can enable range-based for loop by providing begin() and end() member functions that return appropriate iterator objects.

Two subtle points in `For_Each()`

Let’s revisit our implementation of For_Each(), presented again below:

template <typename I, typename F>
void For_Each(I b, I e, F f) {
    while (b != e) {
        f(*b);
        ++b;
    }
}

Introduction to Iterator Categories

We intentionally wrote b != e instead of b < e for the loop condition. First, b < e works for the C array because b and e are simply int* and comparing two pointers within a single array is a valid operation in C. Second, b < e is also valid for vector. As we said earlier, vector::iterator could resolve to value_type* or to a thin wrapper class, depending on the STL implementation. If the iterator is simply value_type*, then b < e would just be a pointer comparison, as was the case for array. If the iterator is a thin wrapper on top of the value_type*, then it can easily implement operator<() because the underlying array is contiguous. Third, b < e actually works for deque too. While its elements are not laid out contiguously, deque’s structure lends itself to implementing an iterator class that can calculate comparisons in O(1) run time.

The expression b < e would not compile, however, if b and e are list::iterator objects because the iterator type does not implement operator<(). In a linked list, it would take O(N) run time to compare two iterators. As we discussed previously, STL omits inefficient versions of API functions. The expression b != e, on the other hand, would work for all container types. Our For_Each() function template works fine with b != e, so choosing it over b < e increases its utility.

STL formalizes these distinctions into Iterator Categories, which we’ll talk about in the next section.

Prefix vs. postfix increment

You may have noticed that we wrote ++b instead of b++. In C++, you should use prefix increment unless the logic you’re writing requires postfix increment. In order to understand why, recall the list::iterator psuedocode from earlier. We’ve added the definition for postfix increment to it:

template <typename T>
class list {
    ...
    class iterator {
        ...
        operator!=() ...
        operator*() ...

        // Prefix increment
        iterator& operator++() {
            ...
            curr = curr->next;
            ...
            return *this;
        }

        // Postfix increment
        iterator operator++(int unused) {
            iterator prev(*this);
            ...
            curr = curr->next;
            ...
            return prev;
        }
        ...
    }
    ...
}

First off, to distinguish between the prefix and postfix operators, C++ requires the postfix operator to take an unused int parameter.

Recall that postfix increment semantics requires that b++ will evaluate to b before it gets incremented. This means that the operator implementation must make a copy of b so that it can be returned later. Our psuedocode illustrates this. prev is copy constructed out of *this and gets returned by value. For a standalone increment statement, those copy constructions are completely unnecessary. While the compiler may be able to optimize away unnecessary copies, the prefix increment doesn’t perform any copies to begin with. Unless you’re using the return value of postfix increment, you should use prefix increment instead.

Vector initialization

`std::copy()` revisited

Consider the following program 11/iterator2, which attempts to copy elements from one vector into another. We’ve also replaced the call to our For_Each() with std::for_each().

void print_int(int x)  { cout << x << ' '; }

int main() {
    vector<int> v1 { 100, 101, 102, 103, 104 };
    vector<int> v2;

    copy(v1.begin(), v1.end(), v2.begin());

    for_each(v2.begin(), v2.end(), &print_int);
  
    cout << '\n';
}

We are trying to use the std::copy() function template to copy all the elements of v1, marked by the two iterators v1.begin() and v1.end(), to fill up the empty vector v2. The third parameter of std::copy() is the destination, an iterator representing the beginning of the range that std::copy() should copy the elements into. Recall that we encountered std::copy() a few chapters ago when we implemented IntArray::push_back() and needed to copy over elements when reallocating the underlying array.

Unfortunately, the program crashes when we run it. We can better understand why if we take a look at std::copy()’s code. One possible implementation is shown below:

template <typename SrcIter, typename DestIter>
DestIter copy(SrcIter b_src, SrcIter e_src, DestIter b_dest) {
    while (b_src != e_src) {
        *b_dest = *b_src;
        ++b_src;
        ++b_dest;
    }
    return b_dest;
}

As we can see, copy() simply iterates through the target range using ++b_dest, assuming *b_dest is always valid. It makes sense that our program crashed because v2 was empty to begin with! We’ll first fix the program by initializing the vector with enough elements. Then, we’ll show how to fix the program using an iterator adapter.

Initializers

We can fix the program by having v2.size() == 5 to begin with. We can initialize the vector with five default-constructed elements with the following declaration:

std::vector<int> v2(5);

You may have been surprised to find that the statement above doesn’t initialize v2 with a single element 5. Why is this initialization treated differently than v1’s initialization? It’s not because we only provided one integer instead of a list of integers. We can create a vector with a single integer 5 as follows:

std::vector<int> v2 { 5 };

Clearly then, the difference is the use of parenthesis vs. curly braces. When we use parenthesis, we are invoking a vector constructor that takes a count parameter: explicit vector(size_type count). This will initialize a vector with count default-constructed elements.

When we use curly braces, we are invoking a different vector constructor, vector(std::initializer_list<T> init). The initializer list class template is a lightweight viewer that provides access to an array constructed to represent the comma-separated list of initializers. The constructed array could be a temporary object whose lifetime is the expression in which the initializer list is used. Or it could be an array in read-only memory if the initializers are all constant expressions, as is the case above. std::initializer_list can be implemented with a pointer to the underlying array and its length. Copying an initializer list object does not copy the underlying array. Moreover, STL treats std::initializer_list as a container by providing begin() and end() iterators.

An std::initializer_list object is automatically constructed when a class object is initialized with a brace-enclosed initializer list and has a constructor that takes a std::initializer_list parameter.

Iterator adapters

`std::back_insert_iterator`

Consider the following class template, IteratorThatPushes. We instantiate it in main() with the container type vector<int>, and pass it to std::copy() as the third parameter, which is the destination iterator.

template <typename Container>
struct IteratorThatPushes {
    Container& c;

    explicit IteratorThatPushes(Container& c) : c(c) {}
    IteratorThatPushes& operator*()     { return *this; }
    IteratorThatPushes& operator++()    { return *this; }
    IteratorThatPushes  operator++(int) { return *this; }

    IteratorThatPushes& operator=(const typename Container::value_type& x) {
        c.push_back(x);
        return *this;
    }
};

...

int main() {
    vector<int> v1 { 100, 101, 102, 103, 104 };
    vector<int> v2;
    
    copy(v1.begin(), v1.end(), IteratorThatPushes<vector<int>>{v2});

    // Compiler can deduce the type parameter from the initializer
    // copy(v1.begin(), v1.end(), IteratorThatPushes{v2});
    
    for_each(v2.begin(), v2.end(), &print_int);

    cout << '\n';
}

The vector v2 is initially empty when we invoke std::copy(). Instead of passing in v2.begin(), we now pass a custom iterator that invokes push_back() whenever std::copy() dereferences the iterator and assigns to it. STL provides many such iterators that provide custom functionality and they are collectively known as iterator adapters.

IteratorThatPushes is constructed with a reference to the container to push_back() into, and holds the reference as a member. We haven’t encountered a reference data member so far, but it’s not that different from holding a pointer. Unlike pointer members, however, reference members must be initialized at construction because a reference type cannot exist uninitialized. Note that C++ let’s us use the variable name c for both the data member and the constructor parameter. We’ve also marked the constructor as explicit to avoid unwanted implicit conversion from a container type.

operator*() and prefix/postfix operator++() are essentially noops. These member functions must be defined, as per the usage in std::copy(), but all of the work is done in operator=(). This is made possible by the fact that IteratorThatPushes::operator*() actually returns a reference to the iterator itself instead of Container::value_type&. As such, the expression *b_dest = *b_src will invoke the IteratorThatPushes::operator=() instead of the underlying value type’s operator=(). The implementation simply invokes push_back() on the underlying container with the right-hand side operand.

The type of the right-hand side operand is const Container::value_type&. Note that without the value_type type-alias defined in all STL containers, we wouldn’t be able to refer to the type of the container’s underlying elements from this class template definition. Before C++20, the operator signature had to include the keyword typename as follows:

IteratorThatPushes& operator=(const typename Container::value_type& x) { ... }

typename was necessary because the compiler doesn’t know if value_type is a type name or some other symbol defined in the Container class. Since Container is simply a type parameter, the compiler won’t know what value_type is until the IteratorThatPushes class template is instantiated. C++20 relaxes this by assuming that Container::value_type is a type name because it’s used in a context where a type name is required.

IteratorThatPushes is a simplified version of STL’s std::back_insert_iterator. The back_inserter() helper function template takes a container and returns an instantiated back_insert_iterator object. We could replace IteratorThatPushes with back_insert_iterator as follows:

copy(v1.begin(), v1.end(), back_inserter(v2));

`std::ostream_iterator`

std::ostream_iterator is an iterator adapter that wraps an ostream object. We can write out the elements of v to cout by simply copying the elements into an ostream_iterator as follows:

int main() {
    vector<int> v { 100, 101, 102, 103, 104 };

    ostream_iterator<int> oi {cout, "\n"};
    copy(v.begin(), v.end(), oi);

    cout << '\n';
}

The ostream_iterator constructor takes an ostream object that it will invoke operator<<() on. It also takes an optional delimiter to append after each write. This iterator adapter works similarly to our IteratorThatPushes() – most of its member functions are essentially noops except for operator=(), which will invoke operator<<() on the underlying ostream.

`std::istream_iterator`

std::istream_iterator is an iterator adapter that wraps an istream object. The following code reads in integers from cin into a vector v using istream_iterator and then prints them out using ostream_iterator:

int main() {
    vector<int> v;
    istream_iterator<int> iib {cin};
    istream_iterator<int> iie {};

    copy(iib, iie, back_inserter(v));

    ostream_iterator<int> oi {cout, ","};
    copy(v.begin(), v.end(), oi);

    cout << '\n';
}

We can run the program as follows:

$ echo "100 101 102 103 104" | ./iterator2
100,101,102,103,104,

iib wraps cin. It invokes operator<<() when it is initially constructed, and then every time it is subsequently incremented. operator*() returns the last element read. iie is a default-constructed istream_iterator, which isn’t associated with any particular istream object. iie is known as the end-of-stream iterator. When an istream_iterator reaches the end of its underlying input stream, it becomes equal to the end-of-stream iterator. This is analagous to using the iterators returned by begin() and end() to traverse STL containers.

Iterator Categories

There are six categories of iterators as follows:

LegacyOutputIterator: Supports writing through single-pass increment, which means alternating assignment and increment (as is the case for std::copy()’s destination iterator). Deviating from this usage results in undefined behavior.
- ex) back_insert_iterator and ostream_iterator
LegacyInputIterator: Supports reading through single-pass increment, i.e., reading through the input range exactly once (as is the case for istream_iterator{cin}).
- ex) istream_iterator
LegacyForwardIterator: All capabilities of LegacyInputIterator plus multi-pass increment. The iterator can only be incremented, but the range can be traversed multiple times using multiple iterators.
- ex) std::forward_list::iterator
LegacyBidirectionalIterator: All capabilities of LegacyForwardIterator plus decrement.
- ex) std::list::iterator
LegacyRandomAccessIterator: All capabilities of LegacyBidirectionalIterator plus O(1) element access.
- ex) std::deque::iterator
LegacyContiguousIterator (since C++17): All capabilities of LegacyRandomAccessIterator plus elements laid out contiguously in memory.
- ex) std::vector::iterator, std::array::iterator

Iterator Categories are not concrete types. They are defined conceptually by their capabilities – i.e., by the set of operations that they support.

Iterator Categories were reworked into a similar but slightly different taxonomy called Iterator Concepts in C++20. The old iterator category names were renamed with the “Legacy” prefix. We’ll explore C++20 Concepts later.

Last updated: 2025-10-21