Python's Iterables and Iterators
Python has iterables and iterators, both of which are invoked during iteration (or while iterating). This is one of those things that is confusing until it isn’t, and there isn’t a great way to narrow that gap.
There are strange logical differences between iterables and iterators, which demonstrate their interdependence on one another. Rather than explain one another, however, they tend to confuse one another because both concepts are so intimately tied.
The final layer of confusion I’ve had to unravel is that, for my day-to-day use of iteration, Python takes care of these two concepts (iterables and iterators) without requiring me to know what’s going on behind the scenes.
When I use a for-loop to iterate over a list, for example, conceptually I understand that I am iterating over a list…and that’s all I am really required to know.
Specifications: I’m on Ubuntu 14.04 LTS, using Python 2.7.6
When we create a list in Python…
the_list = [1, 2, 3]
We know that we can iterate over the list like this:
for element in the_list: print element
That has been all I’ve needed to know for most of the coding I’ve done up until this week.
It turns out that a list is iterable, but that it can’t be iterated over until it is made into an iterator (for-loops do this for you), at which point it can be iterated on. Iterables have the function
An iterator has the function
__next__(), which retrieves and returns the next element in an iterable (beginning with the first element).
A for-loop takes
the_list above, and calls
the_list.next() over and over until it satisfies the parameters of the for-loop (in our example above, that parameter is that you’ve reached the end of the list).
However, you can’t call
next() on a list by default, you have to first get an iterator from the list, and on that iterator you can call
next() to retrieve its elements.
I’ll say that same thing in a slightly different way, to try to clear up the confusion:
A list is an iterable, which means that you can get an iterator from it (an iterable can return an iterator). An iterator has the
next() method, which retrieves its elements one at a time, which is how we are accustomed to iterating over containers.
Part of the confusing part of this all, for me at least, is that these concepts aren’t that complicated. It feels more complicated than it is. And we are accustomed to for-loops doing this work for us.
Hop into a Python repl, and I’ll show you the simple aspects of what a for-loop does, to demonstrate the differences between iterables and iterators.
python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> the_list = [1, 2, 3] >>> the_list [1, 2, 3] >>> the_list.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'list' object has no attribute 'next' >>>
To get an iterator from
the_list, on which you can call the
next() function, you call the
iter() function on the iterable.
iter(the_list) will return an iterator of
Continuing from above:
... >>> list_that_can_be_iterated_on = iter(the_list) >>> list_that_can_be_iterated_on.next() 1 >>> list_that_can_be_iterated_on.next() 2 >>> list_that_can_be_iterated_on.next() 3 >>> list_that_can_be_iterated_on.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>>
StopIteration is the Exception that is raised when
next() has already retrieved the last element from the iterator (which has the same elements of the original iterable).
There are a couple of concepts here that are important for understanding the distinction between iterables and iterators. Forgive me for repeating myself a little. I want to bring back the points I made earlier and try to demonstrate more examples of how these two concepts are different and interdependent.
How they are different:
iter() method returns an iterator from an iterable.
Above we had a container (a list) that has the ability to return an iterator of itself. We called this iterable
the_list. We then set the variable
list_that_can_be_iterated_on” to be an iterator of the iterable
the_list. Then we called
next() on the iterator
list_that_can_be_iterated_on, and each time it retrieved and returned the next element of the original iterable
This article calls iterators lazy, because they don’t do anything more than bookmark their position, and when
next() is called on them, they return the next element, and in turn bookmark that position.
To demonstrate this, I’ll first call
next() on an iterator, and then use a for-loop on the iterator to print out its elements. The first
next() call will retrieve its first element, which means that the next time
next() is called on it, it will move to the second element and so on. When I use the for-loop, it will start at the second element, because
next() was already called once on the iterator.
>>> the_list = [1, 2, 3] >>> list_that_can_be_iterated_on = iter(the_list) >>> list_that_can_be_iterated_on.next() 1 >>> for element in list_that_can_be_iterated_on: ... print element ... 2 3 >>> the_list [1, 2, 3]
Notice at the end of this that
the_list has not been affected. It still contains a list of
[1, 2, 3]. This is what allows us to call for-loops multiple times on the same iterables. To do that here, we could just keep re-entering this…
list_that_can_be_iterated_on = iter(the_list)
And each time we would receive a fresh iterator from
the_list, that starts at its first element.
Also, I did something above that may be confusing. I took
the_list (an iterable) and using the
iter() method, I returned a iterator of
the_list and stored it in
list_that_can_be_iterated_on, and then I called it in a for-loop. I did this to show you in a familiar form (the for-loop) that after already having called
next() on the iterator, the for-loop would start at the second element (because we already called
next() on it once, outside of the for-loop).
I have glossed over the fact that normally when you use a list (an iterable) in a for-loop, it gets an iterator from it without showing us that it is doing this, and then iterates over that iterator, returning each element by calling
next() on the (invisible) iterator that it created out of our original iterable.
Apparently when you use an object that is already an iterator in a for-loop like I did above, it does not produce a fresh iterator from it. Otherwise, we would be able to call
list_that_can_be_iterated_on after the for-loop, and still retrieve the next element. But we can’t; it raises a
Reviewing the important concepts with custom classes:
A list is an example of an iterable. An iterable is defined as a object that has the
__iter__ method, which returns an iterator of itself.
The custom iterator class:
class MyIterator(object): def __init__(self, iterable_that_is_made_iterator): self.iterable_that_is_made_iterator = iterable_that_is_made_iterator self.index = 0 # I assume we're already receiving an iterable ''' Below is the next() method we kept using above. I want to make it clear where I am using the concepts we used above. ''' def next(self): try: element = self.iterable_that_is_made_iterator[self.index] self.index += 1 return element except IndexError: raise StopIteration
Back to the very first example in this post:
the_list = [1, 2, 3]
Remember that this doesn’t do anything yet:
Let’s use our custom MyIterator class to get an iterator from it:
list_that_can_be_iterated_over = MyIterator(the_list)
This does the same thing as calling
iter() on a list. Now we can call the
next() method on
list_that_can_be_iterated_over and retrieve each element in turn.
Note: This custom MyIterator class does not work with dictionaries, or necessarily any other types than lists. Each iterable has its own way of accessing its elements. I think that this is where the getitem method comes in. This MyIterator class does, however, demonstrate how iterables and iterators differ from one another, and how they depend on one another, and in this case, how a list generates an iterator that can then be iterated over with the
The last point that I want to demonstrate is how to reproduce the way a for-loop works to print out (for example) each element in a list.
the_list = [1, 2, 3] for element in the_list: print element
Behind the scenes, this is what the for-loop does:
the_list = [1, 2, 3] # What the for-loop does: iterator_list = iter(the_list) # Generate an iterator from the iterable while True: try: print iterator_list.next() except StopIteration: break
Update 5/1/16: I saw a table like this on someone else’s post, and I think it is helpful. These are equivalent, alternate ways to generate iterators from iterables, and fetch the next element from the iterator.
Refer to this code:
>>> the_list = [1, 2, 3] >>> iterator_of_list = iter(the_list) >>> next(iterator_of_list) 1 >>> next(iterator_of_list) 2 >>> iterator_of_list.next() 3
|When you write:||Python calls:|
I will continue to edit this post and try to make it cleaner and clearer. If this is confusing to you to whatever degree, check out the following explanations from other people, and see if they make things clearer. I have tried my own way of explaining this, but I worry that I’ve made it all feel more muddled than it needs to be.
I hope that my long explanation makes it easier for some to understand. I suspect, however, that what’s more important than the perfect explanation is general exposure to the confusion stretched out over time, flopping around in the confusion, and reading any explanation that helps to whatever degree, and from there will come understanding.