As part of my series on Python's syntactic sugar, I am going to cover the
for statement. As usual, I will be diving into CPython's C code, but understanding or even reading those parts of this post won't be required in order to understand how the unravelling works.
Let's start with a simple
Passing the code through the
dis module gives us:
GET_ITER in the eval loop, it appears to mostly be calling
PyObject_GetIter(). I happen to know that it is mostly equivalent to the built-in function
iter(). Since I also know that there will probably be something to do with
next(), I'm going to handle defining both functions later in this post.
FOR_ITER opcode is semantically a call to
__next__() on an object's type (as represented by
(*iter->ob_type->tp_iternext)(iter) in the C code). What that means is that understanding how
next() work is critical to understanding how
for statements work.
Before covering how
next() work, we should define two important terms. An iterable is a container which can return its contained items one at a time. An iterator is an object which streams those contained items. So while every iterator is conceptually an iterable (and you should always make your iterators also an iterable; it isn't much work), the reverse is not true and not all iterables are an iterator on their own.
The reason I bothered defining those words is because the purpose of
iter() is to take an iterable and return its iterator. Now, what
iter() considers an iterable depends on whether you give it one or two arguments.
Starting with the semantics of the function when there's a single argument, we see that the implementation calls
PyObject_GetIter() to get the iterator. The pseudocode for this function, which I will explain in a moment, is:
The first step is looking for the
__iter__() special method on an iterable. Calling this is meant to return an iterator (which is explicitly checked for on the return value). At the C level, the definition of an "iterator" is an object that defines
__next__(); both the iterable and iterator protocols can also be checked via their requisite
collections.abc classes and
issubclass() (I didn't do it this way in the pseudocode simply because the
hasattr() checks are closer to how the C code is written; in actual Python code I would probably use the abstract base classes).
There is a footnote relating to
__iter__() that says if the attribute on the class is set to
None that it won't be used (although I have not come across any explicit code doing this check). I think this is implicitly supported due to how the implementation is written, i.e.
None is not callable and lacks the appropriate methods that are being checked for.
The second step is what happens if
__iter__() isn't available? In that case there's a check to see if we are dealing with a sequence by looking for the
__getitem__() special method. If the object turns out to be a sequence, then an instance of
PySeqIter_Type is returned whose approximate Python implementation would be:
I say "approximate" because the CPython version supports pickling and I don't want to be bothered. 😁
If all of the above fails, then
TypeError is ultimately raised.
If you recall earlier, I mentioned that
iter() had a two-argument version. In this case
iter() is rather different than what we have already discussed:
As you can see there's a check to see if the first argument is callable, and if it is then an instance of the
PyCallIter_Type iterator is returned. With the function being so short, the important question is what does the
PyCallIter_Type iterator do?
The use-case for this form of
iter() is for when you have a no-argument callable that you persistently call until a certain value is returned representing that there isn't anything more coming from the callable.
def _call_iter(callable, sentinel): while True: val = callable() if val == sentinel: return else: yield val
An example of where this could be useful is if you are reading a file a chunk at a time, stopping once
b"" is returned:
with open(path, "rb") as file: for chunk in iter(lambda: file.read(chunk_size), b""): ...
Pulling it all together and doing everything appropriately, the definition of
To get the next value from an iterator, we pass it to the
next() built-in function. It takes an iterator and can optionally take a default value. If the iterator has a value to return, then it's returned. If the iterator is exhausted, though, either
StopIteration is raised or the default value is returned if it's been provided.
For each value returned by the iterable's iterator, they are assigned to the loop's target (sometimes called the loop variant), the statements in the body are run, and this repeats until the iterator is exhausted. If there is an
else clause for the
for loop, then it is executed if a
break statement isn't encountered. Breaking this all down gives us:
- Get the iterable's iterator
- Assign the iterator's
nextvalue to the loop target
- Execute the body
- Repeat until the iterator is done
- If there's an
elseclause and a
breakstatement isn't hit, execute the
Kind of sounds like a
while loop with an assignment, doesn't it?
- Get the iterator with
next()and assign the result to the loop target
- As long as calling next didn't raise
StopIteration, execute the body
- Repeat as necessary
- Run the
elseclause as appropriate
for without an
Let's start with the simpler case of not having an
else clause. In that case we can translate:
If you squint a bit it sort of looks like the
for loop example. It's definitely a lot more verbose and would be a pain to write out every time you wanted to iterate, but it would get the work done.
for with an
Now you might have noticed that we used a
break statement in our unravelling above which would cause a
else clause to always be skipped. How do we change the translation to work when an
else clause is present?
Let's update our example first:
To eliminate our use of
break in our original unravelling, we need to come up with another way to denote when the iterator is out of items. A variable simply tracking whether there are more values should be enough to get us what we are after.
_iter = iter(b) _looping = True while _looping: try: a = next(_iter) except StopIteration: _looping = False continue else: c else: d del _iter, _looping
This unravelling has a rather convenient side-effect of getting to rely on the
while loop's own
else clause and its semantics to get what we are after for the
And that's it! We can leverage a
while loop to implement
for loops with no discernible semantic changes (except for the temp variables).