While at our monthly South Florida Python meetup (which has exactly two members, Philip Schwartz and myself), the topic of iterators and generators came up. These are two core idioms in Python, and are even more ubiquitous in Python 3. However, they are often poorly understood. This introduction will be part one of a series on Python iterators and generators.
An iterator is an object that allows you to traverse a sequence of data such as a list, dictionary, or tuple, for example. It also works with files.
Let’s see how iterators work.
numbers = [1,2,3,4,5]
for number in numbers:
print(number)
While the for loop is controlling the iterations, the iterator itself is controlling the traversal of the list.
Creating an Iterator
Create a simple list object:
numbers = [1,2,3]
to create an explicit iterator, we create a variable and call iter()
on the list
it = iter(numbers)
We now have a it
tied to the iterator of the numbers
list.
Iterators follow a protocol based on two methods: __iter__()
and next()
. Internally, calling X.__iter__()
is equivalent to calling iter(X)
. To access the first element of the list, use the next()
function:
<span class="k">print</span><span class="p">(</span><span class="n">it</span><span class="o">.</span><span class="n">next</span><span class="p">())</span>
Note: If you are using Python 3, iterators will use X.__next__()
instead of X.next()
.
This is somewhat awkward, so we can use the next(X)
function on the iterator itself:
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">it</span><span class="p">))</span>
The next(X)
function is not only simpler, it is version-neutral, remaing compatible between Python 2.x and 3.x.
Move through the list by repeatedly calling next(it)
. If you try to iterate past the end of the list, you’ll get a StopIteration
exception. The implicit iterators, such as the for loop, are implemented to stop before the StopIteration
exception is thrown.
Files are handled the same way as lists. In this case, the file object itself is an iterator:
<span class="n">fit</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">fit</span><span class="p">))</span>
This prints the first line of data.txt. Because a file object is an iterator, we can call next()
on it.
Iterators and Dictionaries
If you’re coming from another procedural programming language, you might iterate over a dictionary (or hash) as follows:
<span class="n">ages</span> <span class="o">=</span> <span class="p">{</span><span class="s">'Mark'</span><span class="p">:</span> <span class="mi">40</span><span class="p">,</span> <span class="s">'Phil'</span><span class="p">:</span> <span class="mi">30</span><span class="p">,</span> <span class="s">'Bob'</span><span class="p">:</span> <span class="mi">65</span><span class="p">}</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">ages</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
<span class="k">print</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">ages</span><span class="p">[</span><span class="n">key</span><span class="p">])</span>
This will print the key (name) and value (age) for each pair in the dictionary. You will see in a minute that Python has a more elegant and intuitive approach than the loop-over-index implementation above.
So, how does this work, since dictionaries are more complex than lists? Let’s create an iterator over the dictionary and see:
<span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">ages</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">it</span><span class="p">))</span>
This prints the first key. Note that subsequent calls to next()
will not necessarily return the keys in the order in which they were defined. This is because dictionaries are inherently unsorted.
Having an iterator for a dictionary now allows you to simplify the previous for loop:
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">ages</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">ages</span><span class="p">[</span><span class="n">key</span><span class="p">])</span>
Other Iterators
Iterators work with various Python datatypes. For example, with range
:
<span class="n">numbers</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span>
<span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">numbers</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">it</span><span class="p">))</span>
Same pattern as before with lists or dictionaries
What about more complicated data sources? Let’s look at the filesystem, for example:
<span class="kn">import</span> <span class="nn">os</span>
<span class="n">files</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s">'ls *.py’</span><span class="p">)</span>
<span class="n">fit</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">files</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">fit</span><span class="p">))</span>
This will print the first filename, ending in .py, in the current directory. Again, this is the same pattern shown previously, and we can apply a for loop to the iterator:
<span class="kn">import</span> <span class="nn">os</span>
<span class="n">files</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s">'ls *.py’</span><span class="p">)</span>
<span class="k">for</span> <span class="nb">file</span> <span class="ow">in</span> <span class="n">files</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="nb">file</span><span class="p">)</span>
To be even more terse, we can do this:
<span class="kn">import</span> <span class="nn">os</span>
<span class="k">for</span> <span class="nb">file</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s">'ls *.py’</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="nb">file</span><span class="p">)</span>
Iterators also work with tuples. Let’s do something a bit more interesting than just a sequence of single values. We can define a square by its points’ cartesian coordinates:
<span class="n">square</span> <span class="o">=</span> <span class="p">((</span><span class="mi">10</span><span class="p">,</span><span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">23</span><span class="p">),</span> <span class="p">(</span><span class="mi">25</span><span class="p">,</span><span class="mi">23</span><span class="p">),</span> <span class="p">(</span><span class="mi">25</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span>
We can use an iterator on square to retrieve these four coordinate values:
<span class="n">sit</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">square</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">sit</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">sit</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">sit</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="nb">next</span><span class="p">(</span><span class="n">sit</span><span class="p">))</span>
You can probably guess by now that this can be done more easily with a for loop:
<span class="k">for</span> <span class="n">point</span> <span class="ow">in</span> <span class="n">square</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">point</span><span class="p">)</span>
Custom Iterators
Iterators are not limited to native Python datatypes. You can create iterators for your own custom classes as well. Simply follow the protocol explained earlier, implmenting __iter__()
and next()
(or __next__()
in Python 3.x).
For example, we can create an iterator for a class that encapulates the Fibonacci series:
<span class="lineno"> 1</span> <span class="k">class</span> <span class="nc">Fib</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="lineno"> 2</span> <span class="sd">'''iterator that yields numbers in the Fibonacci series'''</span>
<span class="lineno"> 3</span>
<span class="lineno"> 4</span> <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">max</span><span class="p">):</span>
<span class="lineno"> 5</span> <span class="bp">self</span><span class="o">.</span><span class="n">max</span> <span class="o">=</span> <span class="nb">max</span>
<span class="lineno"> 6</span>
<span class="lineno"> 7</span> <span class="k">def</span> <span class="nf">__iter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="lineno"> 8</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span> <span class="o">=</span> <span class="mi">0</span>
<span class="lineno"> 9</span> <span class="bp">self</span><span class="o">.</span><span class="n">b</span> <span class="o">=</span> <span class="mi">1</span>
<span class="lineno">10</span> <span class="k">return</span> <span class="bp">self</span>
<span class="lineno">11</span>
<span class="lineno">12</span> <span class="k">def</span> <span class="nf">next</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="c"># This will be __next__(self) in Python 3.x</span>
<span class="lineno">13</span> <span class="n">fib</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span>
<span class="lineno">14</span> <span class="k">if</span> <span class="n">fib</span> <span class="o">></span> <span class="bp">self</span><span class="o">.</span><span class="n">max</span><span class="p">:</span>
<span class="lineno">15</span> <span class="k">raise</span> <span class="ne">StopIteration</span>
<span class="lineno">16</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">b</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">a</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">b</span>
<span class="lineno">17</span> <span class="k">return</span> <span class="n">fib</span>
Now, we can print the first n elements of the series:
<span class="k">for</span> <span class="n">fib</span> <span class="ow">in</span> <span class="n">Fib</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">fib</span><span class="p">)</span>
Cool, huh?! Now, onto generators…