INTRODUCTION TO PYTHON GENERATORS

Introduction to Python Generators

Introduction to Python Generators

This article is part two of my series on iterators and generators. If you haven’t read part one, or if you aren’t experienced with iterators in Python, then please start here.

What are Generators?

Generators (introduced in PEP 255) extend the concept of iterators. In essence, they are iterators with “intelligence”. A generator function is effectively the next() iterator method, applied to get the next value in a sequence. Unlike regular functions, generators use the yield keyword instead of return and maintain state across multiple calls.

Generator functions also allow you to pause execution, saving state until you resume execution. Let’s look at a simple example:

<span class="lineno"> 1</span> <span class="o">>>></span> <span class="k">def</span> <span class="nf">gen</span><span class="p">():</span>
<span class="lineno"> 2</span> <span class="o">...</span>     <span class="k">yield</span> <span class="mi">1</span>
<span class="lineno"> 3</span> <span class="o">...</span>     <span class="k">yield</span> <span class="mi">2</span>
<span class="lineno"> 4</span> <span class="o">...</span>     <span class="k">yield</span> <span class="mi">3</span>
<span class="lineno"> 5</span> <span class="o">...</span> 
<span class="lineno"> 6</span> <span class="o">>>></span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">gen</span><span class="p">():</span> <span class="k">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="lineno"> 7</span> <span class="o">...</span> 
<span class="lineno"> 8</span> <span class="mi">1</span>
<span class="lineno"> 9</span> <span class="mi">2</span>
<span class="lineno">10</span> <span class="mi">3</span>

If you instantiate the generator function, you can see that a generator object is returned:

<span class="lineno">1</span> <span class="o">>>></span> <span class="n">g</span> <span class="o">=</span> <span class="n">gen</span><span class="p">()</span>
<span class="lineno">2</span> <span class="o">>>></span> <span class="n">g</span>
<span class="lineno">3</span> <span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">gen</span> <span class="n">at</span> <span class="mh">0x10d345050</span><span class="o">></span>
<span class="lineno">4</span> <span class="o">>>></span> 

Because generators implement the iterator interface, you can repeatedly call next() on g to fetch each value emitted by the successive yield calls:

<span class="lineno"> 1</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 2</span> <span class="mi">1</span>
<span class="lineno"> 3</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 4</span> <span class="mi">2</span>
<span class="lineno"> 5</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 6</span> <span class="mi">3</span>
<span class="lineno"> 7</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 8</span> <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno"> 9</span>   <span class="n">File</span> <span class="s">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno">10</span> <span class="ne">StopIteration</span>

It is important to note that instantiating a generator does not execute the body of the function. Only upon a call to next() (either explicit or implicit) will the generator execute. Each execution returns the next yield and freezes.

Let’s rewrite the Fibonacci series function from the previous article as a generator:

<span class="lineno"> 1</span> <span class="o">>>></span> <span class="k">def</span> <span class="nf">fib</span><span class="p">():</span>
<span class="lineno"> 2</span> <span class="o">...</span>     <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span>
<span class="lineno"> 3</span> <span class="o">...</span>     <span class="n">y</span> <span class="o">=</span> <span class="mi">1</span>
<span class="lineno"> 4</span> <span class="o">...</span>     <span class="k">while</span> <span class="mi">1</span><span class="p">:</span>
<span class="lineno"> 5</span> <span class="o">...</span>         <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="lineno"> 6</span> <span class="o">...</span>         <span class="k">yield</span> <span class="n">x</span>
<span class="lineno"> 7</span> <span class="o">...</span> 
<span class="lineno"> 8</span> <span class="o">>>></span> <span class="n">g</span> <span class="o">=</span> <span class="n">fib</span><span class="p">()</span>
<span class="lineno"> 9</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">10</span> <span class="mi">1</span>
<span class="lineno">11</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">12</span> <span class="mi">1</span>
<span class="lineno">13</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">14</span> <span class="mi">2</span>
<span class="lineno">15</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">16</span> <span class="mi">3</span>
<span class="lineno">17</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">18</span> <span class="mi">5</span>
<span class="lineno">19</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">20</span> <span class="mi">8</span>
<span class="lineno">21</span> <span class="o">>>></span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">22</span> <span class="mi">13</span>
<span class="lineno">23</span> <span class="o">>>></span> 

You can see that this generator will work for an any size list:

<span class="lineno"> 1</span> <span class="o">>>></span> <span class="n">g</span> <span class="o">=</span> <span class="n">fib</span><span class="p">()</span>
<span class="lineno"> 2</span> <span class="o">>>></span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">8</span><span class="p">):</span> <span class="k">print</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 3</span> <span class="o">...</span> 
<span class="lineno"> 4</span> <span class="mi">1</span>
<span class="lineno"> 5</span> <span class="mi">1</span>
<span class="lineno"> 6</span> <span class="mi">2</span>
<span class="lineno"> 7</span> <span class="mi">3</span>
<span class="lineno"> 8</span> <span class="mi">5</span>
<span class="lineno"> 9</span> <span class="mi">8</span>
<span class="lineno">10</span> <span class="mi">13</span>
<span class="lineno">11</span> <span class="mi">21</span>
<span class="lineno">12</span> <span class="o">>>></span> 

Use the Schwartz

Phil challenged me with a generator problem he uses on technical interviews. Here’s the problem:

You have a stream of values that need to be read from a text file. Each line of text contains an integer. The stream is too large to fit in memory all at once. Write a generator to process the stream of values and calculate their sum. Do not use the form of with open(filename) as f: in your solution.

Let’s assume the file data.txt list each integer from 1 through 99,999 as such:

1
2
3
4
5
...
99999

We can define a generator function that accepts an arbitrary filename, and yields each line individually:

<span class="lineno">1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="k">def</span> <span class="nf">values</span><span class="p">(</span><span class="n">filename</span><span class="p">):</span>
<span class="lineno">2</span>    <span class="o">...</span><span class="p">:</span>     <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span>
<span class="lineno">3</span>    <span class="o">...</span><span class="p">:</span>     <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">f</span><span class="p">:</span>
<span class="lineno">4</span>    <span class="o">...</span><span class="p">:</span>         <span class="k">yield</span> <span class="nb">int</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="lineno">5</span>    <span class="o">...</span><span class="p">:</span>         

Next, we accumulate the sum from the file data.txt:

<span class="lineno">1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="nb">sum</span> <span class="o">=</span> <span class="mi">0</span>
<span class="lineno">2</span> 
<span class="lineno">3</span> <span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">values</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">):</span>
<span class="lineno">4</span>    <span class="o">...</span><span class="p">:</span>     <span class="nb">sum</span> <span class="o">+=</span> <span class="n">x</span>
<span class="lineno">5</span>    <span class="o">...</span><span class="p">:</span>     

And finally, print the sum:

<span class="lineno">1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="k">print</span><span class="p">(</span><span class="nb">sum</span><span class="p">)</span>
<span class="lineno">2</span> <span class="mi">4999950000</span>

The power of generators here is that a constant amount of memory is utilized, no matter how large the data set.

Consider a more complex example, where we want to read a binary file of arbitrary size in chunks, and operate on those chunks:

<span class="lineno">1</span> <span class="k">def</span> <span class="nf">read_in_chunks</span><span class="p">(</span><span class="n">file_object</span><span class="p">,</span> <span class="n">chunk_size</span><span class="o">=</span><span class="mi">2048</span><span class="p">):</span>
<span class="lineno">2</span>     <span class="sd">"""Generator to read a file one chunk at a time.</span>
<span class="lineno">3</span> <span class="sd">    Default chunk size: 2KB"""</span>
<span class="lineno">4</span>     <span class="k">while</span> <span class="mi">1</span><span class="p">:</span>
<span class="lineno">5</span>         <span class="n">data</span> <span class="o">=</span> <span class="n">file_object</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">chunk_size</span><span class="p">,</span> <span class="s">'rb'</span><span class="p">)</span>
<span class="lineno">6</span>         <span class="k">if</span> <span class="ow">not</span> <span class="n">data</span><span class="p">:</span>
<span class="lineno">7</span>             <span class="k">break</span>
<span class="lineno">8</span>         <span class="k">yield</span> <span class="n">data</span>

The implementation of process_data() is completely arbitrary – it could send bytes over a socket connection, for example. Now we can call process_data() on each of these chunks.

<span class="lineno">1</span> <span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'alottadata.dat'</span><span class="p">)</span>
<span class="lineno">2</span> <span class="k">for</span> <span class="n">chunk</span> <span class="ow">in</span> <span class="n">read_in_chunks</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="lineno">3</span>     <span class="n">process_data</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span>

Generator Expressions

If you’re familiar with list comprehensions, then you already know how to use generator expressions! If not, please take a few minutes to read up. A generator expression represents a sequence of results without turning it into a concrete list.

Here is a familiar list comprehension:

<span class="lineno">1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">)]</span>
<span class="lineno">2</span> <span class="n">Out</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>

We can rewrite this as a generator expression with one simple syntactic change:

<span class="lineno">1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="p">(</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="lineno">2</span> <span class="n">Out</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span> <span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="o"><</span><span class="n">genexpr</span><span class="o">></span> <span class="n">at</span> <span class="mh">0x10351bf50</span><span class="o">></span>

Now, we can iterate as expected:

<span class="lineno"> 1</span> <span class="n">In</span> <span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 2</span> <span class="n">Out</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span> <span class="mi">1</span>
<span class="lineno"> 3</span> 
<span class="lineno"> 4</span> <span class="n">In</span> <span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 5</span> <span class="n">Out</span><span class="p">[</span><span class="mi">3</span><span class="p">]:</span> <span class="mi">2</span>
<span class="lineno"> 6</span> 
<span class="lineno"> 7</span> <span class="n">In</span> <span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno"> 8</span> <span class="n">Out</span><span class="p">[</span><span class="mi">4</span><span class="p">]:</span> <span class="mi">3</span>
<span class="lineno"> 9</span> 
<span class="lineno">10</span> <span class="n">In</span> <span class="p">[</span><span class="mi">5</span><span class="p">]:</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">11</span> <span class="o">---------------------------------------------------------------------------</span>
<span class="lineno">12</span> <span class="ne">StopIteration</span>                             <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">)</span>
<span class="lineno">13</span> <span class="o"><</span><span class="n">ipython</span><span class="o">-</span><span class="nb">input</span><span class="o">-</span><span class="mi">16</span><span class="o">-</span><span class="n">d7e53364a9a7</span><span class="o">></span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span><span class="p">()</span>
<span class="lineno">14</span> <span class="o">----></span> <span class="mi">1</span> <span class="n">g</span><span class="o">.</span><span class="n">next</span><span class="p">()</span>
<span class="lineno">15</span> 
<span class="lineno">16</span> <span class="ne">StopIteration</span><span class="p">:</span> 

Generator expressions can be use in place of generator functions. For example:

<span class="lineno">1</span> <span class="nb">sum</span><span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">11</span><span class="p">))</span>

This calls the built-in function sum() with as its argument a generator expression that yields the squares of the numbers from 1 through 10 inclusive. The sum() function adds up the values in its argument resulting in an answer of 385. The advantage over sum([x**2 for x in range(1, 11)]) should be obvious. The latter creates a list containing all the squares, which is then iterated over once before it is thrown away. For large collections, these savings in memory usage are an important consideration.

Conclusion

You can see how generators can be used for lazy evaluation and for calculating large sets of results. Use generators to minimize memory allocation, and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.

Please read on to Introduction to Python Decorators.


Comments

comments powered by Disqus