This is one of the posts in a series of introductory Python articles. These posts explain programming principles in a simplified way and show very basic Python code that should help folks learning Python to get a better understanding of some concepts. In this post, I’ll share some notes about writing and working with generators in Python that beginners may find useful.

Defining generators

At this point, you must be already with familiar with an iterator concept. Unfortunately, the terminology landscape is really confusing because there are iterables and iterators. Iterators are also iterable (because you can iterate them in a for loop), but iterator is not iterable (they return individual items with the next() call). And the terms iterator and generator are used as synonyms in the documentation most of the time.

Iterable (items):

items = [1,2,3]
for item in items:
    print(item)

Iterator (producer):

producer = iter([1,2,3])  # type: list_iterator
next(producer)
# 1
next(producer)
# 2
next(producer)
# 3
next(producer)
# Traceback (most recent call last):
# ...
#    next(producer)
# StopIteration

The key difference between a “normal” function and a generator function is that a function returns a value whereas a generator function returns a generator. The returned generator can then produce (aka yield) values.

Function:

def produce():
    return [1,2,3]


produce()
# [1, 2, 3] 

Generator function:

def produce_generator():
    for item in [1,2,3]:
        yield item

produce_generator()
# <generator object produce_generator at 0x0000020284925660>

[i for i in produce_generator()]
# [1, 2, 3]

for i in produce_generator():
    print(i)

# 1
# 2
# 3

If you have used a list comprehension before, e.g. [i*i for i in (1,2,3)], then understanding a generator expression would be easy since this is just a lazy version of a list comprehension – the items are generated on demand.

lazy_comp = (i for i in [1,2,3])
lazy_comp
# <generator object <genexpr> at 0x0000020284A4D740>

next(lazy_comp)
# 1
next(lazy_comp)
# 2
next(lazy_comp)
# 3
next(lazy_comp)
# Traceback (most recent call last):
# ...
#    next(lazy_comp)
# StopIteration

Building generators

Generators (or generator functions) will become handy whenever you want to be able to produce values when it’s required. Keep in mind that you can pass generators as function arguments since they “remember” their state and at what value they have stopped after the last request.

Here you can see how we pass a generator to a function after first getting a few values from it. It picks up where it was left!

def produce_generator():
    for item in [1,2,3]:
        yield item


def resume(gen):
    return next(gen)


generator = produce_generator()
next(generator)
# 1
next(generator)
# 2
resume(generator)
# 3

When using the built-in iter() to create an iterator, it’s helpful to remember that iter takes a sentinel value which will tell the iterator to terminate when this value is produced.

In this example, we would like to obtain values from a generator until we get a certain value (sentinel value) and then we shall stop. Because the iter needs a callable for the first argument, we use lambda to delay the execution.

def progress(val):    
    while True:
        val = val * 2
        yield val

producer = progress(2)
for i in iter(lambda: next(producer), 128):
    print(i)

# 4
# 8
# 16
# 32
# 64

Using a lambda in this context is fairly common. Here lambda is used in a generator expression to prevent evaluating too early:

def download(url):
    print(f"Downloading from {url}")


callers = (
    lambda: download(url) for url in ["foo", "bar", "baz"]
)
print(callers)
# <generator object <genexpr> at 0x0000020284A4DDD0>

for caller in callers:
    caller()

# Downloading from foo
# Downloading from bar
# Downloading from baz

Building an infinite generator

The itertools module has a variety of useful generators. Attempting to re-implement some of them is a great exercise that will help you getting a deeper understanding of how generators work. Let’s work on an infinite cycle generator that keeps producing the values from an iterable infinitely:

def cycle(iterable):
    while True:
        for item in iterable:
            yield item

cycler = iter(cycle([1,2,3]))
next(cycler)
# 1
next(cycler)
# 2
next(cycler)
# 3
next(cycler)
# 1
next(cycler)
# 2
next(cycler)
# 3
# ...

This works, but this implementation is built on an assumption that user is going to create an iterator using the built-in iter(). Let’s make this function return an iterator instead for a neater interface. For this, we can wrap the generator function inside another function:

def cycle(iterable):
    def inf(iterable):
        while True:
            for item in iterable:
                yield item
    return iter(inf(iterable))


cycler = cycle([1,2,3])
next(cycler)
# 1
next(cycler)
# 2
next(cycler)
# 3
next(cycler)
# 1
next(cycler)
# 2
next(cycler)
# 3
# ...

Delegating to a subgenerator: yield from

The yield from (PEP 380) provides another way to work with generators.

A standard approach for yielding values from a collection of containers is usually written as below:

def flatten(*items):
    for item in items:
        for value in item:
            yield value

[i for i in flatten([10,20,30], [40,50,60])]
# [10, 20, 30, 40, 50, 60]

It can be rewritten as:

def flatten(*items):
    for item in items:
        yield from item

[i for i in flatten([10,20,30], [40,50,60])]
# [10, 20, 30, 40, 50, 60]

In this example, this is just a bit nicer. The yield from can be used in coroutines which we are not going to talk about in this post. Note, however, that support for generator-based coroutines is deprecated and is removed in Python 3.11.

Another practical example of the delegation is when you need to yield from a generator within a generator itself.

Using yield from to yield from a generator

Let’s write a simple function that will produce paths of files within a given a directory, recursively. The pathlib.Path.rglob() would already do that for us, but we’d like to learn how yield from construct can be applied!

def walk(path: Path):
  for subpath in path.glob("*"):
    if subpath.is_dir():
      yield walk(subpath)
    else:
      yield subpath

walker = walk(Path("/etc/apt"))
next(walker)
# PosixPath('/etc/apt/sources.list')
# ...
next(walker)
<generator object walk at 0x7f2ac323e430>

While iterating through items we find, if this is a file, then it’s all good, but if we hit a directory, there’s a problem. We cannot use yield walk(subpath) because yield would produce a generator (calling walk() returns a generator object). We therefore have to say that we would like to “step into” the generator and get an actual value this generator produces:

def walk(path: Path):
  for subpath in path.glob("*"):
    if subpath.is_dir():
      yield from walk(subpath)
    else:
      yield subpath


walker = walk(Path("/etc/apt"))
next(walker)
# PosixPath('/etc/apt/sources.list')
next(walker)
# PosixPath('/etc/apt/sources.d/sources.list.distUpgrade')

I believe this example is very helpful to understand why you’d need the yield from construct.

Using yield, yield from, and return in the same function

It’s possible to use the return statement within a generator function. When return is hit, the function actually returns a value and stops:

def get(iterable):
    for item in iterable:
        if item > 2:
            return
        yield item


getter = get([1,2,3,4])
for item in getter:
    print(item)

1
2

Now, let’s write a function that will return a flattened version of an iterable up to a certain value.

from typing import List


def search(haystack: List[int], needle: int):
    """Find a needle in a haystack."""
    for item in haystack:
        if isinstance(item, list):
            print(f"Looking at: {item}")
            yield from search(item, needle)
        else:
            if needle == item:
                yield item
                print(f"found {item}")
                return
            else:
                print(f"Looking at: {item}")
                yield item


assert list(
    search(haystack=[1,[2,[3,4,5,6]]], needle=4)
    ) == [1,2,3,4]

# Looking at: 1
# Looking at: [2, [3, 4, 5, 6]]
# Looking at: 2
# Looking at: [3, 4, 5, 6]
# Looking at: 3
# found 4

When iterating over the haystack, if we get an iterable (a list in this particular case), then we want to yield from the iterable, recursively. If the value we found is our needle, then we have to return and terminate immediately. Even though our return statement was called deep inside a recursive yield from call, the program stopped there.


If you’d like to follow up, you may want to learn about the methods of a generator iterator which can be used to control the execution of a generator function and later on proceed to the Asynchronous generator functions.

Happy generating!


Published

Category

python-basics

Tags