03:30 pm, 31 Oct 06
lazy lists as iterators
I think I've posted about this before, but lazy lists make a great iterator pattern. I keep writing code where I wish I could use them.
To be honest, laziness to me always seemed like a sorta unnecessary language feature. As with every language feature, there are always a couple toy examples floating around where it could be cool, but in general it never seemed like something worth designing a practical language around. (Haskell was designed around laziness for a bunch of valid and interesting reasons, but Haskell is also intentionally weird.)
But consider an iterator: you need a way to retrieve the current value, a way to advance, and a way to signal you're done. And then note how similar that is to a lazy list: you have a cons of the current value and a pointer to the code to advance, and the "done" signal is the end of list.
Other languages have a variety of approaches that always felt inelegant:
To make this concrete, imagine I have some tree datatype that supports a preorder traversal. With a lazy list, the traversal is a function from trees to lists of nodes:
Here are some compare-and-contrast patterns that I experience in other languages.
This also shows what you lose, and what I've left out of these examples: the control over when the actual iteration code runs. If, for example, iterating to the next element actually involved a network request, these lazy lists don't make explicit whether you pay all the processing up-front or if you pay as you go. (Since in Haskell you do make "when IO happens" explicit, if these examples involved IO they would make this explicit but also would be more complicated than how I've written them above. But if they were simply computationally expensive, instead of IO-based, then you basically* can't control it.)
* You can, but it's hacky.
To be honest, laziness to me always seemed like a sorta unnecessary language feature. As with every language feature, there are always a couple toy examples floating around where it could be cool, but in general it never seemed like something worth designing a practical language around. (Haskell was designed around laziness for a bunch of valid and interesting reasons, but Haskell is also intentionally weird.)
But consider an iterator: you need a way to retrieve the current value, a way to advance, and a way to signal you're done. And then note how similar that is to a lazy list: you have a cons of the current value and a pointer to the code to advance, and the "done" signal is the end of list.
Other languages have a variety of approaches that always felt inelegant:
- Python and friends use an exception to indicate end-of-iteration, which to me crosses past inelegance to plain "hack".
- Many languages have special syntax to support iteration, via both a
yieldkeyword as well asforeach-style looping, and these loops usually end up not supporting some useful but rare special case. - Ruby's blocks mean that your control flow is inverted -- there's no way to say in the middle of your loop "in this case, skip the next element".
To make this concrete, imagine I have some tree datatype that supports a preorder traversal. With a lazy list, the traversal is a function from trees to lists of nodes:
preorder :: Tree -> [Node]Here are some compare-and-contrast patterns that I experience in other languages.
- Typical iteration operations are done with your existing library of list-processing functions.
For example, to print each element in the preorder (for e in tree.preorder(): print e) you can usemap:map print (preorder tree) - What if I want to get a list back instead of an iterator?
In Python you'll write code like:elems = []; for e in tree.preorder(): elems.append(e)
in Ruby you'll do something similar or need to have preorder support either a block or returning an array. But when your preorder returns a list anyway, this is transparently supported:elems = preorder tree - What if I want each index as I go? This is common enough in Ruby that there's an
each_with_indexmethod, but I always end up wanting amap_with_indexand other variants. With lazy lists, you again can use the existing list-munging functions. Code like:zip [0..] (preorder tree)
gives you a list (or equivalently an iterator) of pairs of(index, element). - For more complicated control flow, like skipping certain elements, you can fall back on referring to the iterator explicitly like you do in C++ or Python:
process (preorder tree) where process [] -> ... process (x:xs) -> if x == 3 then do ...; process (tail xs) -- skip one else do ...; process xs
preorder tree means "tree when viewed in preorder". And since these are functions instead of data types, instead of iteration being this concrete step-by-step process you instead can combine different "views" together: for example, you could write an "everyOther" function that pulls out every other element from a list and then use that as just another piece in a chain of iterators. A reverse iterator over every other element from a tree with the elements numbered in preorder is then justreverse (everyOther (zip [0..] (preorder tree))).This also shows what you lose, and what I've left out of these examples: the control over when the actual iteration code runs. If, for example, iterating to the next element actually involved a network request, these lazy lists don't make explicit whether you pay all the processing up-front or if you pay as you go. (Since in Haskell you do make "when IO happens" explicit, if these examples involved IO they would make this explicit but also would be more complicated than how I've written them above. But if they were simply computationally expensive, instead of IO-based, then you basically* can't control it.)
* You can, but it's hacky.
In Ruby, I think it's pretty common to be able get both, e.g. Hash#each/to_a, Hash#each_key/keys, Hash#each_value/values. The Enumerable module/mixin provides to_a/entries, assuming you provide each.
Python generators and Ruby continuations should allow emulation of lazy lists, right? It needs a library, I suppose... Maybe I'm missing the point, though.
or write a generator expression.
3. python2.3+ has enumerate for that: for idx, e in enumerate(tree.preorder()): dostuff(...)
And as Nik mentions, generators in python are lazy evaluation and are a core language feature since python2.4 (but maybe that's besides the point?).
Lua's approach to iterators is to have a function that returns a function (a closure); it then calls that second function repeatedly until it returns
nil. I quite like this pattern because it bundles the iteration code and the state up together in a nice package:function zero_to(n) local a = -1 return function () a = a + 1 if a > n then return nil end return a end endBut then, I'm well known as being overly fond of closures. I'm often told I use closures too much — especially by people that don't understand them :). If you combine it with Lua's coroutines you can end up with a generator pattern (with all the
yeilding) if you like that sort of thing, though this is a feature of the runtime library and not of the language itself.I love the idea of lazy lists because it always seemed like such a hack to expose to the programmer the distinction between an iterator and a list. Of course, you describe in your last paragraph why it's often desirable to have it explicit.
You can't. Lua's lists are actually associative arrays (though behind the scenes an indexed array is created if the indexes look “list-like”), and assigning
nilto an element deletes it, much like assigningundefto a hashtable element in Perl. But yes, I do think it'd be nicer if they made a distinction between the “end of list” marker and a null element; I guess JavaScript has bothundefinedandnullto handle situations like this, but the distinction is a little blurry.I suppose I should clarify that while you can't actually have actual
nils in your list, any missing element behaves as if it containsnil. (The “array part” of the table type's storage does have gaps for missing indices, but that's an implementation detail not reflected in the language.)This means when you're doing index-based iteration using the
ipairsiterator, you will encounternils for the unused indexes in a non-contiguous array. The cheat here is that this iterator actually returns (key,value) pairs, and the key is never nil (because we're iterating over the numeric indices) until the list ends. Lua functions can return multiple values as in Perl, but theforloop only looks for anilin the first.Now, there may be a cleaner way of doing this, but I couldn't find it after a few hours(!) of looking (I'd be happy if someone told me about it!), and I simply wasn't going to build a copy of the whole structure except the uninteresting pieces. Why? After all, if this were Perl,
@newlist = grep { some_predicate($_) } @oldonewould likely have been what I'd just automatically write, right? Well, sure, but that takes zero effort to program even if it isn't 100% efficient, and in C++ you need to fucking think about copy constructors.I think this is in fact a great example for why functional programming and indeed lazy evaluation rock, and it's easily understandable for non-Haskell programmers. The Perl grep solution was perfectly natural but inefficient, here's a technology that makes it efficient.* True, if they're really concerned about understanding how to ensure stuff is fast everywhere they'll have to spend some time e.g. reading about fusion. But since C++ is "like pulling teeth" to quote someone I know <g>, I've seen people writing things in it that are obviously inefficient like pass a copy of a string to a function because it was inconvenient to pass it by reference.. you end up with code as slow as that of a dynamic programming language only a million times buggier, longer and unpleasant.
Okay, I'm done preaching to the choir :-)
ObPerl6:
You have lazy lists in Perl 6, so it's efficient there. :-)
Also, if you want an index out of an array,
.kvdoes what you want:for @list.kv -> $index, $value { ... }There's also gather/take, which offers another idiom for this.
@new <== gather { for @old.kv -> $i, $v { take $k if $v == 3; } }Also,.kvworks with maps too (and, of course, with hashes, where it gets its name from).* Well, you still have to pay the price of having thunky lists in the first place, but let's say that's one of those things like garbage collection that if someone blankly refuses to pay for, ever, well, I think I'm done talking with them.
(Or its relative remove_if, which takes a function. Which is actually usually useless because you don't have lambda and you can't use a method either.)
I suppose your Perl5 example makes a copy of the list too though, right?
*
@new = grep { .some_predicate } @oldwill of course copy; but@list .= grep { .some_predicate }should not.I don't have enough real world Java experience to compare against it in other iteration uses though.
Python
Nuh-uh, no fucking way, in Python you'll write code like
elems = list(tree.preorder()). I love both Haskell and Python, but for god's sake at least try to know what you're talking about before criticizing itAnd Python has the
enumeratefunction which goes exactly that (give it an enumerable and it returns a generator of(index, element)pairs), and you can recreate it with a combination ofitertools.countanditertools.iziporzipe.g.zip(count(), ["foo","bar","baz"])returns[(0, 'foo'), (1, 'bar'), (2, 'baz')](izip does something similar but it returns a generator, soizip(count(), ["foo","bar","baz"])perfectly replicates the behaviour ofenumerate(["foo","bar","baz"])Uh what? Since when does anyone explicitely refer to iterator objects in Python?
reversed(filter(everyOther, enumerate(tree.preorder()))) # you'll note that my everyOther function here is merely a predicate, it doesn't do the filtering even though you could trivially create one that does it.Or, to look more like the Haskell code,
reversed(filter(everyOther, zip(count(), tree.preorder())))Re: Python
I picked Python because I figured more readers were familiar with it.#2 at least applies to Ruby (which lacks generators).
#4 I just meant C++.
Re: Python
Yes and no, it really depends of your interface here. As you mention it, tree.preorder would either have to return a list or support blocks. The later would probably the path chosen as most Ruby construct can be iterated over because they're Enumerables (http://www.ruby-doc.org/core/classes/Enumerable.html), and it'd be perfectly logic (in Rubyland) to make tree.preorder return an Enumerable.
Therefore getting an Array (http://www.ruby-doc.org/core/classes/Array.html) out of preorder would look pretty much like this:
(note: you could also use
mapwith anidblock e.g.{|a| a}, but sinceto_ais part of the Enumerable mixin (http://www.ruby-doc.org/core/classes/Enumerable.html#method-M003153) we may as well use it)Well ok then, but you still wrote Python (maybe you were thinking of Java pre 1.5?)
Laziness is an evaluation order
You do have control over when code runs under lazy evaluation. All that lazy evaluation means is that code is evaluated outermost-first, and parameters to functions which are duplicated in the body of the function are shared between copies. For instance, suppose we have the following definition:twice x = x + x
Then, under strict (innermost-first) evaluation:
twice (twice 5)
= twice (5 + 5)
= twice 10
= 10 + 10
= 20
Under normal-order (outermost-first) evaluation:
twice (twice 5)
= twice 5 + twice 5
= (5 + 5) + twice 5
= 10 + twice 5
= 10 + (5 + 5)
= 10 + 10
= 20
This wasted work by duplicating the evaluation of twice 5. Lazy evaluation prevents this from happening.
Under lazy evaluation (using let-expressions to identify sharing):
twice (twice 5)
= let x = twice 5 in x + x
= let x = 5 + 5 in x + x
= let x = 10 in x + x
= 20
Or, without let, making the sharing implicit:
twice (twice 5)
= twice 5 + twice 5
= (5 + 5) + (5 + 5)
= 10 + 10
= 20
If evaluation of expressions is allowed to cause I/O to occur, then under lazy evaluation, those effects will naturally occur when the values are actually demanded, and their evaluation happens. So if you were to stick a bunch of side-effecting expressions into a lazy list as elements, the side effects would occur when those values themselves were demanded. If you put them in at the cons-cell level, then simply building the list beyond that point would cause the I/O to occur.
There are some actual applications for this. For instance, in the Haskell Prelude, there are primitives for lazy file input, which create strings (lists of characters) that read the file as they are demanded. The nil at the end of the list has IO attached to close the file handle. This obviously can have some problems if the file on disk changes in the middle of evaluating the string, but generally works fairly well.
Of course, by default, Haskell provides you with no way to create such effects, but it can be done (if you know what you're doing), with unsafePerformIO and unsafeInterleaveIO. It's generally not the best idea, as it can interact in strange ways with compiler optimisations which assume code is pure, and so you have to be careful (usually disabling inlining of any function which uses it is a good idea).
The Haskell specification actually doesn't say that it must be lazily evaluated, but all the implementations use lazy evaluation for the most part. It actually just says that it has to have nonstrict semantics, which means essentially that it has to have the same termination behaviour as lazy evaluation, but otherwise an implementation is permitted to do whatever it wants as far as evaluation order is concerned. (It just can't be strict, because that results in nontermination when it shouldn't.) It would be perfectly okay, though absolutely bizarre, to alternate innermost and outermost evaluation steps. Also, compilers may rearrange code in various ways to improve performance, and end up changing the evaluation order somewhat from what was written (while keeping the results and termination behaviour the same of course). This is a good reason to avoid abusing unsafePerformIO unless you really know that it's okay.