01:20 am, 11 Aug 06
how i/o can work in a purely functional language
Without using fancy jargon or theory, here's an attempt to explain how you can do "impure" operations (like deleting a file, or updating an array) in a purely functional language. It's actually pretty simple; once you get it, it may be a bit of a let-down. But I need to get this out of the way for some later posts to make sense.
Why would you want such a thing?
The promise of a "pure" language is that a given "variable" will never change. (The word "variable" is a little misleading in that case. People sometimes call them "bindings" instead.) So once you write "x = 3", x is fixed, and that's it. This also implies that functions never change: if
There's a few reasons you might like a language like this. For example, some argue it makes programs easier to reason about, as there are fewer moving parts. Others point out that part of the optimization process done by compilers for "normal" languages is to translate code that modifies variables into code where each mutation instead makes a new variable (you can read more about this on the Static single assignment wikipedia article, which says that GCC 4 uses this "extensively").
Another way to look at that is that mutation and history are two sides of the same coin: in a program where things don't change, there is no difference* between "before" or "after" a given statement. This means that the compiler is free to compute things in the order it thinks is best. (C geeks, if you haven't seen it before, you'll be interested to read the argument Threads cannot be implemented as a library, which explains how some natural-seeming compiler optimizations fail in the presence of threading.)
That comment about threading is relevant, too: in a language without mutation, any little subset of code is naturally independent from what the other code is doing, so pure languages naturally parallelize well.
You still need mutation
But in the practical world you do still need to change stuff sometimes, so pure languages still need to support this. So here's what you can do. (First, a disclaimer: this is just roughly how Haskell does it. Other languages have other approaches. But I'll use a more traditional syntax just so the Haskell thing doesn't get in the way.)
Imagine you're given two objects called
Here's my program that prints the letter A:
A real program will want to do more than one thing, of course. So we could modify our language so that main is instead expected to be a list of operations. And we might want to actually use some of our other bindings, too:
This one prints "BBA".
From here you can even build things that feel sorta like subroutines, too. Consider this binding:
This is the place where people get tripped up, so be careful:
Using the same reasoning, we could define a more basic function
The final piece is input. Let's imagine were were given one final primitive operation, called readInt. Again, you could imagine that if "readInt()" were legal it'd read some input from stdin and return an integer, but again that is an effect and is not allowed. Using the same trick as before we can instead talk about the operation
Then we rejigger our language slightly again.
This program prints the letter A, then waits for you to input an integer. If that integer is is 0, it prints another A, otherwise it prints a B. And then, finally it prints a B. But the crazy thing about it is that it is still completely pure: that is,
And that's it. You've seen conditional control flow involving I/O, but the language is still pure. Do you feel a bit cheated? (I sorta did when I first learned this -- it's like they just sidestepped the whole issue.)
Making it more pleasant
You can't pleasantly write programs with the language I outlined above, but it has all the major concepts. Let's define it in a conceptually equivalent way that makes it a bit more useful.
First, make up a syntax for anonymous functions: uh, ruby-style, I guess. With that syntax, here are two equivalent definitions of foo:
(The bit between the pipes are the arguments to the function.) So here's the same
Then let's define a magical operator "=>" that sequences two operations, feeding one into a function that produces the next and giving us back the whole thing as a single operation. So an operation to print "AB" would be
and an operation to print ABAB would be
and that same main again would be
More usefully, to ask for input you might define a function like:
which you could then use in a program like this:
Some interesting consequences
What we've actually done is turned control flow into something that can talked about in the same way we talk about data. In the land of pure data we never needed a "sequence" operator because we never needed to say "do this first, and then this other thing second". But just with the addition of one operator we can reuse all of the other tools we have are still available. To sum a list of integers, you stick a plus sign between each element in the list. To run a list of operations in order, you stick a => between each element. You can use the same code to define both.
And we can even define our own control flow operators. Here's a function that repeats an operation until the user enters zero, and an example use of it:
But through all of this we're still consistent about the difference between pure computation and side effects: once you're pulled in an operation in some code, there's no way for that code to do anything useful with it without returning something that includes that operation. Since we've made the sequencing of dependent operations explicit, it's clear which bits can be rearranged (as before, all of the code) and which can't (you wouldn't move stuff around the => any more than you'd rearrange the elements of a list).
This finally puts us at a place for my next post, which will build from here to monads, which extend this idea in all sorts of fascinating ways.
Thanks for reading
Was that too fast or too slow? I'm never sure which parts are obvious and which aren't.
* Surely there is a difference, as running the code must do something, after all. But the difference is computational time, which isn't observable by the code -- when I write "x = 3" and then "y = 2", there's no mention in the code of how much time or processor is spent between steps one and two.
Why would you want such a thing?
The promise of a "pure" language is that a given "variable" will never change. (The word "variable" is a little misleading in that case. People sometimes call them "bindings" instead.) So once you write "x = 3", x is fixed, and that's it. This also implies that functions never change: if
foo(3) gives you 10, then foo(3) will always be 10. (People sometimes use the verb "mutate" to describe the process of changing variables -- it makes sense in contrast with the word "immutable".)There's a few reasons you might like a language like this. For example, some argue it makes programs easier to reason about, as there are fewer moving parts. Others point out that part of the optimization process done by compilers for "normal" languages is to translate code that modifies variables into code where each mutation instead makes a new variable (you can read more about this on the Static single assignment wikipedia article, which says that GCC 4 uses this "extensively").
Another way to look at that is that mutation and history are two sides of the same coin: in a program where things don't change, there is no difference* between "before" or "after" a given statement. This means that the compiler is free to compute things in the order it thinks is best. (C geeks, if you haven't seen it before, you'll be interested to read the argument Threads cannot be implemented as a library, which explains how some natural-seeming compiler optimizations fail in the presence of threading.)
That comment about threading is relevant, too: in a language without mutation, any little subset of code is naturally independent from what the other code is doing, so pure languages naturally parallelize well.
You still need mutation
But in the practical world you do still need to change stuff sometimes, so pure languages still need to support this. So here's what you can do. (First, a disclaimer: this is just roughly how Haskell does it. Other languages have other approaches. But I'll use a more traditional syntax just so the Haskell thing doesn't get in the way.)
Imagine you're given two objects called
printA and printB and are told that they represent outputting the characters A and B. You can't actually call them as functions in your code -- "printA()" would be an observable effect, after all. But what your code can do is talk about printA itself (note without the parens now), which refers to the the operation of printing the letter A. So you define your language such that the binding called "main" must be at one of these operations, and you're halfway there.Here's my program that prints the letter A:
// myint is an integer myint = 3 // foo is a function from ints to ints foo(x) = x + 7 // main is an operation main = printA(The compiler can drop the myint and foo bits 'cause they're not actually used.)
A real program will want to do more than one thing, of course. So we could modify our language so that main is instead expected to be a list of operations. And we might want to actually use some of our other bindings, too:
main = [printB, (foo(myint) == 5) ? printA : printB, printA]This one prints "BBA".
From here you can even build things that feel sorta like subroutines, too. Consider this binding:
aOrB(x) = (x == 0) ? printA : printBThis is the place where people get tripped up, so be careful:
aOrB is a function. But when you call aOrB(3), it doesn't print anything. Instead, you get back an operation -- either printA or printB -- and you can then stick that operation in your to-do list that is main.Using the same reasoning, we could define a more basic function
print(x) that gives you back an operation that would print out x. Again, be careful here: print("hello") doesn't print anything, it just gives you back the operation that would print "hello". See how it's a pure function? Passing "hello" to this print doesn't change anything and you always get back the exact same thing.The final piece is input. Let's imagine were were given one final primitive operation, called readInt. Again, you could imagine that if "readInt()" were legal it'd read some input from stdin and return an integer, but again that is an effect and is not allowed. Using the same trick as before we can instead talk about the operation
readInt, which, if it were run, would produce an integer.Then we rejigger our language slightly again.
main is still a list of operations, but we declare that if any operation in the list is the sort that produces a value (like readInt), that this value is fed into the next thing in the list to get the next operation. Here's a new program:main = [printA, readInt, aOrB, printB]This program prints the letter A, then waits for you to input an integer. If that integer is is 0, it prints another A, otherwise it prints a B. And then, finally it prints a B. But the crazy thing about it is that it is still completely pure: that is,
main is defined to have a single constant value. (You'll even note that with the same inputs, in the sense of what the operations do, it will always have the same outputs.)And that's it. You've seen conditional control flow involving I/O, but the language is still pure. Do you feel a bit cheated? (I sorta did when I first learned this -- it's like they just sidestepped the whole issue.)
Making it more pleasant
You can't pleasantly write programs with the language I outlined above, but it has all the major concepts. Let's define it in a conceptually equivalent way that makes it a bit more useful.
First, make up a syntax for anonymous functions: uh, ruby-style, I guess. With that syntax, here are two equivalent definitions of foo:
foo(n) = n + 7
foo = {|n| n + 7 }(The bit between the pipes are the arguments to the function.) So here's the same
main program again:main = [printA, readInt, {|x| (x == 0) ? printA : printB }, printB]Then let's define a magical operator "=>" that sequences two operations, feeding one into a function that produces the next and giving us back the whole thing as a single operation. So an operation to print "AB" would be
printAA = (printA => {printB})and an operation to print ABAB would be
printAAAA = (printAB => {printAB})and that same main again would be
main = printA => { readInt => {|x| ((x == 0) ?
printA : printB) => { printB } } }More usefully, to ask for input you might define a function like:
input(prompt) = print(prompt + ": ") => { readInt }which you could then use in a program like this:
main = input("enter a number") => {|num| print(num+num) }Some interesting consequences
What we've actually done is turned control flow into something that can talked about in the same way we talk about data. In the land of pure data we never needed a "sequence" operator because we never needed to say "do this first, and then this other thing second". But just with the addition of one operator we can reuse all of the other tools we have are still available. To sum a list of integers, you stick a plus sign between each element in the list. To run a list of operations in order, you stick a => between each element. You can use the same code to define both.
And we can even define our own control flow operators. Here's a function that repeats an operation until the user enters zero, and an example use of it:
repeatUntilZero(op) = op => { readInt =>
{|x| (x == 0) ? 0 : repeatUntilZero(op) } }
main = repeatUntilZero(print("enter a zero:"))But through all of this we're still consistent about the difference between pure computation and side effects: once you're pulled in an operation in some code, there's no way for that code to do anything useful with it without returning something that includes that operation. Since we've made the sequencing of dependent operations explicit, it's clear which bits can be rearranged (as before, all of the code) and which can't (you wouldn't move stuff around the => any more than you'd rearrange the elements of a list).
This finally puts us at a place for my next post, which will build from here to monads, which extend this idea in all sorts of fascinating ways.
Thanks for reading
Was that too fast or too slow? I'm never sure which parts are obvious and which aren't.
* Surely there is a difference, as running the code must do something, after all. But the difference is computational time, which isn't observable by the code -- when I write "x = 3" and then "y = 2", there's no mention in the code of how much time or processor is spent between steps one and two.
Otherwise, for people who haven't come across CPS before, your syntax is clearer than the standard CPS syntax of just having a function at the end of the argument list.
Naturally independent, but not entirely independent. Many things depend on the sequence of operation in annoying ways. After reading up a bit about Haskell (as a purely functional language) I had wondered if anyone had implemented automatic parallelism or if it was possible. I know there's a couple of Haskell parallel compilers, but they all require sequencing operations and parallel indicators.
I tried finding some information and this paper is interesting, but a bit bare. Haskell mailing lists seemed mostly inconclusive ("I thought people had given up on this years ago!" to "Yes, very much possible..we did it! But you can't see it..") An interesting point that the linked paper pointed out is that automatic paralleism happens a tad in normal imperative languages on modern processors..
I don't know much about automatic parallelism, but my intuition is that it's a bit scary. It would seem hard for the compiler to know which parts benefit from it, and whether the gain offsets the overhead.
observable side-effects seems to be wash.
I would have though it referred to side-effects in the state of your
program. However, printing characters on a screen doesn't really
affect program state does it?
You could make the argument that you're altering the stdout buffer as
a side-effect, but if you want to go that route, then you could argue
that even f(x) = x + 1 will alter the machine's instruction counter
and registers.
So, At what point do you draw the line?
One reason to call printing an observable effect is that if you reorder two print statements (or if you reorder input statements, for example) your program is "different", while if you reorder 1+(2+3) into (1+2)+3 it is the "same". This is mostly an intuitive thing, though.
Another approach is to draw the line with "everything observable by your program and its interaction with the outside world" on one side and "stuff used internally by the system to run your program" on the other. In the footnote of the post I pointed out that spending time is another "effect" of sorts that is caused by effect-free code. But actually reading the system timer falls in the "interacting with the outside world" category. So from the effect-free code's perspective, there is no way for it to discover it is spending time (or modifying registers, etc.), so from its perspective it has no effect.
Regarding affecting the state of your program:
You could imagine a function
storeValue(name,val)
that gives you back an operation that associates name with val. And then a similar fetchValue.
Then you could write a main like
main = storeValue("num", 2) => { fetchValue("num") => { |v| print(v + 3) } }And in this program, that "num" really is changing. Depending on when the readValue operation happens, you can get different values out of it. Since you're explicitly using the sequencing operator you've defined when it ought to happen: before the stuff on the left and after the stuff on the right.
(This is actually how it works in Haskell, though the syntax is nicer. And you rarely need to use it.)
I find the first argument of all the most convincing: pure code is generally easier to reason about. Like, in the formal sense: there are pen-and-paper (and logical-framework) tools for reasoning about programs that don't work for impure values.
I actually find most of the remaining enthusiasm about functional style unconvincing, especially the sort of sentiment expressed in Backus' lecture that you posted a few days ago. Trying to "liberate" computing from von Neumann style sounds to me like trying to "liberate" physics from linear algebra. It's just not appropriate: von Neumann style is a "sweet spot" that is capable of expressing a bit of three different things:
It seems like the FP enthusiasm stems from wanting to emphasize point #3 above all others in language design, which in turn stems from two fallacies:
I don't find these sentiments convincing, given the evidence I see. It seems to me that real costs are borne by humans and tools in order to support proof-friendly languages, creating an increasingly restrictive club as the proof-friendliness increases. Of course, specialists who are abnormally concerned with correctness proofs will continue to prefer more proof-friendly forms, but there are practical constraints. I someday hope to be able to work in Coq, or its descendants, full time. But I recognize that practical constraints -- in tools and personnel -- limit the possibility.
The evidence I see, in fact, is that even the derided Fortran style is too restrictive for many casual programmers, because it has so many rigid type and memory constraints. I think that fact annoys casual programmers, which is why they latch on to dynamic languages, where nearly every primitive is stateful, history sensitive, and impossible to prove anything much about. There's a sense in which, psychologically, the Smalltalk people seem to have been the most prescient: most humans prefer interacting with a tinkerable dynamic system that's partly working and partly not-working. Most people actually prefer debugging to proving, as far as I can tell. At least at first; possibly for their entire computing career, if they never write big programs.
> sounds to me like trying to "liberate" physics from
> linear algebra.
This analogy doesn't sound quite right to me. Instead of thinking of it in terms of the "notation", think of it in terms of the "model."
Functional programming is to imperative programming as quantum theory is to newtonian mechanics. Neither is necessarily wrong, but both are fundamentally different ways of looking at the problem domain -- or, you might say, they are looking at different problem domains.
Physics has as much to do with linear algebra as computation has with transistors. That's why mathematics and computer engineering are their own disciplines.
For those of us on the functional side, we like to think that we should be able to reason about the process of computation without needing a computer to do it on.
But it's not necessarily better -- probably for some problem domains, it is, though -- and sometimes I think we get in trouble when we use nomenclature like "pure". It makes other languages sound "impure".
-2dman-
p.s. --> The Von Neumann architecture talks about storing code in the same memory used for data -- remarkably unrelated to the issue of programming "style" as we are discussing here. The pre-Von-Neumann style was patch cables ;-). Additionally, one could make the argument that functional programming languages tend to be closer to the potential of the Von Neumann architecture, as the interchangeability of data and code (e.g., first-class functions) is usually more fully expressed in this paradigm. Imperative languages seem not to be too interested in such things, and like to keep the code quite separate from the data.
Perhaps I chose a poor analogy, but I am thinking of it in terms of "model". The von Neumann style discussed in that paper is not just about notation, nor about a stored-program memory architecture. It's also a question of whether you think in terms of:
I claim that #1 is more in tune with the way humans tend to think about a lot of problems we use computers for, and also more in tune with the way our most successful computers actually operate. Currently #2 is better supported by formal reasoning tools. Most programming languages have chosen one of these "styles" as dominant and retained a vestigial sublanguage in the "other" style. C has an expression sublanguage; Haskell has a monad sublanguage.
I claim that the dominance of the sequential style is a reflection of a human cognitive preference. I think -- though this is quite informal and not justified by any psych experience, so feel free to debate -- that humans deal with textual information in a very temporal, sequential fashion. We may parallel-process a ton of sensory and physical data; but we have a hard time looking at a large, composite textual expression and mentally reducing its subexpressions correctly. Or even parsing it.
In particular: