I'm an autodidact, so I don't know how the computer science curriculum is usually organized, but I would guess that students learn about Turing machines relatively early in their education, once they have mastered a few programming languages. I've been through a half dozen or so texts on theoretical computer science that cover this ground. They all have one thing in common: *they utterly ignore Turing's fundamental insight*!

Such texts teach the machine; one learns that a Turing machine involves a "tape" that slides about, with a read/write head, etc. So the student memorizes, works exercises, and comes away with a nice, formal notion of computation in terms of an abstract machine. That's the way I learned it, any way, and it seems to be the way most textbooks teach it.

Now that's a fine thing, and it explains how computers work, and how some things are "computable" and some not, and so forth. It's all very clever, but it's also entirely *formal*. I didn't come away from it thinking that Turing was anything but a very clever mathematical engineer. In fact the lambda calculus seems much more elegant and simple than all this business of tapes, machine states, and so forth. Turing seemed to me to be a little like Henry Ford or Thomas Edison: an inventor of a useful machine. An ingenious inventor, but still just an inventor. Studying Turing machines never gave me the feeling that I was in the presence of genius; I would have ranked many mathematicians far ahead of Turing.

Then I came across Robert Soare's paper Computability and Recursion and realized that I was getting only half the story. I'm not competent to judge the paper's technical merits, but I can confidently recommend that it should be required reading for anybody studying either Turing machines or the lambda calculus, if only for its excellent account of the history of the concepts of computability and recursion. It's really more an essay in the history of ideas than a technical paper.

The key point Soare makes, although he doesn't express it in quite this way, is this: it's all about *intuition*, and intuition is all about *lived, human experience*. The formalism of the machine comes *after* the leap of imagination that allowed Turing to crystalize intuitions about human experience in formal terms. Soare points out that Godel was unconvinced by Church's recursive model of computation precisely because it was unconnected to intuitions derived from lived experience.

It turns out that the key section of Turing's landmark 1936 paper, *On computable numbers, with an application to the entscheidungsproblem*, is section 9, wherein he grounds his formalism in an attempt "to show that the 'computable' numbers include all numbers which would naturally be regarded as computable". The next sentence strikes at the heart of the matter: "All arguments which can be given are bound to be, fundamentally, appeals to intuition, and for this reason rather unsatisfactory mathematically." In other words, we have certain real-world experiences calculating numbers, and based on that experience, we have an intuition that some numbers are just "naturally" computable. Now the previous eight sections of his paper described his Turing machine; his observation here amounts to an admission that there is a fundamental rift between our intuitive notion of naturally computable and the numbers computable by his machine formalism. On the one hand, it's not enough to just have a warm fuzzy feeling that the machine can compute everything we think is naturally computable; on the other hand, we can never *prove* that it can, since intuitions based on experience are not mathematical objects.

Turing's stroke of genius was to do for the experience of calculating what Euclid did for the experience of extensionality in space: he found a way to think about that experience in terms of an ideal, formal imaginative structure whose faithfulness to intuitive experience is indisputable. Euclid came up with notions like a point with no height or width and a line with width but no height; such ideas can never be *proven* to represent real space, but the intuition that they do is so compelling that it must be accepted. Turing did exactly the same thing for the experience of calculating, literally with pencil and paper.

Most of section 9 of his paper is devoted to describing the work of a human "computer". As Soare points out, in the mid-thirties the term "computer" meant a human engaged in calculation. Turing analyzed the *process* involved in such calculating. Post-Turing, it's easy to underappreciate what an imaginative leap this was. Calculation by a human, with pencil and paper, is *essentially* an analog process. If it doesn't look like that to you today, it's only because of the revolution in thought brought about by Turing. (Evidence: Godel couldn't come up with it.) He looked at the process of calculating by writing stuff down on a piece of paper divided into squares "like a child's arithmetic book", and essentially realized that we can model that analog process as a finite discrete process, a sequence of "'simple operations' which are so elementary that it is not easy to imagine them further divided." He furthermore realized that we can model the human computer's "state of mind" in finite, discrete terms. With a few other simple constraints we end up with a formal imaginative model of the work of the human computer that can be directly "mathematicalized"; we can "construct a machine to do the work of this computer", and that machine will correspond naturally, intuitively, and directly to the lived experience of performing a calculation. Turing's machine is to the experience of calculation as Euclid's one dimensional line is to the edge of a straightedge.

The sad thing is that section 9, although it is the heart of Turing's argument, is very simple; but I've yet to come across a text on theoretical computer science that even mentions it. You don't really even need mathematics to understand it, and yet it is the foundation of a massive revolution in human thinking. If a number is computable by a Turing machine, and a Turing machine is a formalization of intuition, then what does "number" mean? For Euclid, number was synonymous with length (which is why the Greeks could not stomach zero as a number); post-Turing, (computable) number is synonymous with the process of computation. For Euclid, numbers say something about extension in space; post-Turing, numbers say something about *thinking* - Turing's machine models the process of human computation; human computation is a thought process; therefore Turing machine is synonymous with mind. Etc. etc.

In the popular mind this sort of stuff often comes up as the question of whether or not computers can think, which most people probably think of as whether or not computers can simulate human behavior. But the real question is the other way around: is thinking essentially computational? Not so different from asking whether space is essentially geometric.

by gar

]]>Such comparisons totally miss the point by misunderstanding the notion of performance. Performance is not about time and/or space efficiency, it's about *economic* efficiency, and the speed of compiled code is only one factor in the economic equation. Your compiler can produce the fastest code in the world, but it won't be of much use if it isn't *correct*. So any calculation of return on investment must factor in the "cost of correctness".

Assuming correct code and correct compilation, a compiler that produces faster code offers better Return On Investment. But every programmer knows you cannot make such assumptions, so an investment decision based solely on speed benchmarks is a poor decision based on incomplete information.

In any real-world project of even modest complexity this means a performance assessment of a toolset like C++ and gcc must account for all the effort involved in "proving" correctness. Since proof is not possible, this means not only the cost of testing, but also the cost of fixing bugs. Testing can never prove the absence of bugs, so all such code, even when published as production quality, is essentially experimental.

Hence functional languages have a major advantage in performance economics, since the cost of correctness drops considerably. We can never prove that a piece of software correctly addresses a customer's problems, or that the problem specification is accurate, but functional languages go a long way towards guaranteeing correctness of the code itself. And if you've proven the correctness of the code, there's not much point in making major investments in testing what needn't be tested. Some testing or validation will always be required - e.g. design of the user experience - but most expense involved in establishing code correctness is incurred up front in the effort it takes to prove correctness. Which in principle should be a lot less than the expense of testing and fixing code written in imperative languages.

Moore's law is a law of *economics*, not a law of nature. Engineering is essentially about economics. I would bet there is a similar law waiting to be discovered about the economics of programming, since functional languages (along with logic languages and some other idioms) promise to turn programming into something resembling real engineering. Haskell code has certain mathematical properties that the equivalent C/C++/Java/etc. code cannot hope to have, which means it can reliably be analyzed and transformed based on mathematically principles rather than heuristics. New methods of optimization are bound to arise that will be unavailable to imperative language compilers.

It's inevitable that some day functional language compilers will produce code that even the best hand-rolled assembler cannot match for speed and space efficiency. When that starts to happen and the economics of functional languages - or rather, languages with some kind of support for provable correctness - become more clear, it wouldn't be surprising to see chip designers revive the idea of machines designed for reduction optimization.

In any case, the point is that people who want to make the case for Haskell or other functional languages should not allow themselves to be suckered into discussions about performance isolated from other economic factors. Benchmark code should not use simple algorithms whose imperative implementations involve minimal side-effects and can thus be reasoned about informally with little effort. Instead they should involve a well-defined problem, as simple as possible, but one that requires the kind of non-trivial reasoning that complicates the life of an imperative programmer. Something that makes cost of correctness as much a part of the benchmark as space/time efficiency. And such benchmarks should always be called "performance economics benchmarks", never just "performance benchmarks".

by gar

]]>Indeed it is a very useful metaphor, but the motivation for exposing it as a "pernicious" myth is not linguistic fascism, but rather pedagogical clarity. The Container Fiction is only pernicious insofar as it interferes with a clear understanding of the underlying abstraction it attempts to capture: the functor as a special kind of relation that enables communication between two different and mutually incomprehensible universes. This is bound to happen, since the container fiction is deeply embedded in the world of set theory, and the key to understanding Haskell is category theory, which is a whole nother thing.

The list is probably not the best example, since it involves a lot of complexity, what with the free monad and so forth. A simpler example would be a pair (a,b). This can obviously considered a kind of container, but as metaphors go a somewhat less pernicious image is the chemical compound. A chemical compound has different properties than its consituents; carbon monoxide (CO) is quite a different thing than carbon and oxygen. It's not a container of its constituents, it's their compound. The relation of (the denoted value) of "(a,b)" to its "components" a and b is similar. It's a mereological relation, not a containership relation.

But even that obscures the true nature of the functor (). The value denoted by "(a,b)" is *not* a compound, it is a primitive, whole, opaque value. (Disregard for the sake of argument whether it is dependent on the human mind.) The compound is the expression "(a,b)" itself.

My proposition is that a category theoretic explanation can be clearer and *simpler* than such metaphorical expositions, leading to a solid articulated intuition of the structural nature of functors without resorting to colorful metaphors. Well, maybe one, but it's arguably not a metaphor. A full exposition would take some very careful writing and a bunch of diagrams, but here's a sketch of how it would work.

Start with a, b, and (). Symbols 'a' and 'b' name things in a home universe; '()' is (the name of) an intergalactic "telefunctor" that relates the home universe to an alien universe of pair values. "(a,b)" takes two primitive values from home to a third primitive value in the alien universe. It doesn't *convey* a and b to the alien universe; rather, it *entangles* them. IOW, "(a,b)" is a kind of name or definite description of a value that is just as primitive and indivisible as the value denoted by "a" but that lives in an alien universe. We could say that "(a,b)" represents the quantum entanglement of a and b with the value denoted by "(a,b)". Call that value x. It lives in an alien universe; we know nothing about its internal structure, nor what it looks like in its universe, nor how it relates to other elements of its universe. But we do know something about how it relates to values in our home universe. We obviously know it is related - via () - to a and b together. But we also know, because of the way we define () - since this is just a brief sketch I omit this definition - we know that x is related to a and b individually. Our definition of () goes with two other definitions: fst (a,b) = a, and snd (a,b) = b.

Now we have defined the value of () purely in terms of how it relates things across intergalactic (ok, couldn't resist a metaphor) boundaries - morphisms. We could go further and show how a and b can also be defined as morphisms - after all, they're just names denoting values, and those values are just as opaque and unapproachable as the value of "(a,b)". Our knowledge of how all these things behave is expressed in terms not of what they *are*, but in how they *relate*. You can judge a man by his friends; you can understand mathematical values by their relations to other values.

A diagram might represent values as blackened circles. I believe this approach leads to a very clear, simple, and reasonably complete exposition. The metaphor of alien universes and intergalactic telefunctors could of course be dispensed with, but it does add color without introducing semantic distortion.

The real virtue of this approach becomes more clear when we complement the above exposition of the object component of a functor with a similar account of the morphism component that takes functions from one realm to functions in the alien realm. The pedagogical payoff is the realization that a functor allows us to use what we know about values in one realm to "manipulate" values in another realm. The best metaphor for the functor is neither containment nor mereology, but quantum entanglement.

by gar

]]>That may or may not be true - in my opinion a well-written CT-oriented intro to Haskell need be no more terrifying than any other - but I take mild exception to the implied assertion that "elite" is somehow bad.

The community of Haskell programmers (and functional language programmers in general) is a natural, self-selecting elite. There's nothing wrong with that; on the contrary, the elite status of the Haskell Master should be celebrated. Haskell is hard, there's no sense in denying it, but so are most things worth doing. It's hard to master the piano, or fly-fishing, or brain surgery, or a foreign language, and nobody sensible would pretend otherwise. Why should the highly sophisticated art of functional programming be any different? Elite does not mean snide, or exclusionary, or condescending, or selfish; it means excellence, and the truly elite are open-minded, inclusive, and generous with their hard-earned knowledge.

I came to Haskell with years of imperative programming under my belt. I became a pretty "advanced" programmer, but I never really felt like I was doing anything a clever monkey couldn't do. A C programmer spends most of his time doing much what an auto mechanic does - diagnosing and fixing problems in a sophisticated machine. There's nothing wrong with being an auto mechanic or a C programmer; but it's not exactly an elite.

On my third try I've finally begun to appreciate the beauties of Haskell. The turning point for me was category theory, which lead to a sea change in my thinking. I'm not sure I'll ever become a master Haskell programmer - one of the great things about it is you can acquire a pretty good understanding of how the language works without writing Haskell programs - but it's clear that those who do master it are a different kind of programmer. Show me a great C programmer and I'll show you a Thomas Edison; show me a great Haskell programmer and I'll show you a J. S. Bach. (Idea for a logo: a measure of musical notation with a lambda clef, and notes tied by arrows.)

The message to newcomers should not be "don't worry, it's just like PHP, only different". It would be more honest to state right up front that this is a different game: entree to the elite is open to anybody, but not for free. You have no right to demand excellence without effort. You have to work; you have to master some mathematical ways of thinking that can in fact be difficult even for those with a good deal of math. You can do this; many others have. But if you're the type to throw a fit the first time you discover that your customary ways of thinking won't work, then please, go away. This is for grown-ups.

If you value excellence, and are motivated by the prospect of rising above the common dross and attaining excellence yourself, then you are in for a great intellectual adventure of deep insights and dazzling discoveries. And you will find lots of help. There is a plethora of material on the web that will help you, and the community of those who have gone before you is open and more than willing to help those in their apprenticeship.

In my experience the best, most imaginative and talented people are stimulated by a challenge to excel. They *want* the hard stuff, and they're the ones who should be evangelized.

The message to business people should be similar. No, Haskell is not PHP on steroids, it's a different beast altogether. It's a major paradigm shift, with major benefits that are well worth the costs. You hire expensive, highly-trained MBAs to professionally manage your company's financial assets; why would you not hire elite programmers to manage its information assets? The importance of husbanding knowledge assets to gain competitive advantage is only going to increase; treat it as mere data processing and your competitors will eat you alive. Yes, it's harder to find a good Haskell programmer than it is to find 100 mediocre PHP hackers, but aside from being more productive, the Haskell programmer comes with a disciplined, highly sophisticated way of thinking about information structures and algorithms. This makes it much more likely that your Haskell staff will do a much better job of figuring out how to turn your raw data into real capital that returns value.

(Fortunately I don't have any money riding on my sales pitch.)

by gar

]]>It's a useful fiction, but it is a fiction. A value of type "[a]" does *not* contain values of type 'a', or anything else, for that matter. The Container Fiction is pernicious insofar as it retards progress toward a deeper understanding of how the notation works and why category theory provides such a powerful semantic framework.

All values are ontologically primitive.

This is obvious if you look at "simple" values like integers. Is 3 primitive? Some mathematicians would disagree. The number three, they might argue, is constructed using {} and a successor function. Or they might argue that it's a sequence of lambda expressions of some kind. There are probably lots of other ways to describe the meaning of '3' using various mathematical constructions.

These are all just different ways of *describing* a mathematical object; but as one of my college professors (expert in comparative religion) was fond of saying, map is not territory. Description of an object does not an object make. If these various and sundry descriptions all describe the same thing, then that thing must be independent of the descriptions.

Constructive mathematics does not construct its objects, it describes them. It would be the height of hubris to suggest otherwise; after all, numbers do not depend on the exertions of mathematicians for their continued existence.

The vitality of the Container Fiction demonstrates the power of the Iconic Fallacy. Values do not contain, but syntactic expressions do. We think that an ordered pair (a,b) contains a and b. It doesn't, but the graphical form, using parentheses, strongly implies that containment is involved. This is the Iconic Fallacy: false inference about the nature of the thing signified drawn from the form of the signifier. An expression like [1,2,3] "looks like" a list of integers; it does not therefore follow that it *is* (or denotes) a list of integers. It would be more accurate to say it is a construction that *uses* a list of integers to denote a value. The expression "[1,2,3]" represents application of a morphism (the [] functor) to an argument; if its denoted value contains 1, 2, and 3, then does the denoted value of $\sqrt 4$ contain $4$?

Part of the problem is that "List" (i.e. []) is misnamed, implying that '[a]' *is* a list. It's named after the target end of the arrow; a proper name would be something like "TypToList", thus emphasizing that it is a morphism.

This may seem like so much philosophical splitting of hairs, but it's not. On the contrary, it correlates with one of the fundamental insights of category theory, namely, elements don't matter. I believe it's very important for learners of Haskell (or any FPL with lazy evaluation) not only to learn to think in terms of mathematics, but also to learn to think about how mathematical notation works. I speak from experience; when I first picked up Haskell I used a book that insisted that something like "x = Foo 1 (2,3) 'a'" means that x contains 1, and (2,3), and 'a'. That didn't match up with the ideas I had formed about algebraic data types, so I eventually ended up dropping the whole thing in exasperation.

More specifically: it's important (essential?) to realize that we can only talk about values indirectly. On the one hand that means we have to use a language with syntax, obviously. But it's deeper than that. We can only talk about mathematical objects by "triangulating" using other, related mathematical objects. As a concrete example consider an ordered pair (1,2). Disregarding syntax, lambda expressions, etc., the value (1,2) is a primitive object that is the image of 1, 2 under the mapping ( ). The only way we can talk about it is to use these three things. Not the "contents" of (1,2) - it has no contents - but the things we used to construct (describe) it , which are completely distinct from the thing itself.

The most important reason to learn to think like this is that it is the primary idiom of category theory. For example, in CT terms, something like (1,2) is treated as an opaque object together with two projection arrows fst and snd. No contents involved, just an object and two mappings. The internal structure of the object is irrelevant. Saunders MacLane says somewhere that CT is about "learning to live without elements"; applied to Haskell, that means learning to live without containers.

by gar

]]>Here are a few typical examples of how IO in Haskell is described:

A value of type IO a is an “action” that, when performed, may do some input/output, before delivering a value of type a. (Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell, p. 5)

a type IO a denoting actions that when performed may do some IO and then return a value of type a. (Imperative Functional Programming, p. 3)

The key idea is to treat a value of type IO a as a “computation” that, when performed, might perform input and output before delivering a value of type a. (A History of Haskell: Being Lazy With Class, p. 25):

We can summarize the SDIOH (Standard Definition of IO in Haskell) as "a value of type IO a is a value, that performs, then delivers a value of type a".

Now consider how one might describe a run-of-the-mill non-IO value: "a value of type T a is … a value of type T a". All definitions are tautologies. By definition. If the SDIOH is a true definition, then IO values are not like other values, and we have a two classes of value in Haskell. If it is not a tautology, then it is not a definition, and we still don't know what an IO value is.

Since it is clearly undesirable to have two different kinds of value, it must be that the SDIOH is in fact not a definition. Then what is an IO value?

Part of the problem with the SDIOH is its use of ambiguous terms. It's impossible to discern exactly what it means by "value" and "computation". But that's only part of the problem. The deeper problem, which is related to the terminological problem, is that the semantic model (for lack of a better term) behind the SDIOH adopts a particular perspective on the semantic operations involved in evaluation that forces us to mix being and doing.

The current idiom portrays an IO value as an agent that dynamically performs some kind of action. The GHC implementation of IO operators express this concretely by modeling IO as a state transformer that acts on the RealWorld. This "works" well enough; GHC manages to perform IO. But it doesn't fly mathematically. Mathematical objects *never* act, sing, dance, or *do* anything. They just are. A value that acts is an oxymoron.

Fixing this requires a shift in perspective and a more refined elaboration of the mental model we use to think about IO in Haskell. The trick is to reverse the perspective. Instead of viewing the program as an agent that acts on the world, we view the world as an agent that provides an interpretation for program. We refine the model by dividing semantics into two parts: internal semantics encompass the mapping from Haskell expressions to lambda expressions, along with lambda conversion of expressions; the external semantics cover the assignment of an interpretation to the lambda expressions.

For example, we can account for the semantics of '3' as follows: the internal mapping takes the Haskell expression '3' to the lambda expression '_3' (using prefixed _ to mark symbols in the metalanguage). The World in turn *interprets* the program text, mapping '_3' to the integer of that value. If we wish we can define a World metalanguage. Using a Haskell-like syntax, we say '_3' is interpreted as "Z 3" of type "W Z Int".

This gives us a clear, simple account of the semantics of IO values without any notion of action. An IO a value is just an IO a value; no different than any other kind of value. Semantics comes from interpretation. The internal semantics map non-IO expressions to *constant* (bound) lambda expressions. To account for IO expressions, we use lambda *variables*. For example, "getChar" maps to something like $v : IO Char$ (assuming our metalanguage uses : to indicate type). In other words, the internal semantics takes an IO expression to a free variable in the metalanguage. Now free variables range over all values of their type; so the World's interpretation of such a free variable is the set of mappings from the variable to the World values of the World type corresponding to the argument of the IO type (the 'a' in IO 'a'). In other worlds, the set of all possible bindings of $v : IO Char$ to World values of type "World Char"; for getChar, something like "Char 'c'".

The meaning of the program then becomes the set of all possible World interpretations of it. The notion of actors and actions has been completely eliminated. The notion of "value of type IO a" has been replaced by a description of the way IO expressions are interpreted. The SDIOH is replaced by something like:

An expression of type IO a is bound to a free variable in the metalanguage, which is bound to a (World) value of type a in each interpretation of the program

Naturally, for this to work we would need to rework other parts of the metalanguage, and we would want to come up with some kind of metalanguage for referring to World types and values. In particular, we need to complement the definition of IO values with a definition of (non-IO) values:

a expression of type T a is bound to an expression in the metalanguage that contains no free variables …

This definition would have to be refined, but the basic idea is to distinguish between lambda expressions whose Head Normal Forms have a constant binding to World values, and those that are free.

- the role of the lambda calculus as a metalanguage is made explicit
- IO expressions bind to free vars in the lambda metalanguage
- a World metalanguage provides a model for interpretations
- Executing a program means having the World give it an interpretation

by gar

]]>- a dynamic computing process
- the result of a computational process
- an algorithm that describes a computational process
- a formal description of an algorithm in a specific language
- a formal description of the semantics of a computational process; e.g. a lambda expression
- a dynamic process that converts a lambda expression to normal form
- a formal description of lambda conversion

One could probably come up with additional technical uses for the term. Even worse, "computation" is an ordinary word with well-understood if inexact semantics. Readers come to a technical text with a well-formed idea of what the word means, and thus no reason to think their notion is incorrect.

"Value" is even worse. It too carries multiple technical meanings in the literature, and it is even more common in ordinary language. Same for "evaluation".

These terms are ambiguous. They should be banned from technical discourse.

Usually this ambiguity is tolerable, at least for subject matter experts. But when it becomes a problem, it's a big problem. It cost me many exasperating hours of head-scratching when I decided to finally get to the bottom of monads and IO in Haskell.

The texts that flummoxed me are Moggi's paper on "notions of computation" and the standard idiom used to describe IO in Haskell.

Moggi (Notions of computation and monads, p. 3):

In particular, we identify the type A with the object of values (of type A) and obtain the object of computations (of type A) by applying an unary type-constructor T to A. We call T a notion of computation , since it abstracts away from the type of values computations may produce.

A typical example of how IO is described in Haskell (A History of Haskell: Being Lazy With Class, p. 25):

The key idea is to treat a value of type IO a as a “computation” that, when performed, might perform input and output before delivering a value of type a.

Obviously this language is "good enough", since Moggi's paper was very influential, and Haskell is very successful. But both passages immediately struck me as hazy and somewhat absurd. Try as I might, I could find no way to assign a precise meaning to them. It took a lot of research and a flurry of messages on the haskell-cafe mailing list to figure out why they didn't work for me and discover what the authors (probably) meant.

Needless to say, part of the problem was the ambiguity of the terms "computation" and "value". I came to the texts thinking that a value is either a "datum" like 3 or a function like sqrt. They're just static mathematical objects, so I could not conceive of how a value could "be" a computation, or could "be performed".

The scales fell from my eyes when I came across the following passage buried in the Wikipedia article on Theory of Computation:

A computation consists of an initial lambda expression (or two if you want to separate the function and its input) plus a finite sequence of lambda terms, each deduced from the preceding term by one application of Beta reduction.

I was already familiar with the lambda calculus, but I had never thought of a lambda expression as a "computation". I still don't like it; a lambda expression is not a computation, it's a formal *representation* of a mathematical object (a *value*). Lambda conversion is a purely syntactic calculation that yields a different notational representation of the same value. The lambda calculus is a *formal model* of computation; the same computation could be represented in a different formalism. So it's just wrong to say that a computation *is* a sequence of lambda expressions.

Nonetheless, it's clear why one might think in these terms and use this language, and I think that is what's going on in the passages quoted above. My ideas of computation and value were based on the idea of a direct mapping from expression to mathematical object. Even though I was quite familiar the notions of language and metalanguage, I thought of a metalanguage as serving mainly to clarify meaning; once you understand the semantics of the language, you can disregard the metalanguage and think directly in terms of abstract semantic objects (e.g. numbers and functions.)

But in Haskell that's not good enough; to understand how lazy evaluation works, you need to keep lambda calculus (or the equivalent) in mind, because denotation has to pass through lambda expressions in various states of reduction before arriving at normal form and then semantic objects. You cannot disregard the lambda metalanguage, because the different forms lambda expressions may take constitute part of the semantic universe. Make it explicit and things become more clear. The above definition of Haskell IO becomes:

The key idea is to treat a value of type IO a as a “lambda expression” that, when converted to normal form, might perform input and output before delivering a value of type a.

Unfortunately, that's not good enough: how can a lambda expression in normal form "be performed" and "deliver" a value?

That is the subject of my next post, wherein I will discuss weaknesses in Haskell's informal metalanguage and suggest replacements for "computation", "value", and "evaluation".

by gar

]]>We need some standard terminology to distinguish between *evaluating* an expression and *performing* the result of an expression of type IO something.

This is a specific example of a more general problem with the standard idioms of programming language semantics. Traditional mathematical approaches - denotational, operational, algebraic, etc. - have one thing in common: they are purely "dyadic", to borrow a term of art from semiotics. They are founded on binary oppositions like language v. semantic domain, term v. denotation. But program texts resist such narrowly focussed conceptualizations. They have too many layers or dimensions of meaning. Terms like "evaluation" tend to be ambiguous, since they are so general.

Consider something as simple as the constant '3'. It's a symbol; it denotes a well-known, cleanly delimited value. But it also "means" a bit pattern in memory (its physical representation); that bit pattern in turn may have multiple meanings. It may be an opcode that "means" addition or some other operation on the architecture. It may be an opcode with a different meaning on another machine. It may have a different meaning for various peripheral interfaces. More importantly, it has a meaning for the author of the text. It may be used with its "standard" (i.e. mathematical) meaning, but it also may be used as an element of some kind of code; it may mean "status complete" or "account suspended" or anything else. It may mean '7', if the author is a perverse obfuscator.

With IO terms the situation is even more complicated. A term like "putChar 'c'", as Richard pointed out, is associated with at least two things: a "value" (meaning, I believe, something like closure/suspension/thunk), and an event ('a' appears on the user's screen), corresponding to evaluation and performance, respectively, to use Richard's terminology.

The fundamental problem is that the meaning of a program text involves at least three elements, not just two; it always involves an interpreter (the author, reader, execution environment, or other) in addition to the text and its denotation. In other words, we should think of programs not as dyadic denotational systems, but as communication systems involving multiple players and relations. A program is not just a denotational text; it is a sign.

Semiotics is usually defined as the study of signs, signification, communication systems, and the like. Regardless of its academic status, it provides a rich stock of concepts and terminology that should be useful for both the formal and informal descriptions of programming language meanings (I almost wrote "semantics"). In particular, its elaboration of concepts like "expression", "sign", "denotation" and the like is highly sophisticated, and accommodates the communicative aspect of such terms that a "programming semantics" approach has trouble capturing. A sign involves not just the signifier and signified, but also an "interpretant" (meaning the idea the interpreter forms about the signified, if I'm not mistaken), a code, and various other elements. It covers just about every thing a semantics approach covers - extension, intension, expression, etc. - but from a different and valuable perspective. For semiotics, a semantic system is only one component of a larger system of meaning(s).

For a simple example, we can use it to make a distinction between terms like '3' and terms like "putChar 'a'". The former has a stable, conventional denotation; that makes it a symbol in Pierce's terminology. The latter is not a symbol by this definition; it would more appropriately be called a sign, or maybe an example of a particular sign class. We could elaborate further by describing the "codes" that construct the meanings of symbols and signs. In an exposition that defines its terminology using the language of semiotics, such distinctions may turn out to help avoid the ambiguities and code-shifts common to traditional semantics-based expositions. I can't yet say for sure, but it seems like a promising approach. It even seems possible that formal modeling (as opposed to notation and exposition) might find useful structuring devices in semiotics.

To sum up, semiotics might provide a clear, structured way to talk about the many elements and relations involved in the "meaning" of a program text, with a discipline that traditional approaches often lack.

A web search on "semiotics" will find many lexicons, tutorials, etc.

by gar

]]>The idea in brief is that we model the world as a category W, whose objects are events and whose sole arrow is a precedence operator ( >- , \succ in LaTeX). What the events represent specifically is irrelevant; we only care about an event "input 'c'" as an abstract event. Think of this as the infinite list of events that contains all possible permutations of real world events.

To provide the semantics of the execution of a program P, we establish an isomorphism between it and W. The isomorphism maps "opaque" terms (e.g. IO terms, random values, etc.) to events. Ordering is preserved by some kind of rule mapping textual precedence in P to event precedence in W. In other words, to model a particular execution of P we incorporate a subset of W into the semantic domain. The static semantics of P are expressed only in the semantic domain, excluding W.

Now the meaning of the program is the text of P, its mapping to a semantic domain (the usual denotational mapping for e.g. '3'), plus the set of all possible isomorphisms between opaque terms and W. The actual exposition is a little more complicated than that, but this conveys the general idea.

This allows us to express program semantics, including dynamic semantics, in a rigorous, formal model that completely excludes any hint of dynamics, contingency, side-effects, etc. The notion of evaluation is also eliminated.

by gar

]]>- Operational semantics
- Denotational semantics
- Model theoretic semantics

All such methods attempt to bridge the syntax/semantics divide. They differ chiefly in the formalisms they adopt to represent the semantic domain, with attendant differences in the syntax-to-semantics mapping. Moggi’s innovation was to use Category Theory to represent the semantic domain. His motivation was the observation that the conventional denotational approach, which identifies type $\alpha \rightarrow \beta$ with a total function of the same signature, “wipes out completely behaviours like non-termination, non-determinism or side-effects, that can be exhibited by real programs.” The key word being "behaviors"; a direct mathematical mapping from terms in a language to objects in a semantic domain has no way of accounting for the *behavior* of terms like “print”, which trigger observable external behavior. Whatever mathematical semantics such terms may have, they also carry additional meaning that cannot be captured by a straightforward denotational representation.

The key term he uses is *notion of computation*:

…in order to interpret a programming language in a category C , we distinguish the object A of values (of type A) from the object T A of computations (of type A), and take as denotations of programs (of type A) the elements of T A. In particular, we identify the type A with the object of values (of type A) and obtain the object of computations (of type A) by applying an unary type-constructor T to A. We call T a notion of computation, since it abstracts away from the type of values computations may produce. There are many choices for T A corresponding to different notions of computations.(Notions of Computation and Monads)

He gives the following examples of "notion of computation": partiality, non-determinism, side-effects, exceptions, continuations, and interactive input/output. What these "notions" have in common is disruption of the semantic integrity of a program text. They violate the principle of referential transparency, rendering the text literally uninterpretable to the human reader. So "notions of computation" effectively means the class of opaque terms in programming languages.

In ordinary mathematical notation, as for any formal notation intended for human readers, there are no such opaque terms. There may be terms that are unfamiliar to the reader, but these are unknown, not opaque. They have definite denotation, whereas opaque terms only acquire a denotation when the program is executed as a process on a computing machine. Their semantics are contingent, in contrast to terms like "2 + 3", which have necessary semantics.

This appears to be an insurmountable and lethal problem for the formal semantics of functional programming languages. One goal for the development of functional languages is the elimination of process considerations from the semantics of the language. It should be possible to reason about the meaning of a program text without reference to the process associated with it at run-time. If the language includes opaque terms whose semantics are contingent on the process, then how can it be possible to eliminate processing logic from the semantics? How could one ever hope to prove two programs equivalent?

Moggi simply restated the problem with a different focus: opaque, contingent terms may be unknowns, but they are known unknowns. We know who and where they are in a text, so instead of asking how we can reason about them, we ask how we can represent and manage them in such a way that the semantic integrity of the text as a whole is retained. We know they denote, but we don't know their denotations; we know they behave, but since formal semantics is atemporal, we don't know the order in which they will acquire their behaviors. So the problem is twofold: we must find a way to represent them as ordinary mathematical values, and can we must find a way to ensure that their behavior is properly disciplined.

Moggi's brilliant solution was to use the strong typing of Category Theory as a representational device, and to use the monad as an organizing device. Categories are strongly typed; they provide us with a means to both represent and sequester opaque terms and their contingent values. Sequestration follows naturally from type discipline: objects of a category representing opaque terms with their contingent values cannot naturally mix with objects of any other category. Since they are ordinary objects, they are integrated into the formal semantics of the language as a whole. Functors are the means by which we "construct" such objects; The fact that they serve as proxies for behaviors does not disrupt this integration; they remain objects. In Moggi's formulation, it is the functor that encodes the notion of computation, each constructed object representing a different notion. This is a clever construction of functor as a mathematical operation with side effects, but it is indisputably a mere metaphor for expository purposes. Functors don't have side effects, but that need not stop us from using them to model executable operations that do.

Behavioral discipline follows from the structural properties of the monad. The Kleisli star of category theory and its Haskell counterpart `»=` use data dependency to impose an ordering on their arguments. The semantics of the program text remain atemporal, but when the text is represented as a process, the data dependency will ensure proper ordering of behaviors.

If the language of categories and functors makes you queasy, here is how Wadler et al. used Moggi's insights to engineer a solution in Haskell. The Haskell equivalents to category and functor are the type and type constructor. A type T A is the image of type A under the mapping of the functor T. What Moggi called notions of computation, and I have called opaque terms and their associated contingent denotations, others have called variously actions, computations, processes, programs, and so forth. Whatever we choose to call them, Haskell represents them as values of a constructed type - IO a and so forth. Haskell's strong typing serves to sequester them in their type, unable to interact with values of other types. They are first-class values in the semantic model of Haskell, with no side-effects within the language. At run-time, they may have observable behaviors; the Haskell bind operator »=, which implements the Kleisli star of CT, imposes an evaluation order on its arguments.

This use of category theory is an absolutely dazzling manoever. A very distinctive feature of category theory is that it erases the distinction between things and relations; "it is technically possible to dispense altogether with the objects and work only with arrows," in the words of its co-inventor Saunders Mac Lane. And thus it is possible to do this with "notions of computation": we can disregard them. The fact that, for example, values of type IO a have some kind of observable side-effect at run-time is irrelevant to the formal semantics of the language. Since they are represented as objects in purely categorical terms, using functors and natural transformations, we can dispense with them and transfer our attention to the morphisms - constructors, bind, etc. In other words, we can think of these things not as values (with additional behavior) but as a nexus of relationships encoded as arrows. So when asked, what is a value of type IO, we can justly respond with an imperious: "Faugh! It means nothing to me! But if *you* must know, go run the program!"

- Gregg Reynolds

by gar

]]>