Do You Get Referential Transparency?

Do you get referential transparency? I mean, really get it? I thought I did, but I recently gained a deeper appreciation for what it means.

Referential transparency is also commonly referred to as “pure” functional programming. I understood this to mean that the code lacked side effects like performing I/O or mutating state, and this seems to be a common understanding. But referential transparency can actually be defined quite precisely and succinctly (although I believe agreement with this definition would be far from universal):

If replacing expression x by its value produces the same behaviour, then x is referentially transparent

Sounds simple enough. But notice that this definition contains no reference to I/O or state or any of the things usually associated with lack of referential transparency; keep that in mind. The above definition is the entire definition: if replacing an expression by its value produces no change in program behaviour, then you have referential transparency, regardless of what the behaviour is.

Now, consider the following example, using Scala (you will need to understand how Scala translates for syntax to map and flatMap method calls to follow this):

final class Writer[A](val value: A, val log: Vector[String]) {

  def map[B](f: A => B) = new Writer(f(value), log)

  def flatMap[B](f: A => Writer[B]) = {
    val b = f(value)
    new Writer(b.value, log ++ b.log)


object Writer {

  /** Log a message without a value */
  def log(msg: String) = new Writer((), Vector(msg))

  /** Log a message with a value */
  def logv[A](msg: String)(a: A) = new Writer(a, Vector(msg))

  def apply[A](a: A) = new Writer(a, Vector.empty)


The above is a very basic writer monad. Let’s put it to use in a silly example:

import Writer.{ log, logv }

 * Add one to a number and log a message.
 * Referentially transparent.
def addOneAndLog(i: Int) =  logv("added 1 to " + i)(i + 1)

val log1: Writer[Int] = for {
  _ <- log("before")
  n1 <- addOneAndLog(41)
  n2 <- addOneAndLog(41)
  _ <- log("after")
} yield (n1 + n2)

All the code so far has been pure functional, that is, referentially transparent. What we have done is made an ordinary value, log1 that produces an integer value, while also accumulating logging messages, which could be thought of as really simple instructions to “log this message”. At some point we decide we want to display these log messages to the user, so we write a method to “run” the accumulated logging instructions, which requires a non-referentially transparent function:

 * Side-effect! Prints log messages to stdout.
 * Not referentially transparent.
def run[A](w: Writer[A]): A = {
  w.log.foreach(printf("*log* %s%n", _))

The output of printf("result1 = %d%n", run(log1)) is:

*log* before
*log* added 1 to 41
*log* added 1 to 41
*log* after
result1 = 84

as expected. Notice that the addOneAndLog(41) expression is repeated twice. What if we replace that with a single value:

val add = addOneAndLog(41)
val log2: Writer[Int] = for {
  _ <- log("before")
  n1 <- add
  n2 <- add
  _ <- log("after")
} yield (n1 + n2)

What log messages would you expect from printf("result2 = %d%n", run(log2))? We get exactly the same log messages as for log1. This is because the expression we factored out was referentially transparent: replacing it by its final value changes absolutely nothing about how the code behaves.

I thought I had a good handle on all this functional stuff, when I made a change like that between log1 and log2 to some production code. I was surprised to see output still showing up twice, and it took me a few minutes to figure out I was seeing referentially transparency in action. This made me realise that so many of my expectations are still conditioned by years of programming with imperative, side-effecting code. Perhaps when I find the lack of referential transparency surprising I can officially call myself a functional programmer.

One very interesting thing to note is that the need for I/O in no way precludes the use of referentially transparent code. Some might say at this point “but you’re cheating, the run function is not referentially transparent!”. Yes, all referentially transparent code will end up being translated to a series of side effects in order to run on a real machine. The fact that some of this translation happens in the same language used to write our pure code is a detail of no great significance.

But if side effects are always unavoidable at some point, why bother worrying about referential transparency at all? Well, my experience has been that after maximising pure code, you’ll find that whatever side effecting code you need tends to be very simple and self-contained, and does not get in the way of composition the way it would in a “traditional” imperative program. The composition of large programs from smaller programs is the power that referential transparency gives you. Of course, we’ve come up with many way to compose bits of code together, but they all suck compared to referential transparency. I think that the relative simplicity of the concept (notwithstanding occasional surprises for those of us new to it) sometimes blinds us to just how powerful it really is.

If my toy logging example hasn’t convinced you that this is a good way to write programs, perhaps a slightly (only slightly) less trivial example will help. Like many ideas, the benefits only really become apparent when you start scaling up. This gist contains a program that interleaves the following: reading from the console, writing to the console, and reading from the file system, all driven from code that is referentially transparent.

Comments !