De-suckifying the java.net.URI API with Asymmetric Lenses

I find I’m using the java.net.URI class quite a bit these days. It has two good features: first, it follows the relevant RFCs to the letter; and second, it’s immutable.

The downside of this class is the API - it’s very, well, Java. Actually, it’s bad even by contemporary Java standards. Lets say we wanted to add an argument to the query string of an existing URI. Here’s how you might do it in Java (transliterated to Scala):

def addQueryArgJava(u: URI): URI = {
  val origQuery = if (u.getQuery eq null) "" else u.getQuery + '&'
  new URI(u.getScheme, u.getAuthority, u.getPath, origQuery + "foo=bar", u.getFragment)
}

Ugh, that’s going to get old fast. And of course, this primitive string appending probably won’t do what you want if the original URI already had a “foo” argument.

If we were to implement URI in Scala, we’d probably use a case class, in which case the compiler would give us a free copy method to help out with this kind of thing. But we can add a copy method to java.net.URI without too much trouble. Let’s do away with all the null and -1 unpleasantness while we’re at it:

import java.net.URI

implicit final class RichURI(val uri: URI) extends AnyVal {
  def scheme = Option(uri.getScheme)
  def userInfo = Option(uri.getUserInfo)
  def host = Option(uri.getHost)
  def port = if (uri.getPort < 0) None else Some(uri.getPort)
  def path = Option(uri.getPath)
  def query = Option(uri.getQuery)
  def fragment = Option(uri.getFragment)
  def copy(scheme: Option[String] = scheme,
           userInfo: Option[String] = userInfo,
           host: Option[String] = host,
           port: Option[Int] = port,
           path: Option[String] = path,
           query: Option[String] = query,
           fragment: Option[String] = fragment): URI = new URI(
    scheme.orNull,
    userInfo.orNull,
    host.orNull,
    port getOrElse -1,
    path.orNull,
    query.orNull,
    fragment.orNull
  )
}

assert(URI.create("http://www.google.com/").copy(scheme = Some("ftp")) == URI.create("ftp://www.google.com/"))

The code to add the query argument is now:

def addQueryArgCopy(u: URI): URI = {
  u.copy(query = Some((u.query map (_ + '&') getOrElse "") + "foo=bar" ))
}

Less boilerplate, but still not great. There’s many ways we’re could improve on this, but I’m going to focus on what I think is the best tool for this job: asymmetric lenses.

A Java programmer could intuitively think of an asymmetric lens as the functional programming version of a bean property, without the mutability nightmares. A Scala programmer could think of it as a scalable version of the copy method. An asymmetric lens is basically a means to work with fields within immutable records. I’m using the terms “field” and “record” in a very general sense, the “field” could be a particular key in a “record” that is dictionary map, for example. A lens reifies the mechanics of reading a writing a field into a composable value. As I will attempt to demonstrate, it is the composability of lenses that makes them so powerful.

In the post I’m just going to cover how lenses can help us with a particular example problem. For more details on what they are and how they work, I highly recommend this paper by Tony Morris. It uses Scala and is very easy to follow. All the code from this post, as well as the lens implementation it uses can be found in this gist.

Defining a lens is pretty straight-forward: you provide a function to read the relevant field, and a function to set the field. As we’re talking about immutable records, “setting” a field means returning a new record which has the new value for the field. Unfortunately, this step tends to require a lot of boilerplate in plain Scala. But as we’ll see, you at least only need to do the boilerplate once. There has been some work done to automate much of the boilerplate via compiler plugins or macros, but I haven’t tried these out yet.

So lets define some lenses for the fields of the java.net.URI class. I’m calling the type of my asymmetric lens @> as I think it looks quite nice with infix notation. You read it as “record_type @> field_type”:

object lens {
  val scheme: URI @> Option[String] = Lens(_.scheme, v => _.copy(scheme = v))
  val userInfo: URI @> Option[String] = Lens(_.userInfo, v => _.copy(userInfo = v))
  val host: URI @> Option[String] = Lens(_.host, v => _.copy(host = v))
  val port: URI @> Option[Int] = Lens(_.port, v => _.copy(port = v))
  val pathString: URI @> Option[String] = Lens(_.path, v => _.copy(path = v))
  val queryString: URI @> Option[String] = Lens(_.query, v => _.copy(query = v))
  val fragment: URI @> Option[String] = Lens(_.fragment, v => _.copy(fragment = v))
}

This is using the copy method listed above. As I said, this very boilerplatey, but now we’ll never have call the copy method ever again. The Lens constructor takes two arguments: the first is simply a function to return the field value given an instance of the record, and the second is a curried function taking a field value and a record value, and producing a new record value. All of these particular lenses return an optional value, simply because each element of a URI is optional (an empty string is a valid URI!).

So here’s how we’d use a lens to add to the query string:

def addQueryArgLensString(u: URI): URI = {
  lens.queryString.modify(u)(x => Some((x map (_ + '&') getOrElse "") + "foo=bar"))
}

Hmm, not really an improvement over the copy method, is it? But we can do better. Let’s make another lens for the query string, but this time we’ll parse the query string to make a URI @> Map[String, String]:

// note - not production quality parsing
private val regex = """([^=&]+)=([^&]+)&?""".r
def parse(queryString: Option[String]): Map[String, String] = queryString map { q =>
  (regex.findAllMatchIn(q) map (m => m.group(1) -> m.group(2))).toMap
} getOrElse Map.empty
def toQueryString(pairs: Iterable[(String, String)]) = if (pairs.isEmpty) None else Some(pairs map {
  case (k, v) => s"$k=$v"
} mkString "&")

object lens {
  val query: URI @> Map[String, String] = Lens(
    r => parse(r.query),
    v => _.copy(query = toQueryString(v))
  )
}

How does this help? Well, we can also define a lens that will “focus” on a given key of any Map:

def mapLens[A, B](key: A): Map[A, B] @> Option[B] = Lens(
  _ get key,
  _ map (v => (_: Map[A, B]) + (key -> v)) getOrElse (_ - key)
)

And then we compose these two lenses, to focus not just on the query string, but the “foo” argument within the query string (I have written compose as >=>):

def addQueryArgLens(u: URI): URI = {
  lens.query >=> mapLens("foo") set (u, Some("bar"))
}

This avoids having to manipulate strings or manually copy objects, and if the query string already has a “foo” argument, its value will simply be replaced. Let’s do the comparison for removing a query string argument:

/**
 * Java version of removing query argument "foo", if it is present.
 */
def removeFooArgJava(u: URI): URI = {
  val re = "foo=([^&]+)&?".r.pattern
  val matcher = re.matcher(if (u.getQuery eq null) "" else u.getQuery)
  val newQuery = matcher.replaceFirst("")
  new URI(u.getScheme, u.getAuthority, u.getPath, if (newQuery.isEmpty) null else newQuery, u.getFragment)
}

def removeFooArgLens(u: URI): URI = {
  lens.query >=> mapLens("foo") set (u, None)
}

And naturally, this new “foo” lens can be composed with others. So if, say, the URI was the field of some other case class, all you have to do is define a lens for that field and then compose it with whatever you need, for example:

final case class Example(href: URI,
  // more fields
)

val hrefLens: Example @> URI = Lens(_.href, v => _.copy(href = v))
val example = Example(new URI("…"))
val exampleWithFoo: Example = hrefLens >=> lens.query >=> mapLens("foo") set (u, Some("bar"))

I really hope the Scala team can one day include lenses in the Scala standard library. Scala’s motto is “a Scalable language”. Well, the copy method definitely doesn’t scale well:

val exampleWithFoo: Example = example.copy(href = example.uri.copy(
  query = Some(toQueryString(parse(example.uri.query) + ("foo" -> "bar")))
))

And in real code, record/field hierarchies can get a lot more complex than this.

There’s more cool stuff that can be done with lenses. A partial asymmetric lens can be convenient for manipulating values that may be absent. And lenses work really well with the state monad. See the above referenced paper for details.

Comments !

blogroll

social