February 09, 2018

Choosing between nub and ordNub? In Scala you don't have to!

Haskell’s Data.List module contains nub, a function which removes duplicate elements from a list. However, it doesn’t have very good performance. The culprit is the Eq constraint, limiting the function to comparing elements for equality. There is an improved version called ordNub, but it requires an Ord constraint.

The interesting point I want to highlight in this post doesn’t have much to do with this specific use case. It’s probably quite rare that you specifically need nub where ordNub wouldn’t work. What I do want to highlight is that if this situation ever arose in Scala, we would be able to use implicit prioritization to avoid having to choose at all.

Approach

We are going to create a typeclass containing the function signature for nub. By using implicit prioritization we can prefer the instance which uses Ord over the one with Eq, this is done by putting the Ord-based instance in a subclass. This was explained in the stackoverflow post:

trait LowPriorityImplicits {
  //lower priority conversions
}

object HighPriorityImplicits extends LowPriorityImplicits {
  //higher-order ones here
}

Implementation

First we set up the Eq and Ord typeclasses so we can control the available implicit values for them. We’re not actually going to implement any deduplicator functions here, so let’s just make them return a cool String.

trait Eq[A] {
  def eq: String
}
trait Ord[A] {
  val ord: String
}

Let’s give Int an Eq and Ord instance, but only give String an Eq instance. So, when we nub a list of Ints it should use the Ord version while for a list of Strings it should use the Eq version.

implicit val eqInt: Eq[Int]
 = new Eq[Int] { val eq = "My name is Eq Int." }
implicit val ordInt: Ord[Int]
 = new Ord[Int] { val ord = "Hi! I'm Ord Int." }

implicit val eqString: Eq[String]
 = new Eq[String] { val eq = "Hello there, I am known as Eq String." }

We set up another typeclass called Nubbable. We prioritize the creation of instances for this typeclass with a simple rule: prefer the Ord-based over the Eq-based one. We can do this by putting the Ord-based instance in a subclass of where the Eq-based instance is located. For example:

trait Nubbable[A] {
  def nubbed(l: List[A]): String
}

trait LowPrio {
  implicit def eqNub[A](implicit ev: Eq[A]): Nubbable[A]
    = new Nubbable[A] {
      def nubbed(l: List[A]) = "Message from nubber: " + ev.eq
    }
}

object HighPrio extends LowPrio {
  implicit def ordNub[A](implicit ev: Ord[A]): Nubbable[A]
    = new Nubbable[A] {
      def nubbed(l: List[A]) = "Message from nubber: " + ev.ord
    }
}

Now, when we create our nub function by taking Nubbable as an implicit parameter and run it on some examples we get our expected output. The Ord instance responds for Int, while the Eq instance responds for String.

import HighPrio._

def nub[A](l: List[A])(implicit ev: Nubbable[A]): String
 = ev.nubbed(l)

println(nub(List(1,2,3)))
// Message from nubber: Hi! I'm Ord Int.
println(nub(List("a","b","c")))
// Message from nubber: Hello there, I am known as Eq String.

Conclusion

I think this is a cool showcase of maybe a lesser known feature, used by some Scala libraries. I also find it interesting that I could easily solve this problem in Scala, while I wouldn’t know how to solve it in a same fashion in Haskell (if it is even possible). You can find the code for this example here.