Data manipulation is kind of a big deal - part 1

Data manipulation is kind of a big deal - part 1

2019, May 18    

Data manipulation and transformation is a kind of a big deal. Now that I shared that recursion is important but also mentioned there is other ways to achieve certain goals let’s look into that. This will be a first post in a mini series on data manipulation and transformation. I will focus on collections mainly for now - there is other important aspects to this topic but let’s start with most intuitive stuff first.

Keeping that in mind let’s recall Scala basic collections real quick:

  • List
  • Map
  • Set
  • tuple
  • Option (a collection that’s either empty or has one item, fabulous into from Alvin Alexander here)

I find there is usually a couple of basic things I want to do with collections (apart from trivial stuff like getting size etc.):

  • get items that meet a certain condition
  • do something with every item
  • do something with items that meet a certain condition
  • check if there is or isn’t an item that meets a certain condition

Keeping that in mind let’s start with map, flatMap, collect and filter.

map

You can call map on all the collections. It applies a passed function to each element and returns a collection of transformed elements.

def divideByTwo(number: Int) = number / 2

List(10, 20, 30).map(divideByTwo) // List(5, 10, 15)

… the function can also be anonymous, you will very often see code like:

List(10, 20, 30).map(nr => nr / 2) // List(5, 10, 15)

…or even more often with syntactic sugar for anonymous function like this:

List(10, 20, 30).map(_ / 2) // List(5, 10, 15) 

Of course the function used in map can be a function that goes from (in our case) Int to any other type - sky is your limit. So let’s maybe try a function that goes from Int to String?

List(10, 20, 30).map(_.toString) // List("5", "10", "15") 

Summary
You can think of map as “apply this function to each element of this collection and return the result”.

Important note - map for Option

map has a special usage for Option - it will only execute passed function on Some:

val someString: Option[String] = Some("hello")
val noneString: Option[String] = None
someString.map(_.length) // Some(5)
noneString.map(_.length) // None

Can you see already how handy it could be for error handling? There will be a separate post on error handling and Option will be one of the stars of the episode.

flatMap

flatMap is nothing more than calling map first and then flatten. Why would it be useful?

In collections, flatMap is a lifesaver when you find yourself with a type of nested types like List[List[_]] or maybe Option[Option[_]].

Let’s say we get a list of users in an unintuitive type of List[List[String]]:

val allUsers = List(List("ann", "betty"), List("caro", "ciara"))

Let’s also say that we get a requirement to list all the names as one list but with all names uppercase. To make the end code more readable we could first we could create a little function:

def listOfStringUpCase(list: List[String]) = list.map(_.toUpperCase)

So we could now map, right?

val upperCasedListsOfUsers = 
allUsers.map(listOfStringUpCase) // List(List("ANN", "BETTY"), List("CARO", "CIARA"))

…unfortunately that leaves us with a List[List[String]] again. So now it’s time to flatten - to “merge” our nested Lists and end up with a single List:

upperCasedListsOfUsers.flatten // List(ANN, BETTY, CARO, CIARA)

But what about flatmap? Well, instead of all this noise we could just use flatMap instead for a very intuitive and elegant solution:

allUsers.flatMap(listOfStringUpCase) // List(ANN, BETTY, CARO, CIARA)

Summary
You can think of flatMap as “apply this function to each element of this collection, then flatten the result and return transformed collection”.

Important note - flatMap for List[Option[_]]

flatMap has a special usage for a List[Option] - it will “remove” (flatten) None values:

List(Some("hello"), None, None, Some("world")).flatMap(_.map(_.toUpperCase)) 
// List(HELLO, WORLD)

..unlike map:

List(Some("hello"), None, None, Some("world")).map(_.map(_.toUpperCase)) 
// List(Some(HELLO), None, None, Some(WORLD))

But in regards to above map - if we use flatten afterwards the result will be obviously the same as from above flatMap

List(Some("hello"), None, None, Some("world")).map(_.map(_.toUpperCase)).flatten
// List(HELLO, WORLD)

filter

This is quite trivial - filter does exactly what you think it does - it returns items that meet a given predicate:

List(1, 2, 3, 4).filter(_ > 3) // List(4)

collect

A “mix” of filter and map. The syntax might seem a bit complicated as collect takes a partial function (post on partial functions will come soon):

List(1, 2, 3, 4).collect {
    case i if(i > 3) => i * 3
} 
// List(12)

Summary
You can think of collect as “filter values from this collection and for those that match apply this function”

Realistically, how often do you use those instead of loops etc?

Every single day, without a failure. I think map, flatMap and filter are the first things you learn. I remember my very first Scala code, and by very first I literally mean first lines. I had this gorgeous for-loop ready to go and someone pointed out in my code review that my 9 lines of code could be replaced with less than 10 characters… Of course, I was sold!

Here is a few ideas on how to solve some easy problems using what I explained today, compare them in your head to solutions that wouldn’t use map, flatmap, filter or collect. Below solutions are much more compact, aren’t they?

list all odd numbers

val list = List(1, 2, 3, 4)
list.filter(_ % 2 != 0)

find the length of the shortest word in a sentence

val sentence = "How long is the shortest word?"
sentence.split(" ").map(_.length).min // 2 

duplicate all elements in a list

val list = List(1, 2, 3, 4)
list.flatMap(element => List(element, element))

change all names that start with A to uppercase and return them only

val list = List("anna", "brad", "aga", "steve")
list.collect {
    case name if(name.startsWith("a")) => name.toUpperCase
}

Final note

Of course you can use for loops and imperative ways if you wish for solving all of above problems too. I will never ever encourage this though. With incredibly powerful Collections API that Scala offers it would be like hooking up a snow plougher to a Ferrari. Myself, I want to make the most of Scala and I like how clean the code looks and how expressive you can get. How satisfying to have those one liner solutions…