Data manipulation is kind of a big deal - part 2
In this post I will continue on data manipulation by showing aggregation/accumulation methods - fold and reduce. There is a chance you’ve heard of “map and reduce” before in the context of big data - you know already what’s map
- now it’s time to find out about reduce
.
In a previous post I showed map
and flatMap
- two methods to apply a function to a given collection and get that collection returned. But what if we want to aggregate or accumulate something without necessarily transforming our collection? In other words, what if we want to, for instance, get a sum of all integers in a given collection? fold
and reduce
come in handy. Let’s dive in.
What’s the difference between reduce and fold?
A simple answer is: reduce
accumulates a single result and fold
accumulates a single result using a start value. If we call reduce
on a list we need to provide a function - that function will be evaluated for the first two elements and this result will be then used as a first argument in the next evaluation.
Calling fold
is identical with the difference that we provide a start value which will be used on the first evaluation with the first element.
It’s a bit complicated to explain with sentences so in practice it would look like this, note the comments that show how the evaluation happens:
Obviously the function _ + _
can be anything that is a valid function on two elements of a given collection, here: any function that is valid for two elements of type Integer
.
Also, just as a reminder, _ + _
is a syntactic sugar for an anonymous function on a tuple (a, b) => a + b
When to reduce and when to fold?
Know when to hold ‘em, know when to fold ‘em.
The first two questions I had when I saw fold
next to reduce
were:
- if they do pretty much the same thing, what’s the real difference between them - how do I know when to use which?
- why would I need a “start value”?
Let me try and shed some light on that.
To answer the first question in a simple term - use reduce
when your result will be of same type as the type of elements in a collection. For example:
- get a sum of all elements in a
List[Int]
- the result will beInt
- get max of all elements in a
List[Int]
- the result will beInt
Use fold
when your result will be of a different type than elements of your collection OR if there is a start value you would like to pass as the first element. For example:
- convert a list of integers to a string. (
reduce
wouldn’t work here because it would be expecting the final result to beInt
, notString
, as mentioned above)
As for the second question about the “start value” - as you can see above, in collections it would be used to tell the compiler what type we expect. If we changed above to:
…the code wouldn’t compile.
foldLeft and foldRight
Without going into much detail there is two variations of fold
:
foldLeft
which “traverses” your collection from left to right and is tail recursive and foldRight
which “traverses” from right to left and is head recursive. (remember tail recursion vs head recursion? now you know that foldRight
might blow up on a large collection!)
foldLeft
in practice:
foldRight
in practice:
You can see the result is the same. Now take a moment to think about the difference in results if our function was not addition but subtraction - the results would surely differ! Also, knowing the difference between head and tail recursion can you see how it would make more sense to try and make the most of folding using foldLeft
?
Realistically, how often do you use those?
As much as you can, it’s a quick a painless way to calculate accumulations and it can be used to solve some well known problems in a smart and quick way. Other than work on collections fold
and reduce
could be used as a form of error handling as it can replace pattern matching (more on error handling in a separate post):
In the examples above you can see that folding a None
or an empty list results in the “start value” returned.
You could also use fold
to transform Either
regardless of whether it’s Left
or Right
. The way fold
is implemented on Either
is a shortcut to pattern matching:
Think about above fold as “if this Either is Left then evaluate first function I passed but if this Either is Right evaluate the second function I passed”.
Here is also a few ideas on how to solve some easy problems using what I explained today, compare them in your head to solutions that wouldn’t use fold
or reduce
.
get maximum from a list of integers
get the shortest word in a sentence
revert a list
Final note
I can’t stress enough how helpful it is to practice on little code tasks like on codewars.com or so to wrap one’s head around fold
and reduce
. It’s another way of thinking about problems - I can guarantee with time you will discover they are quite important concepts in functional programming. That being said, sometimes you will notice you can replace your fold
or reduce
with a simpler looking pattern match - then it’s completely up to you. Myself, if my fold
or reduce
look overly complicated I would probably try to improve readability by choosing a simpler looking solution.
(*) in this case “” is String’s identity element for operation of concatenation.
When an identity element is paired with any element via the operation, it returns that element.
In practise: for addition of integers identity is 0 because 1 + 0 = 1. For multiplication it’s 1 because 4 * 1 = 4. It’s probably a questionable choice to speak of identity for Strings, but it works for our simple example