| Eugene Kirpichov ( @ 2008-07-02 13:30:00 |
| Entry tags: | functional programming, java |
Java: readability and some FP (Functional programming in java)
There are cases where functional style is the most appropriate, and among these cases are those where it can be used in java without sacrificing readability.
These cases are exactly what I'll discuss.
I'll also discuss refactoring of 'unconsciously functionalish' code (like new AndFilter(new FieldMatchesPatternFilter(new FieldReference("name"), ".*John.*"), new BlaBlaBlaFilter())) for readability.
DISCLAIMER:
Most of this article is not at all about FP. It is about increasing readability of certain common patterns which, when used unreadably, render FP unusable.
I'll talk a little about FP in the end of the article.
With respect to FP, java has the following problems:
- The infamous 'kingdom of nouns' (the tradition of considering everything to be an object or noun and thinking that to add two numbers, one has to ask a NumberAdder)
- Lack of a convenient syntax for constructing data, like tuples, lists etc
- Cumbersome syntax for generics and poor type inference and polymorphism implementation
Our weapons agains these problems shall be 'import static', varargs and a little bit of gunpowder.
1. Increasing readability of data construction
Very bad:
private static final Set<String> INTERESTING_TAGS = new HashSet<String>();
Map<String,Integer> WEIGHTS = new HashMap<String,Integer>
static {
INTERESTING_TAGS.add("A");
INTERESTING_TAGS.add("FORM");
INTERESTING_TAGS.add("INPUT");
INTERESTING_TAGS.add("SCRIPT");
INTERESTING_TAGS.add("OBJECT");
WEIGHTS.put("bad", -2);
WEIGHTS.put("poor", -1);
WEIGHTS.put("average", 0);
WEIGHTS.put("nice", 1);
WEIGHTS.put("outstanding", 3);
}
Bad:
private static final Set<String> INTERESTING_TAGS = new HashSet<String>(Arrays.asList(new String[] {
"A","FORM","INPUT","SCRIPT","OBJECT"
}));
for(double guessedScale : new double[] {0.01, 0.1, 1, 10, 100}) {
...
}
Good:
public class CollectionUtils {
public static T[] ar(T... ts) {return ts;}
public static Set<T> set(T... ts) {
return new HashSet<T>(Arrays.asList(ts));
}
public static Map<K,V> zipMap(K[] keys, V[] values) {...}
}
import static CollectionUtils.set;
import static CollectionUtils.ar;
private static final Set<String> INTERESTING_TAGS = set("A","FORM","INPUT","SCRIPT","OBJECT");
for(double guessedScale : ar(0.01, 0.1, 1, 10, 100)) {
...
}
Map<String,Integer> WEIGHTS = zipMap(
ar("bad", "poor", "average", "nice", "outstanding"),
ar(-2, -1, 0, 1, 3));
What can be easier? It's just the good old data abstraction that everyone remembers from the second chapter of SICP!
Its usefulness is particularly noticeable in unittests, where one has to test things on complex data structures.
In those cases the best decision is often to encode these structures in strings and to write a smallish parser for them.
For instance:
assertTrue(GraphAnalyzer.isConnected(graph("1->2 2->3 3->1")));
2. Increasing readability of combinators
Bad:
Filter f = new AndFilter(first, second);
Good:
public abstract class Filters {
public static Filter and(Filter a, Filter b) {return new AndFilter(a,b);}
}
import static Filters.*;
Filter f = and(first,second);
Even better:
public abstract class Filter {
Filter and(Filter other) {return Filters.and(this,other);}
}
Filter f = first.and(second).and(third);
And yet even better:
public abstract class Filters {
public static Filter ALWAYS_TRUE = new AlwaysTrue();
public static Filter and(Filter... filters) {
Filter res = ALWAYS_TRUE;
for(Filter f : filters) res = res.and(f);
return res;
}
}
Another example.
Bad:
enum StringComparisonKind {EXACT, REGEX, GLOB}
enum StringPosition {ANYWHERE, WHOLE_STRING, STARTS_WITH, ENDS_WITH}
public class StringCondition {
...
public StringCondition(String pattern, StringComparisonKind comparisonKind, StringPosition position) {...}
}
conditions.add(new StringCondition("foo", StringComparisonKind.REGEX, StringPosition.ANYWHERE))
Good:
import static StringComparisonKind.*;
import static StringPosition.*;
public class StringConditions {
public static regexWhole(String regex) {return new StringCondition(regex, REGEX, WHOLE_STRING);}
public static regexAnywhere(String regex) {return new StringCondition(regex, REGEX, ANYWHERE);}
public static exactWhole(String pattern) {return new StringCondition(pattern, EXACT, WHOLE_STRING);}
...
}
import static StringConditions.*;
conditions.add(regexAnywhere("foo"));
Abstraction, abstraction, abstraction - it is surprisingly underestimated and underused.
3. Increasing readability of anonymous classes
...by extracting them into constants, local variables, named classes or whichever named entity.
Bad:
List<Order> orders = CollectionUtils.flatten(CollectionUtils.map(customers, new Function<Customer, List<Order>>() {
public List<Order> apply(Customer customer) {
return customer.getOrders();
}
}));
Good:
import static CollectionUtils.*;
private static final Function<Customer, List<Order>> GET_ORDERS = new Function<Customer, List<Order>>() {
public List<Order> apply(Customer customer) {
return customer.getOrders();
}
};
List<Order> orders = flatten(map(customers, GET_ORDERS));
4. Increasing readability of combinator classes
Bad:
CustomerProcessor taxes = new ComputeTaxesCustomerProcessor();
These suffixes are absolutely useless. They could be useful if java didn't have packages or static typing: suffixes would prevent us from polluting the global namespace or from accidentally mixing one And with an other; but java has them, and suffixes are as useless as hungarian notation.
Good:
CustomerProcessor taxes = new ComputeTaxes();
5. Increasing readability of cumbersome generic types - poor man's typedef
Bad:
class FooEverythingDoer {
...
Map<String, String> getProperties(Foo foo) {...}
void putProperties(Foo foo, Map<String, String> properties) {...}
Map<Foo, Map<String, String>> getPropertiesBatch(Iterable<Foo> foos) {...}
Foo findByProperties(Map<String, String> partOfProperties) {...}
...
}
Good:
class Properties extends Map<String,String> {
(constructors matching super here)
}
class FooEverythingDoer {
...
Properties getProperties(Foo foo) {...}
void putProperties(Foo foo, Properties properties) {...}
Map<Foo, Properties> getPropertiesBatch(Iterable<Foo> foos) {...}
Foo findByProperties(Properties partOfProperties) {...}
...
}
Bad:
class MagicBarMerger {
public void mergeIntoDb(List<MagicBar> bars) {
List<MagicBar> existingBars = barDao.getAllBars();
Map<Integer, List<Pair<MagicBar, MagicBar>>> pairsById = joinOnId(bars, existingBars);
List<MagicBar> merged = new ArrayList<MagicBar>();
for(Pair<MagicBar, MagicBar> pair : pairsById) {
merged.add(merge(pair));
}
barDao.removeAllBars():
barDao.putBarsBatch(merged);
}
private Map<Integer, Pair<MagicBar, MagicBar>> joinOnId(List<MagicBar> as, List<MagicBar> bs) {
....
}
private MagicBar merge(Pair<MagicBar, MagicBar> bars) {
....
}
}
Good:
class MagicBarMerger {
private static class Bars extends Pair<MagicBar,MagicBar> {(constructor matching super)}
public void mergeIntoDb(List<MagicBar> bars) {
List<MagicBar> existingBars = barDao.getAllBars();
Map<Integer, Bars> pairsById = joinOnId(bars, existingBars);
List<MagicBar> merged = new ArrayList<MagicBar>();
for(Bars bars : pairsById) {
merged.add(merge(bars));
}
barDao.removeAllBars():
barDao.putBarsBatch(merged);
}
private Map<Integer, Bars> joinOnId(List<MagicBar> as, List<MagicBar> bs) {
....
}
private MagicBar merge(Bars bars) {
....
}
}
6. Increasing readability of generics again, utilizing whatever type inference java has
Bad:
Map<Integer, List<String>> namesById = new HashMap<Integer, List<String>>();
Good:
public class CollectionUtils {
public static <K,V> Map<K,V> newMap() {
return new HashMap<K,V>();
}
}
import static CollectionUtils.newMap;
Map<Integer, List<String>> namesById = newMap();
These are similar to ar(T... ts) and set(T... ts) of rule 1.
7. Increasing debuggability
Sometimes I feel a bit uneasy when, during debugging, I inspect an object just to find that is has class FooProcessor$3$1 and two fields, 'innerProcessor' and 'value' with a class of FooProcessor$4$2 and a value of 1.
Therefore it is useful to give meaningful names to all classes that escape scope of the method or enclosing class (most do, according to rule 3).
Giving them meaningful toString's is even better. They don't take much to write but they give a lot to read during debugging.
Higher-order combinators like AND and OR are often used to glue a statically unknown number of parts. Then one ends up with an object structure like AND(x, AND(y, AND(z, ...)))). Such structures are inconvenient to view in the debugger and are even more inconvenient to step through.
So, it is sometimes a good idea to make the AND combinator and alike glue not 2 but an arbitrary number of parts, and have the static factory method 'Filters.and(first,second)' (you did abstract the constructor, right? ;) ) check whether 'first' or 'second' are already AND's and, if so, glue them into a bigger one.
Then the recursive structure will turn into an nice iterative one, pleasing to be viewed in debugger, serialized to XML and to have its loop-over-parts stepped through.
****
Here goes some heavy FP artillery.
0. Read the J vocabulary ;)
This one: http://www.jsoftware.com/help/dictionar
And get a feeling of what the primitive functions and combinators do. You don't have to start writing in J or even to learn it, just read the descriptions of primitives.
J is an outstanding example of a functional language without statical typing and without even closures. This means that most J combinators can be implemented and used in java without stumbling over java's limitations.
My favourites are "&." (f &. g = f^-1 . g . f), "/." (x f/. y = a vector of f(g) where g ranges over groups of x grouped by y) and "/:" (x /: y - x vector, sorted by corresponding values of the y vector)
8. Currying the last argument
(Remember a rule of thumb: the last argument should be the most frequently varying one, then it will be convenient to curry)
Bad:
public class CollectionUtils {
public static <T> List<T> filter(Filter<T> filter, List<T> ts) {...}
public static <K, V> List<V> map(Function<K, V> fun, List<K> ts) {...}
}
Good:
public class CollectionUtils {
public static <T> Function<List<T>, List<T>> filter(Filter<T> filter) {...}
public static <K, V> Function<List<K>, List<V>> map(Function<K, V> fun) {...}
}
Why:
Because the combinators now are combineable.
public class FP {
public static <A,B,C> Function<A,C> chain(Function<A,B> first, Function<B,C> second) {...}
}
Or like this:
public abstract class Function<K,V> {
public abstract V apply(K argument);
public Function<K,U> then(Function<V,U> second) {return FP.chain(this, second);}
}
Now we can write things like:
Let Order be a tuple (Customer, Product, Time)
public abstract class Aggregate<K,V> extends Function<Collection<K>, V> {}
// select K,V from S group by K
public abstract class Partitioned<S,K,V> extends Function<Collection<S>, Map<K, V>> {}
Function<Order,Product> getProduct() {..}
Function<Product,Price> getPrice() {..}
Filter<Product> categoryEquals(String pat) {..}
Partitioned<Order,Month,T> byMonth(Aggregate<Order, T> inner) {..}
And, with that:
public Partitioned<Order,Month,Customer> mostGenerousCustomerByMonth() {
Function<Order, Price> ORDER_PRICE = getProduct().then(getPrice());
Aggregate<Order, Order> MOST_EXPENSIVE_ORDER =
over(getProduct().then(categoryEquals("Cars")), maximizeBy(ORDER_PRICE));
return byMonth(MOST_EXPENSIVE_ORDER.then(Order.GET_CUSTOMER));
}
{
...
Map<Month, Customer> goodGuysIn2007 = filter(timeBetween(year(2007), year(2008)))
.then(mostGenerousCustomerByMonth())
.apply(ourOrders);
...
}
9. Pairs and binary relations
Nearly every project has a home-brewn Pair class and uses lists like List<Pair<Foo,Bar>>
I personally like collecting values of a function 'foo' on the left parts of pairs, or collecting pairs of foo(left) and bar(right), or retaining only pairs whose right part satisfies predicate qux, or stuff.
God would kill a kitten if we didn't create some combinators for that and bless List<Pair<Foo,Bar>> with the ability of having them as member methods:
class BiRelation<L,R> {
...
List<Pair<L,R>> allEntries() {..}
static BiRelation<L,R> rel(List<Pair<L,R>> pairs) {..}
BiRelation<R,L> flip() {..}
BiRelation<L,R> filterL(Predicate<L> p) {..}
BiRelation<R,L> filterR(Predicate<R> p) {return flip().filterL(p).flip();}
BiRelation<L,List<R>> compressL() {..}
BiRelation<List<L>,R> compressR() {return flip().compressL().flip();}
BiRelation<P,Q> fmap(Function<L,P> f, Function<R,Q> g) {return rel(map(pair(f,g)).apply(allEntries()));}
List<T> map(Function<Pair<L,R>, T> fun) {...}
...
}
This also illustrates another cool thing, the "worker/wrapper transformation" (the flip()/.../flip() thing).
Now we can write a BFS:
private static BiRelation<Node, Node> bfs(BiRelation<Node, Node> graph, Node root) {
Set<Node> roots = singleton(root);
Set<Node> reached = new LinkedHashSet<Node>();
reached.add(root);
List<Pair<Node,Node>> res = newList();
while(true) {
BiRelation<Node,Node> nextLayer = graph.filterL(memberOf(roots)).filterR(not(memberOf(reached)));
if(nextLayer.isEmpty())
break;
res.addAll(nextLayer.allEntries());
reached.addAll(roots = nextLayer.rightSet());
}
return rel(res);
}
Or the Schwartz transform (used when sorting an array on the value of a function, in order to avoid computing the function multiple times for each element. The essence is that we glue a vector of function values to the array, sort the resulting vector of pairs on its second element and then unglue the function values):
public static Function<T,Pair<T,U>> attach(Function<T,U> fun) {..}
public static Function<Pair<T,U>,T> detachR() {..}
public static Function<Pair<T,U>,U> detachL() {..}
public static <U extends Comparable<? super U>>
Function<List<T>,List<T>> schwartzSortBy(Function<T,U> fun)
{
return map(attach(fun)).then(sortBy(second())).then(map(detachR()));
}
Or this one:
BiRelation<Customer, Order> customer_order = ... BiRelation<Product, Integer> product_buyersCount = customer_order.fmap(ID, GET_PRODUCT).compressR().fmap(SIZE, ID).flip();
Or many other things.
flip() is my favourite; many things are symmetric: pairs, relations, invertible functions, Map's etc.
Ternary relations can be rotated: TriRelation<A,B,C>.rotate213() etc. However, their use quickly gets mind-boggling.
10. Coarse combinators
Some processors use the VLIW architecture (Very Large Instruction Word). One instruction contains several actions: for example, multiply-and-add-to-accumulator, divide-and-compute-square-root, send-email-and-switch-screen-resolution :) This is done to improve instruction-level parallelism.
In the case of java FP we can utilize a similar trick to increase readability and decrease parentheses; maybe also for performance: a coarse combinator can be implemented more effectively than through the plain combination of primitive combinators.
Instead of this:
Aggregate<Customer, Order> everyonesOrdersIn2008 = map(GET_ORDERS).then(concat()).then(filter(timeBetween(year(2007), year(2008))))
Use this:
public static <K,V> Aggregate<K,V> concatMapOver(Function<K,V> fun, Filter<K> filter) {...}
Aggregate<Customer, Order> everyonesOrdersIn2008 = concatMapOver(GET_ORDERS, timeBetween(year(2007), year(2008)));
That's it.