The Split-Apply-Combine Strategy

Robin Gower Infonomics

2015-08-24 Manchester R

Why care about strategies?

Strategies allow you to:

  • Leverage common tools
  • Focus on the unique aspects of your problem
  • Find a formulation that clearly expresses your intent

A brief description of the strategy

  • Split the problem up into manageable pieces
  • Apply a function on each piece independently
  • Combine all of the pieces together again

...or in a thousand words

split-apply-combine diagram

Why not use for loops?

  • easier to read (concentrate on what not how)
  • bookkeeping (indices/ placeholders/ edge-cases)
  • hard to parallelise

Tools for doing split-apply-combine

  • Excel: pivot tables
  • SQL: group by
  • mapReduce
  • R: plyr & dplyr