By including all of the parallel packages (doSnow, doParallel, doSMP, foreach, and iterators). And embedding the intel MKL math kernal, Revlotion has allowed for 3.6x increase using matrix operations benchmark and 2.5x on the Simon Urbanek test. Simon was a developer on the R-core.
Just to put this in perspective. I actually put together all of the parallel packages together and use doRedis for parameter optimization. The most increase i got out of that was about 60% reduction in time vs sequential single core or 1.6x. It was on a single machine with 6 clusters of local workers on redis. So they must have done a bang up job. definitely going to be looking into this.