Geoffs hangout on the interwebs about stuff I like and do…

28Mar/13

The apply function in R

So as discussed in this post I will be investigating the different members of the 'apply function family' in R. This post starts with the most basic one, called apply().

The R manual states the following

apply(X, MARGIN, FUN, ...)

With the following arguments

X an array, including a matrix.
MARGIN a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.
FUN the function to be applied: see ‘Details’. In the case of functions like +%*%, etc., the function name must be backquoted or quoted.

So what does this mean in practice? 

Basically it means that the user can apply a standard function (eg. mean, sum, etc.) or a user written function on a each element in a row/colum of the array X and do this per row and/or column as set in the MARGIN attribute. This MARGIN attribute is:

  • 1 if you want to calculate the FUN across all elements for each row
  • 2 if you want to calculate the FUN across all elements for each column

Example

To illustrate the different applications of the apply() function I will make use of the USPersonalExpenditure dataset. So first I am going to load this data by using the data() function.

1
data(USPersonalExpenditure)

This data set consists of United States personal expenditures (in billions of dollars) in the categories; food and tobacco, household operation, medical and health, personal care, and private education for the years 1940, 1945, 1950, 1955 and 1960.

And it looks like this:

1940
1945
1950
1955
1960
Food and Tobacco
22.200
44.500
59.60
73.2
86.80
Household Operation
10.500
15.500
29.00
36.5
46.20
Medical and Health
3.530
5.760
9.71
14.0
21.10
Personal Care
1.040
1.980
2.45
3.4
5.40
Private Education
0.341
0.974
1.80
2.6
3.64

 

Now let's assume we are interested in to total expenditure per year. I can sum the values for a column by doing

1
sum(USPersonalExpenditure[,1])

However this is only for the first column (1940) and I want it for all years, so here we can start using apply. Because we want to apply the sum function across all values in a column, for each column.

1
apply(USPersonalExpenditure,2,sum)

is all we need to do. If something equal with a for loop needed to be produced it would become something like:

1
2
3
4
5
a<-NULL;
for (i in 1:dim(USPersonalExpenditure)[2]) {
  a[i]<-sum(USPersonalExpenditure[,i])
}
names(a)<-colnames(USPersonalExpenditure)

As you can see, it takes much more lines to get the same result..

If we want to calculate the average spend across the 5 years in the matrix per category we get this through

1
apply(USPersonalExpenditure,1,mean)

This ends my first tutorial. For questions/remarks/etc. please feel free to leave comments below or contact me through @geoffrey_stoel on twitter or on google+


Share/Bookmark
Tagged as: , , No Comments
28Mar/13

Moving up in the ranks: from an R-Rookie to an R-Pro

R_logoI am playing with R now for little over a year. Not very intensive, but once in a while I start up R Studio and do some coding and analysis. But I am still far, far away from becoming an R-Pro. If you talk to or read some of the posts of the more seasoned R users, it seems that one of the major steps an R-Rookie can make is in using the 'apply' family of functions instead of using for-loops. It seems to be more efficient and faster. I have been trying out some of these apply functions with a lof struggles. And some of the times I jumped back to the for-loop, because I could not use them in the right manner.

Ever since it has been on my 'someday/maybe' list to develop a better understanding of these functions and document it for myself in such a manner that I understand them and can apply them in the future. So that is the quest that I am on for the next couple of weeks. During this quest I will be posting updates on this blog to share my steps and basically build a  set of tutorials about the apply functions.

I will start with the normal apply() function and then move on to lapply(), sapply(), etc. from the base R package (I still have to think about the right order though). Afterwards I will have a look at the plyr package by Hadley Wickham.

I posted a question for input on this subject on google+, Twitter and LinkedIn and I received interesting and relevant feedback on this (and confirmation that I am not the only one struggling with understanding the apply functions). See you soon on my first post about the apply() function.


Tagged as: , , No Comments