Geoffs hangout on the interwebs about stuff I like and do…

28Mar/13

The apply function in R

So as discussed in this post I will be investigating the different members of the 'apply function family' in R. This post starts with the most basic one, called apply().

The R manual states the following

apply(X, MARGIN, FUN, ...)

With the following arguments

X an array, including a matrix.
MARGIN a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.
FUN the function to be applied: see ‘Details’. In the case of functions like +%*%, etc., the function name must be backquoted or quoted.

So what does this mean in practice? 

Basically it means that the user can apply a standard function (eg. mean, sum, etc.) or a user written function on a each element in a row/colum of the array X and do this per row and/or column as set in the MARGIN attribute. This MARGIN attribute is:

  • 1 if you want to calculate the FUN across all elements for each row
  • 2 if you want to calculate the FUN across all elements for each column

Example

To illustrate the different applications of the apply() function I will make use of the USPersonalExpenditure dataset. So first I am going to load this data by using the data() function.

1
data(USPersonalExpenditure)

This data set consists of United States personal expenditures (in billions of dollars) in the categories; food and tobacco, household operation, medical and health, personal care, and private education for the years 1940, 1945, 1950, 1955 and 1960.

And it looks like this:

1940
1945
1950
1955
1960
Food and Tobacco
22.200
44.500
59.60
73.2
86.80
Household Operation
10.500
15.500
29.00
36.5
46.20
Medical and Health
3.530
5.760
9.71
14.0
21.10
Personal Care
1.040
1.980
2.45
3.4
5.40
Private Education
0.341
0.974
1.80
2.6
3.64

 

Now let's assume we are interested in to total expenditure per year. I can sum the values for a column by doing

1
sum(USPersonalExpenditure[,1])

However this is only for the first column (1940) and I want it for all years, so here we can start using apply. Because we want to apply the sum function across all values in a column, for each column.

1
apply(USPersonalExpenditure,2,sum)

is all we need to do. If something equal with a for loop needed to be produced it would become something like:

1
2
3
4
5
a<-NULL;
for (i in 1:dim(USPersonalExpenditure)[2]) {
  a[i]<-sum(USPersonalExpenditure[,i])
}
names(a)<-colnames(USPersonalExpenditure)

As you can see, it takes much more lines to get the same result..

If we want to calculate the average spend across the 5 years in the matrix per category we get this through

1
apply(USPersonalExpenditure,1,mean)

This ends my first tutorial. For questions/remarks/etc. please feel free to leave comments below or contact me through @geoffrey_stoel on twitter or on google+


Share/Bookmark
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

No trackbacks yet.