Switching from Matlab to R: Part 1

Introduction

I was thinking recently about how best to help someone transitioning
from Matlab(TM) to R, and did my best to recall what sorts of things I
struggled with when I made the switch. Though I resisted for quite a
while, when I finally committed to making the change I recall that it
mostly happened in a matter of weeks. It helped that my thesis
supervisor exclusively used R, and we were working on code for a paper
together at the time, but in the end I found that the switch was
easier than I had anticipated.

Tips

  1. Don’t be afraid of the assign <- operator. It means exactly the
    same thing as you would use = in matlab, as in
a <- 1:10 # in matlab a=1:10;

except that it make more logical sense.

The only place you should use = is in logical comparisons like a ==
b
(as in matlab), or for specifying argument values in a function
(see number 5).

  1. Vectors are truly 1 dimensional. This is different from matlab in
    the way that you could not add together an Nx1 and a 1xN vector. In
    R it would be just two vectors of length N. The transpose in R is
    by doing t(), and the transpose of a vector (or class numeric)
    is the same as the original.

  2. Array indices use square brackets, like

a[1:5] <- 2 # assign the value 2 to the first 5 indices of a

This is one of the things that drove me crazy about matlab, that it
used () for indices as well as function arguments. It makes mixed
array indexing and function calls very confusing to look at and
interpret.

  1. By default arithmetic operations are done element-wise. If you have
    two MxN matrices (say A and B), and you do C &lt;- A*B, every
    element in C is the product of the corresponding elements in A and
    B. No need to do the .* stuff as in matlab. To get matrix
    multiplication, you use the %*% operator.

  2. Function arguments are named, so the order isn’t super
    important. If you don’t name them, then you have to give them in
    the order they appear (do ?function to see the help page). For
    example if a function took arguments like:

foo <- function(a, b, c, type, bar) {
# function code here
}

You could call it with something like:

junk <- foo(1, 2, bar = "whatever")

where a and b are given the values of 1 and 2, and c and type
are left unspecified. This would be equivalent:

junk <- foo(a = 1, b = 2, bar = "whatever")

You could also do:

junk <- foo(bar = "whatever", a = 1, b = 2)
  1. No semicolons needed (except where you’d like to have more than one
    operation per line, like a &lt;- 1; b &lt;- 2

  2. In R, the equivalent to a matlab structure is called a
    “list”. Instead of separating the levels with a ., it is
    generally done with a $. So the structure of a list could be
    something like:

a <- junk$stuff$whatever

Use the str() command to look at the structure of a list object.

  1. Most functions that return more than just a single value will
    return in a list. Unlike matlab there isn’t a simple way returning
    separate values to separate variables, like [a, b] =
    foo('bar')
    . For example, using the histogram function:
a <- rnorm(1000)
h <- hist(a)

plot of chunk unnamed-chunk-8

str(h)
## List of 6
## $ breaks : num [1:16] -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 ...
## $ counts : int [1:15] 1 1 3 24 47 80 147 186 206 134 ...
## $ density : num [1:15] 0.002 0.002 0.006 0.048 0.094 0.16 0.294 0.372 0.412 0.268 ...
## $ mids : num [1:15] -3.75 -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 ...
## $ xname : chr "a"
## $ equidist: logi TRUE
## - attr(*, "class")= chr "histogram"

If I wanted to extract something from that I could use

b <- h$breaks

If you really only want one thing out of the list, you could do
something like

b <- hist(a, plot = FALSE)$breaks
  1. You can use .‘s in variable and function names, but I don’t
    recommend you do. Often a function with a . in it means that it
    applies a “generic” operation to a specific class. For example, the
    plot() function is a straightforward way of plotting data, much
    like in matlab. However, there exist lots of variants of plot for
    different classes, which are usually specified as
    plot.class(). E.g. for the histogram object I created above, if I
    want to plot it, I can just do
h2 <- hist(a, plot = FALSE, breaks = 100)
plot(h2, main = "A plot with more breaks")

plot of chunk unnamed-chunk-11

and it will plot it as a histogram, using the generic function
plot.histogram(), as well as accept the arguments appropriate to
that generic function.

Thoughts on topics for future editions of matlab2R

  • plotting, including:

  • points, lines, styles, etc

  • “image”-style plots, contours, filled contours, colormaps, etc
  • POSIX times vs Matlab datenum

  • … suggestions in comments?

Advertisements

3 thoughts on “Switching from Matlab to R: Part 1

  1. Alis

    Nice post! I am currently transitioning from Matlab to R and find it difficult to load and manipulate matrices. The whole story of switching back and forth from matrix to list to data frame is very confusing. If you could shed some light on this it would be helpful 🙂

    Reply
    1. clarkrichards Post author

      Hi Alis! Thanks for the comment. I’d love to help shed some light on working with matrices in R vs Matlab, and it would make a good topic for Part 2 (which obviously I’ve been procrastinating on!). Are there any specific things that have you hung up?

      Reply
      1. Alis

        Hi Clark!
        OK, I have several cases with increasing difficulty. I am dealing with datasets consisting of spectra.
        Case 1 (easy). Let’s say I have a text file with 100 spectra (rows are number of samples, columns are spectral variables). I import it with read.table. Then I have corresponding Y-values (concentrations), in another text file. I import again with read.table. I attach the two objects, to have them together, now I have a list (without names). The goal is to perform some kind of regression analysis on this array of spectra, and this is quite easy. Then I want to also be able to choose which samples (e.g. rows) to use for the regression. Should I then convert the list to a data frame again?…Should I include a column with spectra numbers?
        Case 2. I want to do precisely the same as above (regression on my 100 spectra and 100 concentrations). What if I have them this time separately in individual files for each sample? Then I have to read them in a loop and append them somehow to make a list or data frame, and create names for each number of sample and each variable, I guess. Or there is a more elegant way to do it?
        Case 3. Similar to case 2, but this time let’s say I have 10 separate files with 100 spectra in each. Now I want to import each file, average the values row-wise and use that for the regression analysis. So I will end up with 10 samples, each of them an average of 100. Again, the same problem – should I use data frame, list, should I always provide names for the spectral variable, or factors for the number of samples.

        This are the questions that make me a bit stuck at the moment. I guess these are very basic questions, but probably everybody is going through this in the beginning… After all, matlab is designed to deal very easily with arrays, and that’s the beauty of it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s