Switching from Matlab to R: Part 1

Introduction

I was thinking recently about how best to help someone transitioning
from Matlab(TM) to R, and did my best to recall what sorts of things I
struggled with when I made the switch. Though I resisted for quite a
while, when I finally committed to making the change I recall that it
mostly happened in a matter of weeks. It helped that my thesis
supervisor exclusively used R, and we were working on code for a paper
together at the time, but in the end I found that the switch was
easier than I had anticipated.

Tips

  1. Don’t be afraid of the assign <- operator. It means exactly the
    same thing as you would use = in matlab, as in
a <- 1:10 # in matlab a=1:10;

except that it make more logical sense.

The only place you should use = is in logical comparisons like a ==
b
(as in matlab), or for specifying argument values in a function
(see number 5).

  1. Vectors are truly 1 dimensional. This is different from matlab in
    the way that you could not add together an Nx1 and a 1xN vector. In
    R it would be just two vectors of length N. The transpose in R is
    by doing t(), and the transpose of a vector (or class numeric)
    is the same as the original.

  2. Array indices use square brackets, like

a[1:5] <- 2 # assign the value 2 to the first 5 indices of a

This is one of the things that drove me crazy about matlab, that it
used () for indices as well as function arguments. It makes mixed
array indexing and function calls very confusing to look at and
interpret.

  1. By default arithmetic operations are done element-wise. If you have
    two MxN matrices (say A and B), and you do C &lt;- A*B, every
    element in C is the product of the corresponding elements in A and
    B. No need to do the .* stuff as in matlab. To get matrix
    multiplication, you use the %*% operator.

  2. Function arguments are named, so the order isn’t super
    important. If you don’t name them, then you have to give them in
    the order they appear (do ?function to see the help page). For
    example if a function took arguments like:

foo <- function(a, b, c, type, bar) {
# function code here
}

You could call it with something like:

junk <- foo(1, 2, bar = "whatever")

where a and b are given the values of 1 and 2, and c and type
are left unspecified. This would be equivalent:

junk <- foo(a = 1, b = 2, bar = "whatever")

You could also do:

junk <- foo(bar = "whatever", a = 1, b = 2)
  1. No semicolons needed (except where you’d like to have more than one
    operation per line, like a &lt;- 1; b &lt;- 2

  2. In R, the equivalent to a matlab structure is called a
    “list”. Instead of separating the levels with a ., it is
    generally done with a $. So the structure of a list could be
    something like:

a <- junk$stuff$whatever

Use the str() command to look at the structure of a list object.

  1. Most functions that return more than just a single value will
    return in a list. Unlike matlab there isn’t a simple way returning
    separate values to separate variables, like [a, b] =
    foo('bar')
    . For example, using the histogram function:
a <- rnorm(1000)
h <- hist(a)

plot of chunk unnamed-chunk-8

str(h)
## List of 6
## $ breaks : num [1:16] -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 ...
## $ counts : int [1:15] 1 1 3 24 47 80 147 186 206 134 ...
## $ density : num [1:15] 0.002 0.002 0.006 0.048 0.094 0.16 0.294 0.372 0.412 0.268 ...
## $ mids : num [1:15] -3.75 -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 ...
## $ xname : chr "a"
## $ equidist: logi TRUE
## - attr(*, "class")= chr "histogram"

If I wanted to extract something from that I could use

b <- h$breaks

If you really only want one thing out of the list, you could do
something like

b <- hist(a, plot = FALSE)$breaks
  1. You can use .‘s in variable and function names, but I don’t
    recommend you do. Often a function with a . in it means that it
    applies a “generic” operation to a specific class. For example, the
    plot() function is a straightforward way of plotting data, much
    like in matlab. However, there exist lots of variants of plot for
    different classes, which are usually specified as
    plot.class(). E.g. for the histogram object I created above, if I
    want to plot it, I can just do
h2 <- hist(a, plot = FALSE, breaks = 100)
plot(h2, main = "A plot with more breaks")

plot of chunk unnamed-chunk-11

and it will plot it as a histogram, using the generic function
plot.histogram(), as well as accept the arguments appropriate to
that generic function.

Thoughts on topics for future editions of matlab2R

  • plotting, including:

  • points, lines, styles, etc

  • “image”-style plots, contours, filled contours, colormaps, etc
  • POSIX times vs Matlab datenum

  • … suggestions in comments?

Finding system files with Spotlight on OSX

By default spotlight won’t search through system files (e.g. anything that lives in ~/Library/), which gave me a bit of a headache today when I was looking for where I had put an emacs mode file that I was initializing in my Preferences.el in Aquamacs.

The trick is to enable “System files” in the “Kind” menu in the Finder spotlight. To do this add a search rule, click on the “Kind” menu, choose “Other” and select “System files” in the list. Wham!

From lifehacker.

R: Working with named objects in a loop

Often I want to load, manipulate, and re-save a bunch of separate objects (e.g. a dozen or so SBE microCATs all strung out along a mooring line). To do this, I make use of the get(), assign(), and eval() functions in R. To start, I often define a vector of variable names, like:

varNames <- c(mc100, mc200, mc300, mc500, mc750, mc900, mc1000, mc1500)

where the numbers in the name signify the nominal depth and the names themselves are the object names saved during a previous processing step. Then, I can loop through the instruments by doing:

library(oce)
for (i in seq_along(varNames)) {
  load(paste(varNames[i], '.rda', sep='')
  d <- get(varNames[i]) # copy the object to an object named `d`
  eval(parse(text=paste('rm(', varNames[i], ')'))) # remove the original object from memory
  ## do some processing here, such as:
  ## * filtering
  ## * despiking, etc ...
  d[['temperature']] <- despike(d[['temperature']])
  assign(varName[i], d)
}

Note that I assign the named object to an object called d (my default variable name for “data”), remove the original object (only really necessary when the objects are large, such as with ADCP data, for example), perform a series of processing steps, and then assign d back to a named object (and probably save the new version).

Note that another way of doing the loop is to loop directly through the character vector, which would look like:

for (name in varNames) {
  load(paste(name, '.rda', sep='')
  d <- get(name)
  eval(parse(text=paste('rm(', name, ')')))
  d[['temperature']] <- despike(d[['temperature']])
  assign(name, d)
}

I like the elegance of looping through names, though I often default to the "index" loop for technical reasons (such as filling a matrix with the temperature time series from each microCAT).

Using Rmarkdown with knitr to compose WordPress posts

What I’d like to be able to do is use knitr on an R markdown (i.e. .Rmd) document so include text, code, code output, and images to make a nice looking WordPress post that I can compose and edit locally. And advantage to this is that I can use whatever editor I want for doing the composing (like Aquamacs), and that everything is contained in a single source document.

Workflow

I think the flow will be something like:

  1. Create the post in an .Rmd file locally in emacs.

  2. Use knitr with my local version of R to process the Rmd into either a proper md (Markdown) or an HTML that I can just upload (NOTE: the HTML option doesn’t really seem to work … will maybe need to look into this some more).

  3. Upload (or copy/paste) the md or HTML source into WordPress. The big question is what will happen with generated images, etc. With Markdown, I think the images are created in a directory that is dynamically linked to, but for the HTML there is maybe some way that images are embedded (it appears the embedding isn’t compatible with copying and pasting the HTML source produced by knitr — I guess the images will have to be uploaded manually).

Some code

Here is an example:

x <- seq(1, 1000)
y <- rnorm(length(x))
plot(x, y)

plot of chunk unnamed-chunk-1

Post script

Copying and pasting the .md source produced by knitr works pretty well, though the image links will be broken. To fix this, each image produced by knitr needs to be added to the media library, and then inserted into the post to get the proper path. E.g. for the test figure above, the path is https://codedocean.files.wordpress.com/2014/01/unnamed-chunk-1.png. The linking to the image can still be done in a “markdown” way by using something like:

![plot of chunk unnamed-chunk-1](https://codedocean.files.wordpress.com/2014/01/unnamed-chunk-1.png)

Enabling MELPA package archive for Aquamacs

Thanks to my buddy Ken, I’ve been geeking out on lots of new Emacs/computer tools that could either drastically increase my productivity, or drastically decrease it through incessant messing around. I recently updated to the latest version of Aquamacs, which since it is built on Emacs v24 now contains the handy package management features.

By default it seems that Aquamacs uses the ELPA archive, which is pretty limited in terms of the number of packages. To use the MELPA archive instead (which seems to be the largest and most popular one right now), add this to ~/Library/Preferences/Aquamacs/Preferences.el

;; Melpa
(require 'package)
(add-to-list 'package-archives
             '("melpa" . "http://melpa.milkbox.net/packages/") t)
(package-initialize)

And Voila!