Turning off Auctex fontification so that columns can align

I love Emacs. I use it for everything, and particularly love it for doing tables in LaTeX because I can easily align everything so that it looks sensible, and rectangle mode makes it easy to move columns around if desired.

That being said, Auctex defaults do some fontification to math-mode super- and subscripts, which cause the horizontal alignments of characters to be off (essentially it is no longer a fixed-width font). To turn this off, do:

M-x customize-variable font-latex-fontify-script

Beauty!

Colormap tests

Introduction

The current version of Dan Kelley’s oce package now has a branch testing some new functions for creating “colormaps” — the design here being that there is a way to map levels (say topographic height, or velocity, etc) to a specific set of colors. Development work on this has been ongoing in the colorize branch of the oce repo on Github. See Dan’s blog post at: http://dankelley.github.io/r/2014/04/30/colormap.html for more information.

Many of the standard plotting commands that oce uses already mostly take advantage of the idea of a colormap (such as imagep() and drawPalette()), but recent use cases showed that there was much room for improvements. In particular, the connection between choosing a color scheme for a range of values, was previously up to the user to make sure they matched. This was most commonly done with the rescale() function, but it was found that it is not an ideal solution when the number of color levels is small.

Tests

Create a colormap for use in an imagep() plot of the adp dataset:

library(oce)  # I have built this from the `colorize` branch commit 365d7700f5be33e5
data(adp)
t <- adp[["time"]]
z <- adp[["distance"]]
p <- adp[["pressure"]]
u <- adp[["v"]][, , 1]
par(mar = c(3, 3, 1, 1))
pcol <- Colormap(p)
plot(t, p, bg = pcol$zcol, pch = 21)

plot of chunk unnamed-chunk-1

<br />## now for an imagep
ucol <- Colormap(u, col = oceColors9B)
imagep(t, z, u, colormap = ucol, filledContour = TRUE)

plot of chunk unnamed-chunk-1

Converting Latex to Markdown

I’m applying for a job, which requires me to submit a plain text version of my resumé. As I maintain my CV as a latex document, I wanted to find a simple way to convert it to Markdown format so that it will look good when cut/paste into the web browser.

I use pandoc all the time for document conversion, but I found that because of some heavy layout tweaks to make my CV look good (I’m not using a style file), the markdown produced using

pandoc cv.tex -o cv.md

is pretty gross.

After a bit of googling, I found out about the htlatex utility (found here, and it’s included with TexLive), and which does a fantastic job at converting Latex to HTML:

htlatex cv.tex "xhtml, mathml, charset=utf-8" " -cunihtf -utf8"

Then, use pandoc to convert from HTML to Markdown with:

pandoc cv.html -o cv.md

This leaves a few small things to clean up with further scripting (such as stray /s), but altogether a nice looking Markdown file.

Anti-aliasing and “image” plots

Introduction

Frequently I make plots using the oce1 function imagep(), which at it’s core using the R-base function image(). R has several different graphics devices to choose from, and as each of them have different schemes for tasks such as anti-aliasing they can produce different results depending on the type of plot being created, and the type of file it gets written to. This can be especially apparent when using the filledContour type of plot. Frequently, I find that the default devices for making such plots in R produces undesirable artifacts, such as white lines in an image plot. The example below illustrates this effect using the adp data set:

library(oce)
data(adp)
imagep(adp[["v"]][, , 1], filledContour = TRUE)

plot of chunk plotWithLines

In this post I’ll explore some options for making plots without such artifacts.

PDF devices

It is common for anti-alias effects like the white lines shown above to show up in figures created using the pdf() device. As PDF is essentially a vector graphics format, there is nothing to be done in R to correct the problem. Typically the anti-aliasing is handled by the PDF viewer, and is therefore not native to the file. It is often possible to disable anti-aliasing in many of the most popular viewers (e.g. I use Skim and Preview on OSX), but this has the unfortunate side effect of removing anti-aliasing from all aspects of the figure, including the fonts and axes labels, etc.

For this reason, when producing image plots, I almost always default to using a PNG device instead of a PDF. PNG works perfectly well with pdflatex, and has no artifacts due to image compression (such as in JPGs). The only issue remaining is how to ensure that the image plot itself does not suffer from anti-aliasing effects, while retaining the smoothing of fonts, lines, and points to make a beautiful plot.

PNG devices

For PNG devices, there are several options for the “type” of device, each of which will produce slightly different output. From the help page for png(), the arguments are:

png(filename = "Rplot%03d.png",
width = 480, height = 480, units = "px", pointsize = 12,
bg = "white", res = NA, ...,
type = c("cairo", "cairo-png", "Xlib", "quartz"), antialias)

where the type argument is described as:

type: character string, one of ‘"Xlib"’ or ‘"quartz"’ (some OS X
builds) or ‘"cairo"’. The latter will only be available if
the system was compiled with support for cairo - otherwise
‘"Xlib"’ will be used. The default is set by
‘getOption("bitmapType")’ - the ‘out of the box’ default is
‘"quartz"’ or ‘"cairo"’ where available, otherwise ‘"Xlib"’.

Let’s try some examples of each of the type options to see the difference.

types <- c("cairo", "cairo-png", "Xlib", "quartz")
for (itype in seq_along(types)) {
png(paste("typeExample-", types[itype], ".png", sep = ""), type = types[itype],
width = 300, height = 300)
imagep(adp[["v"]][, , 1], filledContour = TRUE, main = types[itype])
dev.off()
}

each of which produces the following:

cairo cairo cairo cairo

Note that it is the default quartz type that produces the issues through anti-aliasing. This can be turned off by specifying antialias='none' (see the description of the antialias are from ?png for more details):

png("quartzNoAntialias.png", type = "quartz", antialias = "none")
imagep(adp[["v"]][, , 1], filledContour = TRUE, main = "quartz with antialias=none")
dev.off()
## pdf
## 2

quartzNoAntialias

This “fixes” the problem for the image plot, but leaves the fonts and axis lines un-antialiased.

Summary

Based on the above, the best option for producing image-style plots without antialiasiaing artifacts is to use the type='cairo' option for the png device (note that by default Cairo devices use the Helvetica font family, whereas Quartz devices use Arial).

png("cairoDevice.png", type = "cairo", antialias = "none", family = "Arial")
imagep(adp[["v"]][, , 1], filledContour = TRUE, main = "A Cairo device png")
dev.off()
## pdf
## 2

cairoDevice


  1. For hints on installing the oce package check out the blog post here 

Switching from Matlab to R: Part 1

Introduction

I was thinking recently about how best to help someone transitioning
from Matlab(TM) to R, and did my best to recall what sorts of things I
struggled with when I made the switch. Though I resisted for quite a
while, when I finally committed to making the change I recall that it
mostly happened in a matter of weeks. It helped that my thesis
supervisor exclusively used R, and we were working on code for a paper
together at the time, but in the end I found that the switch was
easier than I had anticipated.

Tips

  1. Don’t be afraid of the assign &lt;- operator. It means exactly the
    same thing as you would use = in matlab, as in
a <- 1:10 # in matlab a=1:10;

except that it make more logical sense.

The only place you should use = is in logical comparisons like a ==
b
(as in matlab), or for specifying argument values in a function
(see number 5).

  1. Vectors are truly 1 dimensional. This is different from matlab in
    the way that you could not add together an Nx1 and a 1xN vector. In
    R it would be just two vectors of length N. The transpose in R is
    by doing t(), and the transpose of a vector (or class numeric)
    is the same as the original.

  2. Array indices use square brackets, like

a[1:5] <- 2 # assign the value 2 to the first 5 indices of a

This is one of the things that drove me crazy about matlab, that it
used () for indices as well as function arguments. It makes mixed
array indexing and function calls very confusing to look at and
interpret.

  1. By default arithmetic operations are done element-wise. If you have
    two MxN matrices (say A and B), and you do C &lt;- A*B, every
    element in C is the product of the corresponding elements in A and
    B. No need to do the .* stuff as in matlab. To get matrix
    multiplication, you use the %*% operator.

  2. Function arguments are named, so the order isn’t super
    important. If you don’t name them, then you have to give them in
    the order they appear (do ?function to see the help page). For
    example if a function took arguments like:

foo <- function(a, b, c, type, bar) {
# function code here
}

You could call it with something like:

junk <- foo(1, 2, bar = "whatever")

where a and b are given the values of 1 and 2, and c and type
are left unspecified. This would be equivalent:

junk <- foo(a = 1, b = 2, bar = "whatever")

You could also do:

junk <- foo(bar = "whatever", a = 1, b = 2)
  1. No semicolons needed (except where you’d like to have more than one
    operation per line, like a &lt;- 1; b &lt;- 2

  2. In R, the equivalent to a matlab structure is called a
    “list”. Instead of separating the levels with a ., it is
    generally done with a $. So the structure of a list could be
    something like:

a <- junk$stuff$whatever

Use the str() command to look at the structure of a list object.

  1. Most functions that return more than just a single value will
    return in a list. Unlike matlab there isn’t a simple way returning
    separate values to separate variables, like [a, b] =
    foo('bar')
    . For example, using the histogram function:
a <- rnorm(1000)
h <- hist(a)

plot of chunk unnamed-chunk-8

str(h)
## List of 6
## $ breaks : num [1:16] -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 ...
## $ counts : int [1:15] 1 1 3 24 47 80 147 186 206 134 ...
## $ density : num [1:15] 0.002 0.002 0.006 0.048 0.094 0.16 0.294 0.372 0.412 0.268 ...
## $ mids : num [1:15] -3.75 -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 ...
## $ xname : chr "a"
## $ equidist: logi TRUE
## - attr(*, "class")= chr "histogram"

If I wanted to extract something from that I could use

b <- h$breaks

If you really only want one thing out of the list, you could do
something like

b <- hist(a, plot = FALSE)$breaks
  1. You can use .‘s in variable and function names, but I don’t
    recommend you do. Often a function with a . in it means that it
    applies a “generic” operation to a specific class. For example, the
    plot() function is a straightforward way of plotting data, much
    like in matlab. However, there exist lots of variants of plot for
    different classes, which are usually specified as
    plot.class(). E.g. for the histogram object I created above, if I
    want to plot it, I can just do
h2 <- hist(a, plot = FALSE, breaks = 100)
plot(h2, main = "A plot with more breaks")

plot of chunk unnamed-chunk-11

and it will plot it as a histogram, using the generic function
plot.histogram(), as well as accept the arguments appropriate to
that generic function.

Thoughts on topics for future editions of matlab2R

  • plotting, including:

  • points, lines, styles, etc

  • “image”-style plots, contours, filled contours, colormaps, etc
  • POSIX times vs Matlab datenum

  • … suggestions in comments?

Finding system files with Spotlight on OSX

By default spotlight won’t search through system files (e.g. anything that lives in ~/Library/), which gave me a bit of a headache today when I was looking for where I had put an emacs mode file that I was initializing in my Preferences.el in Aquamacs.

The trick is to enable “System files” in the “Kind” menu in the Finder spotlight. To do this add a search rule, click on the “Kind” menu, choose “Other” and select “System files” in the list. Wham!

From lifehacker.

R: Working with named objects in a loop

Often I want to load, manipulate, and re-save a bunch of separate objects (e.g. a dozen or so SBE microCATs all strung out along a mooring line). To do this, I make use of the get(), assign(), and eval() functions in R. To start, I often define a vector of variable names, like:

varNames <- c(mc100, mc200, mc300, mc500, mc750, mc900, mc1000, mc1500)

where the numbers in the name signify the nominal depth and the names themselves are the object names saved during a previous processing step. Then, I can loop through the instruments by doing:

library(oce)
for (i in seq_along(varNames)) {
  load(paste(varNames[i], '.rda', sep='')
  d <- get(varNames[i]) # copy the object to an object named `d`
  eval(parse(text=paste('rm(', varNames[i], ')'))) # remove the original object from memory
  ## do some processing here, such as:
  ## * filtering
  ## * despiking, etc ...
  d[['temperature']] <- despike(d[['temperature']])
  assign(varName[i], d)
}

Note that I assign the named object to an object called d (my default variable name for “data”), remove the original object (only really necessary when the objects are large, such as with ADCP data, for example), perform a series of processing steps, and then assign d back to a named object (and probably save the new version).

Note that another way of doing the loop is to loop directly through the character vector, which would look like:

for (name in varNames) {
  load(paste(name, '.rda', sep='')
  d <- get(name)
  eval(parse(text=paste('rm(', name, ')')))
  d[['temperature']] <- despike(d[['temperature']])
  assign(name, d)
}

I like the elegance of looping through names, though I often default to the "index" loop for technical reasons (such as filling a matrix with the temperature time series from each microCAT).