Following my earlier post on Julia and gsl-shell, I decided to familiarize myself with Julia.

A lot of smart guys have recently blogged about Julia, see e.g. Julia, I love You by John Myles White. Even Doug Bates started to play with it, and in particular bring us with an easy way to call R’s statistical distributions with Julia.

The installation went fine for me. I’m using XCode 4.3.2 on OS X 10.7.3, with command-line tools installed and 64-bit gfortran (first time I installed a different fortran from the one released on R Development Tools and Libraries). Of course, I forgot to ask for a parallel build, so it took far longer than expected. To get a working web REPL, I also needed to fetch lighttpd, which is basically as simple as typing `make -C external install-lighttpd`

in `julia/`

root directory. Although the web REPL looks great, it has very limited plotting facilities at the moment, so I will stick with the console REPL. It is worth noting that during the first install all external dependencies will be downloaded and everything goes into a `root`

folder. In the end, it amounts to about 1.3 Go on your hard drive, so you can safely delete everything but the `external/root`

folder (and `lighttpd.conf`

if you want to use the web server).

Let’s start with some very basic stuff. Note that I am using the REPL and for plotting purpose I choose the recently released Gaston interface to Gnuplot.^{1} There’s an embedded plotting library, `Winston`

, but I got a lot of `Cairo`

-related errors when trying it. To use R’s probability distributions and RNGs, we need to have a working `libRmath`

somewhere. I happened to compile it following instructions in the R Installation and Administration manual, as pointed out by Doug Bates: (We really just need to type `make`

in the `src/nmath/standalone`

subdirectory of R source tree.)

```
_jl_libRmath = dlopen("julia/libRmath")
load("julia/extras/Rmath.jl")
x = -3:.1:3
y = dnorm(x, 0, 1)
load("julia/extensions/gaston-0.1/gaston.jl")
figure(1)
a = Axes_conf()
a.title = "The gaussian density from R"
addcoords(x, y)
addconf(a)
plot()
```

Here is what I got:

Nothing fabulous actually. But wait, we can do other funny things. First, let’s implement a simplified t-test. There’s already a bunch of handy functions in `base/statistics.jl`

. Basically, we just need to compute a difference of means and a pooled estimate for the variance. Note that we assume equal variance and a two-sided test. I also added a switch-statement for the case of paired samples. This has been tested with the famous Student’s sleep data set (student1908.csv).

```
function t_test(x, y, paired)
nx = length(x)
ny = length(y)
dobs = mean(y) - mean(x)
if paired == true && nx == ny
df = nx - 1
se = std(x-y) / sqrt(nx)
else
df = nx + ny - 2
se = sqrt(((nx-1) * std(x)^2 + (ny-1) * std(y)^2) / df)
end
if paired == false
se = se * sqrt(1/nx + 1/ny)
end
tobs = dobs / se
pval = 2*(1 - pt(tobs, df))
res = HashTable()
res["stats"] = tobs; res["df"]=df; res["p_value"]=pval
res
end
dat = dlmread("student1908.csv")
t_test(dat[:,1], dat[:,2], false) # p = 0.079
t_test(dat[:,1], dat[:,2], true) # p = 0.003
```

Nothing really difficult there. Let’s go for a rough PCA, on Fisher-Anderson iris.txt dataset. Julia already has some basic linear algebra functions in `base/linalg.jl`

(plus additional LAPACK bindings in `linalg_lapack.jl`

, including SVD computation):

```
function scale(x)
(x - mean(x)) /std(x)
end
function pca(a, standardize)
if standardize == true
for i = 1:size(a, 2)
a[:,i] = scale(a[:,i])
end
end
res = svd(a)
sd = res[2] / sqrt(size(a, 1) - 1)
rot = transpose(res[3])
sd, rot
end
iris = dlmread("iris.txt")
pca(iris[:,1:4], true)
```

We can also compare results from an SVD and an eigenvalue decomposition:

```
n = 100
x1 = randn(n)
x2 = randn(n)
x3 = .6 * x1 + randn(n)
X = [x1 x2 x3]
for i = 1:size(X, 2)
X[:,i] = scale(X[:,i])
end
amap(mean, X, 2)
r = 1/(n-1) * X' * X
res1 = pca(X, false)[1]
res2 = sortr(real(sqrt(eig(r)[1])))
```

The above code (it probably looks really ugly because these are my very first lines of Julia code!) reminds me of my 4-5 years of Matlab, but overall the documentation points toward interesting features that I just need to learn more.

As a conclusion of this 2-hour session, it is evident that I am missing a lot of things about Julia. There’s a `cor`

method that is supposed to work with one or two vectors, but also with a matrix, but it doesn’t seem to work when I call it like `cor(X)`

. I also had some trouble with `csv`

header which when present gave me an `Any Array`

that I know nothing about. Well, in my defense I haven’t finished reading the documentation set, and just started from scratch with a console REPL early in the morning.

A reply by Harlan Harris to a recent question on Cross Validated, Does Julia have any hope of sticking in the statistical community?, summarizes a lot of missing features compared to R. But it still looks like a great project, and it’s always interesting to follow the development of a new programming language from the start.