There are three primary reasons behind my decision to produce the XLISP-STAT environment. The first is to provide a vehicle for experimenting with dynamic graphics and for using dynamic graphics in instruction. Second, I wanted to be able to experiment with an environment supporting functional data, such as mean functions in nonlinear regression models and prior density and likelihood functions in Bayesian analyses. Finally, I was interested in exploring the use of object-oriented programming ideas for building and analyzing statistical models. I will discuss each of these points in a little more detail in the following paragraphs.
The development of high resolution graphical computer displays has made it possible to consider the use of dynamic graphics for understanding higher-dimensional structure. One of the earliest examples is the real time rotation of a three dimensional point cloud on a screen -- an effort to use motion to recover a third dimension from a two dimensional display. Other techniques that have been developed include brushing a scatterplot -- highlighting points in one plot and seeing where the corresponding points fall in other plots. A considerable amount of research has been done in this area, see for example the discussion in Becker and Cleveland [ 4 ] and the papers reproduced in Cleveland and McGill[ 8 ]. However most of the software developed to date has been developed on specialized hardware, such as the TTY 5620 terminal or Lisp machines. As a result, very few statisticians have had an opportunity to experiment with dynamic graphics first hand, and still fewer have had access to an environment that would allow them to implement dynamic graphics ideas of their own. Several commercial packages for microcomputers now contain some form of dynamic graphics, but most do not allow users to customize their plots or develop functions for producing specialized plots, such as dynamic residual plots. XLISP-STAT provides at least a partial solution to these problems. It allows the user to modify a scatter plot with Lisp functions and provides means for modifying the way in which a plot responds to mouse actions. It is also possible to add functions written in C to the program. On the Macintosh this has to be done by adding to the source code. On some unix systems it is also possible to compile and dynamically load code written in C or FORTRAN.
An integrated environment for statistical calculations and graphics is
essential for developing an understanding of the uses of dynamic
graphics in statistics and for developing new graphical techniques.
Such an environment must essentially be a programming language. Its
basic data types must include types that allow groups of numbers --
data sets -- to be manipulated as entire objects. But in model-based
analyses numerical data are only part of the information being used.
The remainder is the model itself. Sometimes a model is easily
characterized by specifying a set of numbers. A normal linear
regression model with
errors might be described by the
number of covariates, the coefficients and the error variance. On the
other hand, in many cases it is easier to specify a model by
specifying a function. To specify a normal nonlinear regression model,
for example, one might specify the mean function. If our language is
to allow us to specify this function within the language itself then
the language must support a functional data type with full rights: It
has to be possible to define functions that manipulate functions,
return functions, apply functions to arguments, etc.. The choice I
faced was to define a language from scratch or use an existing
language. Because of the complexity of issues involved in functional
programming I decided to use a dialect of a well understood functional
language, Lisp. The syntax of Lisp is somewhat unfamiliar to most
users of statistical packages, but it is easy to learn and several
good tutorials are available in local book stores. I considered the
possibility of using Lisp to write a top level interface with a more
``natural'' syntax, but I did not see any way of doing this without
complicating access to some of the more powerful features of Lisp or
running into some of the pitfalls of functional programming. I
therefore decided to retain the basic Lisp top level syntax. To make
the manipulation of numerical data sets easier I have redefined the
arithmetic operators and basic numerical functions to work on lists
and arrays of data.
Having decided to use Lisp as the basis for my environment XLISP was a natural choice for several reasons. It has been made available for unrestricted, non-commercial use by its author, David Betz. It is small (for a Lisp system), its source code is available in C, and it is easily extensible. Finally, it includes support for object-oriented programming. Object-oriented programming has received considerable attention in recent years and is particularly natural for use in describing and manipulating graphical objects. It may also be useful for the analysis of statistical data and models. A collection of data and assumptions may be represented as an object . The model object can then be examined and modified by sending it messages . Many different kinds of models will answer similar questions, thus fitting naturally into an inheritance structure . XLISP-STAT's implementation of linear and nonlinear regression models as objects , with nonlinear regression inheriting many of its methods from linear regression, is a first, primitive attempt to exploit this programming technique in statistical analysis.
Anthony Rossini