From: A.J. Rossini (rossini@blindglobe.net)
Date: Thu Sep 19 2002 - 06:41:46 EST
As part of some statistical methods research I'm doing in
visualization as well as in sensitivity analysis, we've been
developing a package for handling flow cytometry data using the
open-source statistics package, R (www.r-project.org), which runs on
Unix machines, Microsoft Windows, and recent versions of MacOS (9 and
higher, though I've heard it might work on the last versions of 8.x).
R is very much like S/S-PLUS, and most of the same non-commercial
developers (including the inventor/originator of S, upon which S-PLUS
is based) are active members of the R core development team.
Currently, we've implemented routines for reading data from FCS files,
gating the resulting data, constructing a number of visualizations,
and for determining differences (i.e. 1 and 2 sample tests).
Before we release a version for general public consumption, however,
the last sticking issue is in reading FCS files (right now, we've been
successful with files from a number of groups (3), but we've had
problems with files from a 4th. So what I'm looking for in terms of
assistance is not programming time, but example FCS files to aid in
the debugging, since if we can't read your data, it'll be really
annoying to have to construct text-files just to read back in.
Why might you be interested in helping out? The end result will be a
platform upon which one can build on R's tools for sophisticated
statistical analysis as well as GUI construction to quickly prototype
results. The code will be released probably under either the GNU GPL
or LGPL (there are lots of technical issues with open source licensing
which I'll not go into at this point), and the end product will
probably be suitable as a complementary tool to the common commercial
flow cytometry software packages. With a bit more help, it may
eventually be suitable for production work, as well.
So:
A. If you would be able to "donate" an anonymized (i.e. source of data
removed) FCS file for testing, please send private email with the
generating platform, so that I don't end up testing the same thing
over and over again, and so that we can arrange for the upload.
B. If you can't send files (many valid reasons for this, I know!) but:
1. would be interested in testing the system,
2. are reasonably sophisticated at being able to download and install
software, and
3. (most critical!) comfortable with using a command-line interface
(Unix shell, DOS window, or similar),
please contact me and I'll send a quick set of instructions and the
functions to test (a limited version of the R library). We are
hoping to release the initial non-testing version by the end of the
year.
We currently can -- read (most?) FCS files, do programmatic and
interactive gating, low dimensional visualizations, 2-sample testing
(KS, truncated distribution curves, Prob binning (with K. Baggerly's
corrections as well), and with a bit of programming in R, any other
routine R can do (heirachical clustering, k-means, clustering of large
datasets (Clara, etc), and on summaries of the individual results, one
might consider regression (standard, ANOVA, mixed effects models,
GEE), smoothing (kernel, loess, splines), etc, etc...
What we are currently working on (research into both
analytics/informatics as well as statistical methods): making some of
those tools easier to work on, designs, novel 2-sample comparisons,
high dimensional visualizations and structured grand-tours, and a few
other topics related to manuscripts in preparation.
What we probably won't work on (but others may): GUIs, database
specific interfaces (to Oracle, MySQL, or PostgreSQL relational
storage, HDF5 for high-performance flat-file access), extending the
currently limited annotation system.
(one related non-flow cytometry project that I'm very much involved
with, to provide some ideas of where I'd like to take the project
eventually, is the Bioconductor project (http://www.bioconductor.org)
which is an open source system (statistical analysis and
visualization, as well as some annotation) for the analysis of gene
expression arrays (affy, spotted cDNA, SAGE)).
best,
-tony
--
A.J. Rossini Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics rossini@u.washington.edu
FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org
-------------- http://software.biostat.washington.edu/ ----------------
FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(my tuesday/wednesday/friday locations are completely unpredictable.)
This archive was generated by hypermail 2.1.6 : Thu Jan 01 2004 - 17:42:04 EST