RE: Parametric statistics for flow

From: A.J. Rossini <rossini@u.washington.edu>
Date: Thu Dec 01 2005 - 01:35:12 EST
It's a bit trickier than that.	most of the classic 2-sample nonparametrics tests assume
some a homogenenous population (or homogenenous population of populations) where the
second population is a simple change in one direction (particular centrality or variance
statistics such as mean, median, variance, IQR, etc).  But with flow, part of the issue
is that you've got changing (centrality, variability of fluor) populations (compositional
data), and rather complex questions to be answered with it.  The other part of the issue
is that you do have large datasets, and hence any notion of homogeneity in a statistical
sense flies out the window.  This makes life a bit difficult to do an accurate analysis.

Even highly gated populations are very likely mixtures, if other interesting components
were not stained!

That isn't to say that crude approximations of the correct analysis won't get you the
"right answer".  They tend to, which is why there hasn't been much urgency (or need,
IMHO) to increase statistical sophistication in the sense of using complex
heirarchical/mixture models/random effects models.  But soon, soon, soon...  (adding
parameters for to accomodation the heterogenetiy would be one frequentist approach to
appropriately adjusting p-values for massive amounts of data -- perhaps a Bayesian
modeling approach would be more natural, though it lets you get to the point quicker --
is it really a percentage difference in population which is important, or a threshold
(absolute #)?  Depends on the biology of course, but no need to hold biological thought
hostage to a weak understanding of statistics (or vice-versa!).

best,
-tony


A.J. Rossini
rossini@u.washington.edu
http://www.analytics.washington.edu/
ON LEAVE (at least until Sept 2006):
Novartis Pharma AG, Basel Switzerland

On Tue, 29 Nov 2005, David Coder wrote:

> You don't mention if your data are collected/displayed as logarithmically
> scaled histograms. A log transform of the data can 'look' Gaussian, but it's
> not always the case. (See Coder DM, Redelman D, Vogt RF. Cytometry. 1994 Jun
> 15;18(2):75-8. Computing the central location of immunofluorescence
> distributions: logarithmic data transformations are not always appropriate.)
> Non-parametric tests (e.g., the KS test) very often give significant
> differences given the large sample sizes. A trip to a local statistician
> should help sort things out. (The University of Queensland has a Centre for
> Statistics that deals with biostatistics.)
>
> ================
> David M Coder, Ph.D.
> Consultant in Cytometry
> Irvine, CA
> Cell/Msg: 206 499 3446
> Email: d_coder@msn.com
>
>
> -----Original Message-----
> From: Mr Simon Corrie [mailto:s369338@student.uq.edu.au]
> Sent: Monday, November 28, 2005 12:15 PM
> To: cyto-inbox
> Subject: Parametric statistics for flow
>
> Hi folks
>
> I am analysing some histogram data for hit detection assays on beads
> and am looking for comments about the analysis. To perform hypothesis
> or inference tests on the histogram data, I must somehow suggest (in a
> way that is at LEAST semi-quantitative) that my data correlates with a
> parametric distribution - eg normal, exponential, weibull, etc etc. I
> am well aware that a simple flow histogram is multivariate - ie the
> fluorescence response is probably based on several things including
> size, biomolecule desnity on the beads, etc etc. However, at large
> sample sizes, such data MUST approach some limiting distribution.
>
> However, I come up against the problem that my data looks great when
> plotted in a Quantile/quantile plot (against theoretical NORMAL
> quantiles) but will perform poorly in statistical tests for "normality"
> such as kolmogorov-smirnov and shapiro-wilkes tests. I always keep the
> sample size >500. Are such tests necessary for convincing people about
> the data? How about just reporting the p-values for the KS tests -
> certainly not >90%, but always between 10 and 100%.
>
> It seems that most people simply "assume" that the data "should" be
> somewhat normal and then go ahead and do t-tests, which will, I
> suppose, soak up some error, but still not quite sure how to convince
> myself that such methods are viable. If any statistically-minded people
> have some opinions, please let me know. I am keen to do this analysis
> properly and publish the findings - however good or bad they are ;)
>
> Regards
>
> Simon Corrie
>
>
>
Received on Thu Dec 1 15:58:00 2005

This archive was generated by hypermail 2.1.8 : Sat Jan 14 2006 - 22:04:00 EST