Re: RE : Statistics questions

From: Pietro Bulian <pbulian@cro.it>
Date: Wed Feb 20 2008 - 03:59:39 EST
As median is a better estimator of central tendency (mean),  in the presence 
of outliers or skewed distributions, so median absolute deviation ("mad") is 
a better or "robust" estimator of variability (variance or standard 
deviation) in the same setting. It can be easily computed in R (open source: 
http://www.r-project.org/), maybe using rflowcyt package for importing FCS 
data.

Pietro Bulian

Servizio di Onco-Ematologia Clinico-Sperimentale
I.R.C.C.S. Centro di Riferimento Oncologico
Via Franco Gallini 2
33081 Aviano (PN) - Italy

phone: +39 0434 659 412
fax: +39 0434 659 409
e-mail: pbulian@cro.it


----- Original Message ----- 
From: "James Wood" <jcswood@mac.com>
To: cyto-inbox
Sent: Tuesday, February 19, 2008 4:24 AM
Subject: Re: RE : Statistics questions


> For normal distributions only, the standard error of the median is about 
> 25%
> larger than the standard error of the mean. If the distribution is skewed
> then the standard error of the median is very hard to calculate.  The
> following link is to a useful simulation that can be used to show how the
> standard error of the mean and the standard error of the median changes 
> with
> the distribution shape.  If you make a custom skewed distribution in the
> lower channels and add some outliers in the upper channels, you can see 
> for
> yourself how the standard error of the mean can be much larger than the
> standard error of the median.
>
> http://onlinestatbook.com/stat_sim/sampling_dist/index.html
>
> Jim  Wood
>
>
> On 2/17/08 3:44 PM, "Mario Roederer" <roederer@drmr.com> wrote:
>
>> Yes, the statistics on the MFI follow very closely that on the
>> frequency.
>>
>> The standard error of the mean of a population (SEM), which defines
>> how precisely you know the "true" mean, is equal to the standard
>> deviation divided by the square root of the number of events.  The SD
>> (or CV) tells you how broad the distribution is, but will not change
>> substantially as you collect more and more events.  But the SEM
>> decreases with increased events,, in your example, if you count 10,000
>> events vs. 1,000 events, the SEM will be about 30% as large on the
>> first sample -- saying that you know the mean with 3x increased
>> precision.
>>
>> Note that while the use of the SD is most appropriate for gaussian
>> (normal) distributions, the relationship between increased precision
>> of the mean and increasing numbers of events is independent of the
>> actual distribution.
>>
>> By the way, I advocate use of the median fluorescence intensity rather
>> than the mean, since the median is less subject to outliers
>> (particularly when you are dealing with log-distributions).	I don't
>> know (nor did a statistician I asked!) how the variance in the median
>> will relate to the number of events used to calculate it.  However, I
>> think the square-root relationship is probably a reasonable one to use
>> as an estimator... i.e., you need 100x the number of events to
>> increase your precision 10x.
>>
>> Finally, note that once you have more than a few dozen events on which
>> you are computing the MFI (or frequency), the statistical error in the
>> MFI (or %) is probably much lower than experimental error, so it's
>> kind of pointless to collect much more than this if your goal is only
>> to increase precision of the measurement.
>>
>> mr
>>
>> On Feb 15, 2008, at 8:50 AM, Carl Simard wrote:
>>
>>> Since we're on the subject of Poisson statistics, number of events
>>> and CV, there's a
>>> question that I'm asking myself since some time. Does all these
>>> statistics limitations
>>> also applied with MFI ?
>>>
>>> Just to give a practical example, let say I'm not interested in the
>>> proportion of cells
>>> being positive or negative for a given marker (the experiement is
>>> done on a cultured cell
>>> line and thus all cells behave pratically homogenously to a given
>>> treatment). Instead, I
>>> just want to look at change in the relative expression of this
>>> marker based on change on
>>> MFI readings. In this case, will the MFI be more significative if
>>> you count, let say, 10
>>> 000 cells versus 1000 ?
>>>
>>> Carl
>>>
>>> -----Message d'origine-----
>>> De : Howard Shapiro [mailto:hms@shapirolab.com]
>>> Envoyé : 13 février 2008 21:40
>>> À : Cytometry Mailing List
>>> Objet : Re: Statistics questions
>>>
>>>
>>>
>>> Maciej Simm wrote (in response to Petra Disterer)
>>>
>>>>
>>>>> 2. I've read about coefficient of variation and that one should have
>>>>> more than 400
>>>>> positive events to have a CV of less than 5%. In my understanding
>>>>> that means that if I have 400 positive events the probability that
>>>>> these positive events are due to chance is
>>>>> less than 5%. I'm not sure that I have understood this correctly.
>>>>
>>>> CV=100/sqrt(400) or 5%, so "yes". This was elegantly described on
>>>> this
>>>> list before - http://www.cyto.purdue.edu/hmarchiv/2001/0261.htm
>>>>
>>> I'm glad Maciej dug up the pointer to my 2001 posting, which saves
>>> me some writing this
>>> time around, but Petra seems to be laboring under a misconception
>>> about Poisson
>>> statistics. If you count n events, and there are no other sources of
>>> variance in the
>>> measurement, the "mean" of your measurement is n, i.e., the number
>>> of events you count,
>>> and the expected standard deviation of a series of counts of events
>>> from the same sample
>>> is the square root of n. Since the coefficient of variation, in per
>>> cent, is 100 times
>>> the mean divided by the standard deviation, i.e., 100 divided by the
>>> square root of n,
>>> you get 5 per cent as the minimum possible CV for a count of 400
>>> objects, 10 per cent for
>>> a count of 100 objects, 1 per cent for a count of 10,000 objects,
>>> etc. Poisson statistics
>>> therefore tell you how many objects you actually need to count to
>>> get the result to a
>>> desired level of precision. They tell you
>>> *nothing* about the probability that the events you count are or are
>>> not due to chance!!!
>>>
>>> A major reason those of us who can afford it use cytometry is that
>>> it is usually
>>> difficult for even the keenest-eyed and best-trained human observer
>>> to sit at a
>>> microscope and count several hundred of anything. When I was a
>>> medical student, one of
>>> the hardest tests my classmates and I had to do in our role as the
>>> de facto "clinical
>>> laboratory" in the emergency department of a busy city hospital was
>>> the blood
>>> reticulocyte count. Reticulocytes are immature red cells that have
>>> not completely shed
>>> what's left of their protein synthetic apparatus (ribosomes and
>>> endoplasmic reticulum).
>>> They take between one and two days to do this, and, since red cells
>>> normally last about
>>> 120 days in circulation, we expect that about one per cent of red
>>> cells in blood will be
>>> reticulocytes. Reticulocytes can be demonstrated on a blood smear by
>>> staining them with a
>>> dye such as new methylene blue, which will precipitate the ribosomes
>>> into a "network"
>>> (whence comes the term reticulocyte), which, if you are sharp-eyed,
>>> persistent, and
>>> lucky, you will see as one or a few blue dots within the red cell.
>>> The reticulocyte count
>>> goes up if someone has lost blood and is replacing it, and down if
>>> he or she has a
>>> condition such as vitamin B12 deficiency, in which the marrow isn't
>>> generating new red
>>> cells. To do a reticulocyte count on a blood smear, you look at and
>>> count 1,000 red
>>> cells, noting the number of reticulocytes you see while you do this.
>>> If a normal person
>>> has about 1 per cent reticulocytes, you can expect to count 10 of
>>> them while you cruise
>>> (or bruise) through 1,000 red cells, meaning the CV of your
>>> measurement will be over 30
>>> per cent. If you do the count the next day and only count 7, or
>>> count a whopping 13, it
>>> is not at all unlikely that there has been no real change in the
>>> patient's hematologic
>>> status. That's what we learn from Poisson statistics.
>>>
>>> These days, the Clinical Laboratory Improvement Act (CLIA) has made
>>> it illegal for
>>> medical students to be used as lab slaves, at least in the United
>>> States, and
>>> reticulocytes are typically counted in a properly certified lab in a
>>> flow cytometer,
>>> using a dye such as thiazole orange, which binds to nucleic acid,
>>> and analyzing at least
>>> a few tens of thousands of red cells in toto, which yields a
>>> measurement with a
>>> respectable CV. Since red cells spit out their nuclei on the way to
>>> becoming
>>> reticulocytes, they don't (except in pathologic situations) contain
>>> DNA, so dyes that
>>> bind to both DNA and RNA are usually OK for reticulocyte counting.
>>> It only took about
>>> five years for the hematologists to get comfortable with this.
>>>
>>> Reflecting on my career in cytometry, most of it seems to have been
>>> spent automating
>>> various parts of the "scut" lab work I was forced to do as a medical
>>> student; as many of
>>> you may know, I am now looking at cytometric diagnosis of TB (which
>>> I did do in medical
>>> school) and malaria (which I don't recall ever doing, but might have
>>> once or twice).
>>> These diseases were, and are, much bigger problems in resource-poor
>>> countries than in
>>> places where laboratories can afford both flow cytometers and the
>>> infrastructure needed
>>> to run them. TB is typically diagnosed by transmitted light
>>> microscopy of sputum smears
>>> using the Ziehl-Neelsen stain developed in 1883; malaria is
>>> diagnosed by transmitted
>>> light microscopy of blood smears using the Giemsa stain developed in
>>> 1904.
>>>
>>> The vast majority of the people who use these stains don't know how
>>> or why they work;
>>> when they try to evaluate modifications of the staining method, they
>>> typically compare
>>> slides from clinical samples on which examination of several hundred
>>> high-power
>>> microscope fields on a blood or sputum slide will often turn up
>>> fewer than ten pathogens.
>>> Since Poisson statistics have, for the most part, not impinged on
>>> the consciousness of TB
>>> and malaria diagnosticians, it is not generally appreciated that
>>> many such comparisons
>>> are meaningless.
>>>
>>> Now that LEDs have become cheap, there is a big push toward
>>> equipping TB labs in
>>> resource-poor countries with (relatively) inexpensive fluorescence
>>> microscopes, so they
>>> can use stains based on auramine O, which is a blue-excited, green
>>> fluorescent dye that
>>> stains nucleic acids (although the texts on TB erroneously describe
>>> it as staining the
>>> mycolic acid in the cell wall) instead of the Ziehl-Neelsen stain.
>>> That's going to be a
>>> waste of money; true, you can look at a slide at somewhat lower
>>> magnification using
>>> fluorescence, but you're still up against Poisson statistics, and
>>> you really need to look
>>> at much more of the slide than is practical even with a fluorescence
>>> microscope. That's
>>> what cytometry is for. If it takes the TB diagnosticians as long to
>>> catch on as it took
>>> the hematologists, we can chalk up a million or so preventable
>>> deaths to the steep
>>> learning curve. And the same problem, and the same grim numbers,
>>> turn up for malaria.
>>>
>>> The foundations of our science were laid by people very much focused
>>> on human disease
>>> (OK, so the original paper on Poisson statistics and cell counting
>>> was written by
>>> somebody at the Guinness brewery). The synthetic dyes that got us
>>> from empirical
>>> microscopy to cytometry originated from an attempted synthesis of
>>> quinine - for malaria
>>> treatment - that went wrong. Paul Ehrlich, who mastered the use of
>>> those dyes (and caught
>>> TB in the process), made the inductive leap from selective staining
>>> of different cell
>>> types to selective chemotherapy; many of the compounds he worked
>>> with came from Hoechst,
>>> still a manufacturer of both dyes and drugs.
>>>
>>> Whatever else we do with cytometry, we are all ambassadors to our
>>> colleagues. There are
>>> undoubtedly people coming through flow labs who want little more
>>> than to run their
>>> samples and get back to their patients or labs. These folks may not
>>> realize, as I hope
>>> you do, that cytometry does more than merely save time and labor.
>>> Try to see that they
>>> learn something useful while you have their attention.
>>>
>>> -Howard
>>>
>>> (P.S. A lot of this stuff will be in the new book)
>>>
>>>
>>>
>>
>>
>
>
> 
Received on Thu Feb 21 00:38:00 2008

This archive was generated by hypermail 2.1.8 : Wed Jan 31 2007 - 03:12:00 EST