Defining the type of data you will produce is a key step in
designing an experiment. While categorizing data as discrete or continuous is
one of the most straightforward concepts in statistics, things are not always
as simple as they seem. Giving thought to typical data produced in my lab
revealed that discrete data is not always discrete and continuous data is not
always continuous. It is therefore important to understand your data in the
form used for statistical testing and draw appropriate conclusions.
Discrete data may include whole numbers (5
pups in a litter) or categorical assignments (pregnant vs not pregnant) while
continuous data is described as being scalar and able to take on a range of
values when measured (e.g. height, weight, , time etc.). Data may start out as
discrete or continuous, but categorization or manipulation of the data can
quickly change its properties. For instance, binning of continuous data, such
as weight of mice into underweight, normal, and overweight produces discrete
data from the same measures. A major focus of my lab is the study of conditions
and factors which influence gene reassortment of influenza viruses. To measure reassortment levels, we employ a
two-virus system in which the original virus, designated “wild-type” (Wt), and
a “variant” (Var) virus differ only by engineered silent mutations. These
markers allow differentiation of segments by high resolution melt analysis,
with mutations conferring different melt temperatures for segments of the Wt or
Var origin. Genotypes which contain all segments from either the Wt or Var
virus are categorized as “parental” while a virus isolate containing any
combination of Wt and Var gene segments are “reassortant.”
The reassortment
level arising from a coinfection is determined by calculating percent
reassortment as follows:
Much like the
transformation of continuous data by binning into groups, discrete genotypes are
transformed into a value for % reassortment and then treated as continuous data.
Despite not being true continuous data discussed here,
the percent value is routinely treated as such for meaningful statistical
comparisons between groups.Though the underlying data is discrete, it intuitively makes sense to treat the values this way as we are describing
the features of a population instead of individual data. For instance,
calculating the percent of your mice which are pregnant, say 75% does not mean
that mice are 75% pregnant, it means that 75% of your sample population is
pregnant. Being able to accurately define the type of data you are using for
analysis enables one to choose statistical tests and appropriately describe results.
However, while saying 75% of 4 mice are pregnant holds meaning, 75% of 2 mice does not. Really cool research to read about, I'm wondering, though, are there any other similar restrictions/constraints on your 'continuous' value of % reassortment? And how does that affect the statistical analyses you perform?
ReplyDelete