Continuous variables are defined as variables that can take on any value between a minimum and maximum value. Time, distance, and height are common examples of continuous variables. In contrast, categorical variables can be obtained by counting or used to describe something categorical in nature, such as gender.
I’ve encountered my share of confounds when attempting to distinguish changes in a continuous variables. In one case, I was quantifying fluorescent intensity in ImageJ, where intensity is a continuous variable. When there is a large difference between the flourescent intensity of the two groups that I am trying to distinguish, it is easy to see differences on a graph of the raw values. However, when attempting to distinguish intensity differences that are very subtle, (as biological differences often are), the raw values of intensity can be difficult to appreciate. That is, it is difficult to convince myself and others that the change in flourescent intensity is biologically relevant. Thresholding, or turning continuous variables into discrete categories, seems to be a popular choice for addressing this problem. But where do we set a threshold such that it can maximally distinguish between multiple groups? This is an issue we must consider, particularly for data without established positive controls.
|Categorical positive control??!|
This become more complex when we consider elimination of bias. Is it possible to transform a continuous variable into a categorical variable in an unbiased fashion? How do we set a protocol for defining categories for continuous variables in a way that we are not biasing ourselves towards a particular conclusion?
We transform continuous variables into categorical variables all the time. One example that comes to mind are online surveys that categorize responders by age. I always wonder about the rationale for choosing these threshold cutoffs.
|Well reasoned or....random?|