Monday, April 25, 2016

Bias with Western Blot quantification on Image J

As a molecular scientist, I perform a lot of western blots. As you may know, western blots are pretty useless unless you can quantify the difference between the lanes. Image J (or FIJI) is frequently utilized to quantify the results from western blots. However, I have always been weary that there are multiple ways to introduce bias into your quantification with this program. I have wondered if the same person were to analyze the same western blot multiple times, would they get the same values each time? To explore this, I decided to utilize an old western blot image and see what different values I could get without trying to be biased.
To start, I utilized the western blot below. Since this was just an exercise of intrigue, i only analyzed the top bands on the last three columns.

To use Image J, you must select the areas that you want to quantify:
I believe that this first step can cause variability. If a researcher were to make the selected area more narrow or more wide, they may pick up noise surrounding it and make their signal seem greater than it is. To examine this further, I selected those bands three different times and produced the following intensity graphs:
"Attempt 1""Attempt 2"
"Attempt 3"

As of now, there does not look like too much of a difference between these results. To quantify your results on Image J, you use a selection tool to calculate the area under the curve. After quantification, I performed a two-way ANOVA with Tukey's multiple comparisons. I wanted to see if 1) the quantification of the same lane altered between "attempts" and 2) if the relationship between the lanes was the same throughout the different "attempts". It is important to note that the following quantifications do not take noise from the western blot into account.
I was quite surprised to find that the three attempts significantly differed from one another. However, the relationships between the lanes was preserved. 

As mentioned, this method of quantifying did not take noise into account. To correct for noise, Image J allows you to draw a line at the level of noise to act as a threshold. The problem with this is that you cannot really standardize it. Typically, a researcher has to "eyeball" the average level of the noise. To complete this section, I attempted to be as unbiased as possible.The plots will then look like this:
"Attempt One" "Attempt Two"
"Attempt Three"

I then took the area under the curve for these noise-corrected plots and performed another two-way ANOVA with Tukey's multiple comparisons. 
With these normalized values, there was NOT a significant difference between the values for each attempt. Interestingly, normalizing the values revealed a difference between Lane 2 and Lane 3 that were not significant prior to normalization. 

Obviously, we cannot overreach these observations to all wester blot analyses. It is safe to assume that to be as accurate and unbiased as possible, one most control for noise in there western blots. However, I was very surprised that the effect betweens the lanes was preserved amongst all three "attempts." I would like more examples demonstrating this before I get too comfortable with western blot analysis. The next step to examine bias with Image J analysis would be to have multiple people analyze the same western blot. Until then, we must all be careful when analyzing western blots and try to be as consistent as possible. 


  1. This is a really interesting analysis, that I have always wondered about. It makes sense that there is variation between your attempts because of the area around the band, and I wonder if it would be possible to take that into consideration when you are doing western blot analysis. Something like a technical replicate of the area selection? As you mentioned, it is important to correct for noise and that is another added step of variance and potential bias. I am curious if you did the same analysis with a loading control, and then normalized the signal. I realize this was just a quick experiment of intrigue, and it looks like maybe that top band is the control, or a spurious band. Regardless your analysis could shows the consequences of these types of analyses, and it makes
    me wonder if the loading control would help. Or if it would just be another layer of area selection bias.
    This also brings to light a point that sometimes we rely to heavily on the "statistical significance" of these band intensities. Again, this is all just me observing this band without knowing the scientific relevance (or really the experiment at all), but when I look at those bands I do not see a difference in intensities. Since your analysis says there is, it is important to know if this difference is biologically relevant.

  2. I find this post extremely interesting, my lab is a physiology lab and we try not to do western blots unless we absolutely have to. We recently had to perform a couple (always trying to please grant reviewers) and while I was analyzing the data using ImajeJ I started to wonder the exact same thing. Even though the lanes are parallel once you take the picture you have to take into account the noise, the amount of protein loaded into your gel and as you mentioned the bias created when you are responsible for selecting the correct area for the quantification. Even though differences in the amount of protein can sometimes be detected with the naked eye quantification with the software we must be aware of the bias we introduce.
    Like Amanda mentioned all of the quantification goes back to the significance and the magical p<0.05! However, seeing is believing so when we see that your lane 1 has a thinner band when compared to lane 6 can we take that as experimentally relevant? Or will we always be trapped in the paradigm of significance.

  3. This is really an interesting post to me as I've been wondering this for quite a long time ago when I first did Western Blot and was told that I could use the ImageJ to analyze the bands. Actually I've done many blots in past few years, but only a few times when I really used the ImageJ and wanted to see the differences in intensity. This doesn't mean it's not helpful at all but I would recommend use ImageJ to analyze band intensity with caution.

    There are a couple of factors that could make ImageJ analysis tricky. First thing is the sensitivity of your developing reagent and films or the imaging system you use. There are multiple choices outside and what we use in our lab is traditional film system and it's not so sensitive to distinguish slight difference. Actually one member of our lab once tried the loading control, where he loaded the same protein lysates at scaler amounts, say, 2ug, 4ug, 8ug,20ug. Then he did the blot and analyzed with ImageJ and he didn't see the intensity increase the same folds as he loaded in the first place. The trends were right, definitely greater intensity with more loading, but it was not two-folds of intensity in 4ug, or 4 folds in 8ug compared to 2ug loading. So I would doubt the amount inference made by ImageJ intensity analysis, let alone performing statistical analysis afterwards under our lab setting.

    The second thing I bear in mind is that even if you have highly sensitive system, you should still make sure that what you compare have been done in one experiment and analyzed in one-shot to decrease the possibility of error or bias. As you've shown in the blog about setting the line in the bottom to calculate the value under the curve, this could be tricky too as you might draw quite different lines at different analyzing period. I think this is related to one lecture in our class that you should gather all the data you need first, and then you sit down and analyze it following the plan.

    With all that, that's why my advisor always says, "don't use imageJ to analyze the data and show the stats, you need to first persuade me via my eyeball."

  4. Thanks for doing this proof of principle work with ImageJ. I think it is essential that you are able to establish the validity of your measurements before they are used for experimental analysis and draw conclusions from them. These validations are not always privy to the reader following publication and I have wondered about standardization and quantifications with other techniqes. Our program recently had a seminar speaker give a talk on swarming bacteria which merge or do not merge their swarm zones on agar plates with each other based on relatedness of the strains. The "merging" behavior was denoted by whether a "boundary line" was formed between the two swarm zones. Some images of boundary lines were unclear and the distinctions did not seem absolute. The establishment of a reliable and accurate method of determining the population behavior, as you have validated ImageJ here, would have really have boosted my confidence in the speaker's data and conclusions. It's my hope that as we develop as scientists, we will take the approach you have pursue scientific integrity by validating our methods and becoming more knowledgeable about the conclusions we are qualified to make.

  5. With ImageJ quantification of western blots the main issue I ran into, as you point out, is where to draw the line to remove the noise. I was taught to draw the line from the plateau (representing the baseline/background signal) on one side of the peak to the plateau on the other, while the other person analyzing the data was taught to draw the line from the local minima on one side of the peak to the local minima on the other side of the peak. We were sometimes getting very different results so we decided to run a two sample t-test for scores on a few of our westerns to see if there was a difference between my score and his scoring. We found that there was and ultimately we had to standardize our scoring methods. We came up with a protocol to try to encompass as many of the possible variations and circumstances we may encounter and ultimately our scoring was no longer significantly different. However, in the process of creating this protocol we talked to several postdocs and junior PIs who regularly ran westerns and each of them had their own variations on how to perform this analysis. I am unaware of a formal protocol for ImageJ analysis of Westerns, but as it is rarely (if ever) reported in methods how one went about this analysis (other than they performed quantitative analysis using ImageJ) and given the degree to which scorers can vary their methods perhaps a more formal protocol is necessary. If the data themselves are biased no manner of statistics can completely correct for that.
    The way our lab perhaps accounted for this was the standard we went by as far as publishable western data was that, while it is important to do quantitative analysis on westerns, if you cannot see the result qualitatively (i.e. just be able to see that band sizes are different by the naked eye), especially given the degree of variation present in our methods of analysis, your result is much less believable and perhaps ought to be taken with a grain of salt, and thus would not be considered publishable. I would be interested to hear what the practice of the labs of those of you that run westerns is with regards to qualitative vs quantitative analysis.

  6. Westerns were my bread and butter in undergrad and even then quantifying them made me nervous in ways that flow plots still don't now. It always feels like an #overly honest methods post to say that you eyeballed the noise level. Westerns also always worried me with the fact that after a certain point, you lose any higher signal as the film is as developed as it can be, and the currently exposed portion is large enough to limit the area around it that can also react. Westerns are still a great tool, but I do wish we had less direct quantification methods for interpreting them, or were less reliant on those quantifications vs presence/absence or bloody obvious differences.