Songmin Xie

Focus on Bioinformatics and Informatics

  博客园 :: 首页 :: 博问 :: 闪存 :: 新随笔 :: 联系 :: 订阅 订阅 :: 管理 ::

How to design a successful microarray experiment

by Rainer Breitling

Plant Science Group and
Bioinformatics Research Centre
Institute of Biomedical and Life Sciences (IBLS)
University of Glasgow
Glasgow G12 8QQ
United Kingdom

R.Breitling@bio.gla.ac.uk
http://www.brc.dcs.gla.ac.uk/~rb106x

Bioinformatics Research Center Logo

This page tries to answer some of the basic questions to be considered when planning a microarray experiment for the first time. They represent a highly subjective selection of issues that were identified during a meta-analysis of microarray experiments performed at the Sir Henry Wellcome Functional Genomics Facility in collaboration with Pawel Herzyk, and at the Molecular Plant Sciences group in collaboration with Anna Amtmann and Patrick Armengaud.

  1. Are microarray experiments more difficult to design than other studies?No, but... Microarrays are still quite expensive to perform, so you would want to do them properly from the first step on. Also, a single microarray hybridization can generate as many data points as several "classical" Ph.D. theses, so expectations towards the results will be particularly high. Failure to interpret the data due to incorrect design can be particularly embarrassing.

  2. What is a good microarray experiment?Just as there are many uses for classical experimental techniques, there are many ways to exploit microarrays. One important consideration is that you should compare samples that are similar. Don't try to maximize the number of differentially expressed genes. Some special cases:
    • Knock-out animal models. If you compare two animals, try to take samples from comparable areas. The same tissue may not be the same anymore, after you knock out a major physiological process. E.g., the testis of a steroid receptor knock-out will have little resemblance to a wild-type testis, because spermatogenesis is abolished. Therefore almost all genes will be changed in expression to some extent - and the results may be close to impossible to interpret. Microarrays are very sensitive in picking out expression changes. As a rule of thumb, it may be good to use microarrays only on samples that are impossible to distinguish by eye.
    • Tissue comparisons. Here the problems described for knock-out models are aggravated. There are probably very few sensible experiments comparing samples from different tissues (except in the context of large-scale comprehensive expression surveys). It is often even recommendable to restrict analysis to specific cell types within the same tissue, if the quantitative composition changes between conditions.
    • Drug treatments. If you are interested in a drug effect, try to examine the earliest time point possible. This minimizes secondary effects and focuses on the drug-specific changes. Of course, a time-series can be helpful, but if you know the physiological time-scale in advance, it is often more efficient to increase the number of replicates on one early time point.
    • Stably transfected cell lines. To establish a cell line that has been stably transfected by some DNA, you usually have to go through a rigorous selection process, often even including a single-cell stage. Afterwards the newly established cell line may no longer be comparable to its parent line, especially if the effect of the transfection is rather mild. The observations could be dominated by individual differences between cells that have been "amplified" by the selection process. Therefore, it is important to use a mock transfected cell line as the control, preferably one that expresses an inactive point mutant of the construct of interest. Also, as the transfection process and the subsequent selection and gene integration are not reproducible, it is recommended to use independent transfectants for each replicate, even though this is more laborious.
    • Response studies vs. condition studies. If you want to compare two tissues that are quite different (wild type vs. mutant, healthy vs. diseased) it may be more effective to compare their responses to some stimulus (a drug, an hormone, a stressor), rather than comparing their conditions directly. In this setup, each hybridization could compare a single tissue in the stimulated and unstimulated state, which should be more similar, i.e. comparable, than the two different tissues.

  3. Do I need statistical advise for my study design?Microarray experiments are biological experiments, so the most important considerations will be biological. Especially in simple cases, where you want to use microarrays as a comprehensive Northern blot, microarray-specific statistical issues can be of secondary importance at the early stages. It will, however, be very useful to involve a statistician in the analysis/interpretation process to prevent some common pitfalls, such as underestimating the "multiple testing problem" involved in examining thousands of genes at once (see below: How do I analyze my results?). And of course you should always be aware of the basic statistical and philosophical issues involved in any successful experimental design.

  4. What kind of microarrays should I use?
    • Standard arrays. If possible use arrays that many other people are using. This facilitates data exchange and the comparison of results. Your data will be much more useful and easier to interpret if you can directly compare them to other people's data. Cross-platform comparisons are possible but are presently quite tedious.
    • Whole-genome arrays. Nowadays, there are few good reasons to restrict your studies to an arbitrary selection of genes. Exceptions are experiments with organisms for which whole-genome arrays are unavailable or very focused specialized studies. However, it would be a mistake to choose a partial array just because a more comprehensive genome-wide study might yield too many unexpected (or currently unexplainable) results.
    • Single-color arrays (e.g. Affymetrix Genechip® arrays). If you just compare two conditions (mutant vs. wild type; treated vs. untreated; healthy vs. diseased) two-color arrays are the obvious technique of choice. As soon as the study becomes a bit more complex (time-course of treatment; comparison of several mutants; inter-patient comparison), two-color arrays pose so many experimental design problems, that using a one-color technique is usually advisable, even if the single hybridizations may be more expensive.
    • Annotated arrays. For many biological studies, the interpretation will rely on the available functional annotation of the genes on an array. Although, sometimes, microarray experiments are used to fish for previously unknown genes involved in a certain process, attention will usually focus on those candidates that already have been functionally characterized in some other role. Differential expression of completely novel genes is very hard to interpret. This consideration can be important in the case of Affymetrix Genechip® arrays, which in some cases are provided as chip pairs (A and B chips), where one member of a pair is enriched in all the functionally annotated genes. In these cases, consider using only the well-annotated member of the pair.

  5. How many replicates do I need?As many as possible! For an exploratory analysis 3 replicates are usually sufficient, unless the data are particularly noisy (e.g. samples from very small numbers of cells) or the expected effect is particularly small (e.g. changes occur only in very few, specialized cells in the sample). Using less than 3 replicates is not a good idea. Most important is the use of real replicates. Do equal numbers of replicates for each condition/comparison to keep the later analysis simple.

  6. What is a real replicate?A real - or biological - replicate is an independent sample that is varying all the variables that a colleague in another lab couldn't control. You want to report only observations that are general and reproducible. In an imaginary "ideal" experiment, each replicate would be performed in a different lab - so it may be advisable to approximate that situation as much as possible. That does not mean that you have to vary all the variables, if you are certain that some of them won't have any effect, e.g. the brand of standard chemicals, the phases of the moon, etc. But don't underestimate the sensitivity of microarrays, variables like batch of cells or time of day can very well have an observable effect. Most of all, be careful to prepare a perfectly matched control for every sample, even if you are going to use single-color arrays.

  7. Should I do technical replicates?No, unless you are planning a technical instead of a biological study. Repeated hybrization of the same biological sample is a waste of resources. Microarrays are reliable and it has been shown repeatedly that this kind of replication doesn't provide any biologically useful information.

  8. Should I do dye-swap experiments?No. See "Should I do technical replicates?". But of course you can use reverse labelling for some of your biological replicates if you feel like it.

  9. Should I pool samples?It is tempting to pool samples to save hybridization costs. Unless you do single-cell sampling, every samples is already a pool, as it contains mRNA from many cells. Especially if the number of cells obtained from each individual is very small, pooling is the best way of reducing the noise while keeping the number of hydridizations reasonably small. Unless you expect to find interesting inter-individual variations, e.g. in a medical study, there is little to argue against pooling. However, it is important to pool the biological material (tissue, cells), not the purified RNA or labeled cDNA! In this way, problems are far easier to spot. Don't ever include any sample that looks suspicious.

  10. How do I analyze my results?At the SHWFGF we have recently introduced two simple new statistical techniques (RankProducts [RP] and iterative GroupAnalysis [iGA]) that facilitate and enhance the interpretation of microarray experiments. Both methods provide rigorous significance estimates for your observations and perform considerably better than previous techniques, particularly for the small and noisy data sets that are often produced in biological experiments. The standard (recommended) analysis procedure used at the SHWFGF is described here and software is available for download at the GlaMA website.

  11. What else should I do?
    • Keep things simple (pair-wise comparisons, simple time-courses).
    • Try to define your expectations (hypotheses) in advance in as much detail as possible. Inventing ad hoc explanations later when you get your results is not best practice. As Ernst Wit writes in his Ethics of Chance: A statistical "method that makes use of a retrospective study of the data cannot [ever] reach the same significance level as a prior formulation of the hypothesis." This is a general issue for the performance and interpretation of scientific experiments and is particularly relevant for microarray studies with their large data sets and surprise observations.
  12. At the
  13. No, unless you are planning a technical instead of a biological study. Repeated hybrization of the same biological sample is a waste of resources. Microarrays are reliable and it has been shown repeatedly that this kind of replication doesn't provide any biologically useful information. No. See
  14. No, but... Microarrays are still quite expensive to perform, so you would want to do them properly from the first step on. Also, a single microarray hybridization can generate as many data points as several "classical" Ph.D. theses, so expectations towards the results will be particularly high. Failure to interpret the data due to incorrect design can be particularly embarrassing. Just as there are many uses for classical experimental techniques, there are many ways to exploit microarrays. One important consideration is that you should compare samples that are . Don't try to maximize the number of differentially expressed genes. Some special cases: Microarray experiments are biological experiments, so the most important considerations will be biological. Especially in simple cases, where you want to use microarrays as a comprehensive , microarray-specific statistical issues can be of secondary importance at the early stages. It will, however, be very useful to involve a statistician in the analysis/interpretation process to prevent some common pitfalls, such as underestimating the "multiple testing problem" involved in examining thousands of genes at once (see below: ). And of course you should always be aware of the basic statistical and philosophical issues involved in any successful experimental design. As many as possible! For an exploratory analysis are usually sufficient, unless the data are particularly noisy (e.g. samples from very small numbers of cells) or the expected effect is particularly small (e.g. changes occur only in very few, specialized cells in the sample). Using less than 3 replicates is not a good idea. Most important is the use of . Do equal numbers of replicates for each condition/comparison to keep the later analysis simple. A real - or biological - replicate is an independent sample that is varying all the variables that a colleague in another lab couldn't control. You want to report only observations that are general and reproducible. In an imaginary "ideal" experiment, each replicate would be performed in a different lab - so it may be advisable to approximate that situation as much as possible. That does not mean that you have to vary the variables, if you are certain that some of them won't have any effect, e.g. the brand of standard chemicals, the phases of the moon, etc. But don't underestimate the sensitivity of microarrays, variables like or can very well have an observable effect. Most of all, be careful to prepare a for every sample, even if you are going to use single-color arrays. . But of course you can use reverse labelling for some of your replicates if you feel like it. It is tempting to pool samples to save hybridization costs. Unless you do single-cell sampling, every samples is already a pool, as it contains mRNA from many cells. Especially if the number of cells obtained from each individual is very small, pooling is the best way of reducing the noise while keeping the number of hydridizations reasonably small. Unless you expect to find interesting inter-individual variations, e.g. in a medical study, there is little to argue against pooling. However, it is important to pool the biological material (tissue, cells), not the purified RNA or labeled cDNA! In this way, problems are far easier to spot. Don't ever include any sample that looks suspicious. we have recently introduced two simple new statistical techniques ( [RP] and [iGA]) that facilitate and enhance the interpretation of microarray experiments. Both methods provide rigorous significance estimates for your observations and perform considerably better than previous techniques, particularly for the small and noisy data sets that are often produced in biological experiments. The standard (recommended) analysis procedure used at the SHWFGF is described and software is available for .
posted on 2005-01-05 23:36  Songmin Xie  阅读(475)  评论(0)    收藏  举报