Points de repère dans l'analyse de la stabilité et de l'interaction génotype-milieu en amélioration des plantesM. Brancourt-Hulmela, V. Biarnès-Dumoulina and J.B. Denisb
a Laboratoire de génétique et d'amélioration des plantes, Inra, F-80200 Estrées-Mons
b Laboratoire de biométrie, Inra, route de Saint-Cyr, F-78026 Versailles cedex, France
Abstract - Guiding marks on stability and genotype-environment interaction analyses in plant breeding. In plant breeding studies, different statistical stabilities between genotypes or genotype-environment interactions (GEI) must often be considered since genotype responses differ from one environment to another. This paper reviews the statistical techniques used in recent literature up to 1996 and the most recent developments are described. First, stability concepts are reviewed and genotype-environment interaction is defined according to the following notation: where E [Y ge] is the expectation of a given observation Yge for genotype g and environment e, p is the grand mean, αg is the genotype main effect, βe the environment main effect and αβ ge is the interaction between genotype and environment, defined as the complement from the additive model (p + αg + βe ). Then, main statistical methods are presented and classified from an interpreting point of view into five main approaches: (1) Uniparametric approaches: stability or GEI is described with a single parameter. Environmental variance can be set so as to differ for each genotype, which was first introduced by Roemer (1917, cited from Becker and Léon, 1988) and written as follows: μ and a have the same meaning as in the first model and σ 2 are variance parameters associated with each genotype. The joint regression model, first proposed by Yates and Cochran (1938), which uses environment main effect as a pseudo covariate for modelling the interaction term, also belongs to this category: where ρg is the genotype slope or genotype regression coefficient that describes the genotype response to environment potentiality estimated by βe, its main effect. Other terms of the model, E [Y ge], μ and α g are defined as in the first model. This family of models is attractive for the simplicity of its interpretations. Most authors have concluded that these models oversimplify and have added new parameters such as goodness of fit. This leads to more sophisticated families of models. (2) Multiparametric fixed approaches: GEI is modelled by means of several parameters associated with each genotype. There are two basic models: biadditive (or AMMI) models and factorial regression models. They can be extended and combined in several ways, see Gauch (1992) and van Eeuwijk et al (1996). The multiplicative model is written: where λ1 is the singular value that accounts for the interactive part explained by the first term, γ g1 is the normalised genotype vector describing genotype differences and δe1 similarly describes the environments; λ2, γg2 and δe2 are assigned to the second term involving orthogonality constraints with the first term and so on. As previously, other terms of the model, (μ + αg + βe ), correspond to the additive part of the model. The factorial regression model can be written: where θ kh, α'gh and β'ek are regression parameters involving H environment covariates Eeh and K genotype covariates Ggk. Again (μ + α g + βe) is the additive part of the model. A common feature of the AMMI model and factorial regression is that both describe the interaction multiplicatively as a genotype score times an environment score. However, in the AMMI model, both parameters are unknown (bilinear model in parameters), while only a single parameter is unknown in regression, implying a linear model. From a practical point of view, regression is thought to be easier for interpretation but on the other hand it requires that relevant covariates be available. (3) Mixed (random and fixed) parametric approaches: starting from a pioneer work of Shukla (1972), factorial regression models can also be used when environments are considered as a random factor and heteroscedastic genotype variances are introduced; see Denis et al (1997) for a recent development. (4) Nonparametric approaches: this family includes different methods whose common feature is based on genotype ranking and not on estimation or prediction of genotype performances. This is indeed an attractive aim in many breeding programs where breeders are interested in rank order for choosing the best genotypes. In such cases, relative comparisons are sufficient and there is no need to assess the levels. (5) Clustering approaches: here the idea is not to obtain a continuous function modelling the interaction but to identify clusters of similar genotypes and/or clusters of similar environments such that most of the interactive variability is captured by the groups of genotypes and/or environments (defining 'between' effects). From a statistical as well as an interpreting point of view, a crucial distinction has to be made according to whether the clusters are determined a priori (by additional information) or a posteriori (based on the data to be explained). In the last section, comparisons of most of the previous methods are carried out, mainly by means of tables summarising results obtained from the literature (tables II, IV, V, VIII and IX and fig 2). Among them, figure 2 depicts 52 interaction studies using either joint regression, multiplicative approach or factorial regression. These interaction studies are characterised by the proportion of parameters used by the model with respect to the complete interaction (the 'cost' or in the reverse term the 'parsimony') and the proportion of interaction explained by the model (the 'efficiency'). As illustrated in this figure, the AMMI model and factorial regression are equally efficient and much better than joint regression. Our advice is to use factorial regression when relevant covariates are available, owing to its easier interpretation.
Résumé - En amélioration des plantes, le chercheur est souvent amené à réaliser des analyses de stabilité ou d'interaction génotype-milieu. Des revues bibliographiques existent sur le sujet et décrivent des approches différentes selon les auteurs. Le présent article propose une classification des principales méthodes utilisées pour une période allant jusqu'en 1996 en mettant l'accent sur les plus récentes, notamment les méthodes qui font intervenir plusieurs paramètres pour décrire la stabilité des génotypes. En vue de comparer la régression conjointe, la modélisation multiplicative de l'interaction (ou modèle AMMI) et la régression factorielle sur la base de l'efficacité (mesurée par le pourcentage de la somme des carrés des écarts de l'interaction décomposée par le modèle) et de la parcimonie (appréciée par le nombre de degrés de liberté utilisés par le modèle), diverses récapitulations ont été réalisées. Pour chaque méthode, elles s'appuient sur la littérature et mentionnent diverses caractéristiques telles que l'espèce étudiée, la variable analysée, le nombre de génotypes et d'environnements, l'efficacité, la parcimonie et le rapport entre les deux dernières. En général, ceci met en valeur la modélisation multiplicative de l'interaction et la régression factorielle. Cette dernière permet en outre de proposer une explication biologique à l'interaction.
Key words: stability / genotype-environment interaction / plant breeding
Mots clés : stabilité / interaction génotype-milieu / amélioration des plantes