Correlation and regression analysis of athletes` complex traits, based on their personal data, genetic and biochemical parameters

Фотографии: 

ˑ: 

PhD, Leading Researcher of the Institute of Translational Biomedicine O.S. Glotov 1, 4
Researcher of the Institute of Translational Biomedicine I.V. Poliakova1, 4
PhD, Leading Researcher of the Center for Advanced Research D.V. Leshchev1, 3, 4
Specialist of the RC "Biobank Centre" (the Research Park of St. Petersburg State University), Russia M.M. Danilova1
PhD,  Head of the RC "Biobank Centre" (the Research Park of St. Petersburg State University), Russia A.S. Glotov 1, 4
PhD, Associate Professor of the Department of Preventive Medicine and Health Basics A.V. Kergaard2
PhD, professor of the Department of Preventive Medicine and Health Basics R.B. Tsallagova2
Dr.Med., Researcher of the Institute of Translational Biomedicine V.S. Pakin1
PhD, Head of the Organization of Sporting Activities S.Sh. Namozova1
PhD, Deputy Head of the Department of Postgraduate Medical Education A.M. Sarana1, 4
PhD, professor, Head of the Department of Postgraduate Medical Education S.G. Sherbak1, 4
1St. Petersburg State University
2Lesgaft St. Petersburg National State University of Physical Education, Sports and Health
3Peter the Great St. Petersburg Polytechnic University
4City Hospital No.40, St. Petersburg

ABSTRACT. PCR-RFLP analysis of 26 genes was carried out with 102 students of St. Petersburg State University of 18 - 19 years of age practicing aerobic sports. Correlation and regression analyses were applied to study the relationship of genotype to relevant vital lung capacity (VLC). The regression analysis included genotype, sex, physical activity, smoking, alcohol consumption, and also some biochemical blood parameters. Three methods of regressors selection (forward selection, backward elimination and stepwise) were used to create significant models. The regression analysis substantiated statistically significant associations between VLC and AGTR2, NOS3, CNB1, ADRB2 genes whereas the Kendall rank correlation coefficient was significant only for VLC and NOS3 gene.

 Introduction. One of the important goals of sports genetics is to analyze the alleles that control complex traits [3,8,9]. Allele polymorphism testing enables specialists to distinguish persons with positive reactions on extra physical exercise from those having negative reactions on them, for whom such extra physical exercise can be not useful or even harmful [11], causing physical and psychological disturbances, developing different multifactorial diseases [8,12,13].

The important criteria for estimation of the risk of development of these diseases are the results of multiple measurements (monitoring) of different physiological indicators, such as vital lung capacity (VLC), pulse, arterial pressure and body mass index, representing typical complex polygenic human traits [14,15]. Taking into consideration significant individual and temporal variability of these physiological parameters, it is interesting to find out the extent to which they correlate with the individual features of the genome and if it is possible to use the identified correlations for construction of various models of health [4,16].

The purpose of the study is to search correlations and build regression models for early forecast of physical development parameters, using different methods of regressors selection, and the improvement of methodological framework for further studies to test many genes in samples of a large size.

Methods and organization of the study. VLC was selected as the object of investigation because it is a highly heritable trait and changes under the influence of specific physical activities. Models of VLC function are especially important in preventive and sports medicine, forecast of sports success and development of physical abilities of the body, since the possibility of prediction of VLC value will enable specialists to take measures aimed at the timely correction of this significant parameter [8].

102 students of 18-19 years of age, living in the Northwest region of the Russian Federation and practicing aerobic sports, were involved in the study. Each subject provided a written informed consent, filled out an individual questionnaire about his/her place of birth, sex, age, relatives, education, work, physical activity, eating habits, other habits, anthropometric data and medical history. The values of the following biochemical blood parameters were determined: the level of fibrinogen, homocysteine and malonic dialdehyde (MDA); the activity of glutathione peroxidase, glutathione reductase, catalase and superoxide dismutase; the concentration of reduced glutathione. VLC was measured using the SP-3000 spirometer.

Using the methods of PCR-RFLP analysis and microarray hybridization, we studied polymorphic allelic variants of the renin-angiotensin system, as well as the factors of blood coagulation, detoxication and metabolism: AGT (rs699), AGTR1 (rs5186), AGTR2 (rs11091046), BKR_BDKRB2 (rs1799722), REN (G/A substitution in intron 8), MTHFR (rs1801133), ADRB2_48AG (rs1042713), ADRB2_81CG (rs1042714), MDR1 (rs1045642), F2 (rs1799963) F5 (rs6025), F1_FRB (rs1800790), GP3A_ITGB3 (rs5918), PAI1 (rs1799768), F7 (rs6046), PPARA (rs4253778), PPARG (rs1801282), PPARD (rs2016520), GSTT1 ("zero" genotype), ("zero" genotype), GSTP1 (rs1695, rs1138272), NOS3 (5>4 - 4 и 5 repeats 27 bp), ACE (rs4340), TNFA_238GA (rs361525), TNFA_308GA (rs1800629), ACTN3 (rs1815739), AMPD1 (rs17602729), CNB1_PPP3R1 (5I/5D). Alleles (SNP) were identified according to the previously described procedure [1,2,5].

To check the compliance of the frequency of alleles (genotypes) with Hardy–Weinberg equation, a chi-square test (c2 test) was used. Each genotype was recorded as a rank in accordance with the number of minor alleles.

The strength of the relationship between the genotype and the studied parameters was assessed by means of the Kendall rank correlation coefficient. Due to the significant repetition of the allele grades, the adjusted Kendall rank correlation coefficient was used [6].

The method of multiple linear regression was applied to identify weak correlations and create effective models for prediction of the quantitative phenotypic traits, based on studies of a large number of genes. This method is effective for working with continuous values [4].

In addition to the results of genetic testing, some other parameters, such as sex, physical activity (rank), smoking, alcohol consumption (rank) and biochemical blood parameters were included into the analysis.

Considering all the parameters in the regression model led to a very small number of significant coefficients in it. We compared three methods of regressors selection [7]. For primary and statistical data the programs MS Excel and Deductor Academic were used [10a].

Results of the study and discussion. The VLC values ranged from 2.0 to 5.9 l with the average capacity of 3.3 l. The genotype distribution of 23 genes in the study group was in line with the Hardy-Weinberg law (p>0,05), and the genotype distribution of 3 genes (ADRB2, PPARA, ACTN3) differed significantly (p <0.05), which may be caused by the population characteristics of the studied sample.

Using the Kendall rank correlation coefficients for VLC, only one significant correlation was found between the VLC value and NOS3 genotype (the Kendall rank correlation coefficient t = 0,33 Tcr = 0,13).

Regression analysis was much more sensitive. When building a regression model of the considered traits of the athletes, their personal data, biochemical parameters and the results of genotyping of 26 genes were used.

The selection of regressors was performed in the ways that the program Deductor Academic offers (with different probabilities in the F-test) [10b]. It was found that stepwise selection led to a model with only 2 regressors: sex and catalase activity.

In case of forward selection, inclusion of the variable in the F-test with 5% probability provided a model with 4 regressors, including 1 gene – NOS3. Increasing the probability to 10% (forward selection) gave a model with 8 regressors (including 4 genes, where 1 gene and 1 additional parameter (alcohol consumption) had coefficients with p-value > 5%). The probability of 15% led to the model with 10 regressors (including 5 genes, but 1 gene and 2 additional parameters had coefficients with p-value > 5%). In the selection with the probability of 20% or more the model remained unchanged.

The models changed in a similar way when we used backward elimination. With the probability of 15% F of inclusion/exclusion of the variable, the models of forward selection and backward elimination were the same. These models are shown in Table 1.

Table 1. Regression models for VLC prediction, based on 3 methods of regressors selection (forward selection, backward elimination and stepwise selection)

 

Full inclusion

Forward selection and backward elimination

Stepwise selection

Parameter

b

e

b

t

p

b

e

b

t

p

b

e

b

t

p

Constant

4,23

1,24

 

3,4

0,0011

5,13

0,40

 

12,7

0

3,63

0,12

 

29,40

0

MDA concentration

-0,33

0,13

-0,26

-2,45

0,017

-0,34

0,10

-0,27

-3,42

9Е-04

 

 

 

 

 

Catalase activity

-0,007

0,0022

-0,45

-3,11

0,003

-0,005

0,0012

-0,35

-4,34

4Е-05

-0,006

0,0012

-0,37

-4,79

6Е-06

Glutathione peroxidase activity

-0,004

0,0039

-0,11

-0,95

0,35

-0,004

0,0027

-0,13

-1,61

0,11

 

 

 

 

 

Alcohol consumption

-0,071

0,057

-0,14

-1,25

0,22

-0,74

0,04

-0,16

-1,88

0,063

 

 

 

 

 

Sex

0,88

0,12

0,52

7,3

1Е-10

0,90

0,12

0,54

7,47

5Е-11

0,81

0,13

0,48

6,25

1Е-08

AGTR2

-0,18

0,078

-0,21

-2,28

0,026

-0,12

0,059

-0,15

-2,12

0,037

 

 

 

 

 

NOS3

0,095

0,096

0,095

0,98

0,33

0,17

0,10

0,17

1,67

0,027

 

 

 

 

 

CNB1

-0,54

0,18

-0,32

-3,1

0,003

-0,30

0,07

-0,18

2,25

0,011

 

 

 

 

 

ADRB2-2

-0,35

0,12

-0,32

-2,98

0,004

-0,20

0,078

-0,18

-2,50

0,014

 

 

 

 

 

GP3A

0,117

0,14

0,088

0,85

0,39

0,17

0,10

0,13

1,67

0,098

 

 

 

 

 

R2

0,65

0,56

0,42

Corrected. R2

0,42

0,51

0,41

Standard deviation

0,48

0,43

0,48

Criterion F

2,77

11,7

35,5

Significance

1,6Е-04

9,5Е-13

2,5Е-12

Note: b — the regression coefficient of the parameter or the free term, e — the standard error value for the corresponding coefficient, b — the coefficient of the parameter in the standardized regression equation [11,15], t — the t-value, p — the p-value. The results of forward selection are given for of 15% value of the probability of F inclusion / exclusion of the variable, backward elimination — 15%, stepwise — 15% to 15%.

The regression model revealed a statistically significant relationship between VLC and alleles of these genes: AGTR2 (negative, minor allele A), NOS3 (positive, the minor allele 4), CNB1 (negative, minor allele D), ADRB2_81CG (negative, minor allele G). The significant Kendall rank correlation coefficient was registered only for the NOS3 gene. The results indicate higher sensitivity of the regression analysis compared with the correlation analysis.

Conclusion. The regression model enables specialists to carry out initial evaluation of phenotypic indicators of individuals (VLC), based on results of genetic testing and analysis of additional parameters. Due to the fact that VLC is a highly heritable trait and an indicator of the risk of multifactorial diseases, the developed models can be applied for assessment of the risk of these diseases as well as for prediction of sporting success and development of physical abilities of the body.

Acknowledgements. This work was supported by a grant from RSF № 14-50-00069.

References

  1. Akhmetov, I.I., Netreba, A.I., Glotov, A.S. et al. Vyyavlenie geneticheskikh faktorov, determiniruyushchikh individual'nye razlichiya v priroste myshechnoy sily i massy v otvet na silovye uprazhneniya (Identification of genetic factors that determine individual differences in the growth of muscle mass and strength in response to the power exercises) / Molecular biological technology to increase efficiency in the conditions of intense physical activity. Coll. articles. 2007, I. 3. P.13-21.
  2. Glotov, A.S., Vashukova, E.S., Polushkina, L.B. et al. Diagnostika nasledstvenno obuslovlennykh zabolevaniy u detey s pomoshch'u DNK-mikrochipovoy tekhnologii (Diagnostics of hereditary diseases in children using DNA microarray technology) // Voprosy diagnostiki v pediatrii (Questions of diagnostics in pediatrics). 2009, V. 1, № 1, P. 14-17.
  3. Glotov, O.S., Glotov, A.S., Pushkin, V.S. Baranov, V.S. Monitoring zdorov'ya cheloveka — vozmozhnosti sovremennoy genetiki (Monitoring of human health - the possibilities of modern genetics // Vestnik Sankt-Peterburgskogo universiteta (Bulletin of St. Petersburg State University). 2013, V. 3, № 2. P. 95‑107.
  4. Puzyrev, V.P. Fenomo-genomnye otnosheniya i patogenetika mnogofaktornykh zabolevaniy (The phenomo-genomic relationships and patogenetics of multifactorial diseases) // Vestnik Rossiyskoy akademii meditsinskikh nauk (Bulletin of the Russian Academy of Medical Science). 2011, № 9. P. 17-27.
  5. Rebrova, O.Yu. Statisticheskiy analiz meditsinskikh dannykh. Primenenie paketa programm Statistica (Statistical analysis of medical data. Application of Statistica software package). Moscow. MediaSphere. 2013. 312 p.
  6. Tarkovskaya, I.V., Glotov, O.S.,  Ditkin, E.Y. et al. Analiz assotsiatsii polimorfizma genov metabolizma lipidov s indeksom massy tela, obkhvatom talii i parametrami lipidogrammy krovi u zhenshchin. (Analysis of the association of polymorphisms of lipid metabolism genes with body mass index, waist circumference and blood lipid profile parameters in women) // Ekologicheskaya genetika (Ecological Genetics). 2012, V. 10, № 4. P.66-77.
  7. Shmoylova, R.A., Minashkin, V.G., Sadovnikov, N.A., Shuvalov, E.B. Obshchaya teoriya statistiki (General Theory of Statistics) / Ed. R.A. Shmoylova., 4th edition, revised. Moscow: Finance and Statistics. 2004. 656 p.
  8. Allen H.L., Weedon M.N., Wood A.R., et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height // Nature. 2010. V. 467. № 7317. P. 832-838.
  9. Branicki F. Liu K. van Duijn et al. Model-based prediction of human hair color using DNA variants. Human Genetics, 2011, № 129, P.443–454.
  10. Deductor Academic, date of access 25.08.2014 a) http://www.basegroup.ru/deductor/, b) http://www.basegroup.ru/library/analysis/regression/feature_selection (electronic resource).
  11. Jennie E.P., Ben J.H., Sunduimijid B., Michael E.G. Polymorphic Regions Affecting Human Height Also Control Stature in Cattle. Genetics, 2011, V. 187, № 3, P.981–984.
  12. Leung F.P., Yung L.M., Laher I. et al. Exercise, vascular wall and cardiovascular diseases: an update (Part 1). Sports Med., 2008, V. 38, № 12, P.1009-1024. 
  13. Montgomery H.E., Marshall R., Hemingway H. et al. Human gene for physical performance. Nature, 1998, V. 393, P.221-222.
  14. Puthucheary Z., Skipworth J.R., Montgomery H.E. The ACE gene and human performance: 12 years on. Sports Med., 2011, V. 41, № 6, P.433-448.
  15. Wood D., De Backer G., Faergeman O. Prevention of Coronary Heart Disease in Clinical Practice. Recommendations of the Second Joint Task Force of the European and other Societies on Coronary Prevention. Eur Heart J., 1998, № 19. Р.1434–1503.
  16. Xu S., Hu Z. Generalized Linear Model for Interval Mapping of Quantitative Trait Loci. Theor. Appl. Genet. 2010, V. 121, № 1, P.47–63.

Corresponding author: olglotov@mail.ru