Lack of Predictive Value of Ovarian Reserve Tests for Pregnancy Likelihood. The Huge Difference Between Quantity and Quality

Objective: To evaluate the predictive value of different variables on the number of achieved oocytes and the likelihood of pregnancy after IVF. Design: Prospective cohort study. Setting: University associated private AR center. Patients: 718 patients undergoing their first IVF treatment during 2016 and 2017. Interventions: None. Main outcome measures: The study had two objectives. First, to identify factors capable of predicting the quantitative ovarian response of patients by applying a multiple linear regression analysis. Evaluated variables included age, BMI, antral follicle count (AFC), anti-Müllerian hormone (AMH), basal FSH, LH and Estradiol determination, the amount of administered rFSH during ovarian stimulation and Estradiol and Progesterone levels on day of ovulation triggering. Second, to assess, by applying a multivariate logistic analysis, whether the same parameters, plus the number of achieved oocytes, were useful for identifying chances of pregnancy. Results: Whereas AFC was the most accurate factor predicting the number of achieved oocytes, with negligible added value of the remaining variables, age represented the most important factor influencing successful pregnancies. AMH, AFC and achieved oocytes after controlled ovarian stimulation do not predict the probability of pregnancy. Conclusion: Reproductive potential is more related to factors associated to oocyte and embryo quality than the number of achieved oocytes after COS. Our data show that age is the most important variable defining the probability of ongoing pregnancy and that neither AFC nor AMH can be used as criteria for ART exclusion, but as a tool for counselling and proposing different strategies such as oocyte accumulation.


Introduction
Delayed childbearing is a common feature in couples visiting fertility clinics. As a consequence, poorer results could be expected in terms of pregnancy rates due to two different factors related to age: declining of follicular pool (quantity) and oocyte quality [1]. Thus, the success of in vitro fertilization (IVF) depends to a large extent on both, the number and quality of oocytes obtained at the time of oocyte retrieval after controlled ovarian stimulation (COS) [2]. Whereas, number of oocytes may be predicted by using ovarian functional reserve tests (ORT), chance of pregnancy resulting in a healthy newborn depends on factors not directly related to number of oocytes.
Ovarian functional reserve has been identified as a key factor in the success of assisted reproductive technologies (ARTs). As a result, a myriad of markers of ovarian reserve, including age, menstrual cycle length, basal FSH and antral follicle count (AFC), anti-Mullerian hormone (AMH) levels [3][4][5][6][7] have evolved to become part of routine diagnostic testing performed prior to IVF.
Nevertheless, none of the ORT are complete in themselves, either in their sensitivity or their accuracy [8][9][10]. Their goal is to identify women at high risk of having either a poor response or a hyper-stimulation [4,11]. However, such diagnostic tests have been used "predictors of successful full-term pregnancies" or even as "fertility tests" in many ART centers.

Patients
This cohort study carried out by analyzing data of consecutive patients who underwent their first IVF treatment at a university associated single private assisted reproduction center. Data were collected prospectively and recorded in a registered data base between 2016 and 2017. The study had two different objectives. First, we tried to identify factors capable of predicting the quantitative ovarian response in patients undergoing their first treatment cycle in a GnRH antagonist protocol. The second aim was to assess whether the same parameters were useful for identifying chances of successful pregnancy. Thus, the first part of the study focused solely on the prediction of the number of metaphase II (MII) oocytes achieved after COS. The second part tried to identify which factor(s) was (were) predictive of a successful ongoing pregnancy. Ongoing pregnancy after a completed cycle was defined as the transvaginal ultrasound observation of intrauterine gestation sac, fetal pole and cardiac activity at 12 weeks of gestation. All other cycle outcomes were classified as not pregnant.
All patients had adequately visible both ovaries with AFC between 1 and 32. A total of 718 women with an indication for COS and IVF and/or ICSI undergoing their first treatment were included for analysis. Age, height and body weight of every patient was recorded and body mass index (BMI) (kg/m 2 ) was calculated prior to COS. In addition, AFC, basal FSH, LH and Estradiol (E2) levels were assessed on cycle day 3 to 5. AMH level was also measured. During COS, the total doses of recombinant FSH (rFSH) administered, the level of Estradiol and Progesterone on day of ovulation triggering were recorded.
To fulfill the primary objective of the study the number of MII oocytes retrieved were assessed as dependent variable. In the second part of the study, the evaluated dependent variable was ongoing pregnancy.
All women gave their written informed consent before receiving medical treatment for the procedures and for digital recording and use of the data of their history. The study has been conducted according to the principles expressed in the Declaration of Helsinki.

Ovarian stimulation
The ovarian stimulation was performed according to the condition of each patient by using flexible GnRH antagonist protocol. Corifollitropin alfa and rFSH were administered for ovarian stimulation along with a GnRH antagonist when necessary.
Ovulation trigger (either with rHCG or GnRH agonist administration) was performed when two or more follicles were 17 mm in diameter, with the lead follicle being ≥18 mm. Oocyte retrieval was performed 34-36 h later. Insemination was achieved by either conventional IVF or ICSI.
The embryos were transferred 5 or 6 days after the retrieval of the oocyte. Luteal phase support was performed by using 600 mg/ day micronized progesterone vaginal suppositories (Utrogestan, Brussels, Belgium).
Serum AMH levels were determined in duplicate at a central laboratory using a second-generation enzyme-linked immunosorbent assay (ACCES AMH. Beckman coulter®). Intraand inter-assay coefficients of variation were <1,5% and <2,8%, respectively, with the lower detection limit at 0.02 ng/mL and linearity up to 21 ng/mL for AMH. The maximum time interval between serum sampling and the start of COS was 3 months. All the measurements were performed in the same laboratory.
All subjects underwent transvaginal ultrasound assessment of antral follicle count (AFC) performed on the second to fifth day of the menstrual cycle. Ultrasound scans were performed using a 3.7-9.3 MHz multifrequency transvaginal probe by two operators (G.B. and J.A.) who were blinded to the results of hormone assays. Our internal data have shown excellent correlations between repeated measurements (r2 = 0.98).

Statistics
To meet the independence assumption of the assessed variables, only the first completed treatment cycle of each patient was analyzed. We excluded cycles that involved oocyte or sperm donation or in vitro maturation. Thus, 718 patients with one completed treatment cycle each were included in this study.
To identify the variables influencing the number of retrieved MII oocytes a multiple linear regression was performed. In this type of analysis, the influence of different quantitative variables on a quantitative and continuous variable is evaluated. Prior to multiple linear assessment, a simple linear regression analysis is imperative to stablish a hierarchy of variables influencing (or not) the number of oocytes achieved after retrieval. Then, a multiple stepwise linear regression was applied to identify factors which could potentially predict independently the number of MII oocytes retrieved. The evaluated variables included age, BMI, AFC, AMH determination, basal FSH, LH and Estradiol determination on day 3 to 5 of a prior cycle, the amount of administered rFSH during ovarian stimulation and, finally, Estradiol and Progesterone levels on day of ovulation triggering.
Prior to evaluate the predicting factors of pregnancy, the baseline characteristics were compared between the non-pregnancy and clinical pregnancy groups by using one-way analysis of variance for the normally distributed continuous variables.
A multivariable logistic (binomial) analysis was performed to ascertain the predictive value of the same variables on pregnancy.
Besides the factors evaluated in the first part of the study, the number of retrieved oocytes was included as a potentially significant independent variable. In this type of analysis, the dependent variable is a binomial qualitative one (pregnancy).
The significance level of the candidate predictive factors to enter and to stay in the model was set to 0.15 and 0.10 respectively. Multicollinearity among the variables was assessed using factor of inflated variance. Durbin Watson test was previously performed to ensure the independence of errors.
After selection of the candidate predictive factors using stepwise selection, the final model selects those prognostic factors with statistical significance. The goodness-of-fit of the normal regression models was quantified by the coefficient of determination Nagelkerke`s R2. This coefficient is a widely used measure that represents the amount of variation of outcome variable that can be explained by one or more predictors. The P-values in the multi-variable analysis were based on Hosmer and Lemeshow tests. A P-value of 0.05 was considered significant.
Receiver operating characteristic (ROC) curves were calculated and the area under the curves (AUC) was used to assess the discriminative power of the logistic regression models. The AUC is an effective and combined measure of sensitivity and specificity that describes the inherent validity of diagnostic tests.
The standard AUC definitions were as follows: AUC between 0,9 and 1 indicates an excellent test; AUC between 0,8 and 0,9 indicates a high accurate good test; AUC between 0,7 and 0,8 indicates moderate accuracy; values between 0,6 and 0,7 indicates a poor test; values below 0,6 indicates a worthless test.
Different methods have been proposed to determine the optimal cut off value in a ROC curve. Youden index maximizes the difference between "sensitivity" and "1-specifity" thus the optimal cut-off point is calculated. [12]. Analyses were performed in Statistical Package for the Social Sciences (SPSS, Statistical Package for the Social Science Inc. Chicago, Illinois USA, version 21.0.0).

Results
A total of 718 IVF cycles have been evaluated. The mean age of women was 36,673,72 years (meanSD) (95% CI 33,56-39,67). The mean number of retrieved and MII oocytes were 8,546,10 and 7,235,43 respectively. To identify the variables influencing the number of retrieved MII oocytes a multiple linear regression was performed. Prior to multiple assessment, a simple linear regression analysis was performed to stablish a hierarchy of variables influencing (or not) the number of mature oocytes achieved after oocyte retrieval ( Table  1). The only factor with no influence on the number of retrieved of mature oocytes in our series was BMI (Adjusted R2=0,001; p=0,989).
The other variables were evaluated by a multiple stepwise linear regression to identify the factor(s) potentially and independently predictive of the number of MII oocytes to be achieved after retrieval (Tables 2a & 2b). Durbin-Watson test value between 1 and 3 ensures the independence of errors (Durbin-Watson=1,964).
Factor of inflated variance next to 1 and <10 reject multicollineality among the variables (1,269-1,747) (Table 2b). Over 78% of the variance of the dependent variable (MII oocytes) is explained by AFC (Model a: adjusted R2=0,786; t=32,514; p=0,000). By adding the value of AMH to the predictive model (Model b) the prediction in the number of oocytes to be obtained improves less than 0,1% (Increase of adjusted R2 from 0,786 to 0,787 in model b; t= 1,511; p=0,132). Furthermore, the inclusion of all the assessed variables in the multiple lineal regression analysis after AFC (models b to i) does not increase significantly the ability to predict the number of oocytes (Adjusted R2=0,786 from model "a" to 0,794 for model ("I") (t=0,099; p= 0,921) (Tables 2a & 2b).   In 299 cycles (41,64%) an ongoing pregnancy was achieved (miscarriages and ectopic pregnancies were not included in this rate). The baseline characteristics between the non-pregnancy and ongoing clinical pregnancy groups are expressed on (Table 3).
To ascertain the predictive value of the evaluated variables on live birth a multivariable logistic analysis was performed. In this setting, the number of retrieved mature oocytes was included as an independent (and potentially valuable) predictive factor on pregnancy rates. As shown in (Table 4), R2 values indicate that age represents the most important factor influencing successful ongoing pregnancies (38,8% of the variance; R2: 0,388). AFC and AMH values (the most important variables predicting the number of oocytes) increase the predictability by only a 0,6% and 0,7% respectively (R2 increase of 0,006 and 0,007; p=0,170 and p=0,160. Only the number of achieved oocytes (evaluated as independent variable) and basal Estradiol levels show a statistically significant influence on live births, but such differences do not appear to be clinically significant (1,5% and 1,4%; R2 increase of 0,015 and 0,014 respectively) ( Table 4).    ROC curves were calculated, and AUC was used to assess the discriminate power of the logistic regression analysis previously performed ( Table 5). The only variable that showed a good accuracy in predicting a live birth was age (AUC=0,817) ( Figure  1). Moreover, by applying the Youden index 32,5 years-old was the critical age with the maximum level of AUC (Sensitivity of 48%; Specificity of 98%). Other variables showed poorer accuracy even though a statistical significance is present. Thus, AMH, AFC and MII oocytes do not predict a pregnancy with good accuracy (AUC 0,622; 0,620 and 0,619 respectively) ( Figure 2). The remaining evaluated variables showed worthless values (Table 5).

Discussion
Evaluation of ovarian reserve has been the focus of clinical research during last years since there is an increasing demand for assisted reproduction due in part to maternity postponement [8,9,[13][14][15].
Several parameters have been postulated as predictors of ovarian reserve in an attempt to better advise couples and guiding physicians in the elaboration of individualized stimulation protocols, with a reduction of emotional and financial burdens of hard and stressful therapeutic processes [13,16,17]. These include age, serum markers (FSH, Estradiol, and AMH) and ultrasound variables (ovarian volume, AFC and ovarian stromal blood flow). Even after adjustment for chronological age, AFC and serum AMH correlate with ovarian primordial follicle number [18]. However, all the available ovarian reserve testing methods have limitations [19,20]. Even the best available ovarian reserve test is associated with 10-20% false positive results [20]. Indeed, the availability of multiple ovarian reserve markers suggests that none is ideal.
Antral follicles are responsive to FSH and may be considered a predictor of ovarian response to gonadotropins. Results from literature seem to converge to recognition of the importance of AFC as a predictor of ovarian response [11,13,21].
It has been shown that AFC is proportionally related to the size of the primordial follicle pool from which the follicles are recruited [11,22]. Additionally, antral follicles are responsive to FSH and may be considered a predictor of ovarian response to gonadotropins.
A potential advantage of AFC is that it can be measured readily in the clinic, generating results immediately. A downfall is lack of standardization in ultrasound equipment, technique, and antral follicle size, which results in challenges with cross-comparing results from studies and different centers [23].
Our data confirm the value of AFC predicting the number of achieved oocytes after COS. In our series of 718 patients, AFC was the best predictor of achieved oocytes (both total and MII). Multiple linear regression analysis showed that 78,6% of the variance in the number of achieved oocytes could be independently predicted by AFC determination. Since endovaginal ultrasound evaluation is usually performed during the investigation of infertile women, we believe that AFC should be definitely included as a routine test of ovarian reserve in pre-ART evaluation.
It has been suggested that Anti-Mullerian hormone (AMH), a dimeric glycoprotein produced by granulosa cells of pre-antral and small antral follicles in the ovary, is a better marker in predicting ovarian response to COS than age, FSH, and estradiol [13,24]. Unlike other biochemical markers, AMH has the advantage that can be measured on any day of the cycle [25] and does not exhibit intercycle variability [26] making it easier its use as a biological marker of ovarian reserve [17,[27][28][29].
Comparing to AFC, the performance of AMH as a predictor of poor ovarian response has been reported to be very similar [19,20]. However, whereas AFC should be assessed during the early follicular phase to minimize the effect of intra-cycle fluctuations, serum AMH levels are generally thought to remain stable throughout the menstrual cycle [7,29]. A significant positive correlation has been described between serum AMH and the number of oocytes retrieved. This correlation was considerably stronger than the associations found with other ovarian reserve markers such as serum FSH and estradiol [30]. The dose-response relationship between AMH and ovarian response to FSH explains why AMH levels have been shown to be useful in both the prediction of poor response and hyper-response.
However, according to our data, AMH measurement added only a 1% of improvement to AFC determination in predictability regarding the number of retrieved oocytes. Although simple linear regression shows that AMH could be a good predictor, once multiple analysis was performed, the value of AMH was clearly inferior to AFC.
Both, basal FSH and E2 levels measured on day 3-5 of the menstrual cycle have been used as predictors of ovarian response to COS for several years [31]. An increase in FSH levels occurs due to follicle depletion. Basal E2 levels may provide additional useful information for the evaluation of ovarian reserve. Initial studies showed an association between an elevated basal E2 level and a poor ovarian response, [32] using different cut-off values. In our series, neither, basal FSH, E2 nor LH levels showed independent ability to predict the ovarian quantitative response.
It is long established that ovarian reserve reduces progressively with age [33]. This decline has been observed in population-based studies and in women undergoing ovulation induction or IVF [1]. Although it is well known that both the quantity and quality of ovarian follicles significantly decrease as a woman advances in age, in our series, age was not a good independent predictor of oocyte yield. But the aim of ART is not to achieve a given number of oocytes but a successful pregnancy. And reproductive potential is more related to factors associated to oocyte and embryo quality than the number of achieved oocytes after COS.
Too often the oocyte yield has been directly related to the probability of pregnancy. Thus, ovarian reserve tests have been equally used both as markers of oocyte yield and pregnancy chance [9,14]. Recently, it has been understood that these tests are effective in predicting the ovarian response to stimulation and not for the prediction of pregnancy or its outcome [15]. Thus, independent analysis of the influence of each factor is mandatory.
Our data show that age is the most important variable defining the probability of ongoing pregnancy. Both, logistic regression analysis (showing a 38,8% of influence) and AUC in ROC curve (0,817) show the goodness of such variable predicting a successful pregnancy.
Although increasing age is related to fertility decline in women, there is no universal definition of an advanced reproductive age for women, in part because the effects of increasing age occur as a continuum rather than as a threshold effect, and declining fertility is an individual event that differs in each woman [34,35]. Among patients of the present study (mean age 36,67 years), 32,5 years-old showed the maximum relation of sensitivity and specifity predicting an ongoing pregnancy. Among patients of the present study (mean age 36,67 years), 32,5 years-old showed the maximum relation of sensitivity and specificity predicting an ongoing pregnancy.
The mechanisms postulated to be responsible for the loss of fertility with female ageing include poorer oocyte quality, lower embryo implantation rates and altered hormonal environment resulting in ovulatory dysfunction [35]. Also, there is a higher propensity for acquired conditions such as pelvic infections, endometriosis and fibroids among older women. Lifestyle factors such as obesity and lower frequency of intercourse are also potential risk factors [35,36].
Although AFC showed the higher predictive value for number of oocytes in our series, its value as independent predictor of pregnancy has not been confirmed. It has been reported a 66,7% of sensitivity of AFC to predict cycle cancellation and only a 2,6% of sensitivity to predict non-pregnancy [37]. Thus, AFC must not be used as a criterion for ART exclusion, but as a tool for counselling and proposing different strategies such as oocyte accumulation [38]. Furthermore, AFC determination should not be offered as "fertility test" since it is not independently related to pregnancy chance, neither after sexual intercourse nor ART. In fact, previously published meta-analysis supported good expectations for women with less antral follicles [13].
As previously explained, many published studies have described the clinical application of AMH measurement in the prediction of both, quantitative and qualitative ovarian response in ART [29]. There is an association between AMH and oocyte yield after ovarian stimulation, and the hormone has been shown to be a strong predictor of ovarian response to gonadotropins [17,27,39].
However, patients with undetectable AMH levels have been shown to successfully obtain oocytes at the time of retrieval and even to achieve ongoing pregnancy [40,41]. Therefore, a lower limit of AMH below which patients should not expect to have any ovarian response has not been established [42]. AMH is not independently related to quality of oocytes and, thus, the probability of pregnancy. Different strategies, including accumulation of oocytes [38,43] can be applied in this subset of patients, i.e. those with poor ovarian reserve tests (AFC, AMH) but with no additional poor prognostic factor (advanced age). Available evidence does not suggest that serum AMH can be used as a marker to predict pregnancy.
In the era of oocyte vitrification, the number of oocytes achieved in a given ovarian stimulation are no longer the key factor in predicting chances of pregnancy. Moreover, age (and, as a consequence, the frequency of aneuploidies among transferred embryos) predicts the odds of a successful pregnancy [44].

Conclusion
Ovarian reserve tests are important to predict a low or high ovarian response to COS, thus allowing to design an appropriate ovarian stimulation and ovulation triggering therapeutic protocol. According to our data, only antral follicle count (AFC) has showed a significant an independent predictive value for achieved oocytes. Since endovaginal ultrasound evaluation is usually performed during the investigation of infertile women, AFC should be definitely included as a routine test of ovarian reserve in pre-ART evaluation. AMH can been used to increase prediction ability. The determination of basal FSH, LH or E2 is worthless since no predictive value is added. Age, as independent variable, does not predict ovarian reserve.
However, the ultimate goal of assisted reproduction is to achieve a successful gestation. Reproductive potential is more related to factors associated to oocyte and embryo quality than the number of achieved oocytes after COS. Our data show that age is the most important variable defining the probability of ongoing pregnancy.
Regarding ovarian reserve tests, although AFC showed the higher predictive value for number of oocytes in our series, its value as independent predictor of pregnancy has not been confirmed. Our results show that neither AFC nor AMH can be used as criteria for ART exclusion, but as a tool for counselling and proposing different strategies such as oocyte accumulation. Furthermore, such determinations should not be offered as "fertility tests" since they are not independently related to pregnancy chance, neither after sexual intercourse nor ART.
Pregnancy chance is more related to quality of oocytes (and, thus, embryos) than the number of achieved oocytes.