|
Are Noninvasive Liver Fibrosis Tests Ready for Use? There are lots of questions. Prospective comparison of six non-invasive scores for the diagnosis of liver fibrosis in chronic hepatitis C
|
|
|
Journal of Hepatology
Articles In Press, Uncorrected Proof
Vincent Leroy12, Marie-Noelle Hilleret1, Nathalie Sturm3, Candice Trocme4, Jean-Charles Renversez5, Patrice Faure5, Francoise Morel4, Jean-Pierre Zarski12
ABSTRACT
Background/Aims: Non-invasive markers of liver fibrosis have recently been developed as alternative to liver biopsy. The aim of this study was to compare the diagnostic performance of 6 scores (MP3, Fibrotest, Fibrometer, Hepascore, Forns' score and APRI).
Methods: We studied 180 chronic hepatitis C patients. Liver fibrosis was staged according to the METAVIR scoring system.
Results: Overall diagnostic performance of scores determined by AUROCs ranged from 0.86 for Fibrometer to 0.78 for Forns' score (NS) for discriminating F0F1 versus F2F3F4. For discriminating F0F1F2 versus F3F4, AUROCs ranged from 0.91 for Fibrometer to 0.78 for Forns' score (p<0.02). Significant or extensive fibrosis was predicted in 10-86% of patients with positive predictive value (PPV) ranging from 55% to 94%. Using logistic regression, statistical independence was demonstrated for MP3, Fibrotest and APRI. Diagnostic performance of paired-combination scores was then evaluated. The best combinations could select one-third of patients for whom either absence of significant fibrosis or presence of extensive fibrosis could be predicted with more than 90% of certainty.
Conclusions: Current non-invasive scores give reliable information on liver fibrosis in one-third of chronic hepatitis C patients, especially when used in combination.
Discussion
Several non-invasive tests combining biological parameters have recently been developed with the objective of replacing liver biopsy [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. We assessed here the diagnostic performance of Fibrotest, Fibrometer, Hepascore, MP3, Forns' score and APRI. Other scores using routine laboratory parameters such as AST/ALT ratio were not studied for they were recently found to be of less value than APRI [18]. The first finding of our study is an independent validation of scores, with overall diagnostic performances very similar to those originally reported. To our knowledge, it is the first independent validation for Fibrometer and Hepascore. The Fibrometer had the best diagnostic performance but its superiority over other scores was significant only for Forns' score. We thus did not confirm the recently suggested superiority of Fibrometer over Fibrotest [15]. It should be pointed out that in this study the superiority of Fibrometer was observed only in the training group of patients. In two studies, the Fibrotest was found to have a greater diagnostic performance than APRI and Forns' score, but study groups came again from the original population that served for Fibrotest calculation [19], [20]. Altogether, these results show that Fibrometer, Fibrotest, MP3, APRI, Forns' score and Hepascore have overall diagnostic performances close to each other.
The most important issue is to know whether such accuracy is sufficient to wave liver biopsy. For each score, cut-offs have been defined in order to discriminate patients according to two relevant threshold of fibrosis. The first is METAVIR F0F1 versus F2F3F4 because patients with no or mild fibrosis usually do not need to receive antiviral treatment. The second is F0F1F2 versus F3F4 because patients with severe fibrosis or cirrhosis need to be treated and to be screened for portal hypertension, liver failure and hepato-cellular carcinoma. Applying the different cut-offs originally described, NPV and PPV for the diagnosis of fibrosis were variable amongst the scores. This can be explained by the fact that some cut-offs were not designed to discriminate either significant or extensive fibrosis. For example, the cut-offs proposed in Forns' study were designed to detect significant and not extensive fibrosis [12]. Interestingly, there was a clear correlation between the number of selected patients for a given cut-off and predictive values. These data support the concept that scores have close diagnostic performance and that stringency of cut-offs mainly indicate sensitivity, specificity and predictive values. The use of higher cut-offs can select one-third of patients for whom the probability of significant fibrosis is 80-90% and the probability of extensive fibrosis is 60-70%. To the opposite, the use of lower cut-offs can select 20% of patients for whom the probability of no/mild fibrosis is 80%. Although these predictive values are promising, we should keep in mind that 1 in 5 patients would be misclassified if liver biopsy was not performed.
Another limitation of this strategy is that nearly 50% of patients have intermediate values and cannot be classified. Fibrotest overrides this limitation and uses several cut-offs that allow a conversion to the METAVIR scoring system [21]. However, we observed discordances of at least two fibrosis stages between Fibrotest and biopsy in 23% of cases, a result consistent with that reported in two recent studies [22], [23]. The only parameter associated to discordance was an intermediate Fibrotest value, especially indicating F2. Similar data were recently reported for the Fibrometer, which has less diagnostic accuracy when its value suggests METAVIR F2 [15]. This could be explained by the fact that fibrosis area in F2 stage is close to that observed in F1 in morphometric studies [3]. Moreover, it is obvious that F2 determination is the only stage allowing both underestimation and overestimation of two stages. Contrary to Poynard et al. [22], we did not find steatosis associated to discordance. We cannot exclude that in some cases discordance between serum markers and histology could be attributable to biopsy examination failure due to the heterogeneity of fibrosis in the liver. In a laparoscopic study, discordances in fibrosis stage were reported in 33% of patients when left and right liver lobes were compared [24]. However, 98% of discordances were of only one stage according to the Sheuer classification. The main factor of biopsy failure appears to be an inadequate size of biopsy samples. Colloredo et al. [25] suggested that biopsies smaller than 20mm long and 1.4mm wide and containing less than 11 portal tracts led to an underestimation of fibrosis. In an elegant study comparing virtual biopsies to whole liver resections, Bedossa et al. [3] showed that the rate of correct METAVIR fibrosis staging ranged from 65% when biopsy measured 15mm long to 90% when biopsy length was of 40mm. The authors concluded that a biopsy length of at least 25mm would be necessary to evaluate fibrosis accurately. A major strength of our study is that biopsies were of greater quality than that published in previous studies. In the study by Halfon et al. [23], discordances were attributed to liver biopsy in 22% of cases but reasons were not fully given. In our study, the size of biopsies was not associated to discordance, which appears to be attributable to scores' failure rather than biopsy failure. This is strengthened by the fact that diagnostic performance is improved when using scores in combination.
An original aspect of our study was to test the statistical independence of scores, in order to propose a logical algorithm. Interestingly, some combinations including MP3+APRI, Fibrotest+APRI and MP3+Fibrotest gave greater results than single scores. The statistical independence can be explained by the fact that these scores do not share the same biological parameters. Different results were reported by Cales et al. [15] who also performed a multivariate analysis and found the Fibrometer to be the only independent score associated to fibrosis. However, this analysis was again done on the training group of patients and results were not given after removing Fibrometer from the analysis. The same concept of combining two non invasive methods of liver fibrosis was proposed by Castera et al. [26]. Concordance between Fibrotest and transient elastography (Fibroscan) led to a more precise evaluation of liver fibrosis compared to each method used alone. In this study, combination of Fibrotest and APRI did not perform better than Fibrotest used alone but authors did not give any detail about the way they combined both scores. Interestingly, Sebastiani et al. [27] recently proposed an algorithm based on sequential utilization of APRI followed by Fibrotest. Although this algorithm appears to be complex, it highlights the concept of combining non-invasive score of fibrosis to increase their diagnostic accuracy. Our results strongly suggest that the diagnosis of extensive fibrosis (F3F4) can be made with more than 90% certainty when Fibrotest is greater than 0.59 and APRI is greater than 2, such scenario being met in 20% of cases. By contrast, significant fibrosis (F2F3F4) can be ruled out with the same certainty when Fibrotest is lower than 0.22 and APRI is lower than 0.5, this situation being observed in 13.2% of patients. In conclusion, we believe that in these two situations a clinical decision can be made without the need of a liver biopsy and we propose here a simple algorithm (Fig. 4 ). For patients with intermediate values, important but only partial information on liver fibrosis can be obtained by scores. Clinical consequences of misclassification should always be taken into consideration before deciding to perform a liver biopsy, greater than 25mm, which still remains the gold standard of liver fibrosis evaluation.
Introduction
Clinical management of chronic hepatitis C is dependent on the extent of liver fibrosis. Liver biopsy, the gold standard, is still recommended in the majority of patients [1]. However, it is an invasive procedure responsible for severe complications in about 0.5% of cases [2]. Sample variability is another limitation. The biopsy specimen appears to be poorly reliable when its length is inferior to 15mm [3]. Moreover, liver fibrosis is evaluated by histological scores, which have inter-observer variability especially among non-expert pathologists [4]. These drawbacks justify an intensive research on non-invasive alternatives. Some serum markers either directly involved in fibrosis remodelling or altered by its consequences have been described to be correlated to liver fibrosis [5], [6]. More recently, some fibrosis scores calculated from statistical models were described [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. The Fibrotest combines serum concentrations of α2 macroglobulin, haptoglobin, ϒGT, bilirubin, and apolipoprotein A1 [11]. Forns et al. [12] published a score combining age, ϒGT, cholesterol, and platelet count. Hepascore combines bilirubin, ϒGT, hyaluronic acid, α2 macroglobulin, age and sex [13]. The simplest score called APRI is based on AST serum activity and platelet count [14]. Cales et al. [15] proposed another score, called Fibrometer, combining hyaluronate, prothrombin time, platelets, AST, α2 macroglobulin, urea and age, the formula being adjusted on the cause of liver disease. We also proposed a score called MP3, combining PIIINP, a marker of fibrogenesis, and the matrix metalloproteinase MMP-1 that is involved in fibrolysis [16], [17]. All these scores were tested for their ability to discriminate patients according to relevant fibrosis thresholds and their overall diagnostic accuracy estimated by areas under ROC curves (AUROC) ranged from 0.76 to 0.96. However, these scores have not been compared to each other yet in a single and independent study. Moreover, their exact reliability as alternative to liver biopsy is not fully known. Therefore, the aims of this study were to compare the diagnostic performance of these 6 scores and to precise their clinical usefulness.
Patients and methods
Patients
One hundred and eighty consecutive eligible patients with chronic hepatitis C who underwent a liver biopsy between 2002 and 2004 in our centre were included. All patients were anti-HCV positive by ELISA, had detectable serum HCV-RNA by PCR, and had elevated ALT serum levels. Exclusion criteria included co-infection with HIV or HBV, other causes of liver disease, alcohol consumption higher than 30g/day, hepatocellular carcinoma, Gilbert disease, chronic hemolysis, inflammatory syndrome and previous antiviral treatment. The study protocol conformed to ethical guidelines of the 1975 declaration of Helsinki and was approved by our Institutional Review Board. Patients were enrolled after giving their written informed consent.
Liver histology and quantification of fibrosis
Liver biopsies were performed by two senior operators (VL and JPZ) using the Menghini's technique with a 1.6mm diameter needle (Hepafix, Braun, Melsungen, Germany). Biopsy specimens were fixed in formalin and embedded in paraffin. All biopsy specimens were analyzed twice by a single senior pathologist (NS). Liver fibrosis and necroinflammatory activity were evaluated according to the METAVIR scoring system. Sinusoidal fibrosis was staged after sirius red staining as follows: 0=no fibrosis, 1=mild fibrosis and 2=moderate to severe fibrosis. Iron load and steatosis were also quantified.
Serum fibrosis markers
The following parameters were assessed the day of the liver biopsy: AST, ALT, ϒGT, bilirubin, cholesterol prothrombin time, α2-macroglobulin, apolipoprotein A1, haptoglobin and platelet count. PIIINP, MMP1 and HA concentrations were determined from serum stored at -80 C. MMP1 was determined by ELISA (Rα D systems, Abingdon, UK). Serum PIIINP concentrations were assayed using an immunoradiometric assay with monoclonal antibodies (Procollagen Intact PIIINP, Orion Diagnostical, Espoo, Finland). Serum hyaluronic acid levels were measured with a sandwich enzyme binding assay kit (Pharmacia, Uppsala, Sweden). The FT calculation was purchased from the Biopredictive website (www.biopredictive.com http://www.biopredictive.com). Formulas for calculating other scores were taken from original publications and were as follows: Forns' score=7.811 -3.131Xln platelet count (G/l)+0.781XIn ϒGT (UI/l)+3.467Xln age (years) -0.014Xcholesterol (g/l); Hepascore=y/(1+y) with y=exp (-4.185818-(0.0249Xage)+(0.7464 X1 if male, 0 if female gender)+(1.0039Xα2 macroglobulin)+(0.0302Xhyaluronate)+(0.0691Xbilirubin)-(0.0012XϒGT)); APRI=AST/ULNX100/platelet count (109/L); Fibrometer=-0.007Xplatelets (G/l) -0.049Xprothrombin time (%)+0.012XAST (UI/ml)+0.005Xα2 macroglobulin (mg/dl)+ 0.021Xhyaluronate (mg/l) -0.270Xurea (mmol/l)+0.027Xage (years)+3.718, and MP3=0.5903XLog PIIINP (ng/ml) -0.1749XLog MMP1 (ng/ml).
Statistical analysis
k test was used to evaluate the intra-observer concordance. Spearman's two tailed test was used to assess correlations. Receiver operating characteristic (ROC) curves were constructed. Sensitivity, specificity, positive and negative predictive values (NPV and PPV) were calculated using cut-offs previously described for each score. The overall diagnostic performance of scores and single markers was evaluated by area under ROC curves (AUROCs). AUROCs were compared by the roccomp procedure (derived from ƒÔ2) with the Dunn-Sidak correction. Statistical analysis was performed using the STATA 8.0 software.
Results
Patient characteristics
Main features of patients are summarized in Table 1. The median length of liver biopsies was 23mm (range 6-60), and the median number of portal tracts was 17 (range 4-35). Biopsy length was greater than 15mm in 161 (89.4%) and than 25mm in 81 (45.0%) patients. Overall, 89 (49.4%) patients had no/mild fibrosis (F0F1), and 51 (28.3%) had extensive fibrosis or cirrhosis (F3F4). Good agreement was found for the METAVIR fibrosis stage between the two histological readings (k=0.89). Agreement was perfect for the diagnosis of cirrhosis (k=1). The METAVIR fibrosis stage was significantly correlated to sinusoidal fibrosis stage (r=0.54, p<0.001).
Correlation between serum markers and fibrosis stage
Significant correlations were found between METAVIR fibrosis stages versus Fibrotest, MP3, Forns' score, APRI, Hepascore and Fibrometer. The best correlations were observed for Fibrometer (r=0.71), Fibrotest (r=0.70) and MP3 (r=0.69) (p<0.001). Fig. 1 shows the box-plots of fibrosis scores according to METAVIR fibrosis stages. Weaker correlations were also found between scores and histological activity, especially for APRI (r=0.56), MP3 (r=0.49) and Fibrometer (r=0.47) (p<0.001). In each case, correlation with fibrosis persisted after statistical adjustment of activity. There was no statistical interaction between biopsy length and correlation coefficients.
Fig. 1. Score values according to METAVIR fibrosis stages. The top and bottom of each box are the 25th and the 75th centiles. The line through the box is the median, and the error bars are the 5th and 95th centiles. Spearman correlation coefficients were as follows: FT, r=0.70; MP3, r=0.69; FM, r=0.71; HS, r=0.60; FS, r=0.55; APRI, r=0.59, p<0.001 for each correlation test.
Overall diagnostic performance of serum markers
Areas under ROC curves were used for evaluating the overall diagnostic performance of scores. For discriminating F0F1 versus F2F3F4, AUROCs ranged from 0.86 for Fibrometer to 0.78 for Forns' score (Fig. 2a). Fibrometer had a better AUROC than Forns' score (p<0.02) but the significance disappeared after the Dunn-Sidak correction due to multiple comparisons. For discriminating F0F1F2 versus F3F4, AUROCs were better and ranged from 0.91 for Fibrometer to 0.78 for Forns' score. The only significant difference was observed between Fibrometer and Forns' score (p<0.02). The difference was nearly significant between Fibrometer and APRI (p=0.07). Details about AUROCs and 95% confidence intervals are given in Table 2. AUROCs were not altered by separating patients according to genotype (1 versus non 1), length of biopsies (<25mm versus >25mm), and presence or absence of sinusoidal fibrosis, steatosis or intra-hepatic iron load (data not shown). A sensitivity analysis excluding the 19 patients with biopsies smaller than 15mm or with less than 7 portal tracts was also performed, showing very similar results (Table 2).
Sensitivity, specificity, positive and negative predictive values
NPV and PPV for the diagnosis of significant (F2F3F4) or extensive (F3F4) fibrosis are presented in Table 3. For each score, cut-offs were chosen according to original publications. This analysis could not be performed for Fibrometer since no cut-off was provided in the study by Cales et al. [15]. Applying different cut-offs, significant fibrosis (F2F3F4) was predicted in 21-86% of patients with PPV ranging from 55% to 94%. Since lower cut-offs were originally described to rule out significant fibrosis, specific attention must be paid to NPV that ranged from 64% to 84%. For the same cut-offs, NPV to rule out extensive fibrosis were logically better and ranged from 90% to 100%. For example, a FT below 0.22, observed in 33.1% of patients, excluded significant fibrosis with 82.0% certainty and excluded extensive fibrosis with 94.7% certainty. The higher cut-offs selected 10% to 28% of patients for whom extensive fibrosis (F3F4) was confirmed with PPV ranging from 65% to 89%. The best predictive value was observed for MP3 >0.50, but this cut-off selected only 10% of patients. In Fig. 3, we plotted for each cut-off the percentage of selected patients against the PPVs for significant and extensive fibrosis. Strong and nearly linear inverse correlation were found (r=0.94 and r=0.95, respectively, p<0.001).
Analysis of discordances
Multiple comparisons were performed for each score between correctly classified, false-positive and false-negative patients. Neither biological nor histological features including the length of liver biopsy were associated to discordance. Fibrotest gives a quantitative estimate of fibrosis that allows a conversion to the METAVIR score. Comparing METAVIR fibrosis stages estimated by Fibrotest and evaluated by liver biopsy, an overall discordance was observed in 52% of cases. It was an underestimation of fibrosis in 71% and an overestimation in 31% of cases. A discordance was judged clinically significant when the gap between Fibrotest and biopsy was of at least two stages or when the Fibrotest gave F2 (0.49-0.58) and the biopsy showed F1 (two stages on the FT scale that has a F1F2 step (0.32-048)). Such a significant discordance was observed in 42 (23%) of the 180 patients, and in 36 (22%) of the 161 patients whose biopsies were appropriate, i.e. greater than 15mm. The only variable associated to discordance was a FT value ranging from 0.22 to 0.74 (i.e. from F0F1 to F3). In this case, discordances were observed in 31/93 (33.3%) of cases compared to 11/87 (12.6%) of patients who had FT values predicting either F0 or F4 (p<0.01).
Combination of scores
Multiple stepwise logistic regression analysis was performed to test the statistical independence of scores. In logistic regression including the six scores, MP3 (p<0.001) and APRI (p<0.05) were the only variables independently associated to significant and extensive fibrosis. When MP3 was removed from the analysis, Fibrotest (p<0.001) and APRI (p<0.02) were the remaining variables associated to fibrosis. In a model including MP3 and Fibrotest, both variables were independently associated to fibrosis. Thus, we analysed the diagnostic performance of paired combination of independent scores. As expected, simultaneous use of two scores improved NPV and PPV while slightly decreasing the number of selected patients. Diagnostic values of the best combinations associating MP3 and APRI and Fibrotest and APRI are shown in Table 4. For example, the concomitant presence of Fibrotest >0.59 and APRI >2 improved the PPV for significant fibrosis to 96.7% and for extensive fibrosis to 92.2%. By contrast, the concomitant presence of Fibrotest <0.22 and APRI <0.5, observed in 13.2% of patients, could rule out significant fibrosis with NPV of 94.1%.
|
|
|
|
|
|
|