|
Liver Stiffness Measurement Versus Clinicians' Prediction or Both for the Assessment of Liver Fibrosis in Patients with Chronic Hepatitis C
|
|
|
The American Journal of Gastroenterology Dec 2006
Pierre Nahon, M.D.11Service d'Hepato-gastroenterologie, Hopital Jean Verdier, AP-HP, Bondy, France et UPRES 3409, UFR SMBH, Universite Paris XIII, Bobigny, France,
Service d'Hepato-gastroenterologie, Hopital Jean Verdier, AP-HP, Bondy, France et UPRES 3409, UFR SMBH, Universite Paris XIII, Bobigny, France;
Abstract
OBJECTIVES: The goal of this study was to estimate the additional value of liver stiffness measurement (LSM) with physicians' assessment of fibrosis based on epidemiological, clinical, and biological parameters.
METHODS: One hundred forty-two unselected patients with chronic hepatitis C were included. Liver biopsy and LSM were performed simultaneously. First, four physicians (two junior residents with limited experience in hepatology and two senior hepatologists) independently predicted the stage of fibrosis according to the METAVIR classification, using clinical, epidemiological, and biological data. For the second step, they were informed of LSM values and could modify their first evaluation if necessary. Finally, the two successive evaluations were compared with the histological fibrosis score.
RESULTS: Providing LSM values improved agreement between physicians and resulted in a better correlation between clinical impression and histological liver fibrosis. The diagnostic performances were only significantly improved with transient elastography for the diagnosis of cirrhosis where assessment improved in three of the four physicians (AUROC [area under receiver operating characteristic curve]: 0.76 vs 0.87, 0.80 vs 0.87, and 0.83 vs 0.89, all p < 0.05). Moreover, these performances were nearly similar for junior and senior physicians when LSM was provided with the AUROC ranging from 0.69 to 0.72 for significant fibrosis and 0.87 to 0.90 for cirrhosis.
CONCLUSIONS: Providing LSM values to physicians results in a better estimation of liver fibrosis and a more accurate diagnosis of cirrhosis. Moreover, it allows physicians with limited experience to predict liver fibrosis as well as experienced hepatologists.
PATIENTS AND METHODS
Patients
Between November 2003 and November 2004, 142 new consecutive patients referred to our hepatology unit for management of CHC were enrolled in this study. Inclusion criteria were: 1) presence of hepatitis C virus (HCV) RNA in serum and at least transiently elevated serum alanine aminotransferase levels, 2) absence of ascites, 3) no prior history of anti-HCV treatment, 4) absence of hepatitis B virus coinfection, 5) acceptance of LSM and LB, and 6) performance of LSM and LB on the same day. All 142 patients fulfilling these criteria were enrolled after providing their written and informed consent. For each patient, anamnestic and bioclinical data regarding HCV infection that were susceptible to influence the progression of liver fibrosis were recorded at the time of inclusion defined as the day LB was performed (age, sex, geographic origin, date and mode of contamination, HIV coinfection, body mass index (BMI), diabetes mellitus, smoking habit, daily alcohol consumption, serum aspartate aminotransferase (AST), alanine aminotransferase (ALT), and gamma-glutamyl transferase (GGT) activities, bilirubin, prothrombin time, platelet count, serum albumin, and gamma-globulins).
LB
LB specimens were fixed in formalin and embedded in paraffin. Four-micrometer-thick sections were stained with hematoxylin-eosin-safran and picrosirius red. All biopsy specimens were analyzed by an experienced hepatopathologist blinded to the results of LSM and clinical data. Liver fibrosis was evaluated semiquantitatively according to the METAVIR scoring system. Fibrosis was staged on a 0-4 scale: F0, no fibrosis; F1, portal fibrosis without septa; F2, portal fibrosis and few septa; F3, numerous septa without cirrhosis; and F4, cirrhosis. Steatosis was categorized by visual assessment as: 0, none; 1, steatosis in 1-10% of hepatocytes; 2, steatosis in 10-30% of hepatocytes; 3, steatosis in 30-100% of hepatocytes (3).
LSM
LSM was performed with FIBROSCAN (EchoSens, Paris, France) on the same day as liver biopsy by a single technician who was independent from the physician investigators and blinded to clinical and biological data. Details of the technical description and examination procedure have been previously described (16). Briefly, this system is equipped with a probe consisting of an ultrasonic transducer mounted on the axis of a vibrator. A vibration of mild amplitude and low frequency is transmitted from the vibrator to the tissue by the transducer itself. This vibration induces an elastic shear wave that propagates through the tissue. In the meantime, pulse-echo ultrasonic acquisitions are performed to follow the propagation of the shear wave and measure its velocity that is directly related to the tissue stiffness (or elastic modulus). The harder the tissue, the faster the shear wave propagates. The measurement depth was between 25 and 65 mm. Ten successful acquisitions were performed for each patient. The success rate was calculated as the ratio of the number of successful acquisitions over the total number of acquisitions. The median value was kept as representative of the liver elastic modulus. Only results of LSM obtained with 10 successful acquisitions and a success rate of at least 60% were considered reliable. Results are expressed in kiloPascals (kPa).
Prediction of Liver Fibrosis by Physicians
Four physicians were chosen to predict liver fibrosis according to the METAVIR score in this cohort. Two of them were residents with limited clinical experience in hepatology (juniors 1 and 2) and two of them were confirmed hepatologists with extensive experience in the management of liver disease (seniors 1 and 2). For each physician, the usual anamnestic and bioclinical data mentioned above were provided for each patient to allow an initial prediction of liver fibrosis according to the METAVIR score (ranging from F0 to F4). Then, results of LSM for each patient were communicated to each physician allowing them to confirm or modify their initial prediction. Finally, these two successive predictions by each physician were compared with the fibrosis stage based on the METAVIR score for each patient.
Statistical Analysis
Qualitative variables were compared using Fischer's exact χ2 test, while quantitative variables were compared using the nonparametric Wilcoxon test. Groups means were compared by one-way analysis of variance (ANOVA) followed by Bonferroni tests.
Intra-class correlation coefficient (ICC) was calculated to determine agreement between physicians. Confidence interval for ICC was by bootstrap resampling. Spearman correlation coefficient was calculated to measure the relationship between variables. Receiver operating characteristic (ROC) curves were constructed and area under receiver operating characteristic curve (AUROC) was calculated, with values close to 1.0 indicating high diagnostic accuracy. Cutoff values for the assessment of performance of LSM alone were those previously determined and published by our team (18). Performances of LSM alone, clinical impression, or these two combined assessments for the prediction of significant liver fibrosis (METAVIR score ≥F2) were evaluated by comparing pooled F2-F4 patients with F1 patients; performance for the prediction of severe liver fibrosis (METAVIR score ≥F3) was evaluated by comparing pooled F3-F4 patients with F1-F2 patients; performance for the prediction of cirrhosis (METAVIR score = F4) was evaluated by comparing pooled F4 patients with F1-F3 patients.
Multiple linear regression was used to assess the relationship between LSM and fibrosis after adjustment on steatosis (four categories).
All reported p-values are 2-tailed. The R statistical package was used for all analyses (R Development Core Team (2004). [R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org]).
RESULTS
Characteristics of Patients and Distribution of Fibrosis METAVIR Stage
Table 1 shows the characteristics of patients enrolled in the study (overall population and according to METAVIR fibrosis stage). Distribution according to the METAVIR fibrosis stage was F1: N = 44 (31%), F2: N = 45 (32%), F3: N = 15 (11%), F4: N = 38 (26%). As expected and according to previously published data, mean LSM values increased progressively according to the METAVIR fibrosis stage in this cohort. Figure 1 shows a box plot of the actual LSM for the four stages of fibrosis. There was no statistical difference between group F1 and F2, F2 and F3, and F1 and F3. Conversely, the comparisons between F4 and F1, F4 and F2, and F4 and F3 were all statistically significant. A weak correlation was found between steatosis and LSM on univariate analysis (Spearman coefficient = 0.2, p = 0.02); on multivariate analysis (multiple linear regression) including steatosis and fibrosis, only fibrosis remained correlated to LSM. Mean biopsy length in this cohort was 15.8 ± 7.6 mm, ranging from 4.0 to 50.0 mm. Figures 2 and 3 reflect the distribution of biopsy length in the population under study and the presence of cirrhosis according to the quartiles of biopsy length.
Figure 1. Liver stiffness measurement according to fibrosis. Group means were compared by one-way analysis of variance (ANOVA) followed by Bonferroni tests. There were no statistical differences between group F1 and F2, F2 and F3, and F1 and F3. Conversely, the comparisons between F4 and F1, F4 and F2, and F4 and F3 were all statistically significant.
Physician Agreement and Correlations Between Fibrosis Predictions and Histological Fibrosis Stages
Intra-class coefficients were used to study physician agreement in liver fibrosis prediction before and after communication of LSM results (Table 2). Agreement was determined first according to physician experience (junior 1 and 2/senior 1 and 2) then for all four clinicians. Knowledge of LSM values resulted in a significantly increased agreement for junior physicians. For senior physicians, an increased agreement was observed after communication of LSM results but the difference was not statistically different. Overall, agreement among all four physicians was significantly increased after communication of LSM results.
Spearman coefficient was used to determine the correlation between the exact prediction of liver fibrosis by each physician and histological liver fibrosis as assessed by pathologists according to the METAVIR score (Table 2). Communication of LSM results to physicians increased the correlation between predicted fibrosis stages in all physicians with a significant difference for junior 1 and senior 1, while this difference was not statistically significant for junior 2 and senior 2.
Physician and LSM Performance for Prediction of Liver Fibrosis (AUROC)
Table 3 shows AUROC for the prediction of cirrhosis (METAVIR score = F4), severe liver fibrosis (METAVIR score ≥F3), and significant liver fibrosis (METAVIR score ≥F2) before and after communication of LSM results.
The performance of LSM alone for the diagnosis of cirrhosis was 0.89 in this cohort. Performances of predictions by physicians using usual bioclinical data were as a whole lower and increased progressively with experience, ranging from 0.76 for junior 1 to 0.86 for senior 2. Knowledge of LSM results improved the performance of cirrhosis prediction in all physicians with equivalent values ranging from 0.87 to 0.90.
The performance of LSM alone for the diagnosis of severe liver fibrosis was 0.78 in this cohort. Performances of predictions by physicians using usual bioclinical data were equivalent ranging from 0.78 to 0.79 except for junior 1 (0.72). Knowledge of LSM results improved the performance of severe fibrosis prediction in all physicians with equivalent values ranging from 0.80 to 0.82.
The performance of LSM alone for the diagnosis of significant liver fibrosis was 0.68 in this cohort. Performances of predictions by physicians using usual bioclinical data ranged from 0.64 to 0.71. Knowledge of LSM results improved the performance of significant liver fibrosis prediction in all physicians with equivalent values ranging from 0.69 to 0.72.
Cases of Misclassified Patients
We determined the number of misclassified patients according to physician predictions before and after LSM value communication for the diagnosis of significant liver fibrosis (METAVIR score ≥F2) and cirrhosis (METAVIR score F = 4). We then compared mean biopsy length between misclassified patients after LSM results communication and other patients (Table 4).
We first analyzed these cases according to both seniors' predictions. Misclassified patients for the diagnosis of significant fibrosis were defined as follows: patients with histological fibrosis score ≥F2 but predicted by both physicians as F1, or conversely, patients with histological fibrosis score F1 but predicted by both physicians as ≥F2. Before LSM results communication, 32/142 (22.5%) were misclassified according to these definitions; after LSM results communication to senior physicians, 16/142 (11.2%) were then misclassified, thus allowing a good prediction of liver fibrosis in 16/32 (50%) of these initially misclassified patients. Mean biopsy length was 14.4 ± 5.2 mm in these 16 misclassified patients versus 16.0 ± 7.9 mm in others (p = NS).
Misclassified patients for the diagnosis of cirrhosis were defined as follows: patients with histological fibrosis score F4 but predicted by both physicians as < F4, or conversely, patients with histological fibrosis score < F4 but predicted by both physicians as F4. Before LSM results communication, 25/142 (17.6%) were misclassified according to these definitions; after LSM results communication to senior physicians, 15/142 (10.5%) were then misclassified thus allowing a good prediction of liver fibrosis in 10/25 (40%) of these initially misclassified patients. Mean biopsy length was 16.2 ± 6.9 mm in these 15 misclassified patients versus 15.7 ± 7.7 mm in others (p = NS).
Using the same definitions, we also studied these cases according to senior 2 predictions. Before LSM results communication, 46/142 (32.4%) were misclassified for the diagnosis of significative fibrosis; after LSM results communication to senior 2, 26/142 (18.3%) were then misclassified, thus allowing a good prediction of liver fibrosis in 20/46 (43%) of these initially misclassified patients. Mean biopsy length was 15.5 ± 6.3 mm in these 26 misclassified patients versus 15.9 ± 7.9 mm in others (p = NS).
Before LSM results communication, 24/142 (16.9%) were misclassified for the diagnosis of cirrhosis; after LSM results communication to senior 2, 16/142 (11.2%) were then misclassified, thus allowing a good prediction of liver fibrosis in 8/24 (33%) of these initially misclassified patients. Mean biopsy length was 15.8 ± 6.9 mm in these 16 misclassified patients versus 15.8 ± 7.8 mm in others.
Finally, data concerning the 15 misclassified patients according to histological METAVIR score by both seniors predictions after LSM results communication for the diagnosis of cirrhosis were individually collected and communicated to a fifth independent hepatologist, considered as referee (Table 5). Independent signs of cirrhosis displayed by ultrasonography (US) or endoscopy examination were also provided. This referee was asked to decide in each case whether discrepancies were apparently because of false bioclinical evaluations using LSM by senior physicians or possibly because of LB sampling error (using thresholds defined by Bedossa et al. (6), i.e., 35% misstaged METAVIR score when LB <15 mm, 25% misstaged METAVIR score when LB <25 mm). Among these 15 cases, 6/15 (40%) patients had LB <15 mm and 12/15 (80%) patients had LB <25mm.
In two cases (cases 1 and 3), the diagnosis of cirrhosis was predicted by at least one of the senior physicians although the METAVIR score was < F4. In case 1, LSM values were high in a patient with a large length biopsy, but possibly as a result of perisinusoidal fibrosis. In case 3, the prediction of cirrhosis by seniors was mainly influenced by the existence of thrombopenia in a patient with small length LB. In all other cases, the diagnosis of cirrhosis according to the METAVIR score was "missed" by both senior physicians. Both of them were mainly influenced by the results of transient elastography, as LSM values were <14.5 kPa in all but one (case 2) of these situations. Therefore, the thresholds of LB length according to Bedossa et al. (6) were the main criteria to still consider these patients as "misclassified."
Overall, when taking individual account of all these data, the fifth independent physician decided that 8/15 (54%) of the so-called misclassified patients according to METAVIR score were probably well-staged for the diagnosis of cirrhosis by senior physicians using bioclinical information and LSM results.
Same individual data analysis of the 16 misclassified patients for the diagnosis of significant fibrosis (METAVIR score ≥F2) was performed (Table 6). Senior physicians were strongly (but not only) influenced by LSM values. When major discrepancies were observed, biopsy length was the main criteria for the fifth independent physician to decide whether clinical or histological assessment of liver fibrosis was the most performant. Finally, this physician decided that 9/16 (56%) patients were well-staged by senior physicians for the diagnosis of significant liver fibrosis.
INTRODUCTION
Liver biopsy (LB) is still considered to be the gold standard for the assessment of liver fibrosis in patients with chronic hepatitis C (CHC) (1). Moreover, indications for therapeutic management in patients with significant liver fibrosis or screening for hepatocellular carcinoma (HCC) in patients with cirrhosis are based on the staging of fibrosis using various scoring systems (2). Besides its poor acceptability, major issues have been raised regarding the diagnostic accuracy and performances of LB for liver fibrosis evaluation over the past few years (3). Limits regarding the reliability of this technique have been discussed in terms of intra- and interobserver discrepancies or sampling errors leading to fibrosis mistaging (4-8). In clinical practice, the mean length of liver samples obtained by biopsy is usually less than the 25 mm suggested to be the minimum value for a reliable assessment of fibrosis (6). In parallel, noninvasive assessment of liver fibrosis has been developed (9-11). Many of these new diagnostic tools have been evaluated in CHC cohorts and fair correlations with liver fibrosis according to the METAVIR score have been described (12-15). These indexes rely on numerous blood parameters that are not directly linked to fibrosis but that take into account its consequences on the liver parenchyma, such as portal hypertension or impaired liver function. These indexes perform almost equally even though they use a large panel of different blood tests such as bilirubin, platelet count, prothrombin time, haptoglobin, and many more. More recently, the measurement of liver elasticity (stiffness) by transient elastography (FIBROSCAN, Echosens, Paris, France) has been proposed to assess liver fibrosis (16). This new parameter is easy to record at bedside, reproducible (Cales, personal communication), and seems to be correlated to the area of liver fibrosis. Tested in patients with CHC, its performance seemed at least equal to those of blood tests (17).
Published studies of liver stiffness measurement (LSM) have all been correlated with liver fibrosis stage as determined by the METAVIR fibrosis scoring system (17-19). Thus, optimal threshold values of LSM for the diagnosis of significant liver fibrosis or cirrhosis according to the METAVIR score have been determined. If prediction of liver fibrosis by physicians using simple bioclinical data has already been reported (20, 21), the additional usefulness of LSM in clinical practice as well as the validity of the previously reported threshold values have not yet been studied. The purpose of our study was to estimate the additional predictive value of LSM by FIBROSCAN to physicians' subjective assessment of fibrosis based on epidemiological, clinical, and biological parameters.
DISCUSSION
Our study was designed to be as close as possible to clinical practice and unselected patients were consecutively enrolled. This allows to emphasize the fact that a large majority of patients had a biopsy sample less than 25 mm in length reinforcing concerns about the relevance of LB for assessing fibrosis staging in clinical practice and also providing a possible explanation for the overall poorer correlation between LSM and histology than previously reported (17-19). A better correlation was shown in two pilot studies considering patients with larger biopsy samples. In one of them, an even closer correlation was shown when only a subgroup of patients with larger biopsies was considered. To refer to LB as a gold standard obviously depends on the length of the biopsy sample that must be considered as suboptimal in our study as in common practice.
Despite this limitation, our study is able to demonstrate different facts: 1) the subjective assessment of liver fibrosis by clinicians according to epidemiological and clinical data is variable and influenced (but not only) by previous experience, 2) the use of LSM slightly improves the diagnostic performance of clinicians particularly for the diagnosis of cirrhosis.
The performance of the subjective assessment of liver fibrosis by the clinicians was dependent on two factors: the clinician and the degree of fibrosis.
As a whole, experienced physicians performed better and were closer to each other in their estimation. On the other hand, "junior" clinicians performed very differently, one of them being in the range of performances of seniors.
Not surprisingly, the distinction between F0F1/F2F3F4 was most difficult for all physicians as it is also for pathologists; furthermore, in this study, LSM values were not significantly different among scores F1, F2, and F3 thus limiting the additional value of transient elastography for the prediction of significant fibrosis by clinicians. On the opposite, the diagnostic performances for cirrhosis were much better with AUROC ranging from 0.76 to 0.86, values that are close to those reported for blood tests but slightly inferior to those reported using FIBROSCAN alone either in the present study or in previous works (17-19).
The most important point is that the diagnostic performances were slightly improved by the use of FIBROSCAN in all clinicians even if nonsignificantly in all. Moreover, these performances were nearly the same for all clinicians whatever their experience when LSM was provided with AUROC ranging from 0.69 to 0.71 for F ≥2 and 0.87 to 0.90 for cirrhosis. This would mean that a clinician even with insufficient experience and background is able to perform as well as the most experienced if he is provided with LSM values.
Finally, we focused on patients who were misclassified by experienced physicians for the diagnosis of significant fibrosis and cirrhosis, i.e., patients in which important clinical management decisions have to be taken. Results suggest that 30-50% of initially misclassified patients for such decisions by simple bioclinical evaluation could be finally well evaluated with the use of transient elastography. Furthermore, no confounding factors such as steatosis were identified as a cause of overestimation of liver fibrosis by transient elastography.
Again, the AUROC values were inferior in our study to other previously published results (17-19) but this could be partly explained by the small length of liver samples. We did not compare in the present study the performances of blood indexes such as APRI with FIBROSCAN. We consider that blood indexes are either impratical if a complex calculation must be done at bedside by the clinician or poor performant principally for the diagnosis of cirrhosis.
The thickness of the chest wall is a limitation to the use of FIBROSCAN in obese patients. Recent studies in overweight patients suggest that obesity (BMI up to 40) is not a contraindication to measuring liver stiffness (de Ledinghen et al., personal communication). Obesity by itself is not necessarily a limiting factor but it could be hypothesized that if the failure rate of LSM is around 7% in France, higher values may be observed in countries where overweight is more prevalent, at least until special probes can be designed to overcome this limitation.
Transient elastography, if widely available, could probably provide the clinician with an instant value that will help him to refine his prediction of liver fibrosis. Finally, we recognize that the validation of this procedure must be done in another population of patients with larger biopsy samples in order to improve the diagnostic value of histological assessment.
|
|
|
|
|
|
|