Selection Committee Member for the Trial Outcomes Region:
Jamie P. Dwyer, MD
Dr. Dwyer is Professor of Medicine and Director of the Nephrology Clinical Trials Center at Vanderbilt University Medical Center in Nashville. His research entails the design, conduct, and analysis of large-scale randomized clinical trials, particularly in delaying the progression of CKD, detecting renal failure within non-renal trials, and testing interventions in ESRD. Follow him @jamie_dwyer.
Competitors for the Trial Outcomes Region
Doubling of Creatinine vs 40% Reduction in eGFR
Is it time for the “3 Ds” of nephrology studies (death, dialysis, doubling of creatinine) to go down, down, down? Gunning for that final D is a 40% decrease in eGFR: a relative newcomer to the surrogate marker game, but one that has generated a lot of excitement since being pulled off the bench.
First, let’s talk about surrogate markers in general. Even the venerable doubling of creatinine is just a surrogate marker for things our patients really care about (like dialysis, death, and quality of life). There are a number of things that make a good surrogate marker. These include a strong relationship with an important outcome, ease of measurement, and the fact that a given intervention should have a similar effect on a surrogate outcome as the ultimate outcome. Both of our contestants meet these criteria, but to varying degrees.
Doubling of Creatinine
First—a rundown on the elder statesmen doubling of creatinine. Clean, elegant, and able to be calculated in your head, doubling of creatinine has been a key CKD endpoint for decades, and the FDA recognizes it as such.
But there’s a problem with this venerable biomarker. In CKD, doubling of creatinine (which, through mathematical wizardry beyond the scope of this write up, is equivalent to a 57% decrease in eGFR) takes a really long time. In fact, data from the Chronic Renal Insufficiency Cohort (where the baseline eGFR is already rather low) show that only around 5% of participants per year experience a doubling of creatinine.
Long trials delay important results and, perhaps more importantly, cost a lot of money, so a marker that occurs earlier in the course of CKD would seem to be ideal. A 40% drop in eGFR is less than a 57% drop, so you hit your endpoint earlier using 40% as a clinical end-point.
But, if we follow this logic to its absurdist end, why not use 30%? 20%? Heck, why not 5%?
If CKD were characterized by the constant and inexorable loss of eGFR over time, these other endpoints would work quite well. Perhaps everyone who experiences a 5% reduction in eGFR will go on to experience all those other endpoints in time. Why bother measuring?
Well, if CKD were characterized by the constant and inexorable loss of eGFR over time then it would be a very boring disease. We know from first-hand experience that this isn’t the case, and it’s those heterogeneous cases that put the ball back in doubling-of-creatinine’s court.
Here we provide a few examples of eGFR over time, starting at 100% of baseline function.
In blue, we have a well-behaved patient with CKD who has a slow, plodding course of eGFR decline over time. Using a 40% threshold in this individual, or those like her, would be entirely reasonable, as they would progress to the doubling of creatinine threshold anyway.
But what about the gentleman in red? Perhaps he was treated with a novel agent that acutely reduces eGFR but sustains eGFR in the long term. A great example that follows this pattern is the use of renin-angiotensin system inhibitors in the setting of diabetic nephropathy. Acute dippers may drop below the 40% threshold transiently, leading to incorrect assumptions about their ultimate outcome, supporting the use of doubling of creatinine as our outcome in this example.
40% Reduction in eGFR
As we move on to the newcomer in this matchup, we need to consider power. If you remember your introductory epidemiology class (and who doesn’t), you’ll recall that the “power” of a study is strongly related to the number of events that are expected to occur. Studies examining very rare events need to enroll a lot of people (at great cost). The 40% threshold is much more common than the doubling of creatinine threshold (about 10%-50% more common depending on the study). More events = smaller trials, right?
Well…not exactly. Increasing the event rate only increases power insofar as the effect of the agent of interest doesn’t change. Let’s put some numbers to it.
Let’s say I have a population of patients with a 5% rate of doubling of creatinine. I have a new drug that I believe will cut that rate in half—to 2.5%. I’ll need to enroll about 1,800 people to prove my hypothesis. But what if the rate of a 40% decline in this population is 10%? In this case the drug will lower the rate of a 40% decline in eGFR to 5% per year. Well, then I only need to enroll 870 people; about 1,000 fewer people and quite a bit of savings!
But wait…what if the effect of my drug isn’t as dramatic on the 40% outcome. What if I reduce the rate of 40% loss of eGFR from 10% to just 7.5% (perhaps because this is a “noisier” outcome which tends to bias effects towards the null). Well, now I need to enroll 4,000 people to detect the effect. I have more events, but it made my power problem worse!
One way to frame the contest between these 2 champions is based on increased power versus increased type 1 error rate. The potential for increased power of the 40% eGFR reduction endpoint is lost if we reach false conclusions using this endpoint. In a meta-analysis of 37 randomized trials in CKD, more than 80% showed that the use of a 40% endpoint in most cases would improve statistical power while only minimally inflating the type 1 error rate. Interestingly, this study found that a 30% eGFR reduction endpoint would cause a potentially unacceptable increase in type 1 error rates, particularly in those studies where an effective drug causes a brief, acute eGFR decline.
With that data in hand, it seems that the smart money in this contest is on a 40% reduction in eGFR. But, as they say in Vegas, sometimes it’s double or nothing.
Proteinuria vs Patient-Reported Outcomes
This matchup pits the quantifiable against the ephemeral, the chemical against the emotional, the immunoturbidimetric assay versus. the survey. Yes, proteinuria is up against patient-reported measures and this one promises to really foam up the old toilet bowl.
Proteinuria has a storied history as a surrogate marker in kidney studies. Beyond any doubt, proteinuria is predictive of a bad outcome, as this heat map demonstrates.
But just because proteinuria portends a poor prognosis (say that 10 times fast) does not mean it is necessarily a good outcome for a clinical trial. After all, the literature is rife with examples of clinical trials where an intervention improves a surrogate outcome but is subsequently found to have no effect (or perhaps even a negative effect) on a clinically important outcome.
Data for proteinuria as a surrogate marker in the CKD space is mixed. A secondary analysis of the RENAAL study, which evaluated losartan in patients with diabetic nephropathy, found that reduction of proteinuria was a strong predictor of ultimate outcome (in this case, achieving one of the 3 Ds—death, dialysis, doubling of creatinine). Moreover, there was a dose-response effect: more protein reduction, better ultimate outcome. On the flip side, studies such as VA NEPHRON-D, ONTARGET, and ALTITUDE found that reduction in proteinuria was a poor surrogate for the hard kidney outcomes.
Proteinuria is an attractive in this matchup because it has the potential to lie along the causal pathway of kidney disease progression via its apparent ability to promote inflammation and subsequent fibrosis.
But what do we mean when we say proteinuria? A brief look at recent studies suggests that standardization is seriously lacking here. Should we use the protein-creatinine ratio, as this trial of cyclosporine vs MMF in kids with FSGS? Or should we use a 24-hour collection with all of its difficulties, as was done in this trial which examined the effect of prednisone in patients with IgA Nephropathy (modest benefit)?
Proteinuria is ephemeral and variable—are we to trust a single measure of proteinuria at the end of a trial, or should we try to enhance our capture of real effects by measuring proteinuria over multiple time points? And if we do that, should we average the results over time, treat them all individually, or do some mathematical combination of the 2 strategies?
One population-based study suggests that proteinuria performs best as a surrogate when multiple values over time are considered. From a practical perspective, this is not an “easy” study design, nor are the results particularly easy to explain to patients and clinicians.
Ironically, proteinuria—this most “quantifiable” of surrogate outcomes—is not really so. It quickly becomes obvious that proteinuria is like trust in the government—probably important, but damned difficult to measure.
Enter patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs). Remember when we used to ask patients how they felt, and based on their answer we would decide how well our treatment was working? That’s a PROM. I think we’d all like to go back there.
In contrast to proteinuria, PROMs are hard outcome measures. Quality of life is unequivocally important to patients and doctors, and interventions that meaningfully move the needle on quality of life are worthy even in the absence of data suggesting they improve other outcomes like survival.
But where proteinuria can be measured reproducibly in a lab, quality of life measures need to come from patients directly. Encouragingly, the test-retest reliability of the SF-36 (a more-or-less standard quality of life instrument) was quite good, at least over a 2-week time frame. Nevertheless, using PROMs as a clinical trial outcome is a compelling appeal to our shared humanity.
The question is which PROM to use? There are many options. You have the Kidney Disease Quality of Life Short Form (an 80-item survey). You have the Dialysis Symptom Index (a 30-item measure for dialysis patients). There is the Quality of Life Index (a 68-item, more general questionnaire). A recent systematic review found a total of 20 kidney-specific quality of life instruments. While the analysis demonstrated that the KDQOL-36 had the best performance characteristics, very few studies have pitted one instrument against another in the same population.
In contrast to PROMs, patient-reported experience measures (PREMs) ask the patients to tell us about how they perceive their care. How long did they sit in our waiting room? How well did they understand their doctor’s recommendations? These are factors that teach us how to change our practice towards a goal, perhaps, of improving a patient’s quality of life. In that sense PREMs are sort of surrogate outcomes for PROMs. An intervention (like playing nice music in the waiting room) may improve a patient’s experience of sitting in the waiting room, but it probably isn’t going to improve their mental or physical quality of life (at least once they leave).
There are only 2 PREM instruments validated for patients with kidney disease (both geared towards dialysis). While these surveys may help us improve the experience in our dialysis facilities, it remains unclear whether they will be effective levers to drive improved overall patient care.
In the end, we have 2 titans of clinical trial outcomes competing for the gold. And while only one can be the victor during NephMadness, either are reasonable outcomes for a clinical trial – provided they are pre-specified, of course.
– Post prepared by Perry Wilson. Follow him @methodsmanmd.
How to Claim CME
US-based physicians can earn 1.0 CME credit for reading this region. Please register/log in at the NKF PERC portal. Click on “Continue,” click on the “Trial Outcomes Region,” then click on “Continue” to access the evaluation. You’ll need to click on “Continue” again to complete the evaluation, after which you can claim 1.0 credit and print your certificate. The CME activity will expire on June 15th, 2018.