A Reality Check On Artificial Intelligence: Are Health Care Claims Overblown?

As happens when the tech industry gets involved, hype surrounds the claims that artificial intelligence will help patients and even replace some doctors.

By Liz Szabo

6 years ago

Health products powered by , or AI, are streaming into our lives, from virtual to and .

boasted that its AI could Others say computer systems that will make radiologists obsolete.

鈥淭here鈥檚 nothing that I鈥檝e seen in my 30-plus years studying medicine that could be as impactful and transformative鈥� as AI, said Dr. Eric Topol, a cardiologist and executive vice president of Scripps Research in La Jolla, Calif. AI can help doctors interpret , and of the , and could potentially take over many mundane medical chores, freeing doctors to spend more time talking to patients, Topol said.

Even the Food and Drug Administration 鈥� which has approved more than in the past five years 鈥� says 鈥渢he potential of digital health is

Yet many health industry experts fear AI-based products won鈥檛 be able . Many and fear that the tech industry, which lives by the mantra 鈥�,鈥� is 鈥� and that regulators aren鈥檛 doing enough to keep consumers safe.

Early experiments in AI provide a reason for caution, said Mildred Cho, a professor of pediatrics at Stanford鈥檚 Center for Biomedical Ethics.

Systems developed in one hospital often flop when deployed in a different facility, Cho said. Software used in the care of has been shown to discriminate against minorities. And AI systems sometimes learn to make predictions based on factors that have less to do with disease than the used, the or whether a patient was . In one case, AI software incorrectly concluded that people with pneumonia were less likely to die 鈥� an error that could have led doctors to deprive asthma patients of the extra care they need.

鈥淚t鈥檚 only a matter of time before something like this leads to a serious health problem,鈥� said Dr. Steven Nissen, chairman of cardiology at the Cleveland Clinic.

Medical AI, which pulled in in venture capital funding in the third quarter alone, is 鈥渘early at the peak of inflated expectations,鈥� concluded a July report from the research company . 鈥淎s the reality gets tested, there will likely be a rough slide into the trough of disillusionment.鈥�

That reality check could come in the form of disappointing results when AI products are ushered into the real world. Even Topol, the author of 鈥淒eep Medicine: How Artificial Intelligence Can Make Healthcare Human Again,鈥� acknowledges that many AI products are little more than hot air. 鈥淚t鈥檚 a mixed bag,鈥� he said.

(Lynne Shallcross/KHN Illustration; Getty Images)

Experts such as Dr. Bob Kocher, a partner at the venture capital firm Venrock, are blunter. 鈥淢ost AI products have to support them,鈥� Kocher said. Some risks won鈥檛 become apparent until an AI system has been used by large numbers of patients. 鈥淲e鈥檙e going to keep discovering a whole bunch of risks and unintended consequences of using AI on medical data,鈥� Kocher said.

None of the AI products sold in the U.S. have been tested in randomized clinical trials, the strongest source of medical evidence, Topol said. The first and only randomized trial of an AI system 鈥� which found that colonoscopy with computer-aided diagnosis found more small polyps than standard colonoscopy 鈥� was published online in .

Few tech startups publish their research in peer-reviewed journals, which allow other scientists to scrutinize their work, according to a in the European Journal of Clinical Investigation. Such 鈥渟tealth research鈥� 鈥� described only in press releases or promotional events 鈥� often overstates a company鈥檚 accomplishments.

And although software developers may boast about the accuracy of their AI devices, experts note that AI models are mostly tested on computers, not in or other medical facilities. Using unproven software 鈥渕ay make patients into unwitting guinea pigs,鈥� said Dr. Ron Li, medical informatics director for AI clinical integration at Stanford Health Care.

AI systems that learn to recognize patterns in data are often described as because even their developers don鈥檛 know how they have reached their conclusions. Given that AI is so new 鈥� and many of its risks unknown 鈥� the field needs , said Pilar Ossorio, a professor of law and bioethics at the University of Wisconsin-Madison.

Yet the majority of AI devices don鈥檛 require FDA approval.

鈥淣one of the companies that I have invested in are covered by the FDA regulations,鈥� Kocher said.

passed by Congress in 2016 鈥� and championed by the 鈥� exempts from federal review, including certain fitness apps, electronic health records and tools that help doctors make medical decisions.

There鈥檚 been on whether the 320,000 medical apps now in use actually improve health, according to a report on AI published Dec. 17 by the .

If failing fast means a whole bunch of people will die, I don鈥檛 think we want to fail fast. Nobody is going to be happy, including investors, if people die or are severely hurt.

Oren Etzioni, chief executive officer at the Allen Institute for AI in Seattle

鈥淎lmost none of the [AI] stuff marketed to patients really works,鈥� said , professor of medical ethics and health policy in the Perelman School of Medicine at the University of Pennsylvania.

The FDA has long focused its attention on devices that pose the . And consumer advocates acknowledge that some devices 鈥� such as ones that help people count their daily steps 鈥� need less scrutiny than ones that diagnose or treat disease.

Some software developers don鈥檛 bother to apply for FDA clearance or authorization, even when legally required, according to

Industry analysts say that AI developers have little interest in conducting expensive and time-consuming trials. 鈥淚t鈥檚 not the main concern of these firms to submit themselves to rigorous evaluation that would be published in a peer-reviewed journal,鈥� said Joachim Roski, a principal at Booz Allen Hamilton, a technology consulting firm, and 鈥淭hat鈥檚 not how the U.S. economy works.鈥�

But Oren Etzioni, chief executive officer at the Allen Institute for AI in Seattle, said AI developers have a financial incentive to make sure their medical products are safe.

鈥淚f failing fast means a whole bunch of people will die, I don鈥檛 think we want to fail fast,鈥� Etzioni said. 鈥淣obody is going to be happy, including investors, if people die or are severely hurt.鈥�

Relaxing Standards At The FDA

The FDA has come under fire in recent years for allowing the sale of dangerous medical devices, which have been linked by the to over the past decade.

Many of these devices were cleared for use through a controversial process called the , which allows companies to market 鈥渕oderate-risk鈥� products with no clinical testing as long as they鈥檙e deemed similar to existing devices.

In 2011, a committee of the concluded the 510(k) process is so fundamentally flawed that the FDA should throw it out and start over.

Instead, the FDA is using the process to greenlight AI devices.

The FDA, headquartered just outside Washington, D.C., has long focused its attention on devices that pose the greatest threat to patients.(Al Drago/CQ Roll Call via AP Images)

Of the 14 AI products authorized by the FDA in 2017 and 2018, 11 were cleared through the 510(k) process, according to a . None of these appear to have had new clinical testing, the study said. The FDA cleared an designed to help diagnose liver and lung cancer in 2018 based on its similarity to approved 20 years earlier. That software had itself been cleared because it was deemed 鈥渟ubstantially equivalent鈥� to .

AI products cleared by the FDA today are largely 鈥渓ocked,鈥� so that their calculations and results will not change after they enter the market, said Bakul Patel, director for digital health at the FDA鈥檚 Center for Devices and Radiological Health. The FDA has not yet authorized 鈥渦nlocked鈥� AI devices, whose results could vary from month to month in ways that developers cannot predict.

To deal with the flood of AI products, the FDA is testing a radically different approach to digital device regulation, focusing on evaluating companies, not products.

The FDA鈥檚 pilot , launched in 2017, is designed to 鈥渞educe the time and cost of market entry for software developers,鈥� imposing the 鈥渓east burdensome鈥� system possible. FDA officials say they want to keep pace with AI software developers, who much more frequently than makers of traditional devices, such as X-ray machines.

Scott Gottlieb said in 2017 while he was FDA commissioner that government regulators need to make sure its approach to innovative products 鈥渋s efficient and that it fosters, not impedes, innovation.鈥�

Under the plan, the FDA would pre-certify companies that 鈥渄emonstrate a culture of quality and organizational excellence,鈥� which would allow them to provide about devices.

Pre-certified companies could then release devices with a 鈥渟treamlined鈥� review 鈥� or no FDA review at all. Once products are on the market, companies will be responsible for and reporting back to the FDA. have been selected for the pilot: Apple, FitBit, Samsung, Johnson & Johnson, Pear Therapeutics, Phosphorus, Roche, Tidepool and Verily Life Sciences.

High-risk products, such as , will still get a comprehensive FDA evaluation. 鈥淲e definitely don鈥檛 want patients to be hurt,鈥� said Patel, who noted that devices cleared through pre-certification can be recalled if needed. 鈥淭here are a lot of guardrails still in place.鈥�

But research shows that even have been recalled due to serious risks to patients, said Diana Zuckerman, president of the National Center for Health Research. 鈥淧eople could be harmed because something wasn鈥檛 required to be proven accurate or safe before it is widely used.鈥�

Johnson & Johnson, for example, has recalled and .

In a , the American Medical Association and others have questioned the wisdom of allowing companies to monitor their own performance and product safety.

鈥淭he honor system is not a regulatory regime,鈥� said Dr. Jesse Ehrenfeld, who chairs the physician group鈥檚 board of trustees.

In an October letter to the FDA, Sens. Elizabeth Warren (D-Mass.), Tina Smith (D-Minn.) and Patty Murray (D-Wash.) questioned the agency鈥檚 ability to ensure company safety reports are 鈥渁ccurate, timely and based on all available information.鈥�

When Good Algorithms Go Bad

Some AI devices are more carefully tested than others.

An for diabetic eye disease was studied in 900 patients at 10 primary care offices before being approved in 2018. The manufacturer, IDx Technologies, worked with the FDA for eight years to get the product right, said Dr. Michael Abramoff, the company鈥檚 founder and executive chairman.

The test, sold as IDx-DR, screens patients for diabetic retinopathy, a leading cause of blindness, and refers high-risk patients to eye specialists, who make a definitive diagnosis.

IDx-DR is the first 鈥渁utonomous鈥� AI product 鈥� one that can make a screening decision without a doctor. The company is now installing it in primary care clinics and grocery stores, where it can be operated by employees with a high school diploma. Abramoff鈥檚 company has taken the unusual step of buying liability insurance to cover any patient injuries.

Yet some AI-based innovations intended to improve care have had the opposite effect.

A Canadian company, for example, developed AI to predict a person鈥檚 risk of Alzheimer鈥檚 based on their speech. Predictions were more accurate for some patients than others. 鈥淒ifficulty finding the right word may be due to rather than to cognitive impairment,鈥� said co-author Frank Rudzicz, an associate professor of computer science at the University of Toronto.

Doctors at hoped AI could help them use chest X-rays to predict which patients were at high risk of pneumonia. Although the system made accurate predictions from X-rays shot at Mount Sinai, the technology flopped when tested on images taken at other hospitals. Eventually, researchers realized the computer had merely learned to tell the difference between that hospital鈥檚鈥� taken at a patient鈥檚 bedside 鈥� with those taken in the radiology department. Doctors tend to use portable chest X-rays for patients too sick to leave their room, so it鈥檚 not surprising that these patients had a greater risk of lung infection.

While it is the job of entrepreneurs to think big and take risks, it is the job of doctors to protect their patients.

Dr. Vikas Saini, a cardiologist and president of the nonprofit Lown Institute, which advocates for wider access to health care

DeepMind, a company owned by Google, has created an AI-based mobile app that can predict which hospitalized patients will develop up to 48 hours in advance. A blog post on the described the system, used at a London hospital, as a 鈥済ame changer.鈥� But the AI system also for every correct result, according to a . That may explain why patients鈥� kidney function , said associate professor of radiology at the Hospital of the University of Pennsylvania. Any benefit from early detection of serious kidney problems may have been diluted by a high rate of 鈥渙verdiagnosis,鈥� in which the AI system flagged borderline kidney issues that didn鈥檛 need treatment, Jha said. Google had no comment in response to Jha鈥檚 conclusions.

False positives can harm patients by prompting doctors to order unnecessary tests or withhold recommended treatments, Jha said. For example, a doctor worried about a patient鈥檚 kidneys might stop prescribing ibuprofen 鈥� a generally safe pain reliever that poses a small risk to kidney function 鈥� in favor of an opioid, which carries a serious risk of addiction.

As these studies show, software with impressive results in a computer lab can founder when tested in real time, Stanford鈥檚 Cho said. That鈥檚 because diseases are more complex 鈥� and the health care system far more dysfunctional 鈥� than many computer scientists anticipate.

Many AI developers cull because they hold huge amounts of detailed data, Cho said. But those developers often aren鈥檛 aware that they鈥檙e building atop a deeply broken system. Electronic health records were developed for billing, not patient care, and are filled with mistakes or missing data.

A KHN investigation published in March found sometimes life-threatening errors in patients鈥� medication lists, lab tests and allergies.

In view of the risks involved, doctors need to step in to protect their patients鈥� interests, said Dr. Vikas Saini, a cardiologist and president of the nonprofit Lown Institute, which advocates for wider access to health care.

鈥淲hile it is the job of entrepreneurs to think big and take risks,鈥� Saini said, 鈥渋t is the job of doctors to protect their patients.鈥�