Doctors vs AI: New Microsoft AI System MAI-DxO Solves ‘Complex Diagnostic Challenges’ Better Than Clinicians – 1v1 Video Chat & LIve Streaming & Influencer Subscription

By Chaitanya Kohli

Microsoft’s Artificial Intelligence (AI) team has shared research that highlights how AI can perform better than actual human doctors to solve some of the “most complex diagnostic challenges” in the field of medical science, as per a blog post published by the tech giant.

Microsoft has developed an AI system that emulates the actions of a panel of expert physicians dealing with “intellectually demanding” medical cases taken from the New England Journal of Medicine (NEJM).

This AI system—called the Microsoft AI Diagnostic Orchestrator (MAI-DxO)—working in conjunction with OpenAI’s o3 model, solved more than 85% of cases from NEJM. In contrast, doctors from the US and UK, each with 5-20 years of clinical experience, who had no access to AI chatbots, textbooks, or colleagues, could only achieve a success rate of 20%.

For context, Microsoft created interactive case challenges involving stepwise diagnostic encounters sourced from NEJM, where AI models—or human physicians—could put forth questions and order tests.

Microsoft: AI Will Assist Doctors, Not Usurp Them

MAI-DxO logged “higher diagnostic accuracy” and lower overall testing expenditure when compared to physicians or any individual foundation model. For context, the research project tested AI foundation models like ChatGPT, Llama, Claude, Gemini, Grok, and DeepSeek.

Notably, the tech giant claimed that AI would play a complementary role in the healthcare setting, rather than becoming the primary presence in a medical clinic. Further, it highlighted that a doctor’s role is much more than simply making a medical diagnosis of patients.

“While this technology (AI) is advancing rapidly, their (doctors’) clinical roles are much broader than simply making a diagnosis,” the blog post read.

“Clinical roles will, we believe, evolve with AI giving clinicians the ability to automate routine tasks, identify diseases earlier, personalise treatment plans, and potentially prevent some diseases altogether,” it added.

Microsoft also put forth its doubts about AI systems demonstrating their brilliance on medical examinations, such as the United States Medical Licensing Examination (USMLE).

“In just three years, generative AI has advanced to the point of scoring near-perfect scores on the USMLE and similar exams. But these tests primarily rely on multiple-choice questions, which favour memorisation over deep understanding.

“By reducing medicine to one-shot answers on multiple-choice questions, such benchmarks overstate the apparent competence of AI systems and obscure their limitations,” the tech giant’s blog post read.

Why It Matters

Even as AI makes strides in the healthcare sector—illustrated by Microsoft’s latest research—doubts remain about its applicability in real-world settings.