Home › AI Therapy › AI Therapy Studies: What the Research Actually Shows

AI Therapy Studies: What the Research Actually Shows

An honest look at the published studies on AI therapy, what they found, where the evidence is genuinely promising, and where researchers urge real caution.

Reviewed by Seph Fontane Pennock · 7 min read

Published July 6, 2026 · Last reviewed July 7, 2026

In short

The research on AI therapy is promising but early and mixed. Small randomized trials of chatbot tools like Woebot and Wysa, and a 2025 Dartmouth trial of Therabot, suggest AI conversation can reduce symptoms of depression and anxiety for some people with mild to moderate symptoms. Systematic reviews reach cautiously positive conclusions while flagging short studies, small samples, and weak long-term data. At the same time, safety research on general-purpose chatbots shows they can respond unsafely in crisis scenarios. The honest summary is that AI therapy may help with mild symptoms, the evidence is still early, and it is not a replacement for a licensed clinician or a crisis service.

What the research covers, and what it does not

Most of the strongest research on AI therapy looks at purpose-built mental-health chatbots that deliver structured techniques, usually drawn from cognitive behavioral therapy, rather than at general chatbots used informally for emotional support, a distinction that runs through all the applications of AI in therapy and counseling.

The published studies tend to measure short-term changes in symptoms of depression and anxiety, often over two to eight weeks, in people with mild to moderate symptoms. Far less is known about long-term outcomes, severe conditions, crisis situations, or how these tools perform outside a controlled study. Keeping that scope in mind is the key to reading the evidence, and to understanding why researchers describe the field as encouraging but thin. For a plain-language overview of the bottom line, see does AI therapy work.

I've followed this research since the first Woebot trial in 2017, and the pattern holds: the encouraging results come from purpose-built tools tested under clinical supervision, not from the general chatbots most people actually lean on. Keep that distinction in mind whenever a headline claims AI therapy works.

Seph Fontane Pennock, AI therapy expert

The Woebot trial (Fitzpatrick, 2017)

One of the most cited early studies is a randomized controlled trial of Woebot, a chatbot that delivers brief, conversational cognitive behavioral therapy. Published in 2017 in JMIR Mental Health by Fitzpatrick, Darcy, and Vierhile, it enrolled 70 college students who reported symptoms of depression and anxiety and compared two weeks of Woebot conversations against an information-only control.

The study reported that participants who used the chatbot saw a reduction in depressive symptoms over the two weeks compared with the control group, and that engagement was high. It was an important early demonstration that a fully automated conversational agent could deliver self-help techniques people would actually use. The limits matter too: it was a small, short, young-adult sample, so the findings point to potential rather than proof for the general population.

Wysa and the wider chatbot studies

Wysa, another CBT and DBT based chatbot, has also been studied. A 2018 paper in JMIR mHealth and uHealth by Inkster, Sarda, and Subramanian used real-world app data to examine mood changes among people who used Wysa, and reported greater improvement in self-reported low mood among more engaged users compared with less engaged users.

This kind of real-world analysis is useful because it reflects how people actually use an app, but it is not a randomized trial, so it cannot rule out that more motivated users simply improve more on their own. The engagement pattern also echoes what users describe in AI therapist reviews. Across the broader set of chatbot studies, a recurring pattern appears: encouraging short-term signals for mild symptoms, paired with study designs that are modest in size and length. That pattern supports measured optimism.

The Dartmouth Therabot trial (2025)

The most rigorous recent study is a randomized controlled trial of Therabot, a generative AI therapy chatbot developed at Dartmouth. Published in 2025 in NEJM AI, with Heinz and colleagues among the authors, the trial tested Therabot in adults with symptoms of depression, anxiety, or an eating-disorder risk profile, comparing the chatbot against a waitlist control over several weeks.

The researchers reported significant reductions in depression and anxiety symptoms among participants who used Therabot relative to the waitlist group, along with the observation that people formed a sense of working alliance with the tool. This is notable because it used a generative model under careful clinical supervision rather than a scripted bot. The authors themselves frame it as an early and promising result that needs replication, larger and more diverse samples, and longer follow-up before anyone should treat AI therapy as established care.

Systematic reviews: the cautious consensus

When individual studies disagree or are small, systematic reviews help by pooling them. A 2020 systematic review in JMIR by Abd-Alrazaq and colleagues pooled the available randomized trials of mental-health chatbots to examine their effectiveness and safety. Its broad conclusion was cautiously positive: chatbots showed potential to improve some mental-health outcomes, particularly for depression and distress, while the evidence base was limited by small samples, short durations, varied quality, and a shortage of long-term and safety data.

That is the consensus that keeps recurring across reviews of this field. The technology shows real promise for mild symptoms and for engagement, and the research is genuinely early and uneven. A responsible reading treats these tools as a supportive, low-cost first step that is still being validated, a balanced weighing of AI therapy pros and cons rather than a verdict.

The safety findings researchers take seriously

Effectiveness is only half the picture. A separate and important line of research looks at safety, especially how chatbots respond when someone is in crisis. Work from Stanford researchers and others has shown that general-purpose large language models can respond inappropriately or unsafely to prompts involving suicide, self-harm, or severe distress, sometimes missing risk signals or reinforcing harmful thinking.

These findings are a major reason experts urge caution, and why researchers now publish best practices for AI chatbots in therapy. A tool that helps with everyday stress is not automatically safe in an emergency, and general chatbots not built for mental health carry real risk when used that way. This is why every credible summary repeats the same boundary: AI therapy may support people with mild symptoms, but it is not a crisis service and not a replacement for a licensed clinician. If you are in crisis, call or text 988 in the US.

The AI Therapy Evidence Timeline: 2017-2025

Key takeaways

The overall picture is promising but early and mixed: AI therapy shows real potential for mild to moderate symptoms, on a still-thin evidence base.
The 2017 Woebot randomized trial (Fitzpatrick) found reduced depressive symptoms over two weeks, but in a small, short, young-adult sample.
Wysa real-world data (Inkster, 2018) linked higher engagement to greater mood improvement, though it was not a randomized trial.
The 2025 Dartmouth Therabot randomized trial (Heinz, NEJM AI) reported meaningful symptom reductions and called for replication and longer follow-up.
Systematic reviews (such as Abd-Alrazaq, 2020) reach a cautiously positive verdict while flagging small samples, short durations, and weak long-term data.
Safety research shows general-purpose chatbots can respond unsafely in crisis scenarios, so AI therapy is not a crisis service or a substitute for a licensed clinician.

Looking for care?

Browse licensed therapists in our directory.

Find a therapist

Frequently asked questions

Is there evidence for AI therapy?

Yes, but it is early and limited. Small randomized trials of chatbots like Woebot, real-world studies of Wysa, a 2025 Dartmouth trial of Therabot, and systematic reviews all point to potential benefits for mild to moderate symptoms of depression and anxiety. The evidence is promising rather than conclusive, because most studies are short, small, and focused on mild symptoms.

What does the research say about AI therapy?

The research suggests AI therapy chatbots can reduce symptoms of depression and anxiety for some people in the short term, especially when the tool delivers structured techniques like CBT. Systematic reviews describe the results as cautiously positive but limited by study size, length, and quality. Separate safety research warns that general-purpose chatbots can respond unsafely in crisis situations.

What did the Woebot study find?

The 2017 randomized controlled trial of Woebot, published in JMIR Mental Health by Fitzpatrick, Darcy, and Vierhile, found that college students who used the chatbot for two weeks reported a reduction in depressive symptoms compared with an information-only control group. It was an early proof of concept with a small, short, young-adult sample, so it points to potential rather than broad proof.

What was the Dartmouth Therabot trial?

It was a 2025 randomized controlled trial of Therabot, a generative AI therapy chatbot developed at Dartmouth, published in NEJM AI with Heinz among the authors. The study reported meaningful reductions in symptoms of depression and anxiety among participants who used the tool compared with a waitlist control. The authors describe it as an early, promising result that needs replication, larger samples, and longer follow-up.

Are AI therapy studies reliable?

They are a reasonable starting point, but they share real limits. Many trials are small, run for only a few weeks, and enroll people with mild symptoms, which makes it hard to generalize. Some studies use real-world app data rather than randomized designs. The most rigorous evidence, like the Dartmouth Therabot trial, is recent and still awaiting replication, so confident claims are not yet warranted.

Do studies show AI therapy is safe?

Not unconditionally. Studies of purpose-built mental-health chatbots have not shown major harms for mild symptoms, but safety research on general-purpose chatbots has found they can respond inappropriately or unsafely to prompts about suicide, self-harm, or severe distress. Because of this, experts treat AI therapy as a supportive tool for mild symptoms, not a crisis service. In a crisis, call or text 988 in the US.

Related AI therapy guides

Reviewed by Seph Fontane Pennock
Reviewed to ensure studies are described accurately and not overstated.

References

Important: This article is educational information about AI mental-health tools, not a substitute for professional care or a diagnosis. AI tools are not crisis services. If you are struggling, reach out to a licensed mental-health professional. In an emergency, call your local emergency number or, in the US, call or text 988.