In short
The research on AI therapy is promising but early and mixed. Small randomized trials of chatbot tools like Woebot and Wysa, and a 2025 Dartmouth trial of Therabot, suggest AI conversation can reduce symptoms of depression and anxiety for some people with mild to moderate symptoms. Systematic reviews reach cautiously positive conclusions while flagging short studies, small samples, and weak long-term data. At the same time, safety research on general-purpose chatbots shows they can respond unsafely in crisis scenarios. The honest summary is that AI therapy may help with mild symptoms, the evidence is still early, and it is not a replacement for a licensed clinician or a crisis service. If you are in crisis or thinking about suicide, call or text 988 in the US to reach the Suicide and Crisis Lifeline, available 24 hours a day.
What the research covers, and what it does not
When people ask whether there is evidence for AI therapy, it helps to be precise about what has actually been studied. Most of the strongest research looks at purpose-built mental-health chatbots that deliver structured techniques, usually drawn from cognitive behavioral therapy, rather than at general chatbots used informally for emotional support.
The published studies tend to measure short-term changes in symptoms of depression and anxiety, often over two to eight weeks, in people with mild to moderate symptoms. Far less is known about long-term outcomes, severe conditions, crisis situations, or how these tools perform outside a controlled study. Keeping that scope in mind is the key to reading the evidence honestly, and to understanding why researchers describe the field as promising but early. For a plain-language overview of the bottom line, see does AI therapy work.
The Woebot trial (Fitzpatrick, 2017)
One of the most cited early studies is a randomized controlled trial of Woebot, a chatbot that delivers brief, conversational cognitive behavioral therapy. Published in 2017 in JMIR Mental Health by Fitzpatrick, Darcy, and Vierhile, it enrolled college students who reported symptoms of depression and anxiety and compared two weeks of Woebot conversations against an information-only control.
The study reported that participants who used the chatbot saw a reduction in depressive symptoms over the two weeks compared with the control group, and that engagement was high. It was an important proof of concept that a fully automated conversational agent could deliver self-help techniques people would actually use. The limits matter too: it was a small, short, young-adult sample, so the findings point to potential rather than proof for the general population.
Wysa and the wider chatbot studies
Wysa, another CBT and DBT based chatbot, has also been studied. A 2018 paper in JMIR mHealth and uHealth by Inkster, Sarda, and Subramanian used real-world app data to examine mood changes among people who used Wysa, and reported greater improvement in self-reported low mood among more engaged users compared with less engaged users.
This kind of real-world analysis is useful because it reflects how people actually use an app, but it is not a randomized trial, so it cannot rule out that more motivated users simply improve more on their own. Across the broader set of chatbot studies, a recurring pattern appears: encouraging short-term signals for mild symptoms, paired with study designs that are modest in size and length. That is a reason for measured optimism, not strong claims.
The Dartmouth Therabot trial (2025)
The most rigorous recent study is a randomized controlled trial of Therabot, a generative AI therapy chatbot developed at Dartmouth. Published in 2025 in NEJM AI, with Heinz and colleagues among the authors, the trial tested Therabot in adults with symptoms of depression, anxiety, or an eating-disorder risk profile, comparing the chatbot against a waitlist control over several weeks.
The researchers reported meaningful reductions in symptoms among participants who used Therabot relative to the control group, along with the observation that people formed a sense of working alliance with the tool. This is notable because it used a generative model under careful clinical supervision rather than a scripted bot. The authors themselves frame it as an early and promising result that needs replication, larger and more diverse samples, and longer follow-up before anyone should treat AI therapy as established care.
Systematic reviews: the cautious consensus
When individual studies disagree or are small, systematic reviews help by pooling them. A 2020 systematic review in JMIR by Abd-Alrazaq and colleagues examined the effectiveness and safety of mental-health chatbots across the available trials. Its broad conclusion was cautiously positive: chatbots showed potential to improve some mental-health outcomes, particularly for depression and distress, while the evidence base was limited by small samples, short durations, varied quality, and a shortage of long-term and safety data.
That is the consensus that keeps recurring across reviews of this field. The technology shows real promise for mild symptoms and for engagement, and the research is genuinely early and uneven. A responsible reading does not dismiss AI therapy, and it does not oversell it. It treats these tools as a supportive, low-cost first step that is still being validated, not as a proven clinical treatment.
The safety findings researchers take seriously
Effectiveness is only half the picture. A separate and important line of research looks at safety, especially how chatbots respond when someone is in crisis. Work from Stanford researchers and others has shown that general-purpose large language models can respond inappropriately or unsafely to prompts involving suicide, self-harm, or severe distress, sometimes missing risk signals or reinforcing harmful thinking.
These findings are a major reason experts urge caution. A tool that helps with everyday stress is not automatically safe in an emergency, and general chatbots not built for mental health carry real risk when used that way. This is why every credible summary, including this one, repeats the same boundary: AI therapy may support people with mild symptoms, but it is not a crisis service and not a replacement for a licensed clinician. If you are in crisis, call or text 988 in the US.
Key takeaways
- The overall picture is promising but early and mixed: AI therapy shows real potential for mild to moderate symptoms, on a still-thin evidence base.
- The 2017 Woebot randomized trial (Fitzpatrick) found reduced depressive symptoms over two weeks, but in a small, short, young-adult sample.
- Wysa real-world data (Inkster, 2018) linked higher engagement to greater mood improvement, though it was not a randomized trial.
- The 2025 Dartmouth Therabot randomized trial (Heinz, NEJM AI) reported meaningful symptom reductions and called for replication and longer follow-up.
- Systematic reviews (such as Abd-Alrazaq, 2020) reach a cautiously positive verdict while flagging small samples, short durations, and weak long-term data.
- Safety research shows general-purpose chatbots can respond unsafely in crisis scenarios, so AI therapy is not a crisis service or a substitute for a licensed clinician.
Looking for care?
Browse licensed therapists in our directory.
Frequently asked questions
Is there evidence for AI therapy?
Yes, but it is early and limited. Small randomized trials of chatbots like Woebot, real-world studies of Wysa, a 2025 Dartmouth trial of Therabot, and systematic reviews all point to potential benefits for mild to moderate symptoms of depression and anxiety. The evidence is promising rather than conclusive, because most studies are short, small, and focused on mild symptoms.
What does the research say about AI therapy?
The research suggests AI therapy chatbots can reduce symptoms of depression and anxiety for some people in the short term, especially when the tool delivers structured techniques like CBT. Systematic reviews describe the results as cautiously positive but limited by study size, length, and quality. Separate safety research warns that general-purpose chatbots can respond unsafely in crisis situations.
What did the Woebot study find?
The 2017 randomized controlled trial of Woebot, published in JMIR Mental Health by Fitzpatrick, Darcy, and Vierhile, found that college students who used the chatbot for two weeks reported a reduction in depressive symptoms compared with an information-only control group. It was an early proof of concept with a small, short, young-adult sample, so it points to potential rather than broad proof.
What was the Dartmouth Therabot trial?
It was a 2025 randomized controlled trial of Therabot, a generative AI therapy chatbot developed at Dartmouth, published in NEJM AI with Heinz among the authors. The study reported meaningful reductions in symptoms of depression and anxiety among participants who used the tool compared with a waitlist control. The authors describe it as an early, promising result that needs replication, larger samples, and longer follow-up.
Are AI therapy studies reliable?
They are a reasonable starting point, but they share real limits. Many trials are small, run for only a few weeks, and enroll people with mild symptoms, which makes it hard to generalize. Some studies use real-world app data rather than randomized designs. The most rigorous evidence, like the Dartmouth Therabot trial, is recent and still awaiting replication, so confident claims are not yet warranted.
Do studies show AI therapy is safe?
Not unconditionally. Studies of purpose-built mental-health chatbots have not shown major harms for mild symptoms, but safety research on general-purpose chatbots has found they can respond inappropriately or unsafely to prompts about suicide, self-harm, or severe distress. Because of this, experts treat AI therapy as a supportive tool for mild symptoms, not a crisis service. In a crisis, call or text 988 in the US.
