Edward Ungvarsky doesn’t implicitly trust expert witnesses. As a criminal defense attorney, his clients are often evaluated by psychologists, who then present their findings in court. The results of the forensic psychology tests they administer can have a big effect on the outcome of a trial. They may determine whether someone should have custody of a child, or whether defendants can understand the legal system and aid in their own defense, a critical component of evaluating whether a person is competent to stand trial. Others determine whether a defendant was sane when they committed a crime, or whether they are eligible for the death penalty.

So Ungvarsky, who practices in Virginia, Maryland, and DC, tries to make sure the psychologists use the right tools to get at those questions. “I always ask what test they’re doing and why,” he says. For example, he’ll make sure the exams were developed by testing populations similar to his clients–a way of trying to avoid racial or class biases that might be inherent to the test–and that they address the specific issues he’s facing in court.

But Ungvarsky was surprised by a study published in February in the journal Psychology Science in the Public Interest by a group of lawyers and psychologists which found that many tests used in courtrooms are considered scientifically unreliable; in the headline, the journal dubbed them “junk science.” “This study shows that the rest of us need to catch up fast,” says Ungvarsky, “because the stakes are high.”

The team, led by Tess Neal, a psychology professor at Arizona State University, reviewed 364 assessment exams commonly used in the legal system, and came to the conclusion that a third of them were not “generally accepted in the field” of forensic mental health, which means that psychologists who had reviewed those tools didn’t think they were scientifically reliable or accurate. About 10 percent of those tools had not been subjected to empirical testing at all. The study also showed that it was rare for lawyers to challenge the scientific validity of those tests in court, and even rarer for them to succeed, meaning evidence from those questionable tests was used at trial.

“Each year, hundreds of thousands of psychological assessments are conducted and used in court to help judges make legal decisions that profoundly affect people’s lives,” Neal said when she presented the findings at this year’s American Association for the Advancement of Science meeting in February. “There are some excellent psychologists doing scientifically informed, evidence-based assessments. However, many others are not.”

As a clinical psychologist, Neal had wanted to work in courts from the start of her career, and had long believed that psychologists had the right tools to help the legal system answer crucial questions about factors like sanity and competency. But as she worked her way through her doctoral training, she started to have doubts. “I was always kind of questioning and worried about some of what people were doing,” she says.

In a survey she conducted during her graduate residency, Neal asked practicing forensic psychologists which tools they had used during their two most recent evaluations. The responses varied widely. In many cases, Neal and her coauthor, Thomas Grisso, had never heard of the tests psychologists mentioned. “This finding surprised–and worried–us both,” she says.

Those findings led to the current paper. Of the tests Neal and her coauthors reviewed, they found that only 67 percent were considered “generally accepted,” meaning that clinical psychologists believed they were effective tools. As a measure of acceptance Neal mainly used a resource called the Mental Measurements Yearbook (MMY), in which volunteer reviewers assess the scientific merits of tests. The MMY asks thousands of psychologists to research how the tests were created, what their technical characteristics are, and then give their professional opinions of the tests. Of the tests commonly used in court, Neal found that one third hadn’t been reviewed at all in the yearbook or other similar resources. Of those that had been reviewed, only 40 percent had favorable ratings, meaning that the volunteer reviewers concluded that they had strong scientific underpinnings.