Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Is Natural Language Processing Ready to Take on Legal Hearings? | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Is Natural Language Processing Ready to Take on Legal Hearings?

Date
May 03, 2021
Topics
Natural Language Processing
Machine Learning

AI can help us read tens of thousands of case records within minutes, but some key Natural Language Processing challenges remain.

Every year, California holds thousands of parole hearings for eligible prisoners. At the epicenter of America’s mass incarceration crisis — the decision about whether to release a prisoner who has served the minimum required sentence comes down to two people: a parole commissioner and a deputy. During a three-hour hearing, they review the life story of a parole candidate, briefly deliberate, and then decide whether or not to grant parole. If they choose to deny, the candidate must wait a period of up to 15 years until re-appearing before the Board of Parole Hearings.

In each of those hearings, a 150-page transcript of the entire conversation is produced for the government and public to review. And most likely, that transcript will never be read. In 2019 alone, the California Board of Parole Hearings held 6,061 hearings and granted parole in 1,181 cases. For a process of this scale, there isn’t much time to review cases to ensure consistency across parole decisions. The governor’s office and parole review unit are tasked with checking parole decisions, but they lack the resources to read every transcript, so as a matter of practicality, they generally only read transcripts for parole approvals. If parole is denied, unless an appellate attorney or another influential stakeholder pushes for a review, the transcript is usually just archived.

Machine learning opens the opportunity to devise a new approach: What if we could “read” thousands of hearing transcripts within minutes, writing out the most important factors for each case? At a glance we would know when a parolee’s last disciplinary infraction was, for example, or whether the prisoner participated in rehabilitation programming. We could then get a picture of how the parole process operates at scale, judge whether it is fair or not, and identify individual cases that appear inconsistent within it. With the knowledge gleaned, we could push for systemic changes where necessary and identify and rectify potential errors in individual cases directly. This approach would center on human discretionary judgment and use technology to ensure transparency and consistency.

We call this the “Recon Approach” and believe it has applications well beyond parole. For example, the approach could be adapted for use in the Social Security Administration, where administrative law judges must decide whether an unemployment claim is valid. It might also be brought to bear in immigration processes, where a single officer must determine whether or not to grant asylum. In a human-led legal decision-making process, machine learning can take the role of making visible mountains of case records — records that would otherwise be boxed up on shelves in dusty archives. We outline this role in a paper forthcoming in the Berkeley Technology Law Journal titled “The Recon Approach: A New Direction for Machine Learning in Criminal Law” (published in The Berkeley Technology Law Journal). Our team includes Stanford Professor of Computer Science and Electrical Engineering Nick McKeown, University of Oregon Law Professor Kristen Bell, Stanford Professor of Computer Science and Linguistics Christopher Manning (a Stanford HAI associate director), and Stanford PhD students Jenny Hong and Catalin Voss, with support from Stanford HAI.

New Challenges for Natural Language Processing

Our vision requires a different flavor of Natural Language Processing (NLP) than what is commonly used today. Massive language models like BERT and GPT-3 have shown dramatic performance improvements across a large variety of NLP tasks in the last few years. However, even these advanced models struggle with the kinds of complex information aggregation tasks that we need to tackle in order to make legal records accessible. There are three main reasons for these challenges that, we believe, deserve the focus of the NLP community. If solved, they will open up many new ways to apply NLP to the law.

1. We need models that can process longer text.

Most existing models have been applied to short text passages on the order of 500-1,000 words. Parole hearing transcripts average 10,000 words. Written decision records in the Social Security Administration are longer than 3,000 words. Asylum case records are frequently longer than 15,000 words.

2. We need to move beyond Named Entity Recognition.

In order to identify a sub-region of a large piece of text where the model can look for the answer to a given question, existing information extraction systems typically rely on Named Entity Recognition (NER). NER spots all instances of entities such as companies, well-known individuals, or other concepts in a long piece of text. This approach works well for the kinds of questions we ask Siri, but parole hearings are not Wikipedia articles. For many of our extraction challenges, there is a single named entity — the parole candidate — about whom we are attempting to answer a large number of questions. The answers to those questions are spread across many sections, so even if we can identify the relevant named entities, we need to piece together information from various places in an unstructured hearing.

3. Existing models need to improve multi-step reasoning.

Consider the question: “When was the last disciplinary infraction this parole candidate incurred (if any)?” To answer this question, a human annotator skims the transcript to find whether any write-ups for misconduct are mentioned, then finds the dates corresponding to these, and then identifies the most recent one. This kind of multi-hop reasoning task remains challenging for NLP today.

We believe that if we can build NLP models that can consume long text at once and “region isolation techniques” beyond NER that can isolate the most relevant part of a document to answer a given question, we will make considerable progress toward tackling the first two challenges. Some promising approaches, such as skimming neural networks, have been proposed in the literature, but more work is required to see if these can be helpful for practical information aggregation.

A Pilot in California

Project Recon, a collaboration between computer scientists and legal scholars at Stanford and the University of Oregon, aims to pilot a machine learning system that reads case records for the California parole hearing system. We have obtained a dataset of over 35,000 parole hearing transcripts from the State of California. Last year, we won a lawsuit against the California Department of Corrections and Rehabilitation (CDCR) seeking to obtain race and attorney representation data for the prisoners mentioned in the hearings. We look forward to tackling the many technical challenges that lie before us.

We invite researchers in NLP in computational law to join us on this journey and bring your expertise to the table.

Catalin Voss is a PhD candidate in Artificial Intelligence at Stanford University, and Jenny Hong is a PhD candidate in Management Science and Engineering.

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Catalin Voss and Jenny Hong
Related
  • How Large Language Models Will Transform Science, Society, and AI
    Alex Tamkin and Deep Ganguli
    Feb 05
    news

    Scholars in computer science, linguistics, and philosophy explore the pains and promises of GPT-3.

Related News

Digital Twins Offer Insights into Brains Struggling with Math — and Hope for Students
Andrew Myers
Jun 06, 2025
News

Researchers used artificial intelligence to analyze the brain scans of students solving math problems, offering the first-ever peek into the neuroscience of math disabilities.

News

Digital Twins Offer Insights into Brains Struggling with Math — and Hope for Students

Andrew Myers
Machine LearningSciences (Social, Health, Biological, Physical)Jun 06

Researchers used artificial intelligence to analyze the brain scans of students solving math problems, offering the first-ever peek into the neuroscience of math disabilities.

Better Benchmarks for Safety-Critical AI Applications
Nikki Goth Itoi
May 27, 2025
News
Business graph digital concept

Stanford researchers investigate why models often fail in edge-case scenarios.

News
Business graph digital concept

Better Benchmarks for Safety-Critical AI Applications

Nikki Goth Itoi
Machine LearningMay 27

Stanford researchers investigate why models often fail in edge-case scenarios.

MedArena: Comparing LLMs for Medicine in the Wild
Eric Wu, Kevin Wu, James Zou
Apr 24, 2025
News

Stanford scholars leverage physicians to evaluate 11 large language models in real-world settings.

News

MedArena: Comparing LLMs for Medicine in the Wild

Eric Wu, Kevin Wu, James Zou
HealthcareNatural Language ProcessingGenerative AIApr 24

Stanford scholars leverage physicians to evaluate 11 large language models in real-world settings.