Back

AI/ML

Relevant Skills

AI/Machine Learning Analytics Data Science Data Visualization

Published Date

Posted April 23, 2026

Not the right fit for you? Sharing the opportunity with your network is a great form of advocacy!

Human Rights

Design a Unified AI Evaluation Architecture for Gender-Based Violence Court Data

Project

Remote

International Center for Advocates Against Discrimination Inc.

Apply Now

* 8 people have applied to this opportunity

My goal for this Taproot Project is to create a consolidated assessment of our evaluation infrastructure and a target architecture document that will guide how we measure AI extraction accuracy going forward.
Background. TrackGBV uses AI to extract 60+ structured fields from court sentencing decisions in gender-based violence cases. We have three separate pieces of evaluation code in our system: a legacy eval pipeline that has been dormant for about a year, a standalone comparative evaluation notebook, and a new prompt improvement pipeline recently built by an MIT GenAI Lab student team. These three pieces were built at different times by different people, and we need someone to review all of them and tell us what the unified target should look like.
The project. Review the three existing evaluation codebases, identify what each does, what overlaps, what should be retired, and what gaps remain. Then design a target evaluation architecture that unifies the valuable pieces and supports two specific integrations we need: automatic evaluation runs after each extraction, and using human-corrected data as the preferred ground truth source.
Deliverables, organized into three phases:
Phase 1 (Weeks 1-2): Codebase review and landscape mapping
- Review the three evaluation codebases (legacy eval pipeline, comparative evaluation notebook, prompt improvement pipeline)
- Document what each does, what metrics each uses, how each handles ground truth
- Produce a landscape map showing overlaps, gaps, and redundancies

Phase 2 (Weeks 3-4): Requirements and target architecture design
- Work with ICAAD to define requirements for the unified eval system, focusing on automatic post-extraction runs and human-corrected data as ground truth
- Design the target architecture: which components to retain, which to retire, what new pieces are needed
- Define the data flow between extraction, corrections, and evaluation
- Produce target architecture document with component diagrams and responsibilities

Phase 3 (Weeks 5-6): Backlog and handoff
- Translate the target architecture into a prioritized implementation backlog
- Estimate effort for each backlog item
- Document assumptions, open questions, and risks
- This backlog directly informs a follow-on Taproot project for the integration engineer who will implement the architecture

Commitment: 6-8 hours per week across 6 weeks.

Skills needed:
- Senior-level ML engineering or ML infrastructure experience
- Evaluation methodology background (metrics for structured extraction, MAE vs accuracy tradeoffs, multi-label and regression metrics)
- Ability to read and critically assess existing Python codebases (pandas, SQL, pydantic)
- Experience designing ML evaluation systems that integrate with production pipelines
- Clear technical writing for architecture documents and implementation backlogs

Note on sensitive content: This project involves working with infrastructure that processes court sentencing decisions in gender-based violence cases. The volunteer will not do case-level review, but will need a trauma-informed approach when working with the system and its data.

TrackGBV is ICAAD's flagship initiative to expose patterns of judicial bias in gender-based violence cases. Evidence from our dashboard has already contributed to policy reforms in two countries and informs judicial training programs across the Pacific.

As we scale, our AI extraction accuracy becomes increasingly important. Our current evaluation infrastructure evolved organically across multiple volunteers and student teams, and we now have three separate pieces of eval code that don't speak to each other. Without a unified view, we cannot reliably answer "is our extraction getting better or worse over time?" or "does human correction data improve our AI when fed back in?"
This volunteer project is the diagnostic step. By reviewing what exists and designing a target architecture, the volunteer enables ICAAD to:

Make an informed decision about what evaluation code to keep, retire, or build
Set a clear specification for the follow-on implementation project, avoiding scope creep and duplicated work
Close the feedback loop between human QC review and AI prompt improvement
Establish ongoing accuracy measurement as cases expand to new jurisdictions

For a volunteer with ML evaluation experience, this is architectural work with durable impact. Your assessment and design directly shape how TrackGBV measures its accuracy for years to come.

My team and I have prepared for this project by:

Providing access to all three existing evaluation codebases with context on when and why each was built
Proper Documention and READMEs to go with the code
Defining clear integration requirements (automatic post-extraction runs, human-corrected data as ground truth) so the volunteer has concrete targets to design against
Ensuring the volunteer has access to sample data, extracted output, and corrected ground truth examples for context
Committing to a review-and-sign-off process at the end of each phase, so the volunteer knows their direction is correct before moving forward

The volunteer will work directly with me (Director of Analytics and Justice Tech) for all decisions and reviews.

International Center for Advocates Against Discrimination Inc.

Reliable Responder

Location

Remote, US-NY

Website

https://www.icaad.ngo

Member Since

Oct 2021

Completed Taproot Plus Partnerships

Organization Mission

ICAAD is a human rights advocacy center that equips those most harmed by systemic inequity with the necessary tools to drive systemic change. We use our expertise to help communities and governments create a more equitable future.

Program Focus Areas

Advocacy & Human Rights

Other Needs from This Organization

See Similar Opportunities

See All

Opportunities

Project

Website development

Philanthropy & Capacity Building

The WildRoot Collective is seeking a web designer or developer to help us create a professional, user-friendly website that reflects our mission, programs, and community...

Posted

View Opportunity

The WildRoot Collection

Project

Messaging

Education

We are searching for a volunteer who can lead our development of a new and improved mission and vision statement that will communicate our purpose and vision to a variety of...

Posted August 21, 2025

View Opportunity

Heart-to-Heart

Project

Other

Community Development

As Girls in Gear continues to grow nationwide, we would like help establishing a corporate sponsorship strategy. Our goal is to obtain corporate sponsorships to support the...

Posted July 22, 2025

View Opportunity

Girls in Gear