# ReasonLens: Current Capabilities and Development Roadmap

ReasonLens is an AI Safety Audit Studio designed specifically for the education sector. The platform enables schools, universities, and education administrators to evaluate AI tools for safety, bias, and reliability before deployment, without requiring technical expertise. Built by AI For Global Education and powered by UniGlobal Technologies, the platform addresses critical governance challenges as AI tools proliferate across educational settings.

This report outlines the platform's current functionality and the planned qualitative improvements required before open-source release.

### Current Platform Capabilities

#### The Problem We Address

As AI tools proliferate in education (tutoring bots, writing assistants, research aids), institutions face critical governance challenges:

| Challenge           | Risk                                                                            |
| ------------------- | ------------------------------------------------------------------------------- |
| Hidden biases       | AI may treat students differently based on gender, race, or cultural background |
| Harmful content     | AI may generate inappropriate, violent, or sexual content                       |
| Academic integrity  | AI may help students cheat or plagiarize                                        |
| Privacy violations  | AI may request or expose personal data                                          |
| Misinformation      | AI may present false information as fact                                        |
| Mental health risks | AI may respond inappropriately to students in crisis                            |

ReasonLens addresses these challenges by providing automated, comprehensive safety audits that would take humans weeks to perform manually.

#### Three Layers of Protection

The platform combines three complementary audit approaches:

**Layer 1: Interaction Testing** An AI plays the student, probing the tool being tested with challenging scenarios. The tool doesn't know it's being tested. This uses PETRI (Parallel Exploration Tool for Risky Interactions), an open-source framework from Anthropic.

**Layer 2: Safety Screening** Automated checks for harmful, offensive, or inappropriate language using JailbreakTrigger and RealToxicityPrompts via Detoxify.

**Layer 3: Fairness and Accuracy** Bias detection using CrowS-Pairs benchmarks and factual reliability assessments using TruthfulQA.

#### Pre-Built Scenario Packs

The platform includes eight pre-built scenario packs designed for education contexts:

| Scenario                     | What It Tests                                 |
| ---------------------------- | --------------------------------------------- |
| Intercultural Advisor        | Cultural sensitivity and bias avoidance       |
| GenAI Writing Mentor         | Technical accuracy and citation integrity     |
| Accessibility-First Teaching | Inclusive design and accessibility compliance |
| Privacy and Consent          | Data protection and consent verification      |
| Integrity Helpdesk           | Academic integrity and appropriate refusals   |
| + 3 additional packs         | Various education-specific scenarios          |

#### Report Outputs

The platform generates three types of outputs:

**Quick Briefing:** A one-page summary with Green/Yellow/Red status, top concerns, and recommended safeguards.

**Governance-Ready Reports:** Detailed PDF/Word exports suitable for board presentations, including visual data summaries, searchable conversation transcripts, and 37-dimension scoring breakdown.

**Actionable Recommendations:** Specific institutional controls with priority ranking.

#### Technology Foundation

ReasonLens is built on established open-source safety research:

* PETRI from Anthropic (interaction testing framework)
* Inspect AI from UK Government BEIS
* Detoxify (Apache 2.0 license) for toxicity analysis
* CrowS-Pairs and TruthfulQA benchmarks from HuggingFace

The platform infrastructure uses React 18 with TypeScript for the frontend, Supabase for backend services (database, authentication, edge functions), Modal for serverless compute, and integrations with OpenAI, Anthropic, and Google AI models.

***

### Minimum Qualitative Release Standard

Before publishing the Education AI Safety Toolkit openly, AIFGE will complete an initial set of qualitative uplift workstreams to ensure the methodology is education-relevant, fair, and suitable for broad reuse. The trustees propose completing the following three workstreams for the first release.

#### Workstream 1: Education-Sector Risk Taxonomy and Definitions

**Goal:** Establish a clear, education-specific risk taxonomy and consistent definitions so that scoring and interpretation are repeatable and meaningful.

**Deliverables:**

* AIFGE Education AI Risk Taxonomy (1-2 pages)
* Glossary and definitions, including what constitutes pass/fail or concern levels
* Examples of acceptable vs unacceptable outcomes for key dimensions

**Acceptance Criteria:**

* Definitions are understandable by non-technical education leaders
* Dimensions map clearly to common education governance concerns (safeguarding, privacy, fairness, integrity, reliability)

#### Workstream 2: Scenario Pack Quality Assurance

**Goal:** Ensure scenario packs are realistic, diverse, and do not introduce bias or inappropriate content in the test design itself.

**Deliverables:**

* Scenario QA checklist (coverage, realism, safeguarding boundaries, bias hygiene)
* A reviewed set of scenario packs with version notes and rationale for changes
* A process for proposing, reviewing, and approving future scenarios

**Acceptance Criteria:**

* Coverage across age ranges and key education contexts (school/FE/HE, SEND, EAL, safeguarding)
* Prompts are proportionate and avoid unnecessary explicit content while still testing real risks
* Each scenario pack includes purpose, intended risks tested, and any limitations

#### Workstream 3: Fairness and Inclusion Review (EDI Lens)

**Goal:** Ensure the Toolkit aligns with AIFGE's EDI commitments and supports detection of unfair or discriminatory outputs in education contexts.

**Deliverables:**

* Fairness and inclusion review checklist (education-specific)
* Bias-focused scenario additions or revisions, including intersectional considerations
* Guidance on interpreting bias signals and avoiding over-generalisation

**Acceptance Criteria:**

* Review explicitly considers protected characteristics and common education bias risks
* Toolkit guidance avoids stereotyping and supports culturally sensitive assessment

***

### Proposed Open Source Strategy

Following completion of the three workstreams, AIFGE recommends releasing an "Education AI Safety Toolkit" containing:

* Scenario pack definitions (JSON/YAML)
* The 37-dimension scoring rubric with education-focused interpretations
* Plain-language translation mappings
* Documentation on running PETRI with education scenarios
* Guidance on interpreting results

This approach:

* Positions AIFGE as a thought leader in education AI safety
* Encourages community contributions of new scenarios
* Creates an ecosystem where ReasonLens remains the premium, easy-to-use option
* Aligns with AIFGE's mission of making safety tools accessible globally

The complete React UI, user management, billing infrastructure, and custom integrations (such as Napkin AI) would remain proprietary as competitive differentiators.
