Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models | 홍화정 교수 연구실 | 한국과학기술원 산업디자인학과

홍화정 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

article|

green

·인용수 0

·2026

Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

Sieun S. Kim, Yeeun Jo, Sungmin Na, Hyunseung Lim, Eunchae Lee, Yu Min Choi, Soohyun Cho, Hwajung Hong

ArXiv.org

초록

Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss implications for designing participatory red-teaming that prioritizes both the ethical treatment and empowerment of stigmatized groups.

키워드

SafeguardingEmpowermentAgency (philosophy)Citizen journalismSense of agencyAdversarial systemMotivated reasoning

타입

article

IF / 인용수

- / 0

원문

http://arxiv.org/abs/2602.19124

게재 연도

2026

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)