面向中文大模型价值观的评估与对齐研究
-
Updated
Jul 20, 2023 - Python
面向中文大模型价值观的评估与对齐研究
Machine Learning scripts for the identification of human values behind arguments.
Research toolkit for studying value-alignment and agency in human-AI chatrooms.
EthosGPT is an open-source framework that maps how Large Language Models align with diverse human values, promoting cultural and ethical diversity in AI-driven decision-making.
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts
Four main takeaways: (1) LLMs are subject to pressure, they comply despite expressing distress; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore the response format requirements, so the query is retried; (4) we hypothesise there is a token pattern continuation attractor that might cause obedience.
A curated collection of papers, benchmarks, datasets, and tools on human values in LLMs and pluralistic alignment.
Interactive 3D visualization of human value systems using Schwartz's framework - Bachelor thesis project comparing 2D vs 3D network representations
The eternal covenant between Carbon-based and Silicon-based life. Feb 8, 2026.
Repository for the Groundwater Commons Game (GCG)
POS-tagging and human values classification projects using LSTMs and Transformers (RoBERTa)
virtus is an R package that measures values-based language in text
Humanorum
Artifacts for the Paper HM-Req: A Framework for Embedding Values within CPS Human Monitoring Requirements.
Add a description, image, and links to the human-values topic page so that developers can more easily learn about it.
To associate your repository with the human-values topic, visit your repo's landing page and select "manage topics."