We generate over 300,000 user queries that trade-off value-based principles in model specifications. Under these scenarios, we observe distinct value prioritization and behavior patterns in frontier models from Anthropic, OpenAI, Google DeepMind and xAI. Our experiments also uncovered thousands of cases of direct contradictions or interpretive ambiguities within the model spec. — Read More
Paper