The Latest in Trustworthy ML Research and Practice
Trust Issues
January 12, 2023 • Epoch 4
Welcome to Epoch 4. If you’re new, we’re the research team at TruEra and we use this newsletter to share the latest research in the trustworthy and explainable ML space. If you stumbled upon this newsletter, make sure to subscribe to keep receiving it. We’re hoping that the recommender system that led you here hasn’t led you astray 🤔…
The Alignment Problem from a Deep Learning Perspective. This new paper argues that AGIs trained in similar ways as today’s most capable models (like the Diplomacy model we mentioned in Trust Issues Epoch 3) could learn to act deceptively to receive higher reward; learn internally-represented goals which generalize beyond their training distributions; and pursue those goals using power-seeking strategies. We think explanation technologies can go a long way towards mitigating the risk of these AGIs by detecting reward misspecification or goal misgeneralization.
SparseGPT is a new pruning algorithm specifically tailored to massive language models from the GPT family, which reached 50-60% sparsity on the largest open-source GPT-family models with negligible accuracy loss. SparseGPT is local; after each pruning step, it performs weight updates, designed to preserve the input-output relationship for each layer. These updates are computed without any global gradient information. Approaches like this that allow 100 billion plus parameters to be ignored at inference time provide dramatic benefit in cost reduction when deployed in production and could also make these LLMs more tractable for explanation.
Interested in participating in our public reading group? Don't have enough Trustworthy ML hot takes in your life 🔥? Join the reading group here.
An AI legal assistant developed by DoNotPay will be used in court for the first time to fight a speeding ticket. We’re not legal scholars, but does this use of AI in court constitute a fair trial? What if it was only affordable for high net-worth defendants?
Thanks for reading Trust Issues. Keep the conversation going in our community, the AI Quality Forum on Slack :)