AI safety unplugged: understanding AI risks

Exploring AI safety and risk management insights for cyber strategists

Large Language Models have hit us recently like a wave of tsunami. But are we close to building machines that can think like humans? Do we understand what is happening inside AI models? And what about the risks related to AI safety we have right now? These questions were formed during an insightful session hosted by AI House in Davos.

Reed Albergotti, Technology Editor at Semafor, steered the panel “AI Safety Unplugged.” The discussion featured David Haber, Founder & CEO of Lakera; Max Tegmark, President of the Future of Life Institute; Seraphina Goldfarb-Tarrant, Head of Safety at Cohere; and Yann LeCun, VP & Chief AI Scientist at Meta.

Risks and challenges

The panelists shed light on the challenges observed in the ongoing adoption of AI and the risks posed by its current deployments:

Massive scale deployment. David Haber and Seraphina Goldfarb-Tarrant observed that numerous companies are deploying AI solutions on a massive scale. The ease of access to unified AI interfaces allows easy and widespread interaction. Consequently, even minor operational issues or model flaws can have a broad impact, affecting millions of users across the world.

No comprehensive evaluation methods. Seraphina highlighted a significant gap: the lack of comprehensive evaluation methods for AI solutions and their models. The current focus on red teaming and penetration testing helps to uncover vulnerabilities but falls short of evaluating what is happening inside the AI models that are generating the answers we are looking for. These methods do not adequately simulate real-world scenarios or assess the quality of the outcomes generated by AI solutions.

Issues with risk awareness. David pointed out the rapid pace of the AI industry, with new capabilities swiftly rolled out to users. Yet, there remains a lack of comprehensive understanding, even among model builders, about the internal workings of these systems. They have a challenge with explaining what happens exactly inside their models. Echoing this concern, Yann LeCun noted that popular large language models are inherently unsafe and susceptible to manipulation. In his view, we cannot train them to death – we can always find a way to jail-break them.

Disinformation and deepfakes. The World Economic Forum, in their Global Risks Report 2024, identified misinformation and disinformation as top global risks for both the short and long term. Max Tegmark emphasized the urgency of addressing these risks. The current landscape is battling disinformation and grappling with deepfakes that manipulate images, voice, and video. 

The long road to general AI. Yann suggested that achieving General AI is a distant goal, as it still lags behind specialized human intelligence. Merely increasing data and computing power is insufficient, as human learning processes differ significantly. Advancing to the next level of AI will require scientific breakthroughs. Nevertheless, Yann is optimistic about AI systems becoming more prevalent in our lives as assistants and interfaces enhancing our interaction with a digital world.


The panelists provided practical recommendations for companies adopting AI solutions, offering ways to navigate the challenges discussed:

Understand our data. Seraphina emphasized the need for a deeper understanding of the complexity and dependencies within our data. Recognizing unwanted elements in training data is crucial. The quality and nature of this data significantly differentiate less advanced and more mature AI language models available on the market.

Focus on current risks. David stressed the importance of addressing the risks present in existing AI solutions rather than speculating about future scenarios with General AI. A common understanding of these risks among policymakers, researchers, businesses, and AI users is essential for effective remediation.

Accelerate AI progress. Yann explained that human intelligence works differently than currently available AI algorithms. To achieve something similar, the systems must understand the physical world, learn like humans or animals, and master memory, planning, and reasoning. However, this progression calls for entirely new architectures, and their emergence requires scientific breakthroughs. Yann cautioned against overregulation in AI research and development, which could hinder progress. 

AI as a countermeasure. Yann also suggested that better AI is the most effective countermeasure against misuse. He cited Meta’s recent success in using AI to detect and address hate speech and abusive content, with automated solutions operating in every language, resolving 95% of such cases last year.

Further reading

As highlighted by the panelists, comprehending the risks associated with AI remains a challenge for many organizations. For those interested in deepening their understanding of this topic and educating their business and technology stakeholders, the following resources are particularly valuable:

NIST AI Risk Management Framework. The NIST AI Risk Management Framework provides a thorough guide for organizations to identify and contextualize AI risks. It breaks down the nature of these risks and delineates the attributes of trustworthy AI systems. The framework is built around four key functions – Govern, Map, Measure, and Manage – and is subdivided into various categories and subcategories, offering a comprehensive structure for AI risk management.

NIST AI RMF Playbook. Complementing the framework, the NIST AI Risk Management Framework Playbook is a dynamic, evolving tool designed to keep pace with advancements in AI technology. It offers practical steps to implement the framework’s objectives, structured according to the four main functions of the AI RMF. Available for download in various formats from the NIST website, this playbook is a practical and accessible resource for professionals aiming to integrate these principles into their work.

LLM Security Playbook. Published by Lakera, the LLM Security Playbook offers a comprehensive overview of risks associated with Large Language Models (LLMs). It includes a taxonomy of LLM vulnerabilities, illustrative examples, and a summary of common best practices and tools for mitigating these risks.

Deloitte Tech Trends 2023. The Deloitte Tech Trends 2023 report provides insightful analysis of AI trust-related issues. It thoroughly explains data transparency, algorithmic explainability, and AI reliability, which are crucial to building trust in AI systems.

OWASP Top 10 for Large Language Model Applications. This OWASP project identifies the top 10 critical security risks in deploying and managing Large Language Models. It highlights the potential impact of these vulnerabilities and their ease of exploitation.

LLM AI Security and Governance Checklist. Created by the OWASP Top 10 for LLM Applications Team, this comprehensive guide is tailored for CISOs overseeing Generative AI deployment. It covers challenges, deployment strategies, a detailed checklist addressing adversarial risks, AI asset inventory, security and privacy training, governance, legal issues, regulatory compliance, and implementing LLM solutions.

OWASP Machine Learning Security Top Ten. Currently in draft form, the guidance focuses on the top 10 security issues in machine learning systems. This resource is particularly beneficial for developers and application security specialists.

Other sessions

If you are interested in other sessions hosted by AI House Davos, please refer to the AI House official program. If you are interested in other engaging sessions from the cyber perspective in Davos 2024, please look at this article. Our previous articles highlight the insights from the “AI – The Ultimate Invention” and “AI & Trust” sessions.

Navigating the AI revolution: AI trust and safety

Navigating the AI revolution: AI trust and safety

Insights into AI trust challenges, solutions and cyber impacts

Cybersecurity in the digital era: a critical discussion

Cybersecurity in the digital era: a critical discussion

Cyber insights from the Open Forum organized by the World Economic Forum

You May Also Like