By: Justine Brooks
4 Jun, 2025
Ten ambitious research projects have been announced as part of the AI Safety Catalyst Grant program, launched through the new Canadian AI Safety Institute (CAISI) Research Program at CIFAR, which is funded by the Government of Canada. Each project will receive $100,000 for one year and support up to two postdoctoral researchers. The CIFAR AI Safety Postdoctoral Fellows will not only receive funding and research support, but they will also shape a burgeoning community of early-career researchers working on AI safety, building Canada’s next generation of talent in this field.
“As AI continues to shape how we live and work, it’s more important than ever to invest in homegrown talent to ensure this technology works for us, not the other way around. Canada is leading the effort to make AI safe, trustworthy and grounded in human values. That’s where the Catalyst Grants come in. These investments support world-class Canadian researchers who are tackling some of the most pressing challenges in AI, including safety and ethics, accountability, and real-world impact. Their work is essential to making sure AI reflects our values and serves the public good. By supporting Canadian innovation and talent, we are not only reinforcing our leadership in responsible AI but also building a more productive, people-focused economy and helping Canada lead the G7 in growth, trust and innovation.”
— The Honourable Evan Solomon, Minister of Artificial Intelligence and Digital Innovation and Minister responsible for the Federal Economic Development Agency for Southern Ontario
“The AI Safety Catalyst Grants represent a critical step forward in building a robust national ecosystem for AI safety research in Canada. By supporting emerging scholars and high-potential ideas, we’re fostering the community and capacity needed to ensure AI technologies serve the public good and are aligned with Canadian values of transparency and safety, now and into the future. In particular, we are excited about the breadth of projects that were funded, which will tackle topics ranging from misinformation to safety in AI applications to scientific discovery.”
— Catherine Régis and Nicolas Papernot, co-directors of the CAISI Research Program at CIFAR.
CIPHER: Countering influence through pattern highlighting and evolving responses
Matthew Taylor (Amii, University of Alberta), Brian McQuinn (University of Regina)
The rise in misinformation spurred by foreign interference in domestic politics has deleterious effects on the information ecosystem. Canada CIFAR AI Chair Matthew Taylor and Associate Professor at the University of Regina Brian McQuinn are developing innovative human-in-the-loop techniques, combining multi-modal AI with expert human input, to develop an innovative tool to detect foreign interference. Once trained, the CIPHER model will be deployed through a global network of civil society organizations to empower them to combat misinformation.
Adversarial robustness in knowledge graphs
Ebrahim Bagheri (University of Toronto), Jian Tang (Mila, HEC Montréal & McGill University) and Benjamin Fung (Mila, McGill University)
The introduction of false or misleading information into knowledge graphs–the models that power search agents and conversational agents– has serious implications for AI safety, as it allows for misinformation to become embedded into models and spread widely. University of Toronto Professor Ebrahim Bagheri, Canada CIFAR AI Chair Jian Tang and McGill University Professor Benjamin Fung will design machine learning defenses to detect and mitigate adversarial modifications in knowledge graphs. By designing scalable adversarial training and robustness evaluation methodologies, their research will allow for practical deployment of safer knowledge graphs in the real world.
Sampling latent explanations from LLMs for safe and interpretable reasoning
Yoshua Bengio (Mila, Université de Montréal)
Ensuring that LLMs produce trustworthy and interpretable results is a major goal of AI safety researchers. Canada CIFAR AI Chair Yoshua Bengio will develop more trustworthy explanations of LLMs by deploying generative flow networks in a novel way. His focus is to train AI to explain what humans say, by looking at the hidden reasons behind AI decisions and evaluating their accuracy to disentangle the underlying causes behind what AI generates. Ultimately, this project aims to develop a monitoring guardrail for AI agents that can lead to safer AI deployment across many applications.
On the safe use of diffusion-based foundation models
Mijung Park (Amii, University of British Columbia)
As generative foundation models are used in an increasing number of realms, concerns about privacy have accompanied their spread. Canada CIFAR AI Chair Mijung Park will address safety concerns related to diffusion models by using computationally-efficient and utility-preserving techniques. The project focuses on two important areas: not-safe-for-work (NSFW) content generation, and data privacy/memorization– reducing risks of models memorizing private information, like social security numbers, from training datasets. By developing techniques for removing problematic datapoints, they will aid in developing safer, privacy-preserving foundation models.
Advancing AI alignment through debate and shared normative reasoning
Gillian Hadfield (Vector Institute, Johns Hopkins University, University of Toronto [on leave])
Aligning AI systems with human values is one of the key challenges of AI safety. Canada CIFAR AI Chair Gillian Hadfield will draw on the insights from economics, cultural evolution, cognitive science and political science to take a novel approach to the challenge of alignment. Using a debate framework, this project will assess and improve the normative reasoning skills of AI agents in a multi-agent reinforcement learning setting. The approach takes into account the pluralistic, heterogenous nature of human values and the recognition that normative institutions have developed in order to reconcile competing interests and preferences in ways that can address the challenge of alignment, and allow for the integrating of AI agents into human normative systems.
Adversarial robustness of LLM safety
Gauthier Gidel (Mila, Université de Montréal)
Assessing the vulnerabilities of LLMs has become a key area of AI safety research. Canada CIFAR AI Chair Gauthier Gidel proposes a novel, more efficient and automated way of finding vulnerabilities in LLMs. By using optimization and borrowing methods from image-based adversarial attacks, the project aims to provide an efficient automatic attack model. This will allow model developers to improve the evaluations and training of LLMs, assessing their vulnerability and making them safer and more robust.
Safe autonomous chemistry labs
Alán Aspuru-Guzik (Vector Institute, University of Toronto)
Self-driving laboratories have the potential to revolutionize science, yet without proper guardrails, there are safety risks. Canada CIFAR AI Chair Alán Aspuru-Guzik is developing a safety architecture for self-driving chemistry laboratories that draws inspiration from the aerospace industry. The safety framework will have three pillars: a physical black box device (similar to an airplane black box); multi-agent safety oversight systems; and the development of a digital twin to monitor environmental and laboratory conditions. Through these three pillars, Aspuru-Guzik aims to establish wide-spread safety benchmarks.
Safety assurance and engineering for multimodal foundation model-enabled AI systems
Foutse Khomh (Mila, Polytechnique Montréal), Lei Ma & Randy Goebel (Amii, University of Alberta)
Multi-modal foundation models are increasingly being deployed in the real world in a range of domains. Yet despite their importance, existing safety assurance approaches are not adequate for the complexity of multi-modal models. Canada CIFAR AI Chairs Foutse Khomh and Lei Ma along with University of Alberta Professor Randy Goebel are developing end-to-end safety assurance techniques for multi-modal foundation models in several key areas of application: robotics, software coding and autonomous driving. They will develop benchmarks, testing and evaluation frameworks with the potential to improve the safety of foundation models in the real world.
Maintaining meaningful control: Navigating agency and oversight in AI-assisted coding
Jackie Chi Kit Cheung (Mila, McGill University), Jin Guo (McGill University)
AI is increasingly being adopted by software engineers to generate, edit and debug code. Canada CIFAR AI Chair Jackie Chi Kit Cheung and Jin Guo will develop a safety framework for software engineers to understand and control AI-supported coding systems. Their methodology entails gathering practitioner insights, co-designing interfaces and empirical testing. By incorporating human-computer interaction considerations, they aim to provide engineers with more control and insight into the operations of AI-supported coding systems.
Formalizing constraints for assessing and mitigating agentic risk
Sheila McIlraith (Vector Institute, University of Toronto)
As AI agents are increasingly deployed in organizations in semi-autonomous fashions, concerns about the risks have accompanied their use. Canada CIFAR AI Chair Sheila McIlraith will develop concrete tools for a technical safety solution, combining approaches like context-specific evaluation, reward modeling and alignment. This project focuses on the use of Desired Behavior Specifications, which are encoded in representations in order to derive rules interpretible by humans – such as designing a system that can extract a set of formal rules from a training manual. Ultimately, by developing a distributed governance model to mitigate the risks of agentic AI, the project aims to further responsible AI deployment in industry.