Dylan Hadfield-Menell

Cited by

	All	Since 2019
Citations	2880	2574
h-index	25	24
i10-index	37	36

840

420

210

630

201520162017201820192020202120222023202410 19 79 167 196 334 407 454 823 350

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMindVerified email at berkeley.edu
Stuart RussellProfessor of Computer Science, University of California, BerkeleyVerified email at cs.berkeley.edu
Pieter AbbeelUC Berkeley | CovariantVerified email at cs.berkeley.edu
Stephen CasperPhD student, MITVerified email at mit.edu
Gillian HadfieldProfessor of Law and Professor of Strategic Management, University of Toronto; Faculty AffiliateVerified email at utoronto.ca
Smitha MilliCornell TechVerified email at berkeley.edu
Rohan ChitnisMeta AI, MIT, UC BerkeleyVerified email at fb.com
Thomas L. GriffithsProfessor of Psychology and Computer Science, Princeton UniversityVerified email at princeton.edu
Andreas HauptMassachusetts Institute of TechnologyVerified email at mit.edu
Jaime Fernández FisacAssistant Professor of Electrical and Computer Engineering, Princeton UniversityVerified email at princeton.edu
Marc KhouryUniversity of California, BerkeleyVerified email at eecs.berkeley.edu
Sandy H HuangResearch Scientist, DeepMindVerified email at berkeley.edu
McKane AndrusUW HCDEVerified email at uw.edu
Siddharth SrivastavaArizona State UniversityVerified email at asu.edu
Simon ZhuangVerified email at berkeley.edu
Gokul SwamyPhD Candidate, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Micah CarrollPhD student, UC BerkeleyVerified email at berkeley.edu
Alex X. LeeResearch Scientist, Google DeepMindVerified email at google.com
Eric TzengUC BerkeleyVerified email at eecs.berkeley.edu
Gabriel KreimanProfessor, Harvard Medical School and Children's HospitalVerified email at tch.harvard.edu

Dylan Hadfield-Menell

Massachusetts Institute of Technology

Verified email at csail.mit.edu - Homepage

Artificial Intelligence


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Cooperative Inverse Reinforcement Learning D Hadfield-Menell, SJ Russell, P Abbeel, A Dragan Advances in Neural Information Processing Systems 29, 2016	706	2016
Inverse Reward Design D Hadfield-Menell, S Milli, P Abbeel, SJ Russell, A Dragan Advances in Neural Information Processing Systems 30, 2017	407	2017
The off-switch game D Hadfield-Menell, A Dragan, P Abbeel, S Russell Proceedings of the Twenty-Sixth International Joint Conference on Artificial …, 2017	151	2017
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research, 2023	141	2023
On the geometry of adversarial examples M Khoury, D Hadfield-Menell arXiv preprint arXiv:1811.00525, 2018	104*	2018
Toward Transparent AI: A survey on interpreting the inner structures of deep neural networks T Räuker, A Ho, S Casper, D Hadfield-Menell 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464-483, 2023	93	2023
Pragmatic-pedagogic value alignment JF Fisac, MA Gates, JB Hamrick, C Liu, D Hadfield-Menell, ... Robotics Research: The 18th International Symposium ISRR, 49-57, 2020	90	2020
Guided search for task and motion plans using learned heuristics R Chitnis, D Hadfield-Menell, A Gupta, S Srivastava, E Groshev, C Lin, ... 2016 IEEE International Conference on Robotics and Automation (ICRA), 447-454, 2016	79	2016
Should robots be obedient? S Milli, D Hadfield-Menell, A Dragan, S Russell Proceedings of the 26th International Joint Conference on Artificial …, 2017	71	2017
Incomplete contracting and AI alignment D Hadfield-Menell, GK Hadfield Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417-422, 2019	67	2019
What are you optimizing for? aligning recommender systems with human values J Stray, I Vendrov, J Nixon, S Adler, D Hadfield-Menell arXiv preprint arXiv:2107.10939, 2021	64	2021
Conservative Agency via Attainable Utility Preservation AM Turner, D Hadfield-Menell, P Tadepalli Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020	58	2020
Expressive robot motion timing A Zhou, D Hadfield-Menell, A Nagabandi, AD Dragan Proceedings of the 2017 ACM/IEEE international conference on human-robot …, 2017	58	2017
Consequences of Misaligned AI S Zhuang, D Hadfield-Menell Advances in Neural Information Processing Systems 33, 15763-15773, 2020	57	2020
On the utility of model learning in hri R Choudhury, G Swamy, D Hadfield-Menell, AD Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	56	2019
Modular task and motion planning in belief space D Hadfield-Menell, E Groshev, R Chitnis, P Abbeel 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2015	54	2015
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	47	2019
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ... Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022	45*	2022
Unifying scene registration and trajectory optimization for learning from demonstrations with application to manipulation of deformable objects AX Lee, SH Huang, D Hadfield-Menell, E Tzeng, P Abbeel 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2014	41	2014
Explore, establish, exploit: Red teaming language models from scratch S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell arXiv preprint arXiv:2306.09442, 2023	40	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors