Zhengyuan Yang

Cited by

	All	Since 2019
Citations	4844	4830
h-index	29	29
i10-index	36	36

2000

1000

500

1500

20192020202120222023202461 130 303 629 1710 1991

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Lijuan WangMicrosoft GenAIVerified email at microsoft.com
Jianfeng WangMicrosoftVerified email at microsoft.com
Zicheng LiuMicrosoftVerified email at microsoft.com
Linjie (Lindsey) LiSenior Researcher, MicrosoftVerified email at microsoft.com
Jiebo LuoAlbert Arendt Hopeman Professor of Engineering, University of RochesterVerified email at cs.rochester.edu
Kevin LinMicrosoftVerified email at microsoft.com
Zhe GanResearch Scientist, AppleVerified email at apple.com
Ce LiuAI Research Scientist Director, Meta GenAI; IEEE FellowVerified email at meta.com
Liwei WangAssistant Professor at The Chinese University of Hong KongVerified email at cse.cuhk.edu.hk
Jinsong SuXiamen UniversityVerified email at xmu.edu.cn
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondVerified email at microsoft.com
Jiajun Deng (邓家俊)University of Adelaide, Australian Institute for Machine LearningVerified email at adelaide.edu.au
Yuncheng LiGoogleVerified email at google.com
Chenglei SiStanford UniversityVerified email at stanford.edu
Boqing GongResearch Scientist, GoogleVerified email at google.com

Zhengyuan Yang

Researcher, Microsoft

Verified email at microsoft.com - Homepage

Computer Vision Multimedia Vision + Language Multimodal


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Git: A generative image-to-text transformer for vision and language J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu, C Liu, L Wang Transactions on Machine Learning Research (TMLR), 2022	419	2022
A fast and accurate one-stage approach to visual grounding Z Yang, B Gong, L Wang, W Huang, D Yu, J Luo IEEE International Conference on Computer Vision (ICCV), 4683-4693, 2019	345	2019
An empirical study of gpt-3 for few-shot knowledge-based vqa Z Yang, Z Gan, J Wang, X Hu, Y Lu, Z Liu, L Wang Proceedings of the AAAI conference on artificial intelligence 36 (3), 3081-3089, 2022	335	2022
The dawn of lmms: Preliminary explorations with gpt-4v (ision) Z Yang, L Li, K Lin, J Wang, CC Lin, Z Liu, L Wang arXiv preprint arXiv:2309.17421 9 (1), 1, 2023	332	2023
TransVG: End-to-End Visual Grounding with Transformers J Deng, Z Yang, T Chen, W Zhou, H Li IEEE International Conference on Computer Vision (ICCV), 2021	281	2021
Scaling up vision-language pre-training for image captioning X Hu, Z Gan, J Wang, Z Yang, Z Liu, Y Lu, L Wang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	242	2022
Mm-react: Prompting chatgpt for multimodal reasoning and action Z Yang, L Li, J Wang, K Lin, E Azarnasab, F Ahmed, Z Liu, C Liu, M Zeng, ... arXiv preprint arXiv:2303.11381, 2023	240	2023
Improving One-stage Visual Grounding by Recursive Sub-query Construction Z Yang, T Chen, L Wang, J Luo European Conference on Computer Vision (ECCV), 2020	211	2020
Mm-vet: Evaluating large multimodal models for integrated capabilities W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu, X Wang, L Wang The 41st International Conference on Machine Learning (ICML), 2024	208	2024
Prompting gpt-3 to be reliable C Si, Z Gan, Z Yang, S Wang, J Wang, J Boyd-Graber, L Wang International Conference on Learning Representations (ICLR 23), 2022	186	2022
End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions Z Yang, Y Zhang, J Yu, J Cai, J Luo 2018 24th international conference on pattern recognition (ICPR), 2289-2294, 2018	186	2018
Action recognition with spatio–temporal visual attention on skeleton image sequences Z Yang, Y Li, J Yang, J Luo IEEE Transactions on Circuits and Systems for Video Technology 29 (8), 2405-2415, 2018	184	2018
Attentive relational networks for mapping images to scene graphs M Qi, W Li, Z Yang, Y Wang, J Luo IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3957-3966, 2019	170	2019
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption Z Yang, Y Lu, J Wang, X Yin, D Florencio, L Wang, C Zhang, L Zhang, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021	155	2021
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou, J Luo Annual Meeting of the Association for Computational Linguistics (ACL), 2020	140	2020
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Z Yang, Z Gan, J Wang, X Hu, F Ahmed, Z Liu, Y Lu, L Wang European Conference on Computer Vision (ECCV), 521--539, 2022	130*	2022
Multimodal foundation models: From specialists to general-purpose assistants C Li, Z Gan, Z Yang, J Yang, L Li, L Wang, J Gao Foundations and Trends® in Computer Graphics and Vision 16 (1-2), 1-214, 2024	106	2024
Promptcap: Prompt-guided image captioning for vqa with gpt-3 Y Hu, H Hua, Z Yang, W Shi, NA Smith, J Luo Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	90*	2023
SAT: 2D Semantics Assisted Training for 3D Visual Grounding Z Yang, S Zhang, L Wang, J Luo IEEE International Conference on Computer Vision (ICCV), 2021	89	2021
ReCo: Region-Controlled Text-to-Image Generation Z Yang, J Wang, Z Gan, L Li, K Lin, C Wu, N Duan, Z Liu, C Liu, M Zeng, ... IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023	82	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors