Yiqing Xie's personal webpage

Language Technologies Institute, CMU. +1 (518) 763-5018

Rm 6607, Gates and Hillman Centers

4902 Forbes Ave

Pittsburgh, PA 15213, USA

I am a second-year Ph.D. student at the Language Technologies Institute of Carnegie Mellon University supervised by Prof. Carolyn Rosé. I am also collaborating with Prof. Daniel Fried. Before entering CMU, I obtained my Master degree in the data mining group at the University of Illinois Urbana-Champaign supervised by Prof. Jiawei Han.

I have broad interest in NLP and data mining. My previous/ongoing research topics include NLP for code, NLG evaluation, information retrieval, information extraction and graph algorithms. My recent work conducts a data augmentation study on code translation, which gives significant performance improvement and reveals what the models learn in code translation.

Previously, I completed the Bachelor study in computer science and mathematics in Hong Kong University of Science and Technology, where I received the Academic Achievement Medal.


Oct 9, 2023 One paper on code translation got accepted to EMNLP 2023!
Apr 4, 2023 One paper on unsupervised dense retrieval got accepted to SIGIR 2023!
Sep 23, 2021 I am honored to receive the Siebel Scholar Award, Class of 2022! :sparkles:


Carnegie Mellon University 2022 - Present
Ph.D. in Language and Information Technology
Research focus: NLP for code
Advisor: Carolyn Rosé
University of Illinois at Urbana-Champaign 2020 - 2022
Master of Science in Computer Science (GPA: 4.0/4.0)
Research focus: information extraction, graph-based machine learning
Advisor: Jiawei Han
Hong Kong University of Science and Technology 2016 - 2020
B. Sc. in Computer Science and double major in Mathematics (GPA: 3.9/4.3)
Research focus: text mining, graph-based machine learning
Advisor: Raymond Chi-Wing Wong
University of Illinois at Urbana-Champaign 2019
Exchange student in the Department of Computer Science
Stanford University 2018
Exchange student in International Honor Program

Work experience

Microsoft Research Redmond 2023.06 - 2023.08
Research Intern, Health Futures
Work on evaluation of medical text generation
Advisor: Sheng Zhang, Hao Cheng
Microsoft Research Redmond 2022.05 - 2022.08
Research Intern, Productivity and Intelligence group
Work on pre-trained language models for better text sequence embeddings
Advisor: Chenyan Xiong, Payal Bajaj
Alibaba DAMO Academy 2020.07 - 2021.01
Research Intern, Data Analytics and Intelligence Lab
Work on few-shot interaction recommendation under multiple scenarios in Taobao
Advisor: Yaliang Li, Bolin Ding

Honors and Awards

Siebel Scholar, class of 2022 2021-2022
Hong Kong University of Science and Technology Academic Achievement Medal (top 1%) 2020
Hong Kong Special Administrative Region Government Scholarship Fund - Reaching Out Award 2018
Hong Kong University of Science and Technology's Scholarship for Continuing Undergraduate Students 2017-2019
Dean’s List, Hong Kong University of Science and Technology Three times, 2017-2019
Silver medal of China Girls Math Olympiad 2015
Second prize of National Olympiad in Mathematics, Guangdong Province, China 2015

Additional Information

Conference Reviews: EMNLP 2023, ACL 2023, TKDE 2023, COLING 2022, AACL 2022
Teaching Assistant: CS412: Introduction to Data Mining UIUC, Spring 2022
Teaching Assistant: COMP 2012: Object-Oriented Programming and Data Structures HKUST, Fall 2018
Teaching Assistant: COMP 1022P: Introduction to Java Programming HKUST, Fall 2018

Selected publications

For the completed list of publications, check here
  1. EMNLP Findings
    Data Augmentation for Code Translation with Comparable Corpora and Multiple References
    Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn Rose
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings, 2023)
  2. SIGIR
    Unsupervised Dense Retrieval Training with Web Anchors
    Yiqing Xie, Xiao Liu, and Chenyan Xiong
    In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, 2023)
  3. ACL Findings
    Eider: Evidence-enhanced Document-level Relation Extraction
    Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, and Jiawei Han
    In Findings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL Findings, 2022)
  4. WWW
    KoMen: Domain Knowledge-Guided Few-Shot Interaction Recommendation on Multiplex Networks
    Yiqing Xie, Zhen Wang, Carl Yang, Yaliang Li, Hongbo Deng, Bolin Ding, and Jiawei Han
    In Proceedings of the Web Conference (WWW, 2022)
  5. IJCAI
    When Do GNNs Work: Understanding and Improving Neighborhood Aggregation
    Yiqing Xie*, Sha Li*, Carl Yang, Raymond Chi-Wing Wong, and Jiawei Han
    In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI, 2020)
  6. WWW
    Guiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-Expansion
    Jiaxin Huang*, Yiqing Xie*, Yu Meng, Jiaming Shen, Yunyi Zhang, and Jiawei Han
    In Proceedings of The Web Conference (WWW, 2020)