yangyang [at] virginia [dot] edu
Email / LinkedIn / GitHub / Google Scholar
Hi, I'm a PhD student in the Department of Computer Science at University of Virginia, advised by Prof. Adwait Jog.
Prior to that, I received my B.S. degree from Jilin University and was a member of ETECA Lab under Prof. Jingweijia Tan. I was also a visiting student at the State Key Laboratory of Processors at ICT, CAS, under Prof. Guangli Li.
I work on performance questions at the intersection of
Current
Understanding and addressing performance issues in secure GPU architectures, especially confidential computing, including data movement [ISCA'25] and system-level CC overheads [ISPASS'25, +].
Future
I am interested in (i) scaling GPU-based CC, (ii) making cryptographic protocols more GPU-friendly, and (iii) investigating the memory wall in GPU-based CC.
Passed my PhD Qualifying Exam.
NetCrafter got accepted by ISCA'25, see you in Tokyo!
One paper got accepted by ISPASS'25, see you in Ghent!
University of Virginia
Jilin University
University of Virginia, USA
Advisor Adwait Jog
Topics GPU, Trusted Computing (TEE, Cryptography, etc.), Memory
ICT, CAS, China
Advisor Guangli Li
Topics Compiler, Profile-Guided Optimization, LLVM
Jilin University, China
Advisor Jingweijia Tan
Topics GPU Power Modeling, MCM-GPU, Under-Voltage Reliability
Thesis The Design and Implementation of Binary Code Analysis Framework for NVIDIA GPU
Building on our ISPASS'25 results, we propose X. Our evaluation shows that X largely reduces CC overhead, achieving significant speedup (up to 5.1x).
We present NetCrafter, a set of novel techniques to manage network traffic, especially across low-bandwidth links in multi-GPU systems. NetCrafter reduces the volume of flit traffic by (i) stitching compatible, partially filled flits, (ii) trimming unnecessary flits to avoid redundant transfers, and (iii) sequencing flits so that latency-sensitive ones arrive at their destinations faster.
Confidential computing (CC) is a critical technology for protecting data in use. By leveraging encryption and virtual machine (VM) level isolation, CC allows existing code to run without modification while offering confidentiality and integrity guarantees. However, the performance impact of CC in GPU-based systems can be significant. In this work, we present a comprehensive performance evaluation of CC guided by a simple performance model. Specifically, we start by evaluating CUDA applications with a focus on data transfer, memory management, encryption, kernel launch, and kernel execution. We also present a detailed event-level analysis of these applications, revealing that the execution times of kernels that do not use unified virtual memory (UVM) are mostly unaffected, while associated kernel launch overhead and queuing time increase significantly. On the other hand, the execution time of kernels using UVM increases drastically under CC, in addition to other launch and queuing overheads. We also study CNN training and LLM inference to see how CC overhead would affect them. Finally, we consider several optimization techniques, including kernel fusion, overlapping, and quantization, towards addressing the overheads of CC.