Yufei Ding is an assistant professor in the Department of Computer Science, University of California at Santa Barbara. She received her Ph.D. in the Computer Science from North Carolina State University in 2017. Her research interest resides at the intersection of Compiler Technology and (Big) Data Analytics, with a focus on enabling High-Level Program Optimizations for data analytics and other data-intensive applications.
MZ: Tell us about yourself.
YD: I grew up in Mingguang, China, where I did my undergrad in Physics at the University of Science and Technology of China. I always liked physics, mathematics, as well as computer science. For my undergrad thesis, I was doing the Monte Carlo simulation for electron transport to approach the modeling semiconductor transport, which required extensive programming work throughout the project. It might be the first time that I realized "computer science" or "programming" become on top of my preference list. I have to say, I was struggling a bit at that time, as changing major would pose many practical challenges and it was the time to decide which major I wanted to pursue my graduate study. After considering the objective reality that I did not have a strong background in computer science, I chose a more conservation path and came to the United States as a master in Physics.
But I never stopped thinking about the path to computer science. I contacted several faculty members in the Computer Science Department during my master study, asking them for suggestions about which courses I need to take as well as the potential opportunity to work with them on some research project. Much negative feedback was received at that time, and it was understandable as I did not even know what Linux, a GPU, Java, Python, compilers, architecture were at that time. It was then that I met my Ph.D. advisor, Xipeng Shen, who is now at North Carolina State University. In our first meeting, he directly offered me an opportunity to work with him as a Ph.D. student. I have to say that he was willing to take a risk, but it was just amazing from my side. I took the opportunity and started my Ph.D. study in Computer Science in 2012. I still remember that our first project was to investigate the compilation order in JIT runtime system, which was also my course project in my grad compilers class. You can imagine how difficult that was for a student that had only taken courses on C programming and data structures. Fortunately, I was able to manage the project and got it published in ASPLOS 2014, our paper "Finding the Limit: Examining the Potential and Complexity of Compilation Scheduling for a JIT-Based Runtime System".
After I finished my Ph.D., I moved to Santa Barbara and joined UCSB as a faculty member.
MZ: There must be a lot of work as a junior faculty member. What do you do for fun when you are not working?
YD: My career path as an Assistant professor started almost at the same time as being a new mom. My spare time is mostly spent with my baby, which is a new kind of fun. Thanks to the good weather in California and the beautiful scenery in Santa Barbara, I could take my baby to the playground, swimming pool, or the beachside and enjoy our time together after work. I also like traveling but we have been avoiding travel since the baby is only 16 months now. Hopefully, the baby can soon join us in traveling around.
MZ: You mentioned that you had a background on physics as well. Do you see connections between physics and computer science research?
YD: Yes, I believe there are two key words that closely integrate my physics background and the computer science research on large-scope compiler optimizations: algorithms and abstraction. The key to compiler optimizations is essentially the design of different optimization algorithms. In physics, we learned a different set of algorithms, but the underlying principles are always similar. Then there is abstraction and modeling. Before we start to design an algorithm for some problem, we need to figure out the right abstraction, i.e., the most appropriate way to model this problem. The modeling part is the same no matter what field is the problem from. A good vision about the problem is always crucial.
MZ: How do you decide what's the next problem you are going to solve?
YD: It is always important to work on some practical questions that potentially lead to a broad impact, but we need to find a unique perspective that we have expertise on. Following this principle, I have decided to pursue large-scope compiler optimization for emerging fields like quantum computing and machine learning.
MZ: That's a very interesting angle, to look at machine learning and quantum computing from a compiler's perspective.
YD: The compiler is a tool we can utilize to mitigate the gap between algorithms/software and underlying hardware. Take quantum computing as an example, when a programmer writes a quantum program, they often assume there could be infinite number of qubits. But the truth is that not only the number of qubits is limited on current quantum devices, but also the operations that can be applied to these qubits are also restricted. It is the compiler's job to understand the hardware constraints and automatically map a quantum program to the physical devices.
MZ: It sounds like the hardware also has a critical impact on the type of optimizations you can do, and that might bring some opportunities on hardware and software co-design.
YD: Yes, I believe hardware-software co-design will be important in the future. The compiler plays a very important role here. As a compiler designer, we need to come up with good abstractions for both the algorithm and hardware. Through these unified abstraction and novel optimization algorithms, the compiler can explore the optimization space automatically and generate better algorithms for the underlying hardware.
MZ: Some of your work is about leveraging triangular inequality for optimizations. How did that work come about?
YD: Our first work on that is called Yinyang K-Means, which is about using algorithmic optimization based on the triangle inequality to eliminate unnecessary distance computations in K-Means clustering. There are actually a lot of distance-related problems in different fields. K-means is just one of them. Other examples include k-Nearest neighbor in deep learning, ICP in graph analytics, N-body simulation in computational physics. The triangle inequality can be applied in a similar way to save computations in these diverse fields, while previously they were all done by domain experts manually and often not in the optimal way due to the large optimization space. Later we proposed a domain specific language to unify these problems together, and a compilation framework to find the best optimizations automatically. That led to the PLDI 2017 paper on the triangle inequality for compiler-based strength reduction. Strength reduction is a traditional compiler optimization technique. By replacing expensive operations with equivalent but cheaper operations, it helps improve program performance. Traditional strength reduction is mostly at the level of an individual instruction or statement, but in this paper, we turn an algorithmic optimization technique, the Triangle Inequality into compiler optimization technique. The vision was built gradually. As fresh PhD students, we may be able to solve a difficult problem, but the underlying methodology and design principle are not that clear. It is usually after we read lots of papers when we get a vision of the field.
MZ: What kinds of questions do you like to see the PL community to put emphasis on for machine learning and quantum computing?
YD: Questions like: what is the right intermediate representation (IR) for these problems? what is the standard optimization procedure we should follow? And what is the role of program verification and correctness in these fields? For instance, in machine learning, you often can trade accuracy for performance improvement. But from a programming language perspective, how shall we design the IR and expose these optimizations in a more rigorous way?
MZ: What's your past experience of attending PLDI, do you have any stories to share from attending the conference with that?
YD: The programming languages field is a really nice community for young researchers. People here are very supportive, and willing to share with others about the understanding of the field and what they want to pursue in the future. When I first got the job offer from UCSB, I talked to a lot of people in the community and got many useful suggestions about how to become an independent researcher.
MZ: What kind of good suggestions did you receive to become an independent researcher?
YD: I got the suggestion that I should keep my research related to my PhD work, but expand it in a meaningful way, so that the community can easily distinguish the new contributions. That is why I'm still doing compiler optimization, but for different types of problems. I still stick to the same high-level framework, but each component is specially tailored for the new set of algorithms and the underlying hardware.
MZ: Are there any research ideas that you are your particularly favorites in the PL area?
YD: As a compiler researcher, I am interested in compiler frameworks that enable both algorithmic and hardware level optimization, as well as their synergy. I think Saman Amarasinghe from MIT has done good work in this area, with Petabricks and Halide. The question now is how to extend the idea to the emerging fields like ML and Quantum Computing.