People of Language Design and Implementation

An interview project in conjunction with PLDI 2019  

Todd Mytkowicz

Interview with Todd Mytkowicz

Todd Mytkowicz is a Senior Researcher at Microsoft Research Redmond. He thinks hard about abstractions and runtimes that help programmers easily express complex problems yet are sufficiently constrained such that we can build runtimes to efficiently implement those abstractions.

 

MZ: Tell us a little bit about yourself. How did you get to where you are?

TM: I was a junior undergrad at Syracuse University in computer science in 1998. That was right around the beginning of the internet boom. In that summer I got a job as one of the first employees at a startup called Bitpipe. The company was in Boston, which was close to where I grew up. I went back that summer and worked for Bitpipe and I had a blast. It was a super fun environment. The people I worked with were great and it was the heyday of the Internet boom which was a really exciting time to be in computing. I worked there for a couple years while I was still at school and then joined them full time after graduation. I wrote a lot of code for systems used by lots of people. I really enjoyed my job as a developer but found myself drawn to classes at both MIT and Harvard at night because they had these interesting topics anyone can enroll in! I took this robotics class and was like, "hey, this is super fun!" To be honest, at that point I never even thought about graduate school. It was more just I enjoyed taking classes and learning. I stayed in Boston for 2 years and ultimately, I moved to Colorado.

MZ: So you were working at this startup called Bitpipe but then decided to move to Colorado for a change. Why Colorado and did you end up going to the grad school there?

TM: I left Boston, but I was still working remotely for Bitpipe when I moved to Colorado. At that point, I just needed a change. I'd grown up my entire life in the Northeast, and I wanted to move somewhere new. I moved to Colorado, partly because my brother was living there. More importantly, I really enjoy skiing. I was a ski bum for a while.

MZ: Colorado is a great spot to be skiing.

TM: Yes, it is. Right around that time was when the startup sold, so I didn't work for a little bit. I bought myself a pickup truck, and I skied a lot. I did a bunch of races where I was racing against competitive people who were very good. I had pipe dreams of trying to get into professional skiing. I did a bunch of races and as it turns out, I wasn't as good as I thought I was! I remember I did a race at a local mountain up in Colorado. There was this one person in particular that beat me. He was ranked 25th in the world. He was not just faster, he was significantly faster! He won by seconds. Tens of seconds. It was clear, without significant training, I was never going to make it professionally. I realized, okay, this is a fun dream, but maybe grad school is more important, or maybe a better bet.

MZ: That racing experience was fun and interesting, but eventually you decided to move on to pursue a higher degree?

TM: Yes, I then applied to schools, outside of Colorado as well. Pretty much everywhere I applied to I got rejected, including the University of Colorado, which is ultimately where I got my PhD. The irony is that the person that was on the committee to accept students that year was the person, Amer Diwan, who ultimately was my advisor. He rejected me the first time!

MZ: Your application to the grad school was not accepted at first, but you eventually got in. What did you end up doing?

TM: I ended up taking classes in machine learning and cognitive sciences both at Colorado State and the University of Colorado. I worked with a professor named Mike Mozer at Boulder. That was super fun. But I didn't have the math background at the time to hack it in machine learning. I realized I needed to spend more time bulking up on my math skills. So, I started reading, just on my own, reading a lot of materials and writing a lot of code. That's how I often think and deal with problems, is writing a lot of code. I remember building neural networks back in 2003 for a homework assignment in a cognitive science class. What made me realize that I was going to focus on PL and not ML was that I realized I spent more of my time trying to make the neural network fast than I did understanding the math behind why the heck this thing was actually working. At the time, I just cared about how fast they were. I remember spending hours and hours trying to make it faster and faster, and all in Lisp no less! That's what made me realize that systems and PL were where I fit. Through a friend, I got in touch with my advisor Amer Diwan. Amer said, "Hey, let's have a six-month trial period her to see if there's some overlap in how we think and attack problems." Lo and behold, it was. I had a great time. Ultimately, he said, "Yes, you should do a PhD with me."

MZ: So that's how you joined Amer's group. How was that experience?

TM: Amer was an awesome advisor. He ultimately left Colorado and went to Google and is still there to this day. We had a blast together. During my PhD, Amer and I also worked with Peter Sweeney, who was at IBM Research. Recently Peter moved to a startup in the New York area. I also worked with Matthias Hauswirth, who was Amer's former student, then a professor at the University of Lugano. The four of us had weekly meeting about performance and ultimately, about all sorts of things. Transactional memory was hot at the time, so we started to investigate the performance of transactional memory. That led me to my thesis.

MZ: What are some of the interesting lessons you learned on performance transactional memory?

TM: I visited Peter Sweeney at IBM Research for a summer as an intern. While we were working together, thinking a little bit about transactional memory and open nesting. We started talking about the tradeoff between performance and programmability. To get a handle on the work in this area, we started running a bunch of experiments, just basically running existing benchmarks. One day, something interesting happened. I ran an experiment and I concluded that open nesting is a good idea. Amer ran the same experiment, and he got a completely different conclusion. I went home and I was like, Amer, you're doing something wrong. So, I compiled everything for him. I set up a machine. He logged in, and he ran that program and got the exact same result that he was getting before. It was the complete opposite conclusion that I got. So, it turned out, his environment variables contain his username AmerD, whereas my username was much longer. It was TMMytkowicz. My environment variables were causing the stack to start in a different place. That caused massive changes in the performance. This problem ended up being what I focused on for my dissertation work.

MZ: Who would have thought that simply having different lengths of environment variables can have such a big impact to the performance!

TM: We eventually published a paper called "Producing Wrong Data Without Doing Anything Obviously Wrong". It was rejected a few times, but we persisted. I think a lot of people realized that some of these problems existed. The feedback we would always get when we submitted it was, "Well, we know this is a problem." But no one ever did anything about it or even acknowledged it in research. We ended up doing surveys and becoming a little bit more forthright, showing that, while people may colloquial know about it, no one did anything about it or talked about it in their papers. We put it out there as, "Hey, this is a potential problem, right?" We knew this work was a surrogate for, of course, many related problems. Like changing your link order. Or suppose you introduce a new compiler optimization, which changes the location of where a particular hot loop starts in the address space. All of these things can potentially impact the performance of your program, a lot more than you might imagine.

MZ: You later moved to MSR. What have you been working on these days?

TM: What I've been focusing on is parallelizing sequential programs with ideas from symbolic execution. I worked very closely with Madan Musuvathi and Saeed Maleki. The first paper that we've had on this idea was back in 2014 in ASPLOS. It was on parallelizing the execution of DFAs, or finite state machines. We've been working on generalizing the idea of using symbolic execution as for all states and then finding very efficient ways to execute the symbolic execution. Then once you do, you have a parallel algorithm. We've applied this to a bunch of different techniques. We have a paper on showing that this kind of idea works also on a bunch of dynamic programming algorithms.

MZ: Tell us a little bit about your family

TM: My wife was a social worker before staying home with our 2 kids. My dad has a master's in operations research. His master's work was on dynamic programming and linear algebra problems. That's a lot of what I've been spending my time on lately. I find that funny, as I did not really know this until I was in graduate school! My mother was a professor at a university called Curry College, which is outside of Boston. My daughter's in kindergarten. She gets out at about 2:00pm every Wednesday, and she, my wife, and my 3-year-old son all head up to the local hill every Wednesday for skiing. It's something I'm hoping to pass on to my kids.

MZ: Has your approach to research problems changed over time?

TM: I think more on the transition from research ideas to products. I am more motivated now to get my research into the hands of developers. Of course, being at MSR, that helps a lot in this regard. I have found that thinking about how people use your tools really helps hone the research challenges you tackle. Often in research we need to make assumptions to make forward progress. Tackle little bits at a time. Because I am constantly thinking about the end user of the research, I tend to make different assumptions. For example, one of the first things I tell interns that visit me at MSR is that I do not expect them to write massive amounts of code. If we are going to accomplish a research goal in a short amount of time, it is really important that we get a crystal clear understanding of the problem we are working on, what assumptions we are going to make, and how an end user will interact with it, rather than build a big system.

MZ: What kind of problems would you like to see solved in the next five years or next maybe even ten years?

TM: I think one of the things that is often the case with PL is that languages and systems are often purpose built to solve tasks as they become important in industry. JavaScript and the web are a great example. I look back on my career and I see Jikes RVM was really a great project when I was a grad student because it solved a particular need. We needed to understand how garbage collecting worked, how different ideas from compilation and JIT compilation and on-stack replacement worked. Giving people a playground to experiment with those ideas was super important. That kind of work was super important. Now, maybe the web is, in a similar vein, a place where people can apply similarly the ideas. The web has different layout engines, different job script engines, different compilers. Likewise, I think we're seeing right now with machine learning. If you look at TensorFlow, it's a big programming language, where you write Python to produce what amounts to an AST in some low-level language. There are all sorts of people out there who are working on compiling that. Of course, it makes it even harder because, at least in the programming language, when you look at JavaScript, there's a standard semantics. I'm not sure we necessarily have that yet with some of these new domains. They're by implementation only. That's okay, but it brings with it its own challenges.

MZ: Are there research papers or books you find particularly inspiring?

TM: Maybe at the beginning of this interview, I made it sound like everything was obvious and easy for me to, like it was obvious what I was going to do next in my career. That was not the way it was at the time. I remember I was not clear if I wanted to go to be a professor or go into industry. I ended up reading a lot. I remember reading Edsger Dijkstra's "The Three Golden Rules for Successful Scientific Research". I read it as a grad student, just before being a new minted PhD and thinking about what I wanted to work on next. I was really humbled by his advice. One of his golden rules suggests you work on a problem that you're uniquely suited to work on. At the time, I remember being really intimidated by his advice because who was I to feel like I was uniquely suited to work on any problem! Now looking back on it after what I've worked on in my career, I have internalized his advice: it is unlikely any individual is uniquely situated to solve a particular problem. There are very few super-humans! It is much more useful to bring to bear your past experiences and make sure those give you an advantage to that problem. For example, my prior work in performance analysis was "unique" when I started at MSR in the RiSE group. I was one of the few people that worked on that aspect of systems and thus I could bring to bear a unique vantage point when working with others. Anyways, I highly suggest reading his short paper. It was intimidating as a grad student, but it had a strong impact on my research career.

MZ: Some people think that fields like math and physics have a lot of these grand open questions but not so much in the PL field. What's your take on this?

TM: I don't know if you've ever read Danny Hillis's thesis. Danny Hillis was behind the Connection Machine. His thesis was a beautifully written piece. The idea behind the Connection Machine was a computer where you have a lot of little processors all inter-connected, into a grid. It was an interesting execution model for parallel programs at the time. In the last chapter of his thesis, he has a section titled "A New Hope for a Science of Computation". In it he states:" For example, in classical physics, most quantities are continuous. As physicists probe deeper and lower into more fundamental levels of reality, things begin to look discrete. The physicist of yesterday measured. The physicist of today counts. In computation things are reversed. We have begun in the other direction, and, because we have begun only recently, we have not gone very far. This is one of the reasons that computer science seems to be "no good". We have not gotten beyond counting. Knowing the lowest level rules is good, but it is in no way sufficient. Quantum chromodynamics is not much use in designing bridges. Computer science is not much use in designing computers." I love this quote. Ultimately, if I look at what I do on a day-to-day job, I study experimental phenomenon. That's largely what performance analysis, particularly of big systems entails. Even though it's a man-made system, the performance of a computer is sufficiently complex that I posit we cannot explain its behavior without measuring it! I'm not saying that I have some overarching description of an analogy to physics, but I think we're now starting to realize that we need to get into this realm of measurement in computer systems, even though they are man-made. The "physics" of our systems is an important topic.

In fact, that is one of the things that I worked on when I was a PhD student, I have a paper in the physics Journal Chaos with my co-advisor, and Liz Bradley, at the University of Colorado. We used hardware performance monitors to measure a computer's performance on a set of benchmarks. We showed that such a system experimentally obeys the classical definition of chaos. There are all sorts of very interesting physical phenomenon going on in the machines that we've created today. The fact that we can't describe it is, I think, where part of a research challenge comes in. I think we're at the point already where the machines that we've created are beyond our understanding. So, we have nothing to do but to resort to measurements, right? In Danny's Hillis's thesis, he talks about how you can think about physics in the context of computation and there's a physical property that emerges from the machine that he's created, in the way that he studied it. So, I think there's a lot of things to be studied and said with respect to the physics of computers. From a performance standpoint, I don't think I necessarily agree that there aren't open problems. Read section 7.2 of his thesis -- It's a beautifully written section! It was a belief that I already held. But it was nice that someone else held it as well.