Long ago (1988) I moved to Berkeley and started sending a monthly "newsletter" to my Boston friends. When I returned to Boston (1993), I continued the tradition for about five more years (or until I had kids). Looking back, I realize that I was actually blogging. Each newsletter contained anywhere from a few to several blog posts. Having been silent for the past decade or so, I've decided to resume these activities. Don't expect anything profound -- I tend to focus on what I find entertaining or amusing and perhaps sometimes informative. We shall see!

Sunday, June 25, 2023

How to Present your Research: Part 4: Presenting Results

We've talked a lot about story-telling so far, and I bet you're all wondering, "OK story telling is fine, but I have some actual research results I want to show!  Let's get to that." And indeed, we will. But, guess what -- each result is a mini story.

A result story has three basic parts: 1) Why am I showing these results? 2) How did I obtain these results? 3) Here are the results.  Let's dive into each one.

Why am I showing this to you?

Each experiment you perform or result you show should be either answering a research question or showing evidence to support/refute an hypothesis. I think these two are really the same thing, so for the rest of this post, we'll call them research questions (RQ for short).  Note that it has become a lot more common in recent publications (even in systems and ML) to explicitly state your research questions. I encourage you all to do this before you write any experiment code -- it's great research and mental discipline.

Before you dive into the details of the experiment, explain what RQ you are answering. This could be no more than a sentence, "We want to know if the overhead of OurGreatIdea is acceptable, so we conducted N experiments to determine if we were able to do SomethingAmazing without introducing more than X% overhead."  Wait -- notice how I snuck that goal in the end? I think that 99% of the research out there asks the question, "How much overhead does something introduce?" And then, regardless of what that overhead is, we declare victory!  That's not exactly science. I urge you to think (before you run any experiment), "How much overhead is acceptable?" Once you do that, then you can decide if your overhead is great, good, acceptable, almost there, or terrible.

But, I digress. Sometimes, you simply have to state the RQ. Sometimes, you might want to go into a bit more detail. Some of the greatest insight comes from ablation studies; these answer the question, "Which of the N things I did account for the great results we get?" Let's say that you built a system that introduced three optimizations. People will wonder both A) How much does each thing matter? B) Do the results compound? In the case of such an ablation study, you might want a slide to review the various things that  you did and then explain that you're going to introduce each one by itself or remove each one and leave the others in. Depending on your system, one of those is likely to make more sense. If you can do both, go for it!

A cousin of the ablation study answers the question, "How much time do we spend in each part of our BigComplexSystem?" Usually, you can just explain that with a single sentence; sometimes you'll want an architectural picture to explain it.

Regardless of the complexity of the experiment, please explicitly tell them what question the result you are about to present is going to answer.

How did I do this?

Explain the experiment!  It sounds easy enough, but you'd be amazed how many times I am shown a graph with 0 explanation of: A) the data used, B) the workload run, C) any details about whether I am seeing averages of many runs, a single run, the best of many runs, etc. So, describe your experiment.

In a paper, you will want to give precise details about the machine on which you're running. In a presentation, I think it's fine to categorize the platform, e.g., "We ran this on my laptop." or "We ran this on a huge server." or "We ran this on a typical server with a Big Honkin' GPU."  You can go into more specifics, but I think just putting the detail on the slide is more helpful than saying the particular model of processor you are using (no one can wrap their heads around that verbally; OK maybe someone can, but I do not think most people can).  Typically people want to know how much main memory you had and if you are running a modern machine. If you are doing storage work, then of course, they want to know something about the storage system too.  If you are doing ML, they may care what GPU you are using. Think carefully about what your audience wants to know and the best way to present it (on the slide, but say nothing; on the slide and highligh key features; on the slide and read it).

Describe the workload!  Tell the audience whether this is a standard benchmark in your field or one you made up (and if you made it up, you are going to have to justify why). Is this a microbenchmark or a macrobenchmark?  Is there input data? Where did that data come from? What exactly are you going to show? The average of N runs? One run? One cherry-picked run? A number you made up?  Sometimes, you might even say, "Here is what we expect a good system to do on this benchmark."  Then you can compare that strawman to your actual results. Alternately you might way, "To the best of our knowledge, the best system for this is X and here is how it does on this benchmark." Again, then you can compare your result.

Your goal is that when you finish describing your experiment, the audience is excited to see your results.

My Results

Tell me what I'm looking at! Again, this seems obvious, but I have sat through too many talks where a speaker excitedly tells me how great these results are before I have any clue what I'm looking at. So, breathe ...

Most results (in systems) are either tables or graphs/charts (I'll use the term graph here to describe a visual depiction of data). In either case, describe what the audience is looking at. For example:

Graphs

This graph has throughput on the Y axis and the number of cores on the X axis (if you are using log scale, say that explicitly here, even if it's in the axis labels which it should be). So, big bars (higher numbers) are better. [Please, always add this last part!]  Sometimes I just show the structure of the graph with no data as I explain this.  That allows the audience to understand what they are going to see before they get distracted with the actual data. The error bars show standard deviations, the middle 50th percentile, whatever -- tell me!

Tables

Most of the same rules apply here. Each row represents something and the columns correspond to something else. Tables of numbers are relatively difficult to comprehend, so I encourage highlighting the things you want the audience to take away from the tables -- embolden the best in each row/column; color code good/bad/mediocr results. Basically, help the audience take away the message you want them to take away from your results.

Takeaways

Draw the listener's attention to the things you want them to get from the figure. Do not assume that your results are so obvious that they will get exactly what you want them to! Inevitably, they will draw a wrong conclusion or one different from the one you wanted them to.  Walk them through the results.  I like to compare and contrast with what we might expect. Some examples:

We expected that we would scale well until we were running one thread on each socket, but we found that we actually scaled well until we were running one thread on each core. As we'll see in the next graph, our cache footprint was small enough that the data fit nicely in the per-core caches. [This is a great technique to explain WHY you get the results you do -- show a macrobenchmark result and then follow it up with a microbenchmark result that explains the results you got in the macrobenchmark.]

We had hoped that participants using our tool would complete the task more quickly. While they did produce better solutions (point to the part of the picture that shows that), we noticed that they actually took longer. [When you have a surprising or unexpected result, this usually indicates something interesting, even if it didn't match your intuition. In this case, perhaps it's something like, "Our qualitative survey results suggest that people found our tool so much more enjoyable to use that they were willing to spend more time using it to produce high quality solutions; in contrast, old tool was so painful to use that participants stopped as soon as they produced anything close to correct."]

In my speaker notes, I almost always have a numbered list of 3 +/- 1 key points that I want the audience to take away from any result I present.  I do not write the details of those points in the notes, because I want the timing of my explanation to match the ability of the audience to comprehend what I'm saying. If one writes out the exact explanation, it is almost always the case that that explanation is presented too quickly for the audience to comprehend. If the speaker has to think about it, then their delivery almost always better matches the audience's receptive timing. I think this point is crucial!  When we are nervous, we almost always start to speak more quickly and often become quieter towards the end of our sentences. You want to do everything possible to avoid this, especially when presenting results.

Key Takeaways:

  1. Don't forget any of the three parts: question I'm answering, how I am answering it, what I found.
  2. Explain the format of the results you are presenting.
  3. Use microbenchmarks to explain the results of macrobenchmarks.
  4. Highlight surprising results.