Berkeley Andrus: I am working on an intent-mapping project to facilitate more flexible speech interfaces. My specific application is video games, where a player will be able to talk to a game and have it respond intelligently. My goal is to be able to take any number of spoken inputs (“Shoot him!”, “Take down those hostiles!”, “Target the sniper!”) and map them to a finite range of predefined commands ([Attack, enemies]).

As you can imagine, I had a hard time finding data. I looked for existing data sets using terms like “Video game dialog”, “Player instructions”, and even “Examples of imperatives”. Once I accepted that I wasn’t going to find the data set I needed wrapped in a nice little bow, I started brainstorming how to make data.

My first attempt was to generate data with a Python script. I came up with beginnings of sentences (“Go to”, “Get out of”, “Shoot”, “Heal”) and ends of sentences (“the hospital”, “the cruiser”, “my position”). Then I simply concatenated them to generate labelled test cases (“Go to the cruiser” got labelled as [Go, vehicle]). I ended up with 480 test cases, most of which actually sounded like plausible speech.

Then my adviser sent me an even better data source: http://wowwiki.fandom.com, which had loads of NPC dialog. I manually went through pages and pages of dialog and copied down anything that could be loosely interpreted as an imperative, putting labels on each sentence. The data was the reverse of what I wanted – it was things NPCs said to the player rather than the other way around – but at least it was natural, in-the-wild text instead of something I had written myself.

I ran with those two data sets, along with another smaller set my adviser hand-wrote before I took over this project. Fast forward four months, and I was getting close to a good paper. My methods were working great on my three data sets. The only thing that was missing was an actual test on human-generated data (lab members do not count as ‘human’ in this case for obvious reasons. No offense.)

It was time for me to collect some actual data. In an ideal world I would have recorded people playing video games and transcribed everything they said. I didn’t have the resources to do that, so instead I simulated it through a carefully designed digital survey. I took screenshots from Call of Duty and circled important elements in the screenshots, including allies, enemies, vehicles, buildings, etc. I then attached prompts to the screenshots, such as “What might you say to your teammate to get them to protect the circled person?” The respondent’s answers became my test cases, and my prompts gave me my labels. I ended up with 400 usable test cases to add to my paper.

Example question from the survey we conducted

Now that you have the context of what I was able to accomplish, let me share some lessons I learned about survey administration and data collection:

Lesson 1: Pick the right tool for the job

I have a bad habit of not picking the right tool for the job, in my research and in life. I am much more comfortable twisting the tools I already have to meet my needs than finding a new tool that was actually designed for the thing I want to do. So my first thought after deciding to conduct a digital survey was to write a website from scratch, find some cheap way to host it, and collect data that way. Luckily I abandoned that idea in favor of one that was marginally better: Google Forms. I created a new Form and started writing questions, but quickly realized that Forms wouldn’t support the functionality I needed. I wanted to write 40+ questions, but I wanted each user to see only a random subset of them. On top of that, adding pictures was a bit of a pain, and I didn’t want to repeat the process 40+ times.
I finally decided: I should just swallow my pride and learn a new tool. I looked at SurveyMonkey first, but it gave me the same issues as Google Forms had. Then I found Qualtrics.

I had taken Qualtrics surveys before but never made one. So I signed up for the free 30 day trial and started digging around. It took me the better part of an afternoon, but before too long I was comfortable with the software and had finished creating my survey.

Another tool I found along my journey was SurveyCircle. I had been on Reddit, Facebook, and Slack begging people to take my survey. I thought of the dozens of times people had asked me to take their surveys and began wishing I had earned some more karma from the universe by actually responding to them.

That’s when I got an email from Jonas Johé, founder of SurveyCircle. He told me about his company’s service, where you basically trade responses to surveys. I took a few surveys that others had posted, which earned me enough points that my survey started showing up on their search results. It was a great solution to a problem I hadn’t even realized I had.

Long story short: take the time to think about how you are going to create and distribute your survey. There are lots of tools that are free and easy to learn. There are also tools that are expensive and hard to learn, but give you more power to create the thing you really want to create. Weigh the pros and cons and make a deliberate decision instead of just going with what you already know.

Lesson 2: Think about your respondent’s experience

When creating your data-collection survey, there are two goals you should keep in mind. First, you want your data to be unbiased, high-quality data. In my case that meant avoiding using the words that I secretly hoped respondents would use. For example, I knew my model would perform better if users used the word ‘Attack’ over any of its synonyms, so I deliberately avoided using the word ‘Attack’ in my prompts to make sure I wasn’t skewing the responses I received. Getting high-quality data also meant providing the right amount of context – not too much and not too little – to help people understand how to answer questions and why I was collecting this data in the first place.

The second goal you should keep in mind is that you want your data to exist. In other words, make your survey as painless as possible to maximize the number of responses that you get. In my survey, I knew that some respondents would feel unprepared to answer questions about video games. To ease their trepidation, I ended my survey’s instructions by saying “There is no right answer – the goal of this survey is to record a range of possible answers that reflect how people communicate with one another. If you do not understand a question or cannot tell what an image depicts, feel free to skip it and move on to the next one.”

I don’t have any empirical evidence that that was the right thing to say, but I feel that it’s important to anticipate your respondent’s discomforts and help them feel good about whatever answers they submit.

The other big thing you can do to make your respondents comfortable enough to respond to your questions is to keep your survey short. People have a short attention span, especially when they’re providing a free service to a stranger on the internet, so limiting the time you take out of their day is key to getting cooperation.

Put on your UX hat, get inside the mind of your respondents, do some user testing if you have time, and give people the best experience you can.

Lesson 3: Don’t be afraid to hit the pavement

I’m a natural introvert, and one of my favorite things about computer science research is sitting in the same chair in the same lab hour after hour, day after day. When I was distributing my survey, I started by spreading it online because that’s what I was comfortable with.

However, even with the help of SurveyCircle, I wasn’t getting as many responses as I needed. So on the third day of data collection, I decided to step outside my comfort zone. I reserved a table at BYU’s student center, enlisted the help of my wife and two-month-old daughter, and got ready to collect survey responses face-to-face.

We set out two laptops, a big bowl of candy, and a poster asking for volunteers. My wife and I sat behind the table for three hours, smiling and waving at passers-by, trying to attract attention for our cause. And, to my surprise, people came! We got more survey responses during that short window than we did during any other time the survey was open. A few of the participants were friends from classes who came to say hi and then felt obligated to help. But a lot of them were strangers who were attracted by the candy, by our cute baby, or in some cases by the opportunity to help contribute to a research paper.

Table with computers and a bowl of candy where DRAGN Lab members conducted a survey

As if high quantity wasn’t enough, our in-person survey solicitation led to higher quality data as well. Online we got our fair share of trolls and zero-effort replies that had to be thrown out. But we didn’t have any of that with the in-person respondents. Every response we got was top-notch.

Conclusion

In short, creating my own data set took time, but it was worth it. And if I’m being completely honest, it was a little bit of fun. Sometimes we get lucky as researchers and find the exact data set we’re looking for, but we shouldn’t be afraid to go out and collect data ourselves.

Leave a comment and let me know what you’ve learned about data collection. What tools did you use? What decisions made a difference?

Comments are closed