SXSW 2009 Notes: Designing for Wisdom of the Crowds - William Hertling's Thoughtstream

Derek Powazek spoke on Designing for Wisdom of the Crowds at SXSW Interactive 2009. He graciously posted the full slides. It also turns out that Derek works for HP’s MagCloud, a magazine publishing site. Here are my takeaways from his talk.

Wisdom of the Crowds began with Francis Galton. He observed a contest in which people had to guess the weight of a cow. Their individual guesses were off, but the average guess was 1209 pounds, and the actual weight was 1198, less than 1% off.

The question is how to apply wisdom of the crowds to create better community online. When you see web forums, you see lots of stupidity. But when you looked at the most emailed stories on a news site, what the crowd is telling you are the most interesting stories, the crowd is doing an effective job picking stories.

Elements of wise crowds are:

Diversity
Independence (avoid group think)
Decentralization
Aggregation

Elements of bringing Wisdom of Crowds online are:

Small simple tasks
Large Diverse Group
Design for Selfishness
Result Aggregation

Small simple tasks:

One way that things can fall apart is by making it too complicated. A black comment form invites chaos. What you want is something with a specific output value, like a rating from 1 to 10, or a thumbs up/thumbs down.
Good examples of this include the T-shirt design site Threadless, and HotOrNot. (don’t visit the latter link from work.)
But a bad example of this is the initial launch of Wired Magazine’s Assignment Zero. They asked people to write news stories. People were interested in the idea, but when it came time to write an article, they were like “woah, this is a lot of work”. So they changed the process mid-stream by smallying the tasks: First, ask the users who we should interview. Second, ask the users who would sign up to interview those people? Third, who would sign up to take the interview notes and write articles? Fourth, they hired editors to turn raw articles into magazine quality articles.

Large Diverse Groups

Bad example #1: Groupthink at NASA led to a conclusion that it was safe to launch because everyone else thought it was safe to launch. It was inconceivable to think that it wasn’t safe to launch.
Bad example #2: Chevy Tahoe solicited input for advertisements. The only people motivated enough to contribute were environmentalists who submitted counter-advertising. Actual Tahoe fans were motived enough.
Want to encourage diverse groups to participate.

Design for Selfishness

Large groups of people aren’t going to contribute if they get nothing out of it. Is it worth my time? What do I get out of it?
Threadless: get $2,500 if you submit a winning design.
Google PageRank: people create web site links for their own reasons, not to help Google to build a billion dollar business, but Google Pagerank is ultimately dependent on those links.
Flickr Tags: people don’t tag photos to help flickr, they tag to organize photos. Flickr builds on top of that so that not only can they serve up photos by tags, but they can divide into clusters that so the tag of “apple” can be identified as meaning either computers, fruit, or NYC.

Result Aggregation

Favrd: gets favorited tweets from twitter, aggregates them so you can see what the most favorite tweets of the previous day is.

Heisenberg Problem

Once we create a leaderboard,it creates a new motivation: people will try to get onto the leaderboard, regardless fo contributing in a positive way. It creates an incentive for bad behavior.
Example: Flickr used to show absolute ranking of interesting photos, which caused people to spam their photo into many groups. The correction was to show a random selection of interesting photos. Now there is less motivation for someone to complete/spam/game the system to get into the #1 slot, because now there is no #1 slot. (Gaming the system was a recurring discussion theme all week.)
Also, show results only after voting is complete. Threadless shows voting results for T-shirt designs only when the week is done and all votes are in, not at all during the week.

Popularity does not have to rule

Amazon.com reviews for Battlestar Galatica show most helpful favorable review and most helpful critical review. The combination of the two is more informative than just showing you the single most helpful review, because that would be unbalanced. And a histogram of reviews shows you quantitative and visually how many reviews fall into 1, 2, 3, 4, or 5 stars. That gives you a good picture, again more helpful than just reading the most positive or negative or popular review.

Implicit versus Explicit Feedback

Explicit feedback is voting and rating. You are asking the audience to make an intentional decision. Threadless, Digg, Hot or Not, Zen, Amazon. The goal here is never to ask people to do more thinking than is necessary. If thumbs up/thumbs down will work, that’s enough. If 1 to 5 rating will work, don’t do a 1 to 10 rating.
Implicit feedback is pageviews, searches, velocity, interestingness, clickstream data. You can get more useful, better data when you don’t ask people a direction question.

(Personal aside: My passion is all around the implicit data…)

Design Matters: How you ask questions changes the answers you get

Two versions of Kvetch: the early dark version, and the latter white version.
The 1997 version was all dark and black. And the comments were dark, as in “I want to kill my teacher”. But the intention of the site was supposed to be funny, so what was happening?
The latter version of the site was white, with an open airy design. Same text. The submitted comments became funny and lighthearted.
Red versus blue: In a psychological test, they changed only one thing, the color of the border surrounding information. The blue group did better on tests of creative work, the red group did better on tests of recall. Not just a little better, but hugely better. We associate red with ranger and mistakes. People try to avoid mistakes. Red creates a fear response, people don’t want to mess up, so they pay attention to detail. Blue is cooler, more relaxes, and people connect to emotional content much better.

Seeing Things

Our brains work to create a story in our head based on inputs. If some of those inputs are missing, the brain works twice as hard to create a story that makes sense.
Fighter pilots: when they undergo G-forces that starve the brain of oxygen, they undergo vivid hallucinations that comprises a tiny part of reality, but most made up.
In online situations, we lack most of the data we would have in the real world: facial expressions, sounds, etc, and all that is left is lines of text on the screen. So our brains work really hard to make up a story. People make up a story when they are deprived of the data.
They did a study: two groups of people. The “in-control” group goes into a room and answers questions and are told they are always right. The “out of control” group goes into a room and answers questions, and are told they are always wrong. Then they present a chaos picture, such as static or random clouds. When presented with the picture, the in-control group said there was nothing. The out-of-control group saw all sorts of things that weren’t there.
Then they did a followup. They had the out-of-control group tell them a story about their morning or something they were passionate about. Then shows the chaos pictures to those people, and the people said there was nothing there.