By now I’m sure everyone knows about how the NSA is spying on virtually every email coming in or out of the country, and nearly everyone that’s connected, even indirectly, to anyone even vaguely suspicious. If you’re not sure why we should care about privacy, read this piece by Cory Doctorow.

Brad Feld posted this morning about Lavabit committing corporate suicide. Lavabit is the company who provided Edward Snowden with secure email, and they were being forced by the US government (presumably) to violate their privacy/security agreement with their users. Rather than compromise security, they chose to end business operations.

What I found particularly interesting was the comment thread, in which Brad’s readers were asking him to take a stance, and he said that he didn’t yet know what action to take (more or less).

During my drive to work, I started brainstorming what possible actions might be. I don’t know what would be effective, so consider this nothing more than a list of raw ideas.

  • Donate to the Electronic Frontier Foundation (EFF). They, more than anyone else, are the single point organization on the topic of privacy and security on the Internet. They’re organizing information and fighting legal cases. 
  • Don’t make it easy for people to spy on you. While we should assume that our emails, web browser activity, and everything else is widely available (both for legitimate government use as well as abuses of that power), we can still take steps to make it more difficult to be spied on. Some of these include:
  • If you are running a business, reconsider your use of cloud services. Although that’s the direction we’ve all been heading in the last few years, is it worth the potential risk? How would you be affected if your private business correspondence, plans, and data were leaked to random folks, including your competitors? For many years the argument in favor of cloud computing was that you can leave the security to the professionals. Now that we know virtually all cloud computing companies are insecure, that argument is no longer valid. (Consider also that many companies host on AWS. If Amazon is providing data to the NSA, then every company using AWS is also compromised.)
  • If you’re an investor in a tech startup, consider the cloud strategy for that company. Is privacy or security an integral aspect of what they’re offering? If so, they should strongly consider hosting in a privacy-friendly country, like Sweden. The company itself might be better off being located outside the US. If privacy or security is an integral part of their product, this should be a serious concern. That doesn’t just mean companies providing privacy or security as a product, but any product where the value of the product is threatened or diminished without privacy. For example, we can’t even begin to comprehend how genetic data might be used in the future. I’d like to know where my 23andme data is housed. (Given that Google is an investor, is it on Google servers? Great, now the NSA has my genetic profile.)
Any other ideas?

The Obama for America AWS infrastructure diagram is a bundle of awesome. This mature cloud architecture diagram clearly shows the incredible web application architecture the Obama Campaign was able to create in a very short time.

A tiny portion of the AWS Obama for America 2012 infrastructure diagram

It’s been available online at awsofa.info for a while, which is a nice, browsable Google Maps powered way to see the intricacies of the diagram.

However, if you’ve tried to print it, it’s next to impossible. Using the browser File->Print just captures what is onscreen, so you’re forced to choose between viewing a tiny portion of the diagram in high resolution or the entire diagram in a low, unusable resolution.

I really wanted to print the whole diagram on a plotter, and after playing with the HTML for an hour, I got a good 24″x200″ print hanging on the wall at my office.

Now hanging on the wall at my work.

I’m sharing the high-resolution PDF that was an intermediate step in my workflow. It’s 27″x240″, but if you select scale to paper size on a plotter, this should print well on any large size. Note that it’s considerably wider than it is tall, so it’s helpful to set a custom size.

I hope this helps anyone trying to print it.

Update 2013-07-25: I just learned via JP that:

  1. Miles Ward is the author of the diagram. Thank you Miles for creating such an awesome, useful diagram.
  2. There’s a hidden easter egg in the diagram. Look for a Take 5 candy bar.
  3. You can download a PNG of the diagram courtesy of JP. 

The online world is buzzing with news of Elon Musk’s hyperloop, a very fast transportation system that could take people from LA to NY in under an hour. Elon will reveal the details of the system on August 12th, as he mentioned on Twitter.

He’s been talking about it for over a year, and has said:

“This system I have in mind, how would you like something that can never crash, is immune to weather, it goes 3 or 4 times faster than the bullet train. It goes an average speed of twice what an aircraft would do. You would go from downtown LA to downtown San Francisco in under 30 minutes. It would cost you much less than an air ticket than any other mode of transport. I think we could actually make it self-powering if you put solar panels on it, you generate more power than you would consume in the system. There’s a way to store the power so it would run 24/7 without using batteries. Yes, this is possible, absolutely.”

This widely shared concept photo is actually the Aeromovel, a pneumatic train system:

It’s pretty awesome stuff.

Small spoiler alert! Don’t read further unless you want to see a tiny bit of a scene from the last half of The Last Firewall.

Now the one problem with writing near-term science fiction is that stuff keeps coming true before I can get the books out. In this case, I have a vactrain in The Last Firewall. Leon and Mike must hijack the train to avoid detection. Here’s part of the scene where they discuss it:

“Now how do we get to Tucson?” Leon asked. “We are not driving again.”
Mike stared off into space. “I have an idea: the Continental.”
The super-sonic subterranean maglev was an early gift from AI-kind to humans, running in a partial vacuum at a peak of three thousand miles an hour.
“The train only stops in LA and NY,” Leon said. “And besides, we’ll be listed on the passenger manifest.”
“There are emergency exits.” Mike pushed a link over in netspace. “And with your new implant, can you hack the manifest?”
Leon glanced at the shared news article, accompanied by a photograph of a small concrete building peeking out of a cactus covered landscape.
“Marana, Arizona, about a half hour north of Tucson,” Mike said. “Emergency egress number three.”
“So we hop on the Continental and trigger an emergency stop when we’re near the exit?”
“Exactly,” Mike said. “Think that hopped-up implant of yours can fool some train sensors?”

It’s one of my favorite bits of technology in the book, and I was daydreaming about it before I even starting writing the first draft. Now thanks to Elon Musk, we may all get to ride in it.

ROS, the open source robotics OS, is accelerating development in robotics because scientists don’t have to reinvent everything from scratch:

As an example of how ROS works, imagine you’re building an app. That app is useless without hardware and software – that is, your computer and operating system. Before ROS, engineers in different labs had to build that hardware and software specifically for every robotic project. As a result, the robotic app-making process was incredibly slow – and done in a vacuum.  

Now ROS, along with complementary robot prototypes, provide that supporting hardware and software. Robot researchers can shortcut straight to the app building. And since other researchers around the world are using the same tools, they can easily share their developments from one project to another. 

I wrote a similar article last year about how we should expect to see an acceleration in both AI and robotics due to this effect. The remaining barrier to participation is cost:

The reason we haven’t seen even greater amateur participation in robotics and AI, up until this point, has been because of the cost: whether it’s the $400,000 to buy a PR2, or $3 million dollars to replicate IBM’s Watson. This too is about to change.

It’s about to change because cost of electronics declines quickly: by 2025, the same processing capacity it takes to run Watson will be available to us in a general purpose personal computer. Robotics hardware might not decrease in cost as quickly as pure silicon would, but it will surely come down. When it hits the price of a car ($25,000), I’m sure we’ll see hobbyists with them.

PR2 fetches a beer from the fridge

This is a great post and video about two robots making pancakes together. What’s amazing is that it’s not all preprogrammed. They’re figuring this stuff out on the fly:

James uses the Web for problem solving, just like we would.  To retrieve the correct bottle of pancake mix from the fridge, it looks up a picture on the Web and then goes online to find the cooking instructions. 

Rosie makes use of gravity compensation when pouring the batter, with the angle and the time for pouring the pancake mix adjusted depending on the weight of the mix.  The manipulation of the spatula comes in to play when Rosie’s initially inaccurate depth estimation is resolved by sensors detecting contact with the pancake maker.

1. Diagram from Google’s patent
application for floating data centers.

The technology in Avogadro Corp and A.I. Apocalypse is frequently polarizing: readers either love it or believe it’s utterly implausible.

The intention is for the portrayal to be as realistic as possible. Anything I write about either exists today as a product, is in active research, or is extrapolated from current trends. The process I use to extrapolate tech trends is described in an article I wrote called How to Predict the Future. I’ve also drawn upon my twenty years as a software developer, my work on social media strategy, and a bit of experience in writing and using recommendation engines, including competing for the Netflix Prize.

Let’s examine a few specific ideas manifested in the books and see where those ideas originated.

    • Floating Data Centers: (Status: Research) Google filed a patent in 2007 for a floating data center based on a barge. The patent application was discovered and shared on Slashdot in 2008. Like many companies, filing a patent application doesn’t mean that Google will be deploying ocean-based data centers any time soon, but simply that the idea is feasible, and they’d like to own the right to do so in the future, if it becomes viable. And of course, there is the very real problem of piracy.
Pelamis Wave converter in action.
    • Portland Wave Converter: (Status: Real) In Avogadro Corp I describe the Portland Wave Converter as a machine that converts wave motion into electrical energy. This was also described as part of the Google patent application for a floating data center. (See diagram 1.) But Pelamis Wave Power is an existing commercialization of this technology. You can buy and use wave power converters today. Pelamis did a full-scale test in 2004, installed the first multi-machine farm in 2008 off the coast of Portugal, is doing testing off the coast of Scotland, and is actively working on installing up to 170MW in Scottish waters.
Pionen Data Center. (Src: Pingdom)
    • Underground Data Center: (Status: Real) The Swedish data center described as being in a converted underground bunker is in fact the Pionen data center owned by Bahnhof. Originally a nuclear bunker, it’s housed nearly a hundred feet underground and is capable of withstanding a nuclear attack. It has backup power provided by submarine engines and triple redundant backbone connections to the Internet and fifteen full-time employees on site.
    • Netflix Prize: (Status: Real) A real competition that took place from 2006 through 2009, the Netflix Prize was a one million dollar contest to develop a better recommendation than Netflix’s original Cinematch algorithm. Thousands of people participated, and hundreds of teams beat Netflix’s algorithm, but only one team was the first to better it by 10%, the required threshold for payout. I entered the competition and realized within a few weeks that there were many other ways recommendation engine technology could be put to use, including a never-before-done approach to customer support content that increased the helpfulness of support content by 25%.
    • Email-to-Web Bridge: (Status: Real) At the time I wrote Avogadro Corp, IBM had a technical paper describing how they build an email-to-web bridge as a research experiment. Five years later, I can’t seem to find the article anymore, but I did find some working examples of services that do the same thing. In fact, www4mail appears to have been working since 1998.
    • Decision-Making via Email: (Status: Real) From 2003 to 20011, I worked in a position where everyone I interacted with in my corporation was physically and organizationally remote. We interacted daily via email and weekly via phone meetings. Many decisions were communicated by email. They might later be discussed in a meeting, but if a communication came down by a manager, we’d just have to work within the constraints of that decision. Through social engineering, it possible to make those emails even more effective. For example, employee A, a manager, is about to go on vacation. ELOPe sends an from employee A to employee B, explaining a decision that was making, and asking employee B to handle any questions for that decision. Everyone else receives an email saying the decision was made, and ask employee B if there are questions. The combination of an official email announcement plus a very real human contact to act as point person becomes very persuasive. On the other hand, some Googlers have read Avogadro Corp, and they’ve said the culture at Google is very different. They are centrally located and therefore do much more in face to face meetings.
Foster-Miller Armed Robot
(Src: Wikipedia)
  • iRobot military robots: (Status: Real) iRobot has both military bots and maritime bots, although what I envisioned for the deck robots on the floating data centers is closer to the Foster-Miller Talon, an armed, tank-style robot. The Gavia is probably the closest equivalent to the underwater patrolling robots. It accepts modular payloads, and while it’s not clear if that could include an offensive capability, it seems possible.
  • Language optimization based on recommendation engines:  (Status: Made Up) Unfortunately, not real. It’s not impossible, but it’s also not a straightforward extrapolation. There’s hard problems to solve. Jacob Perkins, CTO of Weotta, wrote an excellent blog post analyzing ELOPe’s language optimization skills. He divides the language optimization into three parts: topic analysis, outcome analysis, and language generation. Although challenging, topic analysis is feasible, and there are off-the-shelf programming libraries to assist with this, as there also are for language generation. The really challenging part is the outcome analysis. He writes:

    “This sounds like next-generation sentiment analysis. You need to go deeper than simple failure vs. success, positive vs. negative, since you want to know which email chains within a given topic produced the best responses, and what language they have in common. In other words, you need a language model that weights successful outcome language much higher than failure outcome language. The only way I can think of doing this with a decent level of accuracy is massive amounts of human verified training data. Technically do-able, but very expensive in terms of time and effort.

    What really pushes the bounds of plausibility is that the language model can’t be universal. Everyone has their own likes, dislikes, biases, and preferences. So you need language models that are specific to individuals, or clusters of individuals that respond similarly on the same topic. Since these clusters are topic specific, every individual would belong to many (topic, cluster) pairs. Given N topics and an average of M clusters within each topic, that’s N*M language models that need to be created. And one of the major plot points of the book falls out naturally: ELOPe needs access to huge amounts of high end compute resources.”

    This is a case where it’s nice to be a science fiction author. 🙂

I hope you enjoyed this post. If you have any other questions about the technology of Avogadro Corp, just let me know!

Everyone would like a sure-fire way to predict the future. Maybe you’re thinking about startups to invest in, or making decisions about where to place resources in your company, or deciding on a future career, or where to live. Maybe you just care about what things will be like in 10, 20, or 30 years.

There are many techniques to think logically about the future, to inspire idea creation, and to predict when future inventions will occur.

 

I’d like to share one technique that I’ve used successfully. It’s proven accurate on many occasions. And it’s the same technique that I’ve used as a writer to create realistic technothrillers set in the near future. I’m going to start by going back to 1994.

 

Predicting Streaming Video and the Birth of the Spreadsheet
There seem to be two schools of thought on how to predict the future of information technology: looking at software or looking at hardware. I believe that looking at hardware curves is always simpler and more accurate.
This is the story of a spreadsheet I’ve been keeping for almost twenty years.
In the mid-1990s, a good friend of mine, Gene Kim (founder of Tripwire and author of When IT Fails: A Business Novel) and I were in graduate school together in the Computer Science program at the University of Arizona. A big technical challenge we studied was piping streaming video over networks. It was difficult because we had limited bandwidth to send the bits through, and limited processing power to compress and decompress the video. We needed improvements in video compression and in TCP/IP – the underlying protocol that essentially runs the Internet.
The funny thing was that no matter how many incremental improvements researchers made (there were dozens of people working on different angles of this), streaming video always seemed to be just around the corner. I heard “Next year will be the year for video” or similar refrains many times over the course of several years. Yet it never happened.
Around this time I started a spreadsheet, seeding it with all of the computers I’d owned over the years. I included their processing power, the size of their hard drives, the amount of RAM they had, and their modem speed. I calculated the average annual increase of each of these attributes, and then plotted these forward in time.
I looked at the future predictions for “modem speed” (as I called it back then, today we’d called it internet connection speed or bandwidth). By this time, I was tired of hearing that streaming video was just around the corner, and I decided to forget about trying to predict advancements in software compression, and just look at the hardware trend. The hardware trend showed that internet connection speeds were increasing, and by 2005, the speed of the connection would be sufficient that we could reasonably stream video in real time without resorting to heroic amounts of video compression or miracles in internet protocols. Gene Kim laughed at my prediction.
Nine years later, in February 2005, YouTube arrived. Streaming video had finally made it.
The same spreadsheet also predicted we’d see a music downloading service in 1999 or 2000. Napster arrived in June, 1999.
The data has held surprisingly accurate over the long term. Using just two data points, the modem I had in 1986 and the modem I had in 1998, the spreadsheet predicts that I’d have a 25 megabit/second connection in 2012. As I currently have a 30 megabit/second connection, this is a very accurate 15 year prediction.
Why It Works Part One: Linear vs. Non-Linear
Without really understanding the concept, it turns out that what I was doing was using linear trends (advancements that proceed smoothly over time), to predict the timing of non-linear events (technology disruptions) by calculating when the underlying hardware would enable a breakthrough. This is what I mean by “forget about trying to predict advancements in software and just look at the hardware trend”.
It’s still necessary to imagine the future development (although the trends can help inspire ideas). What this technique does is let you map an idea to the underlying requirements to figure out when it will happen.
For example, it answers questions like these:
When will the last magnetic platter hard drive be manufactured?
2016. I plotted the growth in capacity of magnetic platter hard drives and flash drives back in 2006 or so, and saw that flash would overtake magnetic media in 2016.
When will a general purpose computer be small enough to be implanted inside your brain?
2030. Based on the continual shrinking of computers, by 2030 an entire computer will be the size of a pencil eraser, which would be easy to implant.
When will a general purpose computer be able to simulate human level intelligence?
Between 2024 and 2050, depending on which estimate of the complexity of human intelligence is selected, and the number of computers used to simulate it.
Wait, a second: Human level artificial intelligence by 2024? Gene Kim would laugh at this. Isn’t AI a really challenging field? Haven’t people been predicting artificial intelligence would be just around the corner for forty years?
Why It Works Part Two: Crowdsourcing
At my panel on the future of artificial intelligence at SXSW, one of my co-panelists objected to the notion that exponential growth in computer power was, by itself, all that was necessary to develop human level intelligence in computers. There are very difficult problems to solve in artificial intelligence, he said, and each of those problems requires effort by very talented researchers.
I don’t disagree, but the world is a big place full of talented people. Open source and crowdsourcing principles are well understood: When you get enough talented people working on a problem, especially in an open way, progress comes quickly.
I wrote an article for the IEEE Spectrum called The Future of Robotics and Artificial Intelligence is Open. In it, I examine how the hobbyist community is now building inexpensive unmanned aerial vehicle auto-pilot hardware and software. What once cost $20,000 and was produced by skilled researchers in a lab, now costs $500 and is produced by hobbyists working part-time.
Once the hardware is capable enough, the invention is enabled. Before this point, it can’t be done.  You can’t have a motor vehicle without a motor, for example.
As the capable hardware becomes widely available, the invention becomes inevitable, because it enters the realm of crowdsourcing: now hundreds or thousands of people can contribute to it. When enough people had enough bandwidth for sharing music, it was inevitable that someone, somewhere was going to invent online music sharing. Napster just happened to have been first.
IBM’s Watson, which won Jeopardy, was built using three million dollars in hardware and had 2,880 processing cores. When that same amount of computer power is available in our personal computers (about 2025), we won’t just have a team of researchers at IBM playing with advanced AI. We’ll have hundreds of thousands of AI enthusiasts around the world contributing to an open source equivalent to Watson. Then AI will really take off.
(If you doubt that many people are interested, recall that more than 100,000 people registered for Stanford’s free course on AI and a similar number registered for the machine learning / Google self-driving car class.)
Of course, this technique doesn’t work for every class of innovation. Wikipedia was a tremendous invention in the process of knowledge curation, and it was dependent, in turn, on the invention of wikis. But it’s hard to say, even with hindsight, that we could have predicted Wikipedia, let alone forecast when it would occur.
(If one had the idea of an crowd curated online knowledge system, you could apply the litmus test of internet connection rate to assess when there would be a viable number of contributors and users. A documentation system such as a wiki is useless without any way to access it. But I digress…)
Objection, Your Honor
A common objection is that linear trends won’t continue to increase exponentially because we’ll run into a fundamental limitation: e.g. for computer processing speeds, we’ll run into the manufacturing limits for silicon, or the heat dissipation limit, or the signal propagation limit, etc.
I remember first reading statements like the above in the mid-1980s about the Intel 80386 processor. I think the statement was that they were using an 800 nm process for manufacturing the chips, but they were about to run into a fundamental limit and wouldn’t be able to go much smaller. (Smaller equals faster in processor technology.)
Semiconductor
manufacturing
processes

 

Source: Wikipedia
But manufacturing technology has proceeded to get smaller and smaller.  Limits are overcome, worked around, or solved by switching technology. For a long time, increases in processing power were due, in large part, to increases in clock speed. As that approach started to run into limits, we’ve added parallelism to achieve speed increases, using more processing cores and more execution threads per core. In the future, we may have graphene processors or quantum processors, but whatever the underlying technology is, it’s likely to continue to increase in speed at roughly the same rate.
Why Predicting The Future Is Useful: Predicting and Checking
There are two ways I like to use this technique. The first is as a seed for brainstorming. By projecting out linear trends and having a solid understanding of where technology is going, it frees up creativity to generate ideas about what could happen with that technology.
It never occurred to me, for example, to think seriously about neural implant technology until I was looking at the physical size trend chart, and realized that neural implants would be feasible in the near future. And if they are technically feasible, then they are essentially inevitable.
What OS will they run? From what app store will I get my neural apps? Who will sell the advertising space in our brains? What else can we do with uber-powerful computers about the size of a penny?
The second way I like to use this technique is to check other people’s assertions. There’s a company called Lifenaut that is archiving data about people to provide a life-after-death personality simulation. It’s a wonderfully compelling idea, but it’s a little like video streaming in 1994: the hardware simply isn’t there yet. If the earliest we’re likely to see human-level AI is 2024, and even that would be on a cluster of 1,000+ computers, then it’s seems impossible that Lifenaut will be able to provide realistic personality simulation anytime before that.* On the other hand, if they have the commitment needed to keep working on this project for fifteen years, they may be excellently positioned when the necessary horsepower is available.
At a recent Science Fiction Science Fact panel, other panelists and most of the audience believed that strong AI was fifty years off, and brain augmentation technology was a hundred years away. That’s so distant in time that the ideas then become things we don’t need to think about. That seems a bit dangerous.
* The counter-argument frequently offered is “we’ll implement it in software more efficiently than nature implements it in a brain.” Sorry, but I’ll bet on millions of years of evolution.

How To Do It

This article is How To Predict The Future, so now we’ve reached the how-to part. I’m going to show some spreadsheet calculations and formulas, but I promise they are fairly simple. There’s three parts to to the process: Calculate the annual increase in a technology trend, forecast the linear trend out, and then map future disruptions to the trend.
Step 1: Calculate the annual increase
It turns out that you can do this with just two data points, and it’s pretty reliable. Here’s an example using two personal computers, one from 1996 and one from 2011. You can see that cell B7 shows that computer processing power, in MIPS (millions of instructions per second), grew at a rate of 1.47x each year, over those 15 years.
A
B
C
1
MIPS
Year
2
Intel Pentium Pro
541
1996
3
Intel Core i7 3960X
177730
2011
4
5
Gap in years
15
=C3-C2
6
Total Growth
328.52
=B3/B2
7
Rate of growth
1.47
=B6^(1/B5)
I like to use data related to technology I have, rather than technology that’s limited to researchers in labs somewhere. Sure, there are supercomputers that are vastly more powerful than a personal computer, but I don’t have those, and more importantly, they aren’t open to crowdsourcing techniques.
I also like to calculate these figures myself, even though you can research similar data on the web. That’s because the same basic principle can be applied to many different characteristics.
Step 2: Forecast the linear trend
The second step is to take the technology trend and predict it out over time. In this case we take the annual increase in advancement (B$7 – previous screenshot), raised to an exponent of the number of elapsed years, and multiply it by the base level (B$11). The formula displayed in cell C12 is the key one.
A
B
C
10
Year
Expected MIPS
Formula
11
2011
177,730
=B3
12
2012
261,536
=B$11*(B$7^(A12-A$11))
13
2013
384,860
14
2014
566,335
15
2015
833,382
16
2020
5,750,410
17
2025
39,678,324
18
2030
273,783,840
19
2035
1,889,131,989
20
2040
13,035,172,840
21
2050
620,620,015,637
I also like to use a sanity check to ensure that what appears to be a trend really is one. The trick is to pick two data points in the past: one is as far back as you have good data for, the other is halfway to the current point in time. Then run the forecast to see if the prediction for the current time is pretty close. In the bandwidth example, picking a point in 1986 and a point in 1998 exactly predicts the bandwidth I have in 2012. That’s the ideal case.
Step 3: Mapping non-linear events to linear trend
The final step is to map disruptions to enabling technology. In the case of the streaming video example, I knew that a minimal quality video signal was composed of a resolution of 320 pixels wide by 200 pixels high by 16 frames per second with a minimum of 1 byte per pixel. I assumed an achievable amount for video compression: a compressed video signal would be 20% of the uncompressed size (a 5x reduction). The underlying requirement based on those assumptions was an available bandwidth of about 1.6mb/sec, which we would hit in 2005.
In the case of implantable computers, I assume that a computer of the size of a pencil eraser (1/4” cube) could easily be inserted into a human’s skull. By looking at physical size of computers over time, we’ll hit this by 2030:
Year
Size
(cubic inches)
Notes
1986
1782
Apple //e with two disk drives
2012
6.125
Motorola Droid 3
Elapsed years
26
Size delta
290.94
Rate of shrinkage per year
1.24
Future Size
2012
6.13
2013
4.92
2014
3.96
2015
3.18
2020
1.07
2025
0.36
2030
0.12
Less than 1/4 inch on a side cube. Could easily fit in your skull.
2035
0.04
2040
0.01
This is a tricky prediction: traditional desktop computers have tended to be big square boxes constrained by the standardized form factor of components such as hard drives, optical drives, and power supplies. I chose to use computers I owned that were designed for compactness for their time. Also, I chose a 1996 Toshiba Portege 300CT for a sanity check: if I project the trend between the Apple //e and Portege forward, my Droid should be about 1 cubic inch, not 6. So this is not an ideal prediction to make, but it’s still clues us in about the general direction and timing.
The predictions for human-level AI are more straightforward, but more difficult to display, because there’s a range of assumptions for how difficult it will be to simulate human intelligence, and a range of projections depending on how many computers you can bring to pair on the problem. Combining three factors (time, brain complexity, available computers) doesn’t make a nice 2-axis graph, but I have made the full human-level AI spreadsheet available to explore.
I’ll leave you with a reminder of a few important caveats:
  1. Not everything in life is subject to exponential improvements.
  2. Some trends, even those that appear to be consistent over time, will run into limits. For example, it’s clear that the rate of settling new land in the 1800s (a trend that was increasing over time) couldn’t continue indefinitely since land is finite. But it’s necessary to distinguish genuine hard limits (e.g. amount of land left to be settled) from the appearance of limits (e.g. manufacturing limits for computer processors).
  3. Some trends run into negative feedback loops. In the late 1890s, when all forms of personal and cargo transport depended on horses, there was a horse manure crisis. (Read Gotham: The History of New York City to 1898.) Had one plotted the trend over time, soon cities like New York were going to be buried under horse manure. Of course, that’s a negative feedback loop: if the horse manure kept growing, at a certain point people would have left the city. As it turns out, the automobile solved the problem and enabled cities to keep growing.

 

So please keep in mind that this is a technique that works for a subset of technology, and it’s always necessary to apply common sense. I’ve used it only for information technology predictions, but I’d be interested in hearing about other applications.

This is a repost of an article I originally wrote for Feld.com. If you enjoyed this post, please check out my novels Avogadro Corp: The Singularity Is Closer Than It Appears and A.I. Apocalypse, near-term science-fiction novels about realistic ways strong AI might emerge. They’ve been called “frighteningly plausible”, “tremendous”, and “thought-provoking”.

I’ve been thinking about the web and the role and effect of social networks. While I’m a user of Facebook, and like certain parts of it, there are other aspects of it that concern me, both for the impact it’s having now, as well as for the future. As an idea person, I ponder how we can get the benefits of social networking without the costs, while regaining the open web we used to have.

If you haven’t done so, go read Anil Dash’s The Web We Lost. I’ll wait.

I’m going to cover three topics in this post:

  1. The shortcomings of social networking as they exist today. 
  2. The benefits of social networking. I don’t want to throw away the good parts.
  3. A description of what a truly open social network would look like.

The Problems of Today’s Social Networks

These are the main problems I see. I’m not trying to represent all people’s needs or concerns, just capture a few of the high-level problems.

Transitory Nature

My first problem with Facebook and Twitter is the transitory nature of the information. I’m used to the world of books, magazines, and blogs, where information is created and then accessible over the long term. Years later I can find Rebecca Blood’s series of articles on eating organic on a foodstamp budget, my review of our Meile dishwasher written in 2006, or my project in 2007 to build the SUV of baby strollers. These are events that stand out in my mind. 
Yet if I want to find an old Twitter or Facebook post, it’s nearly impossible, even if it happened just a few months ago. There was a post on Facebook where I asked for people who wanted to review my next novel and twenty-five people volunteered. Now it’s a few months later, and I want to find that post again. I can’t. (Having grown used to this problem, I took a screenshot of it, but that’s an awful solution.)
The point is that properly indexed and searchable historical information is valuable to us, our friends, and possibly our descendants. However, it’s not valuable to Facebook and Twitter, whose focus in on streaming in real-time.

Ownership and Control Over Our Data

It should be unambiguous that we own our own data: our posts, our social network, our photos, and that we should have control over that information. As a blogger and author, I would choose to make much of that public, but it should be my choice. Similarly, it should be possible to have it be private. My data shouldn’t be used for commercial purposes without my explicit opt-in, and I should have control over who gets it and how they use it.
Personally, I’d like my content to be creative commons licensed: It’s mine, but you can use it for non-commercial purposes if you give me attribution. 
Yet this is not the case today. We have problems, again and again, with Facebook, Google, Instagram, and other services claiming the right to use our material for advertising, using it commercially, reselling it, and so on.

Advertising

We should have the right to be free from advertising if we wish, and certainly to have our children not exposed to advertising. But the way social networks exist today, the advertising is forced at us, whether we want it or not. And while I can ignore it (although I still hate the visual distraction), it’s harder for my kids to do so.
We’ve unfortunately ended up in a situation where the only revenue model for these businesses seems to be advertising based, even though there are alternatives.

Siloing of Networks and Identity

I have a blog, a couple of other websites, accounts on Twitter, Facebook, LinkedIn, Google Plus, FourSquare, YouTube, and Flickr. But, for all intents and purposes, there’s just one me. We try to glue these pieces together: sharing Instagram photos on Facebook, or using TweetDeck to see Facebook and Twitter posts in one place, sharing checkins, but this is a terrible approach because our friends and readers either see the same information in multiple places (if we share and cross-link) or miss it entirely (if we don’t). Because the networks are fighting over control points, they’re disallowing the natural openness that should be possible.

Privacy

For some people, privacy is a big concern. This isn’t a big one for me because I subscribe to the basic theory of Tim O’Reilly that obscurity is a bigger concern. (He was talking about authors and piracy, but  I think the theory applies to most people, whether they’re furthering their career, starting a business, selling a product, etc.) I’m concerned about the use and misuse of my data by commercial interests, but I think that can be handle through mechanisms other than privacy. If I’m wrong, then yes, privacy becomes a bigger issue.

The Benefits of Social Networking as it Exists Today

Yet for all these complaints, there are pieces that are working.
I have a niece and her husband that I don’t get to see often, but they’re active on Facebook, and I feel much more of a connection to them as compared to family not on Facebook. I’m glad to share what’s happening with my kids with my mom. I have far more interactions with fans on Facebook than I ever had comments on my blog.
The attempts surface the content that matters to me are imperfect (to be honest, often awful), but exist in some form:
  • I don’t see a hundredth of the tweets of the people I follow, but using TweetDeck and searches on hashtags and particular people, I’m able to find many I am interested in.
  • Google Circles are far too much work to maintain, but for a few small groups of people, it helps me find the content about them.
  • Facebook’s automated algorithms are awful, showing me the same few stories over and over and over, but it’s an attempt in the right direction: trying to glean from some mix of people plus likes plus comments what to show to me. 

The Solution

I think there is a solution that combines the best of social networks and the best of the old, open web. I think it’s also possible to get there with what we have today, and iterate over time to make it better. 
What I’m going to describe is a federated social network. 
Others have discussed distributed social social networks. You can read a good overview at the EFF: An Introduction to the Federated Social Network. If you look at the list of projects attempting distributed social networking, you’ll notice that they all list features that they’ll support, like microblogging, calendars, wikis, images. You’d host the social network on your own server or on a service provider.
Despite distributed social network and federated social network being used somewhat interchangeably in the EFF article, I want to argue that there are critical differences. 
The fully distributed social network describe in the EFF article and in the list of projects feels like mesh networking: theoretically  superior, totally open and fault tolerant, but in practice, very hard to create on any scale. 
I prefer to use the term federated social network to describe a social network in which the core infrastructure is centrally managed, but all of the content and services are provided by third parties. The network is singular and centralized; the endpoints are many and federated. To continue the analogy to computer networking, it’s a bit like the Internet: we have some backbones tying everything under the control of big companies, but we all get to plug into a neutral infrastructure. 
(I’ll acknowledge that in recent years we’ve seen the weakness of this approach: we end up with a few big companies with too much control. But it’s still probably better that we have an imperfect Internet than a non-existant mesh network.)
Here’s my vision.

SocialX Level One: 

Let’s start by imagining a website called SocialX. I have an identity on SocialX, and I tie in multiple endpoints into my account: Twitter, my blog, and Flickr. 
Behind the scenes, SocialX will use the Twitter API to pull in tweets, RSS to pull in blog posts, and the Flickr API to pull in photos.
Visitors to my profile on SocialX will see an interwoven, chronological stream of my content, including tweets, blog posts, and photos, similar to the stream on Facebook or Google Plus.
SocialX will be smart enough to eliminate or combine duplicate content. If a tweet points to my own blog post, it can surmise that these should be displayed together (or the tweet suppressed), knowing the tweet is my own glue between twitter and my blog: the tweet is an introduction to the blog post. 
Similarly, if a blog post includes a flickr photo, then the photo doesn’t need to be separately shown in my stream. 
Of course, SocialX will feature commenting, like all other social networks. Let’s talk about comments on blogs first. Let’s assume I’m using a comment service like DISQUS. By properly identifying the blog post in question, SocialX can display the DISQUS comment stream exactly as it would appear on the blog: in other words, both SocialX and the original blog post share the same comment stream. Comment on my blog, and your comment will show up in the SocialX stream associated with the post. Comment on SocialX, and the comment will show up on the blog.
Twitter replies can be treated as comments. In fact, the current approach of handling related messages on twitter is obscured behind the “view conversations” button. On SocialX, Twitter replies will look like associated comments. And if you reply on SocialX, your comment gets posted back to Twitter as a reply. So both Twitter and SocialX will share the same sequence of shared content, they’ll just be represented as comments on SocialX, and as Twitter replies/conversation on Twitter.
In other words, the user interface of SocialX might look a lot like Facebook or Google Plus, but behind the scenes, we have two-way synchronization of comments.
SocialX can handle the concepts of liking/+1/resharing in a similar manner. The two high level concepts are “show your interest in something”, and “promote something”. Each can be mapped back to an underlying action that makes sense for the associated service. For twitter, “show interest” can be mapped to favoriting a tweet, and “promote” can be mapped to retweet.
So far, we’ve also discussed how a single user’s stream of content looks. In other words, we’ve looked at it from the content provider’s point of view. 
If a user named Tom comes to SocialX to view content, he can, of course, view a single user’s content stream. But Tom likely has multiple friends, and of course this is social networking, not just the web, so we’ve got to use social graphs to determine who Tom is interested in. 
SocialX will use any available social graphs that it’s connected to, and will display the sum total of them. So if Tom connects with Twitter, he’ll see the streams of everyone he follows on Twitter. If Tom connects with Twitter and LinkedIn, Tom will see interwoven streams of both. (Although SocialX will try to remove redundant entries across services by scanning posts to see if the content is the same.)
This is today. We can make this work with a handful of existing services, plugging them into a centralized network, do the work on the central network to get these existing providers connected. It’s about bootstrapping.
Notice that we don’t need everyone to use SocialX for it to start being valuable. If Tom visits the site, and follows another twitter user named Sally, we can display Sally’s twitter stream for Tom, and probably auto-discover her blog feed, making the service useful for Tom before Sally every starts to use it. In essence, at this point we have a very nice social reader.

SocialX Level Two:

The next step beyond this is an API for the platform. Rather than force the platform to do work to integrate each new endpoint, we provide an open API so that other services can integrate into the network. When the next post-hipster-photo service comes out, they can integrate with SocialX just as Instagram once did with Facebook and Twitter APIs.
The API will require services to support a common set of actions for posting, commenting, liking, and promoting. Services will be required to provide posts in two formats: a ready-to-render HTML format, as well as a semantic form that allows other services to create viewers.  (Semantic HTML would work as well.) 
We require the semantic form because SocialX can’t be the only ones in the business of rendering these streams. So SocialX will also provide an API for other services to provide a reader/viewer or whatever you’d like to call it. This enables the equivalent of TweetDeck and Hootsuites in our environment. If someone can provide a superior user experience, they’re welcome to do so.
We also need to take a stab at figuring out what content to display. Should SocialX display everything, like Twitter? Use circles, like Google Plus? Heuristics like Facebook? Have a great search ability?
Let’s open it up to third parties to figure it out. A third party can consume all the streams I’m subscribed to, and then take their best attempt to figure out what I’m interested in. And if we set up this API in a smart way, it’ll function like a pipeline, so that we could have a circle-defining service divide up streams into circle-specific streams, and an interest-heuristic take each circle and figure out the most interesting content within that circle, etc.
Services like news.me are a perfect example of existing stream filtering, they just do it out-of-band.
Newsle is another good example of a content service we’d want to plug in because these news stories are associated with people we follow, even if they originate outside someone’s own content stream.
So far we have a content API on one side of the service that allows us to pull in content from and about people. On the other side of the service, we have a filter API that can remix, organize, and filter what stories appear in the stream. And a reader API to consume the final stream and render it.
SocialX will continue to provide default, base level filtering and reader native to the service, but all content originates from somewhere else.
Now we have a rich ecosystem that invites new players to create content, filter it, and display it.
In contrast to distributed social networking systems that spread out the network, but build in the features, SocialX would distributed the features, but have a singular central network.

SocialX Level Three:

Technology businesses need to make money. I respect that. As a technology guy, I’m often on that side of the fence. Content providers want to make money. I respect that, too. As an author and a blogger, I’d like to earn something from my writing.
But I also want to be free from advertising. 
How can we resolve this dilemma?
Advertising is just one way of making money, but I’d like to suggest two other ways.

Patronage

Let’s think about Twitter for the moment. Their need to make money from advertising has led to all sorts of decisions that their users hate. They want to insert ads into the tweetstream. They want control over all Twitter clients, to ensure their ads are shown. They’re restricting what can be done with the Twitter API.
Anytime a company makes their users hate them can’t be good. 
Here’s a different idea. The more followers one has on Twitter, the more valuable Twitter is. At the very top of the ecosystem, there are users with millions of followers, whose tweets are worth thousands of dollars each. Even at the lower end of the system, a user who has 10k or 50k followers on twitter is likely gaining a tremendous value from that network.
What if Twitter charged the top 1% of most-followed users a fee? Twitter would be free to use under 2.5k followers, but followers are capped unless you pay. A fee starting at $20/year, of roughly 1 cent per follower, would raise about $200M a year — in the same ballpark as their current ad-based revenue.

Ad-Free, Premium Subscriptions

The second opportunity is to charge for an ad-free, premium experience. If 10% of Twitter users paid $10/annually for an ad-free experience, that’s $500M in revenue. Personally, I’d be delighted to pay for an ad-free experience. Part of the reason this doesn’t work well today is that my time reading is split between Twitter, Blogger, WordPress, individually hosted blogs, news sites, Facebook, Google Plus, and so on.
It’s simply not feasible to pay them all individually.
However, if I’m getting the content for all these services through one central network, and can pay once for an ad-free experience, suddenly it starts to make sense. 
SocialX knows who the user is, what they’ve viewed, which services helped to display the content.
Now we start to see a revenue model that can work across this ecosystem. Revenue could come from a mix of patronage, paid ad-free users, and advertisements. We’ll keep ads in the system to support free users, but now that we have multiple revenue streams, there’s less pressure to oriented the entire experience around serving ads and invading people’s privacy. 
Example 1: Ben is a paid-subscriber of the system. Ben’s $5/month fee is proportioned out based on what he interacts with, by liking an item, sharing it, bookmarketing it, or clicking “more” to keep reading beyond the fold. He’s going to pay $5/month no matter what, so there’s no incentive for him to behave oddly. He’ll just do whatever he wants. 
If Ben interacts with 300 pieces of content in a month, each gets allocated $5/300=1.6 cents.
Those 1.6 cents are shared with among the ecosystem partners, something like this:
  • network infrastructure: 15% (SocialX)
  • stream optimization: 15% (news.me, tbd)
  • reader: 15% (the feedlys, tweetdecks, hootsuites of the world)
  • content service: 15% (the twitter, flickr, blogger, wordpresses of the world)
  • content creator: 30% (you, me, joe blogger, etc.)
Example 2: Amanda is a free user of the system. She sees ads when using SocialX. Amanda will be assigned an ad provider at random, or she can choose a specific one. (Because the ads too, will be an open part of the system.) Ad providers will be able to access user data for profiling, unless the user opts out. 
If Amanda clicks on 5 ads during the month, that will generate some amount of ad revenue. The ad provider keeps 20% of the revenue, and the rest flows through the system as above. The revenue is allocated to whatever content Amanda was viewing at the time.
Ad providers are induced not to be evil, because users have a choice, and can switch to a different provider. 
Example 3: George is a famous actor from a famous science fiction show. He has four million followers. Only the first 2,500 of George’s followers on SocialX will be able to view his stream, unless George pays a Patronage fee. He does, which for his level of usage is $4,000 a year. However, George is also a content provider, so if his content is interacted with (liked, reshared, etc.), he’ll also be earn money. Since George is frequently resharing other people’s content, the original content creator will get the bulk of the revenue (25% instead of 30%), but we’ll give George 5% for sharing.

Conclusion 

Let me come back to some of my problems with existing social networks, and see if we’ve improved on any of them:

  • Advertising: We’ve made a good dent in advertising. By having a central network and monetization process that relies on a combination on paid ad-free experiences, patronage, and advertising, we’ve taken some of the pressure off ads as the only revenue model, and hence the primary force behind the user experience. We’re allowing people to select their ad provider, so they can choose if they want targeted ads or random ads, or organic product ads, or whatever they want. Ad providers can’t be evil, or customers will switch providers.
  • Ownership and Control Over Our Data: SocialX owns very little data. It resides in the third party services. When users have choice over one blogging platform or another, or hosting one themselves, then they will regain control over their data by being free to choose the best available terms or by hosting it themselves. 
  • Privacy: 
    • My primary privacy concern is over the commercial use of my data, and in this regard, I have much more control. I can choose to use a stream filtering service which profiles me and my interests and receive a more personalized stream, or I can choose not to. Either way, the data is only used to benefit me. I can pay to opt-out of advertising totally, or opt-out of targeted advertising at no cost. 
    • I haven’t really thought through the scenario of “I don’t want anyone but a select group of people to see this content,” the other type of privacy concern. My guess is that we could solve this architecturally by having selectable privacy providers that live upstream from the filters and readers. These privacy providers would tag content with visibility attributes as it is onboarded. 
  • Transitory Nature: My concern here was the case of being able to find a given Facebook post where I had solicited beta-readers. In the SocialX case, I see a few fixes:
    • Some of the “stream filter providers” could be search engines. 
    • I could have chosen to originate my post as a blog entry.
    • The platform could support better bookmarking of posts. 
  • Siloing of Networks and Identity: By it’s very nature, this is the anti-silo of networks and identity.

The main problem we’re left with is that we need a benevolent organization to host SocialX. Because it is a centralized social network, someone must host it, and we have to trust that someone to keep it open.
A few years ago, I was sure this was going to be Google’s social strategy. It seemed to fit their mission of making the world’s information accessible. It seemed to be a platform play akin to Android. Alas, it hasn’t turned out to be, and I no longer trust them to be the neutral player.
It could be built as a distributed social network, but then we’re back to the current situation. Lots of distributed social networks, but no one has the momentum to get off the ground. 
If you’ve made it this far — thanks for reading! This is my longest post by far, and I appreciate you making it all the way through my thought experiment. Would this work? What are the shortcomings? How could this become a reality? I’d love feedback and discussion.