Square root of x divided by zero: The speed, size and dependability of programming languages

Posted on October 23, 2010

This may be the most interesting blog post I’ve read this year. It compares 33 programming languages in both speed of execution and expressiveness of the language. How does Python compare to Ruby? How fast is Java?

Square root of x divided by zero: The speed, size and dependability of programming languages

Database Anti-Patterns from PDX Ruby Tuesday

Posted on August 4, 2010

hertling

On Tuesday night, Xavier Shay (@xshay) gave a short talk on database anti-patterns.

Here are my rough notes:

STI for Shared Data

STI = single table inheritance. google “rails sti”.
this is sometimes called a “god table”. It is easily found by having a single table that has many null columns, because different kinds of objects are being stored in a single table.
Example: a database for books with columns id, type, name, illustrator. for books, there is no illustrator, so you have a null field.
This gets complicated over the time. you have to loosen database constraints (you can’t enforce a value for illustrator), and logic is required to handle the null case, even for books.
Using class table inheritence is one solution: books is one table, a second table is comics, and comics table has the illustrator field. In this case, instead of complicating the handling for books with a null illustrator field, we make comics, which needs the extra data, handling getting it from the table.

Not Deleting Data

People are afraid to delete data

in part, afraid of creating dead references, e.g. can’t use has_many
business needs to go back in past

It’s bad to try to use one database for both reporting and operational needs.
Example: a users database, in which records can marked as deleted.

then either user names can’t be unique, or then user names can’t be reused. either way, this then gets into extra coding and/or relaxing constraints.
plus, all your queries become “find users where state is not deleted”, so all queries become more complex and slower.

Solution: have two tables, and move the deleted users into the old_users database, which gives you your history.

Different data, same database

No notes here, sadly.

Not locking data

Example: it’s easy, if the user hits pay twice, that you could have a race condition here:

order = Order.find(id)
order.mark_as_paid!

Order.transaction do

order = Order.find(id, :lock => true)
order.mark_as_paid!

Notes from CloudCamp PDX 2010

Posted on July 20, 2010

hertling

CloudCamp 2010

Portland, Oregon

#oscon #cloudcamp #pdx

CloudCamp is a free birds of a feather session at OSCON, the O’Reilly Open Source Conference. I came out of general interest, and because one of the promised tracks is deploying your own cloud using open source tools.

Promo: New user group: pdxdevops

Lightning Talks

open cloud

Sam Johnson

Google, Zurich

open source / open cloud: freedom. You can move from one cloud to another. avoids lock-in.
unfettered competition leads to commoditization leads to utility computing.
case study: free software

open source is a happy medium between free software and proprietary software that leads to useful stuff, good for business.
open source is trademarked, giving it some instant recognizability and specific criteria for being open source

criteria for open cloud

open interfaces (atompub)
open formats (open document)

http://opencloudinitiative.org

Adrian Cole

Ops Code

@adrianfcole

5 APIs for Provisioning

Provisioning

Allows access to cheap resources
APIs -> automation
Tools exists

Manage Complexity

multi-cloud APIs

abstract what is commoditized
provide a consistent substrate
reduce complexity and lock-in

Dasein Cloud

Written by guy who did first JDBC
Focuses on services

Apache Deltacloud http://deltacloud.org

Ruby implementation
provides REST endpoint. Can use curl to manipulate the clouds.

ruby cloud computing library
compute an storage across many providers (about 6)

jclouds

multi-cloud framework
zero lock-in to cloud apis
written in java
runs in google app engine

libcloud http://libcloud.org

was a python library, with java coming soon
is about compute
works with 16 providers

The Simple Cloud API

Doug Tidwel

http://www.simplecloud.org

The Simple Cloud API brings cloud technologies to PHP and the PHPilosophy to the cloud, starting with common interfaces for three cloud application services: File Storage Services, Document Storage Services, Simple Queue Services.

Joint effort of Zend, GoGrid, IBM, MS, Nirvanix, and Rackspace.

But you can build libraries to support other clouds

Supports 3 areas:

File storage (s3, nirvanix, azure blob, rackspace)
Document storage (s3 doc, azure doc)
Simple queues (sqs, …)

Uses Factory and Adapter design patterns

Eric

Principal consultant with Center Stance

Cloud Consultants: do implementations in the cloud

Not much of an open source person, more of a cloud person.

SalesForce.com, SAAS.
VisualForce is a templating language + Apex (java like) = to do addons for SalesForce.
App Exchange: app marketplace.
managed and unmanaged packages.

managed packages are controlled, no code.

940 packages in the app exchange.
less than 10% of those are open source: about 80 packages.

Cory

Dyn, Inc.

DynDNS.org

Doing DynDNS for over 12 years. 3.5M people using it.
Dynect Platforms: hosts companies like twitter, 37signals, zappos.
Geotarget multiple clouds

Users in EU, go to Amazon EU, users in the Western USA go to GoGrid, users in the Eastern USA go to …
Automatically redirect traffic to servers that are running (active failover)

DNS can give you a slider for your traffic: how much do you want to send to the cloud vs. your own servers? you can base it on latency, on location, on etc.
DNS resolution time is part of overall latency for users. DynDNS is faster (like 32 ms vs 120 ms in example.) that’s 90ms you’re getting back to be able to do more in your own server.

Unpanel

Hahahaha: They asked “who considers themselves an expert on the topic of open source and cloud computing?” Five people raised their hands. “OK, you’re the panel. Come on up.”

How is CC going to change the choice of the dev platform?
Is open source still relevant in cloud computing?
Will open source save us from a handful of monopolies?
What are the implications on hardware? What will change for hardware?

Stuart Smith, Rackspace: Is open source still relevant?

Only if you value freedom.
In fact, it is even more important.
When your proprietary software vendor goes out of business, you still have the software, you still have the license key.
When your proprietary cloud vendor goes out of business, your company is fucked.

Will open source save us from monopolies?

Just being free isn’t enough. There have been other free efforts that have been crushed by monopolies.
You have to have people adopt it.
The only way it is going to work is if everyone gets involved. otherwise cloud computing will be dominated by a few proprietary stacks.

How does this influence our choice of platforms?

With some platforms, like Google App Engine, you either drink the koolaid, or you don’t.

We’re going through this change between latency sensitive and bandwidth sensitive. Everything moving to data centers. highly multicore systems. now losing in the market place to classic out of order design. we’re going to see lots more cores, lots more latency sensitive. gpu assisted. more message passing hardware to avoid going through the OS.

Breakout Discussions

Why open? Open stack. Open cloud.

Open is:

creative commons license on the specs themselves. if the specs themselves are copyrighted, you can’t even tell your customers about them.
patents: you can’t have key technology locked up.
trademarks: when you start talking about “amazon compatibility”, you have problems. so the relevant names must be open for use.
implementations: you need to have multiple implementations.

open design / transparency / open process: so the community can participate, so i can understand the design, what is going on.

open process is hard: because standards bodies are in theory open, but they cost $12,000 to join, so it;’s not really open.
if it’s not open, then other people can’t innovate and move things forward. that’s limited to the standards setters.

then what are the options? a different standards body?

we had an unconference, and invited people to participate, and we were able to learn from each other and move things forward.

(this was on the format used for virtual machines)

Open cloud is:

open formats
open interfaces
open source
open data

“multiple, interoperable implementations, at least one of which is open source”

having an open source implementation does give you a real viable alternative.
example: if there was an open format for microsoft office, and they said, well all you have to do is implement microsoft office yourself, then it isn’t really viable, unless there really is an existing open source implementation.

part of the core of open source is the right to fork.

if you don’t have the right to go, then you are married to the solution (e.g. whoever will buy MySQL)
this would include the right to fork a spec

let the best API float to the top.

Machine Learning and Data Mining in Ruby and R

Posted on April 7, 2010

hertling

My notes from the @pdxruby talk on 2010/04/06

Machine Learning and Data Mining

Randall Thomas

Engine Yard

www.evilmartini.com/blog

Randall’s Slides from Talk
netflix, amazon, google: recommending movies, books and music, links based on your personal experience
- the future is about information…not data (how many gigabytes of data do you have sitting around?)
- if it’s so cool, how come everyone isn’t doing it? it’s hard
world’s shortest stats course
- two types of statistics
  - descriptive: the average height in this room is 5’ 6”
  - inferential: odds are, this horse is going to come in first.
- the two tasks
  - classification: you try to come up with a system for classification (cluster analysis, decision trees)
  - prediction: card counting, i predict that this deck is hot
  - or both: we want to both classify the data and draw inferences about new data
two types
- supervised learning
- unsupervised learning: the way a bayesian filter works… i have no idea what the inputs were, but i can look at the macro behavior, and then make predictions. this is also the way markov models work, the way spam filters work.
R
- heavy-weight lifting tool for statistics
- has shell for working in statistics
5 numbers, one picture
- pallas.telperion.info/ruby-stats
RSRuby
- lets you eval R code
Computer friendly data descriptions
- feature vector: simple 0 or 1 for each feature. beer, wine, whiskey, gin are the vectors. (1 if you like it, 0 if you don’t)
  - attempt bitwise and of vectors
Clustering…
- Simple Geometric: just use the distance formula. If you have 2 dimensions, or 3 dimensions, there is a simple formula. that formula generalizes to N dimensions
- R code: plot(sort(mydata$profits))
Not Simple Geometric Clustering
- Support Vector Machines: create maximal separation of unseparatable data by projecting onto different planes.
- You can seperate into two groups: one that is good, and one that is bad. one that are people attacking your IP ports, and one that isn’t. one that is spam, one that isn’t.
- You can apply the SVM over and over again recursively… this turns into a decision tree.
Read:
- First: Introductory Statistics with R by Peter Dalgaard (2nd edition) – teaching you the basics in a tutorial fashion
- Second: A Handbook of Statistical Analyses Using R by Brian S Everitt and Torsten Hothorn
  - load the free PDF in Rvignette(package = “HSAUR”)
- The Elements of Statistical Learning by Hastie, Tibshirani, Friedman
  - www-stat.stanford.edu/~hastie/Papers/ESLII.pdf
Regression in R
Examples of companies doing this…
- Collective Intellect: doing mining of memes

Kick Ass Mashups: Punk Rock APIs – notes from SXSWi presentation

Posted on March 14, 2010

hertling

Wow, I loved this presentation! Feel like a programming kung fu master… – Will

Revenge Of Kick-Ass Mash-Ups with Punk Rock APIs

Kent Brewster

http://kentbrewster.com/

@kentbrew

Notable Mash-Ups

Google Maps Mash-Up: first recorded AJAX mach-up, probably inspired most of the state of the modern art.
Flickr Blog badges

Punk Rock: DIY ethic

Other generative things

lego blocks, erectors sets, refrigerator boxes
original apple //e

is your site sterile?

users are cows, not customers
real customers are coke and GM
any unauthorized use is abuse

Your existing API

you already have an API: HTML
you’re already being screen-scraped

you know this

If you open up an API, you get pinpoint data about how it is being used.

Sterile APIs: HTML, RSS
Generative APIs: Free.
Punk Rock APIs: use generative APIs to turn sterile APIs into generative APIs.

Job interview at netflix: asked to review code. looking through real source code, he found that they had cribbed his own code. hired.
Netflix Bubble Widgets

single line javascript include

Pipes.yahoo.com

this is why yahoo is still relevant. they are doing amazing stuff like pipes.

Some very little javascript can do amazing things because it relies on Yahoo Pipes to do the heavy lifting.
YQL: yahoo query language. amazing tool.

select * from twitter.search where q=‘earthquake’;
This works because the community contributes tables (see community tables) that actually do the fetching/parsing of the data.

bit.ly/kb_twit
bit.ly/kb_sxsw

used YQL, and a bit of xpath.
filtered results, nice presentation, runs fast.

Advice for Hackers

Go easy on the server. Since every request comes from a separate IP address, client-side mash-ups look like botnet attacks.
Respect robots.txt

Pipes and YQL respect robots.txt

Create and pass an application ID even if it’s not required.
Let the site now what you’re doing. They might hire you.

Advice for Site Owners

Build your API first. Build your site on your API, and then open it up to the community. Example: Flickr.
Whitelist Pipes and YQL: It’s the right thing to do.

They are giving you a free API caching mechanism
Twitter has done it. If you are running up against twitter API limits, try it.

How to open an API where you work

Build an interesting mash-up
Write the documentation for the API you wish you had.
Don’t write a spec. Write the actual docs.
Give it to the back-end guys.

To Be Useful for Client-Side Mash-Ups

Return Javascript
Wrap the requested JSON in the client’s preferred Javascript callback

To be useful for repeated calls… (some complicated stuff I didn’t get)

something having to do with square brackets

Every Javascript reply must have HTTP Status 200

If it comes back with anything else, the browser won’t see the response and the calling script will hang forever.

Demo the Last: Missing Kids CAPTCHA

http://bit.ly/kb_captcha

Questions…

What if a call never returns?

You have to set a timeout. Probably requires a global variable.

Examples of business mashups? Examples of doing it to correct a company’s bad UI?

People are more interesting to me… so not so aware.
Don’t surprise anyone in your IT group. If you should it to your boss, and they think it is awesome, you’ve really stuck the IT group in a corner.

If you’re a company, and you’ve never done this before, go talk to Mashery, or other companies like that.

Raw Notes from Coding For Pleasure at SXSWi 2010

Posted on March 14, 2010

hertling

Building Apps in Your Spare Time

#codingforpleasure

Gina Trapani

write stuff mainly to procastinate writing
Firefox scripts to improve gmail (better gmail 2 0.9.8.1)
ThinkTank – ask your friends

Matt Haughey

Side projects

Wrote fuelly: social miles per gallon.
MetaFilter (1999), written when blogs were still new

Adam Pash

MixTape: playlists shared with friends
Belvedere
Texter: shorthand for your computer. Like textexpander for the mac.

Why should I develop an app in my spare time?

Just built a tool for ourselves (and 25,000 other users).
Just wanted something as clean as possible. Not an overbearing UI like slashdot.
Fill a need… Gmail
Want an archive of tweets.
Very important to scratch your own itch
Ego motivation… opportunity to get users right away, get feedback
You can build anything… that is really exciting.

Pash: I am not a programmer by trade, and I am not a great programmer, but I can still make anything.
Trapani: it’s amazing what you can do now between APIs and the languages available

Don’t expect to make money. Metafilter was a success, but it took 6 years before they made money. There can be a huge slog. If your motivation is only money, you’ll shutter the project. If you build an app you use every day, then at least you can still use it every day.
“The internet is so ready to give you an answer to any problem” — Pash
You can work on stuff that will further your career
If you don’t have an idea you are excited about, then you aren’t going to make it happen.

All the beloved things… twitter, flickr… they didn’t start as a plan to make a lot of money.
How can I do it?

You have to dedicate time.
If you are really excited about it, you can find the time.
The first thing to go for most people is the television. Two hours of veg time at the end of the day is the easiest thing to go.
It can be a relaxing time… just enjoy it, watch TV, plan to put a year into it.
Use frameworks… don’t reinvent the wheel. Rapidly prototype. Google what you need to do, and copy and paste code. Use libraries and plugins that exist, there are plugins for everything.
Collaboration is a big deal

it’s so much more fun to work with someone
it’s so helpful to bounce ideas off something

You really don’t need to be a coder or to hire someone to start. You can go from zero to competent in just about any language about six months.
Dan Bricklan, inventor of the spreadsheet (will: about a billion years ago), was like “iphone development, this sounds interesting“, and went out and bought an MacBook, an iPhone development book, and wrote an app, and put it in the store for $3
Did you ever pay anyone?

Yeah, I don’t really have the skills or competency anymore in design, so I hire some designers. Same for CSS… I don’t have the skills any more to make this work in dozens of browsers. I sent to it to some kids in (the middle of nowhere), and paid them $100.
I’ve never hired anyone because I’m cheap, but I barter with people. “I’ll build something for you if you design something for me.”

Open source

Trapani: everything I’ve done is open source. At lifehacker, we have this big community of people doing open source. Why not use those resources?

There is nothing more awesome than waking up to check your email and finding a code contribution.

But you can’t rely on that. It’s a big commitment for someone to get the code, work on it, and submit a change.

Pitching your idea to the company… to sponsor them

You’ve got to make the case for why to do
Google’s 20% time is a good example to cite
Or it may be synergistic: e.g. for lifehacker it raises their credibility for their employees to be doing open source

Questions…

Talk about ownership when you are working at a company

Check your company’s policy before hand. Some have weird policies like even what you do on your own time is owned by the company.
If you can convince your company to open source it, then it isn’t an issue at all.

I am a developer, and I like to build super-visualize things, but I am not a designer. How can i find someone to work with?

There are some sites to help. But that is kind of a crapshoot.
you network a lot.
Go to an ignite in Portland.
Look up the portfolio of designers you meet.
Don’t go to rubycon to find a designer.
Go to social events or design events.

Talk about programming where you might not want to open source the code. Talk about some successful examples of that.

I had security issues – a giant login system with crappy code. I wanted to keep that code secret.
One motivation to make your code good is to open source it.
But if you can’t do open source… then you have to hire programmers, or find one fan of your work to work with. and still keep it closed.

What about liability…worried about being sued.

I made a music sharing site that uses mp3s shared on servers around the country. So I made an LLC, and now MixTape belongs to that LLC.
Having a terms of service can help. Lawyers can help you do terms of service and LLC for less than $1k.
Or copy and paste from Google or someone else. Something is better than nothing.

Tradeoff with APIs… you are at the mercy of the service. You get a lot, but then the service could go away.
How do you get users? I’m the sole user of like a half a dozen apps.

It’s not easy. Integrate them into whatever you do. For fuelly, we made badges people could put on their blogs. Talk about it on twitter.
Talking to developers about things you made. No one want talks to a PR person. We want to talk to developers.

As a designer, I want to learn programming. Where should I go?

Google is great.

I’m not hearing why the stuff you make is as awesome as it is. What are the decisions you can make, what are the freedoms you have, that you don’t have to make money off it

You are the user. You are the designer. You can make the application what you want it to be. It can be very satisfying.

At what point do you reach break even on the server costs?

I’m spending $100/month for the server, and using AdSense will cover the costs.
You can do “donate a dollar” via paypal, but that is sporadic.
It’s weird to do a project where covering the hosting cost is considered a success.
Amazon referrals, ads, are a passive way to do it.

Share a couple of websites that would be good resources

prototype
jquery
open languages have great documentation… documentation plus comments is amazing.
free git book online
stackoverflow
peepcode
just google your programming question

Raw Notes from Beyond LAMP: Scaling Websites Past MySQL at SXSWi 2010

Posted on March 14, 2010

hertling

Sorry, I got to “Beyond LAMP” about 25 minutes late – my notes don’t include anything from the first part of the meeting.

Updated: Here are some additional great notes covering the beginning part of the session, as well as some more organized notes from the end part.

twitter uses cassandra

no disk seeks when you do a write
no master, you can do write on any machine
when you post a twitter, it gets written into the queue for each of your followers. so if you have 1,000 followers, then it’s written 1,000 times.
cassandra designed to use commodity servers

monitoring

one of the tricks is to know when a machine needs to be replaced when all you have are hundreds of commodity servers
monitor, monitor, monitor
cpu load, file descripters, bandwidth, database connections, database performance, disk space, etc.
need monitoring system, ability to graph, all centrally, so you don’t have to go to individual machines.

Being on the front page of digg crashed server (digg sends a lot of traffic)

Why not memcache the front page? It gets loaded all day long, and it is always the same.
Rewrote the system to only read from database when it’s not memcached. Refresh memcache once per day.
Before change, was 60% db writes, 40% reads. After change was 99% db writes, only 1% reads. All the reads were now coming from memcache.

At twitter, expect that people will come along and read what you’ve written. So they do write-thru caching. The tweet is put first into the cache, then into the database. This way they never need to read the database to get the recent tweets.
when you get beyond a certain point, you can’t analyze the data on a single machine. you have terabytes and terabytes of data.

Hadoop lets you run distributed jobs, that automatically retry when systems go down or fail.
Without this, as the data grows, you end up asking simpler and simpler questions.
With this, you can ask more sophisticated questions.

Scaling search…

they can process 1 to 2 searches per second
search is hard

What was the first thing to blow up for you?

1st was mysql, 2nd was apache.

made the switch over to engineX for serving up images. much, much faster. Using Apache was like using a sledgehammer to server up images.

connection issues with postgres.
migrating data schema when at scale is really hard… turn off indexes before copying data
twitter: one thing that kept our ops team awake at night…

we are a rails app
how do we maintain relationships?
we had it normalize with a follower table: user_id, follower_id in a single table.
lookups against this table were table
they built an intermediate solution… denormalized data structure
while they worked on a longer term solution… they built a custom social graph data tool.
it need to work across 7 orders of magnitude: from someone with 1 follower to someone with 1M followers

Questions

Deployment

twitter uses murder: bit-torrent for deployment. seed some servers, then those servers help feed others. brought main app deployment time from 12 minutes to 37 seconds. check out twitter opensource – they open source most of their tools

Hardware databases

twitter is using some, facebook experimenting with them. they are PCI express cards almost as fast as main memory.

databases versus key stores

it’s natural to go to denormalized – you just want the data you want in the form you want
over time, more of the logic goes into the application code, so database indexes are less useful

how do you manage when data is on particular servers

at twitter, using cassandra, are already consistent

there are bunch of new systems that have different tradeoffs. some have eventual consistency, some don’t. if your application can’t handle eventual consistency, then cassandra isn’t for you.

did anyone consider any of the top ten database, like say oracle

twitter: we strongly prefer open source systems. as we scale, we like to be able to peek under the hood, and see what is going on.
facebook: we like open source, we like the way open source projects work together, we like to be nimble – these proprietary systems are not so nimble.
it’s a combination of openness, ideology, and cost.

berkeleydb versus memcached

memcached is just a wrapper on top of berkely db

William Hertling's Thoughtstream

A writer musing about science fiction, A.I., and the Internet.

Category Archives: programming

Square root of x divided by zero: The speed, size and dependability of programming languages

Database Anti-Patterns from PDX Ruby Tuesday

Notes from CloudCamp PDX 2010

Machine Learning and Data Mining in Ruby and R

Kick Ass Mashups: Punk Rock APIs – notes from SXSWi presentation

Raw Notes from Coding For Pleasure at SXSWi 2010

Raw Notes from Beyond LAMP: Scaling Websites Past MySQL at SXSWi 2010