Friday, June 29, 2007

MapReduce or How Google Rocks

MapReduce or How Google Rocks

This last week I did a presentation on Google's home grown parallel processing algorithm MapReduce which can be found here.

Google has hundreds of Terrabytes (1TB = 1,000 GB = 1,000,000 MB) of information to process every day. They have to rank, sort, and search billions of web pages across the world. And they have to do it as cheaply and quickly as possible.

Since Google has opted to use thousands of cheap rack servers instead of supercomputers to do all their heavy lifting, their software solution must fit this model. Google also has a dislike for Windows and uses only Linux computers, which are based on Unix. But Unix, as anyone who has used it before knows, has very little facility for parallel processing. (update: Perhaps I should say that Unix has little built in functionality for parallel processing and most parallel processing has to be customized to suit your individual needs/constraints.)

Thus, Google created a platform called MapReduce to distribute their giant processing tasks across thousands of cheap computers.

Essentially MapReduce is a two step process carried out by a myriad of workers and one master controller.

Pseudo Algorithm

  1. Master gathers the location of the worker computers and assigns them to either the Mapping group, or the reducing group.
  2. Master locates all portions of the dataset to be crunched (usually spread across hundreds of servers)
  3. The dataset is parsed and chunked into 64 MB digestible portions
  4. The Map workers pull their chunk of the data and emit <key, value> to the reduce worker
  5. The reduce workers reduce their <key, values> to the desired output and store the final dataset on more file servers

Example - MapReduce Grep

Grep is a Unix command that searches through each line of a file for a specified piece of information and returns the lines that contain that information.

  1. Master gathers workers and datasets
  2. Map workers search through their chunks for the specified search string and emit <lineidentifier, 1> for each line that contains the info.
  3. Reduce workers take all lines emitted, trace back the "line identifier" to the original dataset, pull the line from the file and copy it to the new output file.

Results

Given 1,800 intel Xeon 3 Ghz servers with 4 GB RAM each, MapReduce can search through 1 TB of data and return a new dataset in about 2.5 minutes. How fast is that? Well in comparison my laptop would most likely take about 2-3 weeks to complete the whole task.

Google has a lot of cool technology. I will be posting more work from their repository of research in the next couple of weeks as I will soon be working for Google so stay tuned!

Thursday, June 28, 2007

The Importance of Getting out of Debt

The Importance of Getting out of Debt

In my last post I urged you all to start saving for retirement. I even did some math to show that one of the best things you can do is to start saving in an investment account as soon as possible. Even if you have debt, I explained, starting the Roth IRA now will lead to great advantages upon retirement.

However I would like to expand my analysis a bit.

My friend Will has recently turned me on to Dave Ramsey. Dave Ramsey is one of the leaders of the financial ideology that promotes the Judeo-Christian-Muslim idea of living a debt free life. Will and Dave Ramsey have pointed out that there are several benefits to first getting completely out of debt and then saving for retirement. I think they would also agree with my post against Credit Cards.

Hedge Against Job Loss

First, Will and Dave point out that if one pays off debt first and builds up a little emergency fund then a hedge against job and income loss is formed. The essential thing is that while times are good, there is plenty of money to pay back loans and monthly expenses. However when times are bad, there is no money to pay back debts which always have to be paid back. As Dave notes: Most Americans are about three pay checks away from bankruptcy.


Let No Man Be Your Master

Second, Will and Dave point out that when you are indebted to someone, you are also enslaved to them in some fashion. Have you ever thought to yourself, "Gosh I would sure love to do X, but in order to pay back my debts I have to have a job and I can't do X and have a job at the same time :-/ " ? If you have, you know what it means to be a slave to your creditor.

Psychological Freedom

While it may be true that some folks can borrow and borrow and eventually go into bankruptcy and never feel any remorse, guilt or shame, most of us feel a sort of a shadow hanging over us when we go into debt. I personally have anxiety and much doubt when I go to sign for a loan or put a good bit of cash on credit card. This psychological/spiritual burden is so prominent that Dave Ramsey actually has his callers who have just completed their debt payments shout at the top of their lungs, "I'm debt freee!!!"

So while it may be true that one can end up with slightly more money at retirement by first investing in retirement and then paying off debt, there are several great advantages to being out of debt as fast as possible: hedging against income loss, an end to financial slavery, and psychological freedom.

Tuesday, June 26, 2007

To Save or not to Save: A Tale of Why I love the Roth IRA

To Save or not to Save: A Tale of Why I love the Roth IRA

As an applied mathematician I feel the constant need to prove my worth by working on practical problems for the good of the general public, like a mathematical superhero of sorts. So far I have solved problems/questions such as "How much of the earth needs to be covered in Solar panels to power the world", "What is the probability of getting a meaningful 4 letter acronym from the subtitle of your blog?". I also did "The Mathematics of Credit Cards" in which I ranted against the wretchedness of the credit card industry.

Sustainability and Responsibility

Most people see sustainability as relates to industry and the environment, I see it also in terms of societal and familial preservation. In this article I hope to give a demonstration of how investing in a Roth IRA retirement account can lead to a sustainable retirement in which neither your children, nor your fellow citizens will be taxed by your failure to think about the future when you were younger.

First, how much will it cost to retire? Assuming that you retire at 65 and live to be 85 you will need at least $30,000 dollars per year (assuming you own your house and cars). This totals about $600,000 and leaves little room for error, or inflation. In fact, if we account for inflation, we will need to save $850,000 (3.5% annual inflation). And that's if we retired today! If, like me, you retire in 40 years or so, only spending $30,000 a year means you'll need about $3.3 million due to inflation!

If I made the equivalent of $50,000 a year with a %3.5 annual raise for the rest of my life then I'll have to save about 80% of each pay check in order to reach $3.3 million!

But that's impossible! Who in the world beside Tiger Woods and Bill Gates can save 80% of their pay check?

Off in the distance we see our good friend Roth IRA charging towards us on his white steed. "I'll help you!" he says!

"But how?" we ask.

"By the power of growth funds, stocks, bonds and other compound gains financial instruments....and it's all tax free growth too!"


"But Roth IRA, we heard that the stock market is volatile, and that we could lose all our money if terrorists collapse and the government fails. Shouldn't we invest in Gold?"

"No!" a voice booms. Oh look, it's Dave Ramsey, the guy who keeps telling me to get out of debt. "Gold averages about 2% return on investment since the 1900's. And in time of crisis, it's no where near as liquid as you think! Besides, starting in 1900, even with the Great Depression, the stock market averages a 12% return per year!"

Wow, thank you surprise visitors! So if we want to avoid capital gains taxes (to the tune of 15% of all gains made!) we simply chip in money to our Roth IRA. The only limitation is that you can only put up to $4000 per year into it. Luckily the maximum contribution is adjusted bi-yearly for inflation (every two years I'll get to invest more into it).

So let's see, if I chip in only about 8% of my $50,000 salary, with only 10% growth on average, I will be left with $3.3 million by the time all my hair turns gray! Sweet! That's exactly what I need!

"But wait!" Dave Ramsey exclaims. "Shouldn't you pay off your debts first, and then start to invest in retirement? After all, the credit card companies and students loans are charging you an average interest rate of 15%!"

Well let's do the math. If I have $1000 at the end of every month which I can pay towards debt, my Roth IRA, or a regular (taxable) investment account how much will I have at the end of my 40 year working career?

Let's assume I have $50,000 in assorted debt at 15% interest, and that my IRA earns 10% a year. Furthermore, my regular investment account also earns 10% a year, but I make trades (and thus pay capital gains taxes) on a third of my stock thus limiting my growth to 5%.

Scenario 1

If I pay off all my debt first with my extra money, and then begin to save in the Roth IRA and the regular investment account I will end my career with about $4.5 million.

Scenario 2

However, if I first contribute the maximum amount to my Roth IRA, and then take the leftovers to pay back the debt, and after the debt is paid back continue investing in my regular account I will have about $4.8 million in the bank.

The difference? If you start investing in the Roth IRA at the very beginning it will take you 3 years longer to pay back that $50,000 debt, but during that time you will have been able to contribute about $30,000 to your Roth IRA which will grow to be about $300,000 by the time you retire.

Whether you follow Dave Ramsey's or my suggestion one thing is clear, the sooner you start to save on that Roth IRA the better. Every year earlier that you start saving will be a year that when you retire you will thank your younger, wiser self.

If you would like to start a Roth IRA you can use the same company that I use, Edward Jones. In fact if you give my good friend Ryan Russell a call I'm sure he would appreciate your business!

Note: I have a script saved on my computer for calculating retirement savings, if you would like to see your projected retirement savings email me your financial assumptions and I'd love to help you out!

Friday, June 22, 2007

My Wife

My Wife

Malia and I have been married for nearly 6 months now. It seems like just yesterday we were getting engaged atop St. Mary’s glacier. Then we were planning our wedding in what felt like hardly any time.

We spent many nights up late planning and talking and waiting until the last minute to say goodbye for the evening.

And now we’re married and go to sleep by 10:30 or 11:00 pm at night. And it’s wonderful!

I’ve learned so much about Malia and about myself. I know that she feels the same. And it’s only been 6 months…how much more will we get to learn in the coming years?

She’s learned to live with my dreaming/ranting about Solar Parabolas and how we should never pay for electricity again. She understands that I have a world changing idea every other day, only to be replaced by a new one soon after.

And she’s learned technical terms like PHP and Matlab. She even enjoys watching documentaries with me!

And though it’s only been six months, and we’re still naïve newlyweds I can say without a doubt that marrying Malia has been the best and most rewarding thing I’ve ever done.

As it says right there on the side of this blog…I love my wife!


Tuesday, June 19, 2007

Machine Learning - Introduction

Machine Learning - Introduction

After my latest blog on "Why Robots are the way of the future" I realized I have not really written about one of my greater passions...Machine Learning/Artificial Intelligence.

Machine Learning and Artificial Intelligence are not quite the same thing. Usually Machine Learning falls under the category of Computational Science, Mathematics, Computer Science and Statistics. Artificial Intelligence usually belongs to the Computer Science, Cognitive Science, Psychology and perhaps Mathematical Philosophy departments.

Machine Learning is generally the field of study related to the question: How can we teach a machine to perform a certain task as well as it can be performed?

However, Artificial Intelligence is generally an answer to the question: How can we simulate (or reproduce) the cognitive functions of humans/animals/intelligent beings? A lot of Artificial Intelligence seeks to model the human mind, which doesn't always do things optimally.

Both fields of study are important for developing algorithms and systems that are able to interact, aid, and enhance our lives. For instance, Machine Learning was used to create software to control a refinery. Up until the 1980's most refineries and their myriad pipes, valves, and gauges were controlled by humans. Any potential catastrophes had to be averted by alert workers. However, ML systems were developed that could optimally monitor and control refineries...even better than humans can! Furthermore, as conditions in the refinery change over time the software adapts and retrains itself without having to be reprogrammed over and over.

Interestingly, many of the Machine Learning algorithms are fairly simple in their approach. For instance, a basic classification algorithm is as follows:

Given a set of descriptions x and their associated objects/predictions y which are part of a set of classification categories {y_1, y_2, ... , y_n},

Build a model (a brain)

- For each unique classification y:

-take mean(x) for all x associated with the classification categories y_n

Make Predictions

- take an unclassified description x and evaluate the distances to each of the category means

- the unclassified description x will be classified according to the closest mean

Enhance the model with new examples

-given additional information x and y, recalculate the mean(x)'s with the new information...the model is now enhanced.

For example: Bobby stands by the road and for every vehicle that passes by quickly measures the length of the vehicle. Bobby also writes down whether the vehicle was an "18-wheeler" or "other". Using the above algorithm he calculates the mean "18-wheeler" length to be 20.4 feet, and the mean "other" vehicle length to be 10.9 feet. A blind girl named Jane comes along and says that she has just measured a vehicle that is 16.9 feet long, but can't tell if it's an "18-wheeler" or not. Bobby says that since the unobserved vehicle is closer in length to an "18-wheeler" it must be one!

This may not seem like a very intelligent algorithm (and it's not) but it does demonstrate one key feature of an intelligent algorithm, the ability to form it's internal computing algorithm via external data.

In the next few articles on Machine Learning I'd like to discuss some other more intelligent algorithms such as Support Vector Machines, Neural Networks and Random Forests...all three of which are some of the coolest and most effective AI/ML algorithms. Until then, see if you can think up your own intelligent algorithm...or use for an algorithm. I'd love to hear about it!

Robots are the Way of the Future

Robots are the Way of the Future

I have this really cool robot called Roomba made by iRobot. It vacuums the floor for Malia and I. All we have to do is push the clean button and away he goes zipping around the carpet in a semi chaotic pattern until it's all clean. And it works too!

This is a great development for 2 reasons

1.) It's just cool.

2.) It really does save us time.

Statements like Robots are the way of the future may seem kind of cheesy and cliche. However, I would like to say that there are plenty of cool robots out there. And as Artificial intelligence and Machine Learning develop, robotics will only become more and more useful.

A few months ago on PBS's Nova a program about the Great Robot Race , sponsored by DARPA, was aired in which many vehicles competed to be the first completely machine driven vehicles to make a trek across a desert in California/Nevada. The winner was a small SUV driven without the help of GPS (which is amazing!). It operated on stereo vision, laser vision and a few other sensors, all controlled by a computer with Machine Learning software. Essentially the researchers who created the small intelligent SUV taught the car to drive in the dessert, and it succeeded on the gruelling 150 mile course with an average speed of more than 25 mph!

The significant thing about robots like the winner of the DARPA challenge and Roomba, is that they are actually learning the task set out for them. By learning I mean that the robot's own internal algorithms are being modified as the robot carries out it's task in response to poor or good performance. If you think about it, we learn this way most of the time. In fact, our whole public education system falls under the category that these robot's education falls under: Supervised Learning. (Well ideally at least, a lot of education falls under the category of the 'Expert System', where the teacher poses as an expert and relays his wisdom and knowledge to the student via memorization.)

Supervised Learning is the process where a student is presented a set of examples (descriptions and their corresponding object), and asked to perform a similar identification task on some previously unseen examples. The performance on the unseen examples is evaluated by a supervisor, and the feedback given to the student. The student takes the feedback and reshapes her internal processing algorithms. The process repeats until the supervisor is satisfied with the student's performance.

Robots can even become so good at learning, that they can interface with our brain's functionality! In this story, a man has a prosthetic arm attached to his body and nervous system. Over time, he and the robotic arm are learning to communicate with each other...now that's cool! Star wars is no longer the fantasy world.

I love Machine Learning and Artificial Intelligence, not only because we can use it to make prosthetic arms and automatic carpet cleaners, but because it teaches us something about ourselves. When we explore the world of learning we also question how it is that we know things. We question what knowledge is, how we obtain it, and ultimately what reality is. And at the end of those questions we will always have more, but that's the point of learning!

Thursday, June 14, 2007

They Took My Toothpaste!

They Took My Toothpaste!

Last night my Mom flew to Houston from Denver. As she passed through the TSA security check point, the guards notified her that she would have to relinquish her recently purchased black cherry jam. You can imagine the disapointment she must have felt since black cherries don't really grow in Houston.

I experienced a similar disappointment last year when flying to Cozumel. I had just bought some new toothpaste. In my excitement to get to Mexico I forgot all about having packed in my carry on baggage. I forgot about it until, that is, I was notified that I would have to hand over my toothpaste. Doh!

To me it was a feeling of disappointment and bewilderment, akin to having my bike stolen a few years ago. This was further compounded by the regret that I could have saved my toothpaste from the grubby paws of the TSA had I put it in a ziploc bag. Why a ziploc bag anyway? It must have something to do with explosives. But it borders on absurd to think that somehow would-be-bombers are thwarted by putting their liquid explosives in plastic bags!

The question arose in my mind, "What do they do with all the contrband anyway?" My mother and I can't be the only ones who have forgotten to put our jams, toothpastes and other dangerous fare in protective zip-loc bags.

It turns out that the airports are turning a tidy profit on these items! http://www.cbsnews.com/stories/2006/08/12/national/main1890143.shtml . This article reports that the Pennsylvania airport is making about 100-200 thousand dollars a year auctioning off those confiscated pocket knives and scissors on eBay!

And as for the toothpastes and black cherry jams of the world? The fortunate (or unfortunate) community of homeless get those. While I don't mind that the homeless are getting a break at my expense, I do mind that the TSA apparently has so little concern for their safety that they are shipping the homeless hundreds of potential bombs and chemical warfare agents!

But you'd have to be nearly subhuman to be that blatently calous and uncaring for the poort. No, there's probably an easier explanation. The real reason they ship my toothpaste to the homeless is that the TSA has very little conviction that my toothpaste and my mother's black cherry jam is anything other than what they are supposed to be. Which reminds me....why did they confiscate my toothpaste again?

Tuesday, June 5, 2007

Airplane's are Awesome!

Airplane's are awesome!

Last Wednesday I got to go flying with my good friend Josh. He and his instructor, allowed me to come along with them as Josh practiced his short runway and grassy field landings.

Shortly after 9:30 AM we were taxiing down runway 11R, getting ready for our turn on the tarmac. The air was buzzing with the sound of the propeller and a few of the other neighbouring plane's engines. Luckily Josh got me a pair of airplane head phones which both block out all engine noise and with the attached microphone allow everybody in the airplane to talk to each other!

Finally our turn came. We were positioned at the beginning of the runway facing south. There was a wind blowing from the North (I think). Josh gave the engine it's full power and kept the brakes on. As the plane rocked a bit he finally released the brakes and off we jumped. Within a few feet, the nose was off the ground.

As Josh concentrated on keeping the nose in the air and the tail from smashing into the ground we wiggled down the run way. As I was to later learn, the wiggling, or more precisely, the yawing, was due to the fact that the ultra concentration of our brave pilot caused a lack of concentration on the rudder. Thus as we took off I realized that riding in a Cessna 172 is nothing like riding in the comfort of a 737!

We climbed high into the air and I could see all of Boulder, Broomfield, Superior, Louisville and some of Lafayette. Since it had snowed in the higher elevations the previous night, the Front Range mountains were quite majestic!

After about 7 minutes, the Erie Airport came into view. We made some preparations and then we landed...but we didn't stop! Nope, we just sped up and kept on flying. Once in the air we circled back around and did the same thing about 5 more times.

Finally our pilot and the instructor had grown tired of terrorizing me and decided that it was time to fly back to Jeffco for a real landing. After making one pass and being rejected by the air traffic controller, we were finally allowed to make a final safe landing.

Flying around the Boulder area in a little plane was a really wonderful experience. It still amazes me that human beings have managed to get themselves off the ground and into the air at such great heights. The thrill of flying and the wonder that it inspires have created a great desire in me to one day (hopefully) soon acquire a private pilot's license!

Thanks Josh for your wonderful airplane skills!