Friday, October 22, 2010

Solving Linear Equations Just Got Much Faster

In computer science, we measure the efficiency of an algorithm in a couple of ways, but one of the most important is called "asymptotic complexity", usually written in something called big-O notation. This notation captures an algorithm's scalability as the problem size gets larger. For example, an algorithm with O(n) performance takes about twice as long to run if the problem size doubles. An algorithm with O(n^2) performance would take four times as long.

You can get a feel for this without computers. Most of us have sorted books alphabetically before, by title or by author. If you have just a few books, say ten or fewer, you can usually eyeball it and get them sorted without using any special algorithm. What you are really doing here (I think) is a sort of mental version of what we CS types call a "selection sort" - you are finding the alphabetically first book, placing it first, then finding the next one and placing it, and so on. With a small set of books, that works fine. But suppose you have 1,000 books to sort. Now the problem is much harder: you can't store the 1,000 books in your memory, so the only practical way to carry out that algorithm is to pick up the first book at hand and then start checking all other books. As you find one alphabetically earlier, put down the book you're holding and pick up the new one, then continue until you've looked at all the books. The book you're holding is the alphabetically first one, so place it on the shelf. Great! Now repeat another 999 times.

This takes a while. In big-O terms, it's an O(n^2) algorithm, which means that running it for 1,000 books takes about 100^2 = 10,000 times longer than running it for 10 books. Not very practical!

You probably intuitively understand this, because if you were really given the job of sorting 1,000 books, you wouldn't do it this way. An alternate (and much faster) way is to start by sorting the books by first letter: you put all the books with authors starting with A in a pile, all the books with authors starting with B in another pile, and so on. Eventually you have 26 piles, some bigger and some smaller. The average pile size will be 1,000/26, or about 40 books. If you just ran selection sort on the piles, each would take about 4^2 = 16 times longer than sorting 10 books, and you'd have to do it 26 times, so you end up doing 416 times more work than sorting 10 books. But that's a 24x improvement over just running selection sort. (To the detail-oriented: yes, it's true that since the piles are different sizes, you won't actually do quite this well in practice. But you can easily make the large piles small by running our "binning" algorithm on the second letter for those large piles: take the S pile and make new piles for authors starting Sa, Sb, Sc, and so on.) (Also: but what about the time needed to split the the 1,000 books into bins in the first place? The good news here is that deciding which pile to put the books into is easy: just look at the book. So you just look at each book once. That takes some time, but it's not significant compared to the rest of the algorithm, which involves comparing pairs of books. In big-O terms, this is an O(n) contribution to the overall algorithm, so it doesn't change the overall complexity.)

So, now you understand how complexity works. We care a lot about the complexity of algorithms that we use all the time. For example, above I talked about sorting, because we sort all the time. You can hardly take two steps in your average code base without tripping over a sort. It turns out that sorting (assuming nothing about the data being sorted) takes O(nlogn) time, meaning that it scales almost linearly: that logn term rises slowly, but it's there. O(nlogn) is slower than O(n), but it's much, much faster than O(n^2). Suppose an algorithm takes 1 second with 1,000 data points. Then an O(n) algorithm would take 1,000 seconds with 1 million data points. An O(nlogn) algorithm would take 3,000 seconds (we'll use base-10 logs here). But an O(n^2) algorithm would take 1 million seconds.

Another problem that comes up a lot in CS (not as much as sorting, but a lot) is solving systems of linear equations. Here the measure of problem size is "number of variables", and we'd like to solve systems with lots of variables: millions, even billions of them. The naive, basic algorithm for solving them is something called Gaussian elimination, and it's an O(n^3) algorithm - that's cubed. As you can imagine, this performs pretty badly with large numbers of variables. With a million variables, an O(n^3) algorithm will run a billion times slower than it would with 1,000 variables. So if you could solve a system of a thousand variables in a microsecond, it would take 1,000 seconds - almost 17 minutes - to solve one with a million variables. There are some algorithms that solve linear systems slightly faster than this, but improvements have been slow.

So along comes these Carnegie Mellon researchers. They've figured out a clever algorithm that reduces the complexity of solving (certain kinds of) systems of linear equations to O(n(logn)^2) time. It's hard to express how huge an advance this is. Take our example above: where solving a system of 1,000 variables takes a microsecond and solving a system of a million variables takes 17 minutes. With the new algorithm, you could solve the larger problem in 9,000 microseconds, which is still just 9 milliseconds. Or you could solve an even larger problem - a billion variables - in just 36 seconds. This is an extraordinary achievement.

Read the whole story here.

UPDATE: I meant to mention this in the original post, but forgot. It's worth worrying a bit about the constant. The problem with "asymptotic complexity" is that it only gives an ordering of runtimes with very large problem sizes. For small problem sizes, the constant, or even constant-time or other lower-order terms, can dominate the actual run-time.

The constant shows up as follows: suppose that algorithm A has runtime of approximately n seconds for problem size n, and that algorithm B has runtime of 10^-10 n^2 for problem size n. Then algorithm A has O(n) complexity and B has O(n^2) complexity. But for problem size n=1,000, A takes 1000 seconds while B takes only 10^-4 seconds. For problem size n=1,000,000, A takes 1,000,000 seconds while B takes 100 seconds. As you can see, B is much faster over a very wide range of problem sizes. So even though B is "slower" in the asymptotic sense, it's probably faster in most real cases.

Lower-order terms can matter, too: suppose algorithm A has runtime of 10^-6 n + 2 seconds, while algorithm B has runtime of 10^-6 n^2 seconds. Both A and B have the same constant. But for problem size n = 1,000, A takes about 2 seconds while B takes 1 second: the added 2 seconds for A (presumably for setup and precomputation, or something) dominates the runtime. For much larger problems, of course, A may still be faster: its asymptotic complexity is O(n) compared to B's O(n^2).

So this new algorithm has great asymptotic complexity, but it's worth waiting until we see the constants, lower-order complexities, and other things (like how well the algorithm fits into cache, how well it can be parallelized, etc.) before considering it a big win.

No comments:

Post a Comment