In my mind, computing philosophy basically breaks down into two camps. The first camp believes that anything can be done with an OS and a piece of general purpose computing hardware. These are the generalists, and can usually be found running Unix of some flavor.
The second camp believes that optimizing the hell out of a problem can only be done by dedicated hardware that is purpose built to attack the problem under consideration.
There's a third camp, that's really in between the two, which believes in a hybrid approach. Which, incidentally, is the right camp to be in, as you'll always have a date... I mean, you'll be taking the best out of both worlds and leaving the suboptimal in the camps to be set fire to at a later date.
If you think about it hard enough, this is really the reasoning behind dedicated silicon firewalls, network devices and storage devices. Purpose built hardware platforms that deliver highly optimized and highly performing services that are a subset of all services.
Why I bring this up is that there is a heartbeat of the unbalanced in the data warehousing world. Leaving aside the 800lb gorilla of Teradata for the moment, the Data Warehousing Cabal (DWC) seems to be pushing down the path that the only way to be effective with data analysis and manipulation is to use dedicated DW appliances. They're not wrong, but there's a difference between being mostly wrong and a little right. (Ever see The Princess Bride? Same principle.)
The right answer is to abstract the discussion away from the technology (HA!) and evaluate what your data needs really are. The closer you are getting to the business transaction that is generating historical data, the more likely you are to be in a relational database of some form that really isn't there to support anything other than operational transactions and recent-history reporting. The further away, both in terms of location and time, from that same transaction, the more likely you are to be dealing with a summarized (or at best aged) version of the truth of that transaction, and the closer you are to a data warehouse (if not a data archive).
In my view, the greatest opportunity that the Greenplum/Exadata/NEOs of the world offer us is to build massive data archives of transactional data at a fraction of penny/cent per txn that can be reasonably well interrogated for a similar cost. The true battle ground is in the in-between, where OLTP and DW overlap to a certain extent. This is where the computing philosophy needs to be applied, but tempered with a real understanding of the business needs that are being serviced. If your business needs are really analytics-based, then put quite simply, traditional relational databases will not be able to meet the cost test applied to them as their compute and storage needs will simply price them out of existence. If it's truly operational work that you're trying to do, then work on a copy of the database in it's purest form and move on. And if you're building a 3rd camp there, I would kindly invite you to set yourself on fire.
No comments:
Post a Comment