As more and more organizations start looking at Big Data analytics as a method to gain better business intelligence capabilities to their large databases, another question seems to come up. The question is how can you transition your current network storage, SANs, NAS etc, to support Big Data when they inherently can’t deal with terabytes or even petabytes of unstructured data. It is from this that we are seeing new methods for dealing with large volumes of data that might indeed result in a new storage platform design.
One platform that is stepping up to the plate is Hadoop. I wrote about them previously, as they are right now considered at the forefront of dealing with large amounts of data in a manner that helps organizations retrieve business analytics that previously weren’t available. The best thing about Hadoop is that it LOVES data regardless of if its structured, unstructured or complex, and shines in situations that require support analytics like clustering and targeting.
But how does this tie back to storage? Simply put, Hadoop helps IT folks efficiently store and access large amounts of data. It can do this because Hadoop is designed to run as a platform across many machines which don’t share memory or disks. This means you can buy a bunch of cheap servers, throw them in a rack and install Hadoop. The platform will be broken down and distributed to make it more efficient, and avoids having to store everything in one place. It also adds data replication functionality because it creates multiple copies and therefore can assist in DR situations.
But it’s more than that, because every server in the environment can help process the data, through breaking up the code and transmitting to each server to process their little piece. Once each server has done its individual task, it brings back all the results in one cohesive stream, also known as MapReduce. It is this brilliant piece of ingenuity that makes Hadoop do things that traditional storage just cannot do, it makes every cluster processor work in parallel speeding the whole process up and making it possible in the first place.
Ofcourse, don’t expect to simply install Hadoop into your environment and everything will work. As with any project, there is lots of prerequisites, hardware requiremenst and configuration that needs to be done ahead of time. But it is such an exciting thing to see all these amazing advancements that are coming out of the cloud movement, and it is clear that technologies such as Hadoop are going to be a key part of that change.