Traditional data analysis fails to cope with the advent of Big Data which is essentially huge data, both structured and unstructured. Much more is needed that being able to navigate on relational database management systems and draw insights using statistical algorithms.
The good news is that the analytics part remains the same whether you are dealing with small datasets, large datasets or even unstructured datasets. What is needed the most in big data is the ability to draw relevant information from the humungous amounts of data being processed every minute. This requires technology to join hands with traditional analytics.
Let us now look at some of the key skills needed for being a big data analyst –
While traditional data analyst might be able to get away without being a full-fledged programmer, a big data analyst needs to be very comfortable with coding. One of the main reasons for this requirement is that big data is still in an evolution phase. Not many standard processes are set around the large complex datasets a big data analyst has to deal with. A lot of customization is required on daily basis to deal with the unstructured data.
Which languages are required – R, Python, Java, C++, Ruby, SQL, Hive, SAS, SPSS, MATLAB, Weka, Julia, Scala. As you can not knowing a language should not be a barrier for a big data scientist. At the minimum one needs to know R, Python, and Java. While working you may end up using various tools. Programming Language is only a tool and more tools you have in your kitty, merrier it is.
2) Data Warehousing
Experience with relational and non -relational database systems is a must. Examples of non- relational database include – Mysql, Oracle, DB2. Examples of non-relational database include – NoSql : Hbase, HDFS, MongoDB, CouchDB, Cassandra, Teradeta, etc.
3) Computational frameworks
A good understanding and familiarity with frameworks such as Apache Spark, Apache Storm, Apache Samza, Apache Flink and the classic MapReduce and Hadoop. These technologies help in Big Data processing which can be streamed to a great extent.
4) Quantitative Aptitude and Statistics
While the processing of Big Data requires great use of technology, fundamental to any analysis of data is good knowledge of Statistics and linear algebra. Statistics is a basic building block of data science and understanding of core concepts like summary statistics, probability distribution, random variables, Hypothesis testing framework is important if you are data scientist of any genre.
5) Business Knowledge
To keep the analysis focused, to validate, sort, relate, evaluate the data, the most critical skill of a big data scientist is to have a good knowledge of the domain one is working on. In fact, the reason big data analysts are so much in demand is that its very rare to find resources who have a thorough understanding of technical aspects, statistics and business. There are analysts good in business and statistics but not in programming. There are expert programmers without the know how of how to put the programs in the context of the business goal.