Big data and security

The advent of big data is one of several current significant changes which will fundamentally alter the IT landscape. It has both security challenges as well as security opportunities.

Big data is a phenomenon that has emerged, as increasingly large and complex data sets are stored, and organisations wish to gain the advantages that big data analytics offers. As the costs of data storage decreases, increasing data feeds from mobile devices, logs, sensors, surveillance tools and social data from social networking, supply diverse, complex data into the data pool.

A 3Vs model is used to describe big data: Volume (the amount of data and the size of the data sets), Variety (complexity and diversity of the data type), and Velocity (speed with which the data comes in and goes out). A forth V for Value should also be part of the mix so that only relevant data is analysed and properly interpreted. Conventional relational database methods fail under conditions of high volume, variety and velocity. New technologies such as those involved in Hadoop and Splunk, are used to process big data. IBM has pioneered the use of silicon nanophotonics on chips to speed up the processing of big data.

The storage and processing of big data has security implications. Much of the data is sensitive, providing privacy challenges, threats of identity theft and the threat of theft of proprietary information. Data at rest needs to be encrypted and in some cases data masking is appropriate to camouflage sensitive data. Network communications need adequate access control. Transactions and anomalies need to be logged, and particularly administrator activity should be closely scrutinised. Nodes need to be monitored and applications permitted into the cluster need to be properly vetted. Currently, not many organisations have the capability of analysing terabytes of data, so this function is often outsourced, introducing additional security hazards.

However, the emergence of big data analytics provides opportunities for increased security. Big data will increasingly be used to compile security profiles of individual behaviour. The theory is that a hacker’s behaviour or malware behaviour will be contrary to the normal behaviour of the user (a deviation from a baseline), so can be identified if normal behaviour is defined from big data analytics. Crowdsourcing – incorporating data from other users and organisations in the network – will be used in refining the normal baseline. Advanced statistical methods and heuristic behavioural analysis is applied to the data to define the normal baseline. In this manner big data analytics will be able to detect advanced persistent threats and areas where traditional malware signature scanning fails by quickly identifying deviations from the normal.

High risk industries such as banking, insurance, government, will be the first to analyse big data looking for trends that can then be used to trigger security alarms.

The first challenge for CISOs and for the security industry is to come up with ways in which new valuable data feeds are introduced into big data analytics to better define the baseline normal behaviour and highlight deviations caused by malware and hacking. This method will undoubtedly outperform existing malware and hacker identification methods. The second challenge for CISOs and the security industry is to then present the analytics in a valuable, meaningful and non-complex manner. In this way, big data analytics will become an invaluable tool in the security arsenal.

Leave a Reply

%d bloggers like this: