Padlock on keyboard
What does ICO’s recent update mean for Big Data?
30 June, 2017
Event | 1st International Conference on the Frontiers and Advances in Data Science (FADS)
3 August, 2017
Show all

Machine Learning in the World of Big Data

The concept of machine learning is becoming ever more important not just in the world of computer science but also in business. The explosion of data has given this technology a new lease of life and an impetus that is becoming difficult to ignore.

Machine learning algorithms have been extensively used in the Google self-drive car. They are also heavily relied upon providing the backbone for the recommendation engine from the likes of Amazon and Netflix. Banks also use machine learning as part of their tool kit for fighting fraud. So along with use in email spam filters, web search engines, online ads and credit scoring it is becoming harder to escape the use of machine learning algorithms.

So how can businesses use machine learning? Any organisation that is generating large quantities of data has the potential to benefit. No large dataset, however carefully planned, designed and thought through, will not have issues. Whether that be inconsistencies in the date format (for example dd/mm/yyyy in the UK market and mm/dd/yyyy from the US) or missing values simply because no data exists for that field. Machine learning can be used to help clean up the data.

Then there is insight into the data. Companies that can understand their customer requirements have a better chance of keeping that customer. Those that can predict a trend before it becomes a trend have a better chance of reacting and growing. Machine learning algorithms are very good at identifying patterns in large datasets. They can be used to uncover hidden themes through sorting, cataloguing or grouping elements of data.

There are a wide range of tools that are available to the data scientist. These range from the API’s which require a high level of programming knowledge such as Scikit-learn, Mahout, MLlib to package extensions such as R, SAS or Stata. For those data scientists who don’t have the extensive Java or Python programming experience the extensions to standard analytics packages will be an easy way into machine learning.

The tools are available and in most cases free. The Open Source community has really embraced the big data arena and are providing a plethora of tools to assist companies and data scientists.

Blog post by Richard Skeggs (Senior Research Data Manager at the Business and Local Government Data Research Centre), please email us if you have any questions about the contents of this post.

Published 26 July 2017