MIT builds tool that could assist Asian Big Data plans

By GovInsider

Artificial intelligence can help data scientists be more efficient, rather than put them out of jobs, researchers believe.

Researchers in MIT have built software that could help solve a global shortage of data scientists and analysts. The United States is short of 140,000 to 190,000 data analysts. In Asia, Malaysia has said that it needs 1500 data scientists by 2020. On last count in April, there were only 80.


Singapore has said that data science and analytics is “at the heart” of its Smart Nation movement. Both governments have launched massive online open courses to help citizens get skilled. MIT’s Computer Science and Artificial Intelligence Lab has built a prototype that can help: it can draw patterns from vast amounts of data much faster than humans.


The Data Science Machine is a software that uses artificial intelligence to sift through Big Data, pick the right variables to analyse, and develop a model to predict patterns from the data. The team behind it believes that it can make data scientists more efficient, rather than putting them out of jobs.


“We view the Data Science Machine as a natural complement to human intelligence,” said Max Kanter, whose MIT master’s thesis is the basis of the prototype. They enrolled the software in three data science competitions, and it finished ahead of 615 of the 906 participating teams.


While the teams typically took months to build a prediction algorithm, the software could build the models in two to 12 hours. In two of the competitions, it was 96% and 94% as accurate as the winning human submissions. In the third, it was a “more modest” 87% as accurate, the lab said.


“Typically the Data Science Machine does well and beats many humans, but some humans beat it,” Kanter told IEEE, an association of engineering professionals. “So it would be naive to say that human data scientists don’t have any value.” The MIT lab is now tweaking the software to allow for more human control, instead of leaving data scientists out of the loop.