Data is the New Oil

Deep Learning is a revolutionary field, but for it to work as intended, it requires data. The area related to these big datasets is known as Big Data, which stands for the abundance of digital data. Data is as important for Deep Learning algorithms as the architecture of the network itself, i.e., the software. Acquiring and cleaning the data is one of the most valuable aspects of the work. Without data, the neural networks cannot learn.

Most of the time, researchers can use the data given to them directly, but there are many instances where the data is not clean. That means that it cannot be used directly to train the neural network because it contains data that is not representative of what the algorithm wants to classify. Perhaps, it contains bad data, like when you want to create a neural network to figure out cats among colored images, and the dataset contains black and white images. Another problem is when the data is not appropriate. For example, when you want to classify images of people as male or female. There might be pictures without the tag or pictures that have the information corrupted with misspelled words like ‘ale’ instead of ‘male.’ Even though these might seem like crazy scenarios, they happen all the time. Handling these problems and cleaning up the data is known as data wrangling.

Read more