Data science involves using computers to analyze large data sets. One of the main tools of data science is machine learning, the discipline of teaching a computer to recognize patterns and to infer rules. In supervised machine learning, the computer is presented with a set of objects (called instances in data scientist’ jargon) whose properties are known, and asked to learn their salient characteristics in order to make predictions on other objects whose properties are only partially known (like in the example shown at the bottom). In unsupervised machine learning, the computer is asked to learn how to recognize and characterize instances, often for the purpose of either grouping together similar objects, or singling out information-rich outliers. Data science also involves representing and visualizing data in an efficient manner. This discipline is becoming increasingly crucial in many applied and experimental sciences, from Astronomy to Biology to Physics to Medicine. Technological advances have led to a steep increase in the size and complexity of data sets, causing many techniques that were used for decades to become obsolete, and pushing us to develop new ones that are suitable to analyzing the new generation of data.