Artificial Intelligence and machine learning

The pursuit of Artificial Intelligence (AI) has a long history. It's even before the invention of Computer. But it only became practical with the development of Computer Science. Normally, we believe the birth of modern AI took place in Dartmouth Conference in 1956, in which the term "Artificial Intelligence" was proposed. Since then, researchers started to investigate AI from multiple disciplines, some of them succeed, and some of them failed. For a review of AI history, you can read this Wikipedia page.

AI research has many branches, like reasoning as search, robotics, etc. Currently, the most successful AI methodology is Machine Learning (ML), and we can say:

AI $\approx$ ML

Machine learning was firstly proposed by Arthur Samuel, as "Field of study that gives computers the ability to learn without being explicitly programmed" [1]. Below are some other definition of machine learning:

“Machine learning is based on algorithms that can learn from data without relying on rules-based programming.” - McKinsey & Co.
“Machine learning algorithms can figure out how to perform important tasks by generalizing from examples.” – University of Washington.
“The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?” – Carnegie Mellon University.
"Machine learning research is part of research on artificial intelligence, seeking to provide knowledge to computers through data, observations and interacting with the world. That acquired knowledge allows computers to correctly generalize to new settings. "- Yoshua Bengio, Université de Montréal.

These definitions have their own emphasizes, but the main idea is the same: machine learning is about building algorithms to learn automatically from data.

Simply speaking, the purpose of ML is to find a function**: $y=f(x)$**, in which $x$ is input data, and $y$ is output data. $f()$ is the function (or model) to convert $x$ to $y$. Here are some examples:

Language translation: $x$ is a sentence of an language (e.g., English), and $y$ is the sentence translated to another language (e.g., Chinese).
Voice recognition: $x$ is a piece of voice recorded, and $y$ is the corresponding text.
Face recognition: $x$ is a photo (a collection of pixels) of a human being, and $y$ is the identification of this human being.
Autonomous vehicles: $x$ is a video showing the current road condition, and $y$ is a decision of driving behavior.
Stock price prediction: $x$ is the stock prices in the past few days, and $y$ is the predicted stock price for tomorrow.

To build $f()$, we can just program it, by figuring out the relationship between $x$ and $y$ manually. However, $f()$ built manually cannot deal with many complicated situations. Machine learning provides another way of building $f()$ by allowing the computer to observe the data and establish the relationship between $x$ and $y$. Therefore, the main characteristic of ML is that $f()$ is not pre-defined or designed by human, but learned from data ($X=[x_1, x_2, ..., x_n]$, $Y=[y_1, y_2, ..., y_n]$, for $n$ pieces of data). These data are called "training data", since they are used to train the model $f()$.

In addition to training data, machine learning requires some more data which is called "test data". Test data is used to evaluate the trained model. You can never use test data to train the model, otherwise the model can memorize the test data and cheat on its evaluation.