With efficient modeling, we can extract value from data. The "data chain" is a value-added chain, transforming data from raw material to products and services. People have recognized the potential value of data, and make analogies between data and raw material, or commodity, or assets. Let's look at these analogies:
Data as raw material
Data is commonly viewed as a new type of raw material and resource. It's natural because as we explained in "data chain", data needs to be processed to make it usable and mine the value within it. And so-called "data economy" is built on the mined value from data. Some even believe data is becoming the most important resource in future, since it's the fuel of AI, and AI would be the key factor of productivity promotion.
However, data has some special characteristics, compared with traditional raw materials like oil. First of all, usage of this type of raw material doesn’t exhaust it. Secondly, since it’s not exhaustible, it can be shared by many users, rather than being exclusively used. Thirdly, data is very heterogeneous, each piece of data can have its unique value.
Data as commodity Some people view data as a kind of commodity, since we need to pay efforts to create or product data. However, basically I think data is not commodity. The most important reason is that data cannot be used solely by the end users. Data is typically “consumed” by the models to provide services and products. We can view a model built on data as a commodity, but it’s not the case for the standalone data.
In addition, there are some other feature of data that are different to commodities. Firstly, the replication cost for data is nearly zero. Secondly, the marginal utility of data is unstable.
Data as asset
Data can be viewed as a new type of asset, since 1) it can create profit for its owner; 2) it can be controlled by its owner. This type of assets is becoming the essential assets of the modern economy, enabling a branch of new business models. However, compared with traditional assets, data has its own unique characteristics: 1) The expected revenue from data is difficult to predict in many cases, since we can only understand its value after building model with it; 2) The ownership of data is difficult to protect and transfer.
These characteristics makes it challenging to price and trade data. Since the value of it can hardly be understand before modeling, the seller will be reluctant to offer a high price. And because the data can be replicated with no cost, once it’s traded, its ownership isn’t actually transferred to the new owner, but just enlarges its coverage.
I argue that these issues regarding to data pricing and trading are essential to build sustainable data economy. Otherwise, data economy is just a buzz word.
Since data is now important raw material or even assets, its ownership should be seriously protected. The recent years data privacy has gained intensive public attention. But I argue it will become more centric problem of the society. As we are entering AI era, more wealth will be created by AI, and the data asset behind it. Therefore, the current paradigm in which personal data are collected, utilized and monetized by big companies will make a great wealth in balance, and polarization. And it’s actually happening at the moment.
Therefore, it’s important to establish a new paradigm, in which personal data is protected and reasonable priced, so that it can share the revenue of AI. And such paradigm cannot disable the usage of data in AI.
To achieve this, there are several technologies we can adopt: