Just as an artist creates unique masterpieces with simple colors and a cook conjures up exotic flavors from simple ingredients, data scientists gain knowledge by linking different data sets. The speed, volume, and variety of data increase exponentially as it races around the world. Despite its unrivaled character, it is equated with oil and gold because of the underlying value potential.
In addition to the model, age and claims history, motor insurers can take into account how a particular vehicle is driven, for example, and tax evasion can be averted by sharing income tax data with the goods and services tax network. Yes, sometimes one and one can actually be eleven!
Therefore, connections are even more important than the accumulation of separate records. It is therefore crucial to separate questions of value creation and value creation.
Both the central government and the Länder propose a number of political, legislative and normative data frameworks. These include the Personal Data Protection Act, national cybersecurity and artificial intelligence (AI) strategies, and frameworks for non-personal data and responsible AI. And platforms proposed across domains include mandates to share data, often with government involvement.
Personal vs not personal
Data that directly or indirectly identifies a specific person is considered “Personally Identifiable Information” (PD). Accordingly, all other data should be “Non-Personal Data” (NPD).
However, such mutually exclusive and orthogonal binary classifications have limitations. For example, if you are the only one with location tracking enabled on a cell phone, even in a large crowd, you could be uniquely identified, even if location data is considered NPD.
Individuals were re-identified with greater than 90% accuracy by combining anonymized data (treated as NPD) with public records! In addition, the anonymization itself could be reversible, just as a spectrum analyzer can display proportions of primary colors of a certain hue.
More ways to splice!
Other data classifications include: at rest – on the move; on the edge – in the cloud; encrypted – unencrypted; structured – unstructured; low frequency – high frequency; Real time – historical; national – transnational; physical – physiological; public sector – private sector; Individual – community; raw – processed.
The list is indeed endless. Even researchers often contrast empirical data with simulated data to test their hypotheses.
There can be more than one way to skin a cat, but dates can be cut and diced on more than a ton! Yes, there are ways to sort out data extremely meticulously and infinitely sharp and focused. Finally, online search results, shopping recommendations, medical treatments, and even loan offers can be hyper-personalized.
Algorithmic trading in securities can be triggered based on the simultaneous linking of seemingly different sets of data such as weather forecast, exchange rate and crude oil production.
Likewise, fraud could be detected based on a number of factors such as location, frequency, quantity and, lo and behold, even how much pressure the user is putting on the mobile screen and how inclined they are!
The possibility of generating “synthetic data” using “digital twins” also opens up immense opportunities for innovation. For example, the best vaccine candidates have been and are proactively identified by testing their effectiveness against possible mutations of the SARS-CoV-2 virus even before they emerge.
Data is the elephant in digital space. To cope with this, India needs an overarching policy framework for data management, rather than being blinded by discrete one-dimensional tools based on simple binary files that can create rifts and overlaps.
Nobel laureate in economics, Ronald Coase, famously said, “If you torture the data long enough, they will confess.” Nonetheless, data must be handled with care, combined with accountability and transparency, to ensure its lawful use for the common good.
The binaries can be useful for estimating various aspects of data, but Schrödinger’s cat could just meow “the-lines-blur”, inspired by another Nobel Prize winner, Bob Dylan!
Deepak Maheshwari is a Senior Fellow at the Center for The Digital Future.
Never miss a story again! Stay connected and informed with Mint. Download our app now !!