Leveraging Data — Make Your Product and Business More Valuable
Without Data, You Will Be Lost
The Data-Economy Is Here
During the last 10 to 15 years, the economy has undergone a massive shift towards a data-centric economy. It all started in the tech sector, where data is used to provide valuable services.
Most people driving cars are using either Apple or Google maps for real-time traffic information nowadays. The service would be pretty useless without the location data provided by many smartphones (and vehicles).
The takeaway here is that data is becoming a product itself. What started in the web sector has begun to migrate into more and more fields, e.g., predictable maintenance or digital health.
The change in the economy is driven by one key factor: demographic change.
The demographic change leads to a shortage of skilled workers, which can be felt in many fields already. Fewer people will work, but the economy still needs to support more and more people. This is only possible by increasing productivity through technology.
In healthcare, it’s even worse because fewer nurses and doctors need to support an aging society without decreasing the quality of care.
Intelligent Systems Will Dominate Their Markets
To tackle the shortage of workers while increasing productivity, we need intelligent systems.
Unfortunately, today’s status quo is that we have sophisticated systems with many sensors inside that generate plenty of data. Often, this data is not used as much as it could and should be.
In some fields, data has become highly valuable to their products, and I think this will hold true in almost all industries. Maybe it takes some more time in some areas, but the general trend is going towards increased data processing to increase the value of the products.
A very eye-opening example is the rise and fall of different navigation systems. Before 2010 almost everybody had a dedicated navigation system from TomTom, Garmin, or a similar company in their car. They sold the devices individually, and you had no (or periodic) map updates, for which you often had to pay extra.
Although Google Maps started in 2005, navigation didn’t make it into the product until 2009. The actual killer functionality, however, is the real-time traffic which is ‘crowdsourced’ from android devices which started in 2013. At this point, smartphones were ubiquitous; therefore, the location data for these devices were available to Google.
Their service was superior (and for free) and forced traditional navigation system providers to change their business models or move to a more specialized niche to survive. TomTom, for example, now provides maps to Apple Maps. In my opinion, this is a significant downgrade considering that they sold navigation systems before that.
This shift will happen in all industries. If you cannot create more intelligent products by using data, a competitor or a startup surely will. Such systems save users time and money, allowing them to make more with less.
What Are the Ingredients for Intelligent Systems?
This is one of the main questions remaining. Intelligent systems create value by comparing a newly recorded date against some “old” data used to train the system. Thus it’s referred to as training data.
The quality of the training data is crucial to the success of your AI project, especially if you are working in regulated environments such as digital health or autonomous driving.
The key takeaway is that data is generally available, but the correct data sets are hard to acquire in many cases.
You Need the Right Data — Not Just Any Data
To get meaningful results, you need a large set of training data. Furthermore, there are some more requirements:
- It has to be diverse enough to include all relevant features to your problem
- You likely will need labels or annotations
- Edge cases are within the data set
These problems can be mitigated by using simulation-based data synthesis. This means you create data from (deterministic) mathematical or physical models that describe the underlying first principles (not to be confused with an AI model). A domain expert in the field usually does this work.
From this model, you can generate data with accurate annotation because the annotation comes straight from the model itself. It’s not added in a post-processing step.
Furthermore, such models are parameterizable. That means you can automatically change the model within pre-defined bounds to synthesize various realizations leading to diverse data sets.
Simulations are an advantageous way to generate data in many fields.
The Next Step: Integration Into Your Workflow
The next step in the data journey is integrating the data simulation into your workflow. Simulation models can be accessed by automatically generated APIs that allow you to parameterize the model and start the simulation.
This is great news because now you can track your data quality and AI model drift and automatically react to that by generating the right data to counter your drifts. This integration should be a part of your MLOps pipelines in the future.