Simulating Auto-Annotated Artificial X-Ray Data Sets
Artificial Data for AI in Healthcare
The demand for intelligent data-driven systems is increasing in all areas of our lives. Application areas are found in medical imaging, autonomous driving, smart homes and many more.
For such systems to work robustly and reliably, they need large amounts of training data. This training data needs to be labeled and often times annotated, too. Especially for health application, the annotation of medical images is very time-consuming and error-prone, due the tedious character of the work.
Often, such well-suited data is not readily available, especially in heavily regulated areas where either data-safety or data-privacy is of high importance. Artificial Intelligence (AI) in healthcare is a field where both aspects matter.
When intelligent systems are allowed to support the diagnosis process it has to be ensured that these systems are reliable. Unfortunately, large amounts of annotated medical images are very hard to get quickly. This directly leads to a cold start problem. The application cannot be developed until enough data is available.
From having an idea to starting any kind of AI training several months can pass by with the following tasks:
- data acquisition from at least a few different sources and ideally many different individuals
- data annotation by specifically trained people
This is a costly process and that is just for trying a new idea without knowing whether it is even feasible.
An alternative to real-life data is the usage of artificial data.
Synthetic X-Ray Data
Simulating artificial X-ray images based on real physics is a straight-forward process. Generally, a source emits X-rays towards a detector plate. In between the source and the detector is our human that needs to be X-rayed.
When the X-rays travel through our bodies, they are attenuated by the tissues. Each kind of tissue, e.g. muscles, fat, …, have different attenuation constants. Therefore, depending on what kind of tissues and of how much tissue is between the source and the detector, a different shade of grey is seen on the image. Typical attenuation coefficients for different tissues can be found in various publications [1].
Simulating such images only takes a few seconds on a consumer-grade laptop. Such fast simulations in combination with automatically adjustable models allow the simulation of large data sets with high diversity.
Even more important, it’s possible to automatically annotate the artificial data. This has several advantages:
- ground truth is perfectly known
- fast and inexpensive
The annotation style is flexible and depends on your application. Three different styles can be seen in the next picture (left to right):
- simulated X-ray without annotation
- each vertebra body with its processes as a single annotation
- only the body of each vertebra
- each vertebra and its processes are annotated individually
TL;DR
- increasing amount of data-driven applications with the need for large data sets with annotations
- data acquisition and annotation is time-consuming and thus expensive
- physics-based simulations enable the creation of large data sets including direct annotation
References
[1] https://www.nist.gov/pml/x-ray-mass-attenuation-coefficients