Simulating Auto-Annotated Artificial X-Ray Data Sets

Artificial Data for AI in Healthcare

3 min readFeb 14, 2022

The demand for intelligent data-driven systems is increasing in all areas of our lives. Application areas are found in medical imaging, autonomous driving, smart homes and many more.

For such systems to work robustly and reliably, they need large amounts of training data. This training data needs to be labeled and often times annotated, too. Especially for health application, the annotation of medical images is very time-consuming and error-prone, due the tedious character of the work.
Often, such well-suited data is not readily available, especially in heavily regulated areas where either data-safety or data-privacy is of high importance. Artificial Intelligence (AI) in healthcare is a field where both aspects matter.

When intelligent systems are allowed to support the diagnosis process it has to be ensured that these systems are reliable. Unfortunately, large amounts of annotated medical images are very hard to get quickly. This directly leads to a cold start problem. The application cannot be developed until enough data is available.

From having an idea to starting any kind of AI training several months can pass by with the following tasks:

data acquisition from at least a few different sources and ideally many different individuals
data annotation by specifically trained people

This is a costly process and that is just for trying a new idea without knowing whether it is even feasible.

An alternative to real-life data is the usage of artificial data.

Synthetic X-Ray Data

Simulating artificial X-ray images based on real physics is a straight-forward process. Generally, a source emits X-rays towards a detector plate. In between the source and the detector is our human that needs to be X-rayed.

When the X-rays travel through our bodies, they are attenuated by the tissues. Each kind of tissue, e.g. muscles, fat, …, have different attenuation constants. Therefore, depending on what kind of tissues and of how much tissue is between the source and the detector, a different shade of grey is seen on the image. Typical attenuation coefficients for different tissues can be found in various publications [1].

Side view of four vertebrae (L2 to L5) embedded in a simple virtual body phantom.

Simulating such images only takes a few seconds on a consumer-grade laptop. Such fast simulations in combination with automatically adjustable models allow the simulation of large data sets with high diversity.

Even more important, it’s possible to automatically annotate the artificial data. This has several advantages:

ground truth is perfectly known
fast and inexpensive

The annotation style is flexible and depends on your application. Three different styles can be seen in the next picture (left to right):

simulated X-ray without annotation
each vertebra body with its processes as a single annotation
only the body of each vertebra
each vertebra and its processes are annotated individually

Artificial data: simulated front view X-ray of the four vertebrae (L2 to L5) with different annotation styles.

TL;DR

increasing amount of data-driven applications with the need for large data sets with annotations
data acquisition and annotation is time-consuming and thus expensive
physics-based simulations enable the creation of large data sets including direct annotation

References

[1] https://www.nist.gov/pml/x-ray-mass-attenuation-coefficients

Simulating Auto-Annotated Artificial X-Ray Data Sets

Artificial Data for AI in Healthcare

Synthetic X-Ray Data

TL;DR

References

Written by Orell Garten

No responses yet