Degenerative cervical myelopathy (DCM) is a chronic disease that causes progressive non-traumatic compression of the cervical spinal cord. As the compression of the spinal cord worsens DCM can cause neurologic deficits, impaired mobility, and significant impairment in quality of life.
The CSM-International and CSM-North American clinical trials are the two largest clinical trials that studied clinical outcomes after surgical decompression of the spinal cord in DCM. Patients were included in the study if they had 1 or more clinical signs of myelopathy and imaging evidence of cervical spinal cord compression. Each patient had an MRI scan of the cervical spine and then went on to have surgery. The patients were then assessed 6 months, 12 months, and 24 months following surgery.
Each patient had a pre-operative MRI of the cervical spine that at a minimum included a T2-weighted and T1-weighted sequence with an axial and sagital series. Unfortunately the MRIs were stored in various formats. The majority were dicom files, but many were stored as a tiled series of jpegs or pngs. In addition some MRIs were missing or corrupted. We included only the MRIs that were stored as dicom files, which limited us to 289 patients.
We chose to represent each MRI as a series of independent axial 2D images. This was advantageous because we could make use of existing deep learning models such as VGG16 or ResNet50. We chose to consider each axial slice independently of the other axial slices within the scan. We thought this would be a reasonable compromise. The downside of this approach is that any feature that manifested predominantly along the Z-axis would be lost. We extracted the T2-weighted axial sequence for each patient and stored them as a new set of dicom files. This was accomplished manually using OsiriX Lite.
There are a number of pathologic changes that can be identified in an MRI scan of a patient with DCM. The full range of imaging findings are summarized in this 2016 article from Neurosurgical Focus. (https://www.ncbi.nlm.nih.gov/pubmed/27246488)
To summarize, the structural changes related to DCM that can be detected on MRI include:
- Spinal cord compression
- Cervical Stenosis
- Cord signal change
- Ligamentous Pathology
- Sagittal Alignment
We chose to focus our deep learning model on detecting spinal cord compression for the following reasons:
- Spinal cord compression is highly sensitive for myelopathy. The following 2010 study of 103 patients (https://www.ncbi.nlm.nih.gov/pubmed/20150835) found that spinal cord compression was 100% sensitive and 79.6% specific for clinical myelopathy.
- Spinal cord compression can be reliably graded on T2-weighted axial images using a number of grading systems. The inter-rater reliability of these grading systems is greater than 80% and in some studies was over 95%. (https://www.ncbi.nlm.nih.gov/pubmed/27246488)
- Even though spinal cord compression is not 100% specific for clinical myelopathy the presence of spinal cord compression is a concerning finding that warrants continued follow up.
For these reasons we believed that a deep learning model capable of reliably detecting spinal cord compression would serve as a useful screening tool for detecting patients that had symptoms of clinical myelopathy or were at risk of developing clinical myelopathy.
To standardize the data labelling we used the qualitative criteria outlined in this 2010 study. https://www.ncbi.nlm.nih.gov/pubmed/20150835. Importantly we did not differentiate between Partial spinal cord compression and Circumferential spinal cord compression. Instead we defined spinal cord compression as any indentation on the spinal cord parenchyma which changed the contour of the spinal cord perimeter. Labelers assessed each T2-weighted axial slice and assigned a label of:
- 1: evidence of partial or circumferential spinal cord compression or
- 0: no spinal cord compression.
Two labelers independently labeled 110 patients, corresponding to 5635 individual axial images. The remaining 173 patients were not labelled at this stage and were kept for model testing.
As you can see the two labelers had excellent agreement (96.4%) on images that were not compressed. The agreement was still good (88.1%) on compressed images. We examined the images where there was disagreement between the labellers and we found that these images tended to be ones with minimal partial compression.
In the first part of this report we described our method of data representation and the process by which we prepared data. In summary we collected MRI scans from patients with degenerative cervical myelopathy (DCM) from the CSM-International and CSM-North American trials. We then extracted the T2-weighted axial sequence from each patient. We focused on identifying spinal cord compression in these axial images because spinal cord compression is a highly sensitive and specific finding for clinical myelopathy. Two labelers went through a subset of the images and labeled each T2-weighted axial image based on a pre-determined set of qualitative criteria to identify images showing spinal cord compression.
We looked at established deep convolutional neural networks (CNN) and after some comparison decided to focus on ResNet50 because of its good performance on the ImageNet database and relatively smaller memory requirements. Previous studies have achieved good results using transfer learning applying weights from Imagenet to classify MRI and CT images. We thus attempted to do the same and we tested various degrees of fine tuning. We placed a priority on model simplicity. We thus attempted to achieve optimal performance from a single ResNet50 CNN prior to creating more complex models through ensembling.
The ResNet family of CNNs have become commonplace since placing first in the ILSVRC 2015 competition. The architecture makes use of residual units which avoids the problem of degrading accuracy. A downside of ResNet50 is that given its depth we would be unable to train the model from scratch. That’s OK because we were intending to use pre-trained weights for some of the layers anyway.
We split our labelled dataset into a training/validation cohort with 80% of the data and reserved 20% for model testing. We trained a number of model architectures and used overall accuracy on the testing dataset as a metric to compare models. We used Keras v2.24 with a TensorFlow v1.5 backend for model implementation. We used data augmentation with random scaling, rotation, and horizontal flips during model training. The following architectures were tested.
Model 4, which had two fully connected layers with 512 units each, performed the best with 92.99% accuracy. There is certainly room for some improvement here but we started to run into memory constraints on our GPU with deeper networks so for now we settled with this performance. We were pleasantly surprised that we achieved ~93% accuracy with a relatively simple network configuration.
So for we have tested our model on individual T2-weighted axial slices and have achieved 93% accuracy at identifying spinal cord compression. However, we have not yet demonstrated that the model would be useful in a real clinical setting.
In the real world patients can present to their primary-care physician with a wide variety of symptoms that may be suggestive of cervical myelopathy. These patients will often undergo an MRI of the cervical spine. Specialist radiologists then interpret the MRI scans and identify abnormal scans, which can be a laborious and time-consuming process.
We wanted to determine if our model would be able to distinguish between healthy patients and patients with confirmed diagnosis of DCM. We used a dataset of 32 healthy control patients that underwent MRIs of the cervical spine. We also used the 179 patients enrolled in the DCM-International and DCM-North American studies that had a confirmed diagnosis of cervical myelopathy. Our model was not trained on any of these images. We thus had two cohorts of patients that we would attempt to classify with our model:
- Healthy Control — 32 patients
- Cervical Myelopathy — 179 patients
For each patient we applied our convolutional neural network model on each T2-weighted axial slice. The model output a class prediction for each slice. The number of slices per patient ranged from 18–82 with a median of 43. We used a simple threshold to generate a patient-level prediction. If the model identified >1 slice as showing spinal cord compression the patient was labeled abnormal.
The model was able to distinguish between patients in the healthy control cohort and the diseased cohort with high sensitivity (0.9665) and high specificity (0.8529).
- We used a dataset of 5635 labeled MRI images from 110 patients to train a deep convolutional neural network to detect cervical spinal cord compression. We achieved high (93%) accuracy on this classification task.
- We tested our model on a dataset of MRIs from 179 patients with cervical myelopathy and 32 healthy control patients. The model identified patients with cervical myelopathy with high sensitivity (97%) and specificity (85%).
- Our deep learning model could be used in a primary care setting to rapidly interpret cervical spine MRI scans and flag patients with abnormal MRI scans for further review.
- In a follow up post we will describe how we used this model to stratify patients according to disease severity and predict clinical outcomes 6 months after surgery to decompress the spinal cord.