abdominal trauma detection

Computed tomography (CT) scans have become crucial for patient evaluation when it comes to injury detection, but interpreting this data can be complex and time-consuming. So, the RSNA Abdominal Trauma Detection Kaggle competition challenged the community to devise a deep learning approach to classify abdominal injuries from multi-phase CT scans in the hopes of assisting medical professionals with diagnosis.

Overview

In this competition, we’re given computed tomography (CT) scans provided by various institutions. The goal is to build a model that can extract critical features within these scans and classify organ injuries (if present) at the liver, spleen, and kidney, as well as any bowel and extravasation injuries.

Pipeline + Model Architecture

In this project, multiple prominent architectures were pipelined together to form several solutions. The major pipelines experimented on within this repository are summarized as follows:

2.5D Backbone Feature Extractor → 3D CNN → Prediction Head
Mask Generator → Merge Input and Mask → 3D CNN → Prediction Head
Slice Predictor → Input Slice Interpolation → 2.5D Backbone Feature Extractor → 3D CNN → Prediction Head
Mask Generator → Backbone Feature Extractor (one for input and one for mask) → Merge Input and Mask Features → 3D CNN → Prediction Head

Backbone Feature Extractor

The primary backbone feature extractors utilized were ResNet and Vision Transformer. These architectures are notable for their ability to effectively extract features from visual data through the use of residual connections and self-attention modules (He et al., 2015), (Dosovitskiy et al., 2021). Since the input is a stack of CT scans, it takes the shape \((B, C, H, W)\), where \(B\) is the batch size, \(C\) is the CT scan length, \(H\) is the image height, and \(W\) is the image width. The first thought is to directly apply a 3D CNN, but this would be computationally expensive and memory intensive, especially with high values of \(C\). So, we adopt the 2.5D CNN paradigm depicted below (Avesta et al., 2022), in which we process the CT scans in separate slices and concatenating the extracted features.

2.5D vs. 3D convolutional neural network (Avesta et al., 2022).

We define a preset slice length \(L\) that represents the number of channels each of these processed slices consist of. Thus, we can process this as follows, where SLICE_CHANNELS = \(L\):

b, c, h, w = scans.shape
x = scans.view(b * (c // SLICE_CHANNELS), SLICE_CHANNELS, h, w)
x = self.backbone(x)
x = x.reshape(b, c // SLICE_CHANNELS, x.shape[-3], x.shape[-2], x.shape[-1])

The backbone can be defined with native PyTorch model definitions. Note that (1) the first convolutional layer must be changed to reflect the chosen slice length and (2) the network head is discarded.

from torchvision.models import resnet18, ResNet18_Weights
backbone = resnet18(weights=None)
self.backbone = nn.Sequential(*(list(backbone.children())[:-2]))
self.backbone[0] = nn.Conv2d(SLICE_CHANNELS, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

Mask Generator

The idea for the mask generator component of the pipeline is to predict a mask region for relevant organs to provide the model more context in completing the downstream task of classifying injuries. Both SAM-Med2D and TotalSegmentator were investigated for this mask generation task.

SAM-Med2D

This model is a fine-tuned version on the Segment Anything Model and trained on 4.6 million medical images. SAM-Med2D incorporates domain-specific knowledge from the medical field by adapting adapter layers within the base transformer blocks (Cheng et al., 2023).

SAM-Med2D model (Cheng et al., 2023).

Following the instructions from the official implementation, we can generate and apply the masks as follows:

from SAM_Med2D.segment_anything import sam_model_registry
from SAM_Med2D.segment_anything.automatic_mask_generator import SamAutomaticMaskGenerator

model = sam_model_registry["vit_b"](Namespace(image_size=256, encoder_adapter=True, sam_checkpoint=MASK_MODEL)).to(DEVICE)
mask_generator = SamAutomaticMaskGenerator(model, pred_iou_thresh=0.4, stability_score_thresh=0.5)

def apply_masks(self, id, input):
    size = MASK_DEPTH # 12
    if id + '.npz' in os.listdir(os.path.join(MASK_FOLDER, self.mode)):
        masks = np.load(os.path.join(MASK_FOLDER, self.mode, id + '.npz'))
        for i in range(size // 2, N_CHANNELS, size):
            input[i - (size // 2):i + (size // 2), :, :] *= masks[str(i)]
    else:
        save_masks = {}
        for i in range(size // 2, N_CHANNELS, size):
            image = input[i - 1:i + 2, :, :].transpose(0, 1).transpose(1, 2)
            masks = self.mask_generator.generate(image.to(DEVICE))
            mask = np.zeros(image.shape[:-1])
            for m in masks:
                mask = np.where(np.logical_and(m['segmentation'], m['stability_score'] > mask), m['stability_score'], mask)
            input[i - (size // 2):i + (size // 2), :, :] *= mask
            save_masks[str(i)] = mask
        np.savez(os.path.join(MASK_FOLDER, self.mode, id + '.npz'), **save_masks)

In hindsight, I shouldn’t have chosen id and input as variable names.

The naive approach is to generate the mask and zero out the CT-scan input at non-masked locations. This new masked input can then be feature extracted and concatenated with the features from the original input. This feature vector can then be fused with a multi-layer perceptron. However, another approach would be to implement an attention-based mechanism, which is likely a better utilization of the organ segmentation context. This can be done with a scaled dot product cross attention, where keys \(K\) and values \(V\) are derived from the original input and queries \(Q\) are computed from the segmentation mask.

TotalSegmentator

Because SAM-Med2D falls under the segment anything framework, it is difficult to accurately obtain specific organ segmentations. Instead, we look at TotalSegmentator. This approach is, for the most part, accurate, but suffers from computational inefficiency.

Deep-learning methods for auto-segmenting brain images either segment one slice of the image (2D), five consecutive slices of the image (2.5D), or an entire volume of the image (3D). Whether one approach is superior for auto-segmenting brain images is not known.We compared these three approaches (3D, 2.5D, and 2D) across three auto-segmentation models (capsule networks, UNets, and nnUNets) to segment brain structures. We used 3430 brain MRIs, acquired in a multi-institutional study, to train and test our models. We used the following performance metrics: segmentation accuracy, performance with limited training data, required computational memory, and computational speed during training and deployment.3D, 2.5D, and 2D approaches respectively gave the highest to lowest Dice scores across all models. 3D models maintained higher Dice scores when the training set size was decreased from 3199 MRIs down to 60 MRIs. 3D models converged 20% to 40% faster during training and were 30% to 50% faster during deployment. However, 3D models require 20 times more computational memory compared to 2.5D or 2D models.This study showed that 3D models are more accurate, maintain better performance with limited training data, and are faster to train and deploy. However, 3D models require more computational memory compared to 2.5D or 2D models.Competing Interest StatementThe authors have declared no competing interest.Funding StatementArman Avesta is a PhD Student in the Investigative Medicine Program at Yale which is supported by CTSA Grant Number UL1 TR001863 from the National Center for Advancing Translational Science a component of the National Institutes of Health (NIH). This work was also supported by the Radiological Society of North America (RSNA) Fellow Research Grant Number RF2212. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of NIH or RSNA.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:the data used in this study were obtained from the Alzheimer Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). We obtained T1-weighted MRIs of 3430 patients in the Alzheimer Disease Neuroimaging Initiative study from this data-sharing platform. The investigators within the ADNI contributed to the design and implementation of ADNI but did not participate in the analysis or writing of this article.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.Yesthe data used in this study were obtained from the Alzheimer Disease Neuroimaging Initiative (ADNI) database available at: adni.loni.usc.edu. https://adni.loni.usc.edu/ 2D segmentationtwo-dimensional segmentation2.5D segmentationenhanced two-dimensional segmentation3D segmentationthree-dimensional segmentationADNIAlzheimer’s disease neuroimaging initiativeCapsNetcapsule networkCPUcentral processing unitCTcomputed tomographyGBgiga-byteGPUgraphics processing unitMRImagnetic resonance imaging

Overview

Pipeline + Model Architecture

Backbone Feature Extractor

Mask Generator

SAM-Med2D

TotalSegmentator

References

2023

2022

2021

2015