Small objects, big stakes: detecting kidney stones at high resolution

The smallest target sets the budget

In our dataset, kidney stones have a median bounding box of about 39 pixels at native resolution, with the smallest 5 percent under 18 pixels. CT is commonly stored at 512 by 512 in-plane. At that size, a stone is a handful of pixels, and a detector that downsamples aggressively will simply lose it. The smallest clinically important object dictates the resolution the whole pipeline has to support.

1280 pixels and a stride-4 head

Atlas upsamples each slice to 1280 by 1280 and adds a stride-4 detection head, which produces a 320 by 320 feature map capable of localizing objects as small as about 8 pixels. That recovers recall on the smallest stones while preserving the fine soft-tissue detail that subtler findings, like fat stranding or wall thickening, depend on. Internally, kidney stones reached 81.7 percent F1.

Why a single-stage detector

Atlas uses a one-stage detector rather than a two-stage region-proposal network. Contemporary medical adaptations of single-stage detectors localize CT findings well [1], inference latency stays compatible with emergency triage, and the model footprint fits a hospital-grade GPU. Bounding-box regression uses Wise-IoU, which focuses learning on ordinary-quality boxes rather than chasing outliers [2].

The catch: contrast phase

Resolution is not the only variable for stones. Their conspicuity depends on contrast phase, because intravenous contrast can raise attenuation in ways that interact with the high-attenuation signature a stone detector relies on [3]. This is part of why kidney stones were one of the classes most sensitive to the move from the internal cohort to an external one [4], and why protocol-stratified evaluation is on the roadmap.

The smallest target sets the budget

1280 pixels and a stride-4 head

Why a single-stage detector

The catch: contrast phase

References