Given the unavailability of real-world pharmaceutical inspection-domain datasets, we have created the Sensum Solid Oral Dosage Forms (SensumSODF) dataset intended for research and evaluation purposes.

  • The dataset consists of two main types of solid oral dosage forms:
  • capsule; non-translucent hard-shelled capsules with print, which are normally used for dry, powdered ingredients
  • softgel; translucent soft-shelled capsules, which are primarily used for oils and active ingredients which are dissolved or suspended in oil

The dataset consists of 836 defect-free and 153 defective examples of non-translucent hard-shelled capsules with a size of 192 × 320 pixels, and 846 defect-free and 345 defective examples of translucent soft-shelled capsules with a size of 144 × 144 pixels. Defective examples exhibit diverse defects such as cracks, dents, smudges, impurities, and air bubbles. Defective regions range from small to large structures and are hand-annotated by a pharmaceutical product inspection domain-expert. Annotations coarsely indicate the defective area on a given example, yet to some extent also include minor defect-free areas.

Please fill out the download form before downloading the Sensum SODF dataset:


If you use this dataset in your scientific work, please cite our paper:

Domen Rački, Dejan Tomaževič, Danijel Skočaj:
Detection of surface defects on pharmaceutical solid oral dosage forms with convolutional neural networks;
In: Neural Computing and Applications, August 2021.

    author = {Ra{\v{c}}ki, Domen and Toma{\v{z}}evi{\v{c}}, Dejan and Sko{\v{c}}aj, Danijel},
    title = {Detection of surface defects on pharmaceutical solid oral dosage forms with convolutional neural networks},
    journal = {Neural Computing and Applications},
    year = {2021},
    month = {August},
    day = {17},
    issn = {1433-3058},
    doi = {10.1007/s00521-021-06397-6}


The data is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). For using the data in a way that falls under the commercial use clause of the license, please contact us.