Datasets

Curated clinical datasets for model training, evaluation, and benchmarking. All datasets are de-identified per HIPAA Safe Harbor.

The Dataset Object

FieldTypeDescription
idstringUnique dataset ID (e.g. ds_t2d_cohort_v3)
namestringDisplay name of the dataset
modalitystringData modality: ehr | imaging | genomics | claims | wearable
recordCountintegerNumber of de-identified patient records in the dataset
fhirResourcesstring[]FHIR R4 resource types included in this dataset
licensestringData use license (e.g. CC-BY-4.0, Research-Only)
sizeGbnumberApproximate compressed size of the dataset in gigabytes
accessStatusnullablestringYour org's access status: open | pending | approved | denied
updatedAtdatetimeISO 8601 UTC timestamp of the last dataset update

Endpoints

Browse marketplace datasets.

Parameters

ParameterInTypeRequiredDescription
modalitye.g. ehrquerystringoptionalData modality (ehr|imaging|genomics|claims)
minRecordse.g. 10000queryintegeroptionalMinimum record count

Request

curl "https://api.healthcloud.ai/v1/datasets?modality=ehr" \
  -H "Authorization: Bearer hc_test_sk_demo_live_xxxxxx"

Response

Response200
{
  "data": [
    {
      "id": "ds_t2d_cohort_v3",
      "name": "T2D Longitudinal Cohort v3",
      "modality": "ehr",
      "recordCount": 84000,
      "fhirResources": ["Patient", "Observation", "Condition"],
      "license": "CC-BY-4.0",
      "sizeGb": 12.4
    }
  ],
  "meta": { "total": 89 }
}

Related Resources