Datasets
Curated clinical datasets for model training, evaluation, and benchmarking. All datasets are de-identified per HIPAA Safe Harbor.
The Dataset Object
| Field | Type | Description |
|---|---|---|
| id | string | Unique dataset ID (e.g. ds_t2d_cohort_v3) |
| name | string | Display name of the dataset |
| modality | string | Data modality: ehr | imaging | genomics | claims | wearable |
| recordCount | integer | Number of de-identified patient records in the dataset |
| fhirResources | string[] | FHIR R4 resource types included in this dataset |
| license | string | Data use license (e.g. CC-BY-4.0, Research-Only) |
| sizeGb | number | Approximate compressed size of the dataset in gigabytes |
| accessStatusnullable | string | Your org's access status: open | pending | approved | denied |
| updatedAt | datetime | ISO 8601 UTC timestamp of the last dataset update |
Endpoints
Browse marketplace datasets.
Parameters
| Parameter | In | Type | Required | Description |
|---|---|---|---|---|
| modalitye.g. ehr | query | string | optional | Data modality (ehr|imaging|genomics|claims) |
| minRecordse.g. 10000 | query | integer | optional | Minimum record count |
Request
curl "https://api.healthcloud.ai/v1/datasets?modality=ehr" \ -H "Authorization: Bearer hc_test_sk_demo_live_xxxxxx"
Response
Response200
{ "data": [ { "id": "ds_t2d_cohort_v3", "name": "T2D Longitudinal Cohort v3", "modality": "ehr", "recordCount": 84000, "fhirResources": ["Patient", "Observation", "Condition"], "license": "CC-BY-4.0", "sizeGb": 12.4 } ], "meta": { "total": 89 } }