Purveyor: Centers for Medicare and Medicaid Services

Years in the DataCore: 2012-2017

Years of data owned: 2012-2017

Unit of data: Claim

Dataset website:

General description: This is also known as the Physician/Supplier Part B claims file and contains final action fee-for-service claims submitted on a CMS-1500 claim form. Most of the claims are from non-institutional providers, such as physicians, physician assistants, clinical social workers, nurse practitioners, and free-standing facility claims. 

Common Key Linking Variables

CLM_ID is the unique identifier for a given claim.

Hospital Linking:

  • CARR_NUM can be used to identify the Carrier where the claim was submitted from.

Provider Linking:

  • NPI can be used to uniquely identify providers.

Geographic Linking:

  • ZIP code data are provided for every claim.

Carrier Structure

Base Claim File

Every row of the claim file represents a claim submitted to CMS.

The Primary Key of the claim file is CLM_ID

Line File

Every row of the Line file represents data that can exist in duplicity for a given Claim.

The Primary Key of the line file is CLM_ID and CLM_LN

DataCore Staff Errata

5/28/2019: No data errata, data exceptions or data corrections have been issued.

DataCore Purveyor Errata

5/28/2019: No data errata, data exceptions or data corrections have been implemented.


CMS sent the claims files as comma separated value files (.csv) along with a SAS load script and a data dictionary. It was found that the data dictionary files were incorrect and could not be used to load the data into SQL. Instead, the process below was used.
For the code used for these processes, email

  1. The .csvfiles were loaded into SAS using the provided SAS load files.
  2. SQL tables were created using the proc sql "create table like" command in SAS. 
  3. SAS was then used to convert the .csv into Tab Separated Value files (.tsv)
  4. A bulk copy program (BCP) was used in order to upload the .tsv into SQL. 
  5. The provided data dictionary was used to generate metadata about the dataset fields and was used to generate the data dictionary.