Purveyor: Centers for Medicare and Medicaid Services

Years in the DataCore: 2012-2017

Years of data owned: 2012-2017

Unit of data: Claim

Dataset website:

General description: This includes the base Medicare A/B/D segment which includes beneficiary enrollment information, such as the beneficiary unique identifier, state and county codes, ZIP code, date of birth, date of death, sex, race, age, monthly entitlement indicators (A/B/C/D), reasons for entitlement, and monthly managed care indicators (yes/no). As of 2006, it includes variables specific to enrollment in Part D, and as of 2017 Part C was added. The MBSF has four segments: 1) Beneficiary Summary File or Medicare enrollment, 2) Chronic Conditions, 3) Cost & Utilization, and 4) NDI death information with ICD-10 cause of death through 2008. 

Common Key Linking Variables

The MBSF provides State and county data for every claim. This allows for only geographical linking using this table; however, other forms of linkage can be used by using the other datasets which link to MBSF. 

Master Beneficiary Summary File Structure

There is a single table within the MBSF; this table has a primary key of DSYSRTKY, which maps to DESY_SORT_KEY in all other CMS datasets, and year. 

DataCore Staff Errata

5/28/2019: No data errata, data exceptions or data corrections have been issued.

DataCore Purveyor Errata

5/28/2019: No data errata, data exceptions or data corrections have been implemented.


CMS sent the claims files as comma separated value files (.csv) along with a SAS load script and a data dictionary. It was found that the data dictionary files were incorrect and could not be used to load the data into SQL. Instead, the process below was used.

For the code used for these processes, email

  • The .csvfiles were loaded into SAS using the provided SAS load files.
  • SQL tables were created using the proc sql "create table like" command in SAS. 
  • SAS was then used to convert the .csv into Tab Separated Value files (.tsv)
  • A bulk copy program (BCP) was used in order to upload the .tsv into SQL. 
  • The provided data dictionary was used to generate metadata about the dataset fields and was used to generate the data dictionary.