MIT Database Administrator: Lohith Kini (Email:
This MIT web page has been assembled and is maintained by present and former students, researchers and faculty of the Department of Biological Engineering to provide an interface between the population experience of common mortal diseases in the United States and Japan and quantitative cascade models based on biological and clinical information about these diseases. (See list of contributors below.)
It contains two major elements:
|
||||
One may select either “View U.S. Mortality Data” or “View Japanese Mortality Data” by clicking on the appropriate icon.
Clicking on "View U.S. Mortality Data" opens a list of forms of mortality as recorded in Vital Statistics of the United States beginning with "All Causes" and ending with "Senility". This site contains most but not all data regarding more common forms of cancer and other forms of mortality. Researchers interesting of organizing data for any unlisted disease(s) should contact Prof. W.G.Thilly, thilly@mit.edu, for advice and/or assistance. We would be pleased to include links to historical databases for other countries.
Clicking on a particular group of diseases such as “Digestive organs and Peritoneum (150-169)” under Malignant Neoplasms displays a more specific list of cancers.
Numbers shown in brackets are the International Statistical Classification of Diseases and Related Health Problems (ICD-9) Codes used to categorize diagnoses recorded as the cause of death. In some cases we have combined data from several cancer sites in order to obtain a more complete historical record. For example, Colon Cancer (153) was recorded only since 1958 but if combined with Anal Cancer (154) and Small Intenstine (152), they yield a historical intersectin with Lower Gastro-intestinal Tract Cancer with records continuous from 1900-2006. Occassionally, the printed record contained obvious typographic errors or nonsensical data. In these cases interpolations were used to fill in missing valules and such interpolations are clearly printed in red in the primary record of number of deaths on the Excel file sheets "Raw Data".
Clicking next on a specific cancer site such as “Lower GI Tract” opens a page of summary data recorded from 1900-2006 organized by gender and ethnic groups (EA, European-Americans and NEA, Non-European Americans, predominantly African-Americans) and secondarily with regard to (a.) age of death (displayed chart) (b.) calendar year of birth and (c.) calendar year of death. Charts for (b.) and (c.) are opened by clicking the desired gender and ethnic group for each category.
Shown are summary charts in which the age-specific mortality rates (annual deaths/100,000 population) on the y-axis are shown as a function of age of death on the x-axis. Each birth decade cohort’s age specific mortality rate is depicted by joined symbols so that the form and historical changes in age-specific lifetime mortality rates for this form of death may be observed in a single chart.
Alternately, one may choose to observe the mortality rates of individual birth decade cohorts displayed over calendar years or as specific age-specific death rates, e.g. 50—54 yrs, displayed over the entire period of recording.
Finally, the complete record for any disease may be downloaded to inspect the raw annual data as recorded by the U.S. Census Bureau or U.S. Public Health Service along with several additional ways to view the data.
If desired all data on this website may be downloaded by clicking the icon Download all Mortality Data that comprises ~66 Mb as Excel(TM) files.
Clicking “CancerFit” below opens a page containing four links.
The first link, when clicked, shows the basic assumptions and equations used in a cascade model including but not limited to the assumptions of CancerFit v.5.0.
THIS LINK MUST BE STUDIED AND UNDERSTOOD BEFORE ANY FURTHER STEPS COULD BE USEFUL.
The age-specific mortality rates for death year intervals 15-189, 20-24, …, 100-104) of a particular population cohort defined by gender, ethnic group and birth decade, e.g. 1890-99 corrected for coincident forms of death and any effect on survival of medical intervention constitute a function INC(h,t) that for any gender and ethnic group born in any decade defined as “h”, and dying in the age interval defined as “t”.
Calculated “best-fit” function CAL(h,t) of the model to INC(h,t). Wide ranges of values for initiation, Ri, j, …, n, and promotion RA, B, …, m, mutation rates, preneoplastic colony growth rates,
, the fraction, “F”, of persons at risk of the particular disease for any combination of required inherited or environmental risks and a function, “f” that represents the possibility of a synchronously mortal form of disease(s) that share(s) required risks with the disease studied.
These data are compared by CancerFit v.5.0 to a cascade model that assumes “n” initiation mutations are required in the fetal juvenile period and “m” promotion mutations are required in an initiated stem cell in order to create. Goodness of fit of the functions generated as CAL(h,t) to INC(h,t), i.e. GOF(h,t), are calculated for the set of parameter values minimizing the sum of (log INC(h,t)–log CAL(h,t)2 over all age-of-death intervals employed.
The second link, when clicked, will download the entire source code of CancerFit, written for MATLAB v7.6 or higher. The download file (CancerFit v5.0, approximately 66 MB) is a zipped filed containing MATLAB source code along with all the mortality data from this M.I.T. repository. An interested user who downloads the zip file has to first unzip the file, titled CancerFitv5_0.zip. If you are using a Mac OS X, the zip file will show up in your Downloads list and will be automatically unzipped and available in the location where your downloaded items are sent. The unzipped folder will reveal a list of folders: “Mortality Files”, “src”, “util” along with the following files: “CancerFit.fig” and “CancerFit.m”. The model equations are implemented in the files listed under the “src” folder and the interface itself is programmed in the files labeled “CancerFit.fig” and “CancerFit.m”. The folder “Mortality Files” consists of all the mortality and population data of all ~111 diseases available on this website as Excel(TM) and text files, both of which can be directly accessed for analysis by the CancerFit program.
The third link is a tutorial describing the steps a CancerFit user needs to take in order to analyze a particular age-specific lifetime mortality function here using cancer of the lower GI tract in European American Males born 1890-1899 as an example.
The fourth and final link opens a page containing example results obtained on the Cancer of the Lower GI Tract, EAM, birth interval 1890-99 using estimated post-diagnosis five-year survival rates (See Herrero-Jimenez et al., 1998, 2000) to define INC(h,t). The program CancerFit v.5.0 was run iteratively for all twenty-five pairs of different numbers of initiation events (n = 1,2,3,4,5) and promotion events (m = 1,2,3,4,5).
First, the best fits of CAL(h=1890-99, 15< t <104) were calculated for the twenty-five combinations of n = 1-5 and m = 1-5 under the parsimonious conditions of homogeneous risk, F=1, and no synchronous mortal diseases sharing risk factors with colorectal cancer, f = 1. Values of (Pii Ri)1/n and (PiA RA)1/m were permitted to range from 10-9 to 100 and the range of mu was set at 0.1 to 0.3.
Second, the best fits of CAL(h,t) to INC(h,t) were assessed under the additional assumption of inhomogeneous risk, i.e., the parameter “F” representing a hypothetical fraction of the population at risk was allowed to range from 0 to 1.
Thirdly, we considered the possibility of both population inhomogeneity, F < 1, and a competing synchronous mortal disease having genetic and/or environmental risks shared with colorectal cancer, i.e., the parameter “f” representing this possibility was allowed to range from 0 to 1. This assumption did not, however, further reduce the values of GOF(h,t).
A figure at the bottom of these sample results depicts the degree of concordance of the two trial conditions given n=2 and m=1: F = 1, f = 1 (population homogeneity, no synchronous competing risk) and F < 1, f =1 (population inhomogeneity, no synchronous competing risk)) with adult lifetime incidence data for lower G.I. tract cancer in European American males born 1890-99 INC(h,t).
People
|
Publications
|