View - ResearchGate
View - ResearchGate View - ResearchGate
Mining Biomedical Data Using MMTx and UMLS 163format input data using any commonly available spreadsheet program (such asExcel), provided the spreadsheet software can support all the lines of input datathat are required to be read in. For example, the geneRIF database is beingprocessed, an example of the format of being:139 2827859 15501399 2005-05-14 12:17 T-cell recognition of the outersurfaceprotein A (OspA) epitope is important in the induction of autoimmunityin treatment-resistant Lyme arthritis (OspA).The tabs can be substituted with “|”s in any text editor that supports find andreplace type operations so that MMTx will have an easier time with the data(see Note 3). In Microsoft Word the data could be transformed as follows:1. Open the database file (generif_basic) in Word.2. Use Ctrl-C to highlight and copy a single tab character.3. Select Edit → Find.4. Select the Replace tab.5. Select the “Find what” textbox and type Ctrl-V to paste in a tab.6. Select the “Replace with” textbox type “|” without quotes.7. Select “Replace All.”8. Save the newly formatted database.This will result in text in the default MMTx format as shown next. 139| 2827859|15501399|2005-05-14 12:17|T-cell recognition of the OspA epitope is important inthe induction of autoimmunity in treatment-resistant Lyme arthritis (OspA).3.6. Running and Handling Results From MMTxPerhaps the most difficult component of data mining with MMTx is handlingthe overwhelming amount of data that will be generated from the original inputdata. This problem is complicated by the fact that MMTx is not really designedto be an end-user program. MMTx is focused more on the production of machinereadabledata for analysis by software tools, and not for direct interruption by anend-user. Potential users must therefore overcome a fairly work-intensive initialbarrier before they can assess the utility of MMTx and UMLS. The simple exampleherein should avoid some of this but the focus on the generation of machinereadabledata means that the best way to handle MMTx generated data isprogrammatically, by either handling the output of MMTx directly or preferablythrough the Java API. Regardless of whether software tools are used to processand analyze the results, ultimately a human is needed for the final analysis.3.6.1. Choosing a Data ModelRegardless of whether the analysis will be software assisted, one considerationremains the same—choosing a data model. As discussed in Subheading 3.3.,there are three different data models. The default “strict” model utilizes the highest
- Page 296: Modeling Transcription Factor Targe
- Page 300: Modeling Transcription Factor Targe
- Page 304: Modeling Transcription Factor Targe
- Page 308: Ac 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0
- Page 312: Modeling Transcription Factor Targe
- Page 316: Modeling Transcription Factor Targe
- Page 320: Modeling Transcription Factor Targe
- Page 324: Modeling Transcription Factor Targe
- Page 328: 154 Osborne et al.are included in t
- Page 332: 156 Osborne et al.Fig. 2. Flowchart
- Page 336: 158 Osborne et al.UMLS source abbre
- Page 340: 160 Osborne et al.Fig. 3. Querying
- Page 344: 162 Osborne et al.3.4.2. Installati
- Page 350: Mining Biomedical Data Using MMTx a
- Page 354: Mining Biomedical Data Using MMTx a
- Page 358: Mining Biomedical Data Using MMTx a
- Page 362: 172 Ho et al.Fig. 1. Artificial exa
- Page 366: 174 Ho et al.allowing for cases whe
- Page 370: 176 Ho et al.A different measure is
- Page 374: 178 Ho et al.3.1.3. LA and Generali
- Page 378: 180 Ho et al.The ECF-statistic can
- Page 382: 182 Ho et al.In the special case of
- Page 386: 184 Ho et al.Fig. 5. An illustratio
- Page 390: 186 Ho et al.Fig. 7. The power curv
- Page 394: 188 Ho et al.this section were not
Mining Biomedical Data Using MMTx and UMLS 163format input data using any commonly available spreadsheet program (such asExcel), provided the spreadsheet software can support all the lines of input datathat are required to be read in. For example, the geneRIF database is beingprocessed, an example of the format of being:139 2827859 15501399 2005-05-14 12:17 T-cell recognition of the outersurfaceprotein A (OspA) epitope is important in the induction of autoimmunityin treatment-resistant Lyme arthritis (OspA).The tabs can be substituted with “|”s in any text editor that supports find andreplace type operations so that MMTx will have an easier time with the data(see Note 3). In Microsoft Word the data could be transformed as follows:1. Open the database file (generif_basic) in Word.2. Use Ctrl-C to highlight and copy a single tab character.3. Select Edit → Find.4. Select the Replace tab.5. Select the “Find what” textbox and type Ctrl-V to paste in a tab.6. Select the “Replace with” textbox type “|” without quotes.7. Select “Replace All.”8. Save the newly formatted database.This will result in text in the default MMTx format as shown next. 139| 2827859|15501399|2005-05-14 12:17|T-cell recognition of the OspA epitope is important inthe induction of autoimmunity in treatment-resistant Lyme arthritis (OspA).3.6. Running and Handling Results From MMTxPerhaps the most difficult component of data mining with MMTx is handlingthe overwhelming amount of data that will be generated from the original inputdata. This problem is complicated by the fact that MMTx is not really designedto be an end-user program. MMTx is focused more on the production of machinereadabledata for analysis by software tools, and not for direct interruption by anend-user. Potential users must therefore overcome a fairly work-intensive initialbarrier before they can assess the utility of MMTx and UMLS. The simple exampleherein should avoid some of this but the focus on the generation of machinereadabledata means that the best way to handle MMTx generated data isprogrammatically, by either handling the output of MMTx directly or preferablythrough the Java API. Regardless of whether software tools are used to processand analyze the results, ultimately a human is needed for the final analysis.3.6.1. Choosing a Data ModelRegardless of whether the analysis will be software assisted, one considerationremains the same—choosing a data model. As discussed in Subheading 3.3.,there are three different data models. The default “strict” model utilizes the highest