View - ResearchGate

View - ResearchGate View - ResearchGate


Mining Biomedical Data Using MMTx and UMLS 163format input data using any commonly available spreadsheet program (such asExcel), provided the spreadsheet software can support all the lines of input datathat are required to be read in. For example, the geneRIF database is beingprocessed, an example of the format of being:139 2827859 15501399 2005-05-14 12:17 T-cell recognition of the outersurfaceprotein A (OspA) epitope is important in the induction of autoimmunityin treatment-resistant Lyme arthritis (OspA).The tabs can be substituted with “|”s in any text editor that supports find andreplace type operations so that MMTx will have an easier time with the data(see Note 3). In Microsoft Word the data could be transformed as follows:1. Open the database file (generif_basic) in Word.2. Use Ctrl-C to highlight and copy a single tab character.3. Select Edit → Find.4. Select the Replace tab.5. Select the “Find what” textbox and type Ctrl-V to paste in a tab.6. Select the “Replace with” textbox type “|” without quotes.7. Select “Replace All.”8. Save the newly formatted database.This will result in text in the default MMTx format as shown next. 139| 2827859|15501399|2005-05-14 12:17|T-cell recognition of the OspA epitope is important inthe induction of autoimmunity in treatment-resistant Lyme arthritis (OspA).3.6. Running and Handling Results From MMTxPerhaps the most difficult component of data mining with MMTx is handlingthe overwhelming amount of data that will be generated from the original inputdata. This problem is complicated by the fact that MMTx is not really designedto be an end-user program. MMTx is focused more on the production of machinereadabledata for analysis by software tools, and not for direct interruption by anend-user. Potential users must therefore overcome a fairly work-intensive initialbarrier before they can assess the utility of MMTx and UMLS. The simple exampleherein should avoid some of this but the focus on the generation of machinereadabledata means that the best way to handle MMTx generated data isprogrammatically, by either handling the output of MMTx directly or preferablythrough the Java API. Regardless of whether software tools are used to processand analyze the results, ultimately a human is needed for the final analysis.3.6.1. Choosing a Data ModelRegardless of whether the analysis will be software assisted, one considerationremains the same—choosing a data model. As discussed in Subheading 3.3.,there are three different data models. The default “strict” model utilizes the highest

Mining Biomedical Data Using MMTx and UMLS 163format input data using any commonly available spreadsheet program (such asExcel), provided the spreadsheet software can support all the lines of input datathat are required to be read in. For example, the geneRIF database is beingprocessed, an example of the format of being:139 2827859 15501399 2005-05-14 12:17 T-cell recognition of the outersurfaceprotein A (OspA) epitope is important in the induction of autoimmunityin treatment-resistant Lyme arthritis (OspA).The tabs can be substituted with “|”s in any text editor that supports find andreplace type operations so that MMTx will have an easier time with the data(see Note 3). In Microsoft Word the data could be transformed as follows:1. Open the database file (generif_basic) in Word.2. Use Ctrl-C to highlight and copy a single tab character.3. Select Edit → Find.4. Select the Replace tab.5. Select the “Find what” textbox and type Ctrl-V to paste in a tab.6. Select the “Replace with” textbox type “|” without quotes.7. Select “Replace All.”8. Save the newly formatted database.This will result in text in the default MMTx format as shown next. 139| 2827859|15501399|2005-05-14 12:17|T-cell recognition of the OspA epitope is important inthe induction of autoimmunity in treatment-resistant Lyme arthritis (OspA).3.6. Running and Handling Results From MMTxPerhaps the most difficult component of data mining with MMTx is handlingthe overwhelming amount of data that will be generated from the original inputdata. This problem is complicated by the fact that MMTx is not really designedto be an end-user program. MMTx is focused more on the production of machinereadabledata for analysis by software tools, and not for direct interruption by anend-user. Potential users must therefore overcome a fairly work-intensive initialbarrier before they can assess the utility of MMTx and UMLS. The simple exampleherein should avoid some of this but the focus on the generation of machinereadabledata means that the best way to handle MMTx generated data isprogrammatically, by either handling the output of MMTx directly or preferablythrough the Java API. Regardless of whether software tools are used to processand analyze the results, ultimately a human is needed for the final analysis.3.6.1. Choosing a Data ModelRegardless of whether the analysis will be software assisted, one considerationremains the same—choosing a data model. As discussed in Subheading 3.3.,there are three different data models. The default “strict” model utilizes the highest

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!