Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
An inspection of Table 3 reveals that, in several cases, names cannot <strong>be</strong> assumed to<br />
refer to one and the same patentee automatically (e.g. “AG INTERNATIONAL”, “AH<br />
INTERNATIONAL” and “AL INTERNATIONAL”, or “APPLIED GENERICS” and “APPLIED<br />
GENETICS”). In other words, approximate string searching is very powerful in identifying<br />
potential matches but does not result in conclusive findings. The difference with the former<br />
example in Table 2 is the presence of proper names. Approximate string searching is conclusive<br />
when identifying spelling variations of common words, but far less conclusive in the case of<br />
proper names. Without additional validation efforts, the num<strong>be</strong>r of mismatches can <strong>be</strong><br />
considerable. This problem will <strong>be</strong> more marked, the shorter the length of strings <strong>be</strong>ing<br />
assessed. This lack of accuracy precludes the adoption of approximate string searching in an<br />
automated manner. This issue might <strong>be</strong> addressed by using address information as discussed in<br />
section 6.3 - Introducing address information (in conjunction with name similarity) 13 .<br />
6.2 Automatic acronym generation<br />
As mentioned in the introduction, the presence and use of acronyms and abbreviations<br />
result in an increase in name variations. To deal with acronyms, an automated method of<br />
generating acronyms for company names that consist of different parts might <strong>be</strong> considered.<br />
These generated acronyms can <strong>be</strong> matched with acronyms already present in the list of<br />
patentee names. When a match is found, one could consider harmonizing both names. This,<br />
however, requires the generated acronyms to <strong>be</strong> unambiguously related to an acronym already<br />
present. Unfortunately, this is rarely the case as the following experiment illustrates.<br />
Acronyms have <strong>be</strong>en generated automatically, <strong>be</strong>ginning with a test set of names<br />
consisting of different parts (containing at least one space). The name after its legal form<br />
indication had <strong>be</strong>en removed was used as the starting point. All non-letter (A-Z) and non-digit<br />
(0-9) characters were replaced with a space (“ “), resulting in a string of words separated by<br />
spaces. An acronym was generated taking the first character of every word. These acronyms<br />
can <strong>be</strong> linked back to the cleaned names to see if automatically generated acronyms match<br />
acronyms already present in the name list.<br />
Table 4 and Table 5 contain all names in the test set for which the automatic created<br />
acronym resulted in “IBM” and “ICC” respectively.<br />
Table 4: <strong>Name</strong>s with automatically generated acronym "IBM"<br />
AUTOMATICALLY<br />
GENERATED<br />
ACRONYM<br />
ORIGINAL NAME<br />
IBM IBM BUSINESS MACHINES<br />
IBM IINTERNATIONAL BUSINESS MACHINES<br />
IBM INDUSTRIEANLAGEN BETRIEBSGESELLSCHAFT MBH<br />
IBM INDUSTRIEANLAGEN-BETRIEBSGESELLSCHAFT MBH<br />
IBM INDUSTRIEANLAGEN-BETRIEBTGESELLSCHAFT MBH<br />
IBM INERNATIONAL BUSINESS MACHINES<br />
IBM INFORMATION BUSINESS MACHINES<br />
IBM INTELLECTUAL BUSINESS MACHINES<br />
IBM INTENATIONAL BUSINESS MACHINES<br />
IBM INTERANATIONAL BUSINESS MACHINES<br />
IBM INTERANTIONAL BUSINESS MACHINES<br />
IBM INTERNAIONAL BUSINESS MACHINES<br />
IBM INTERNAITONAL BUSINESS MACHINES<br />
IBM INTERNAL BUSINESS MACHINE<br />
IBM INTERNATIAONAL BUSINESS MACHINES<br />
IBM INTERNATIIONAL BUSINESS MACHINES<br />
IBM INTERNATINAL BUSINESS MACHINES<br />
IBM INTERNATIOAL BUSINESS MACHINES<br />
IBM INTERNATIOANAL BUSINESS MACHINES<br />
IBM INTERNATIOANL BUSINESS MACHINES<br />
IBM INTERNATIOINAL BUSINESS MACHINES<br />
13 For example, according to EPO applicant information, “APPLIED GENERICS” is always situated in Biggar,<br />
GB, while “APPLIED GENETICS” is always situated in Freeport, US, suggesting two different companies.<br />
13