06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

An inspection of Table 3 reveals that, in several cases, names cannot <strong>be</strong> assumed to<br />

refer to one and the same patentee automatically (e.g. “AG INTERNATIONAL”, “AH<br />

INTERNATIONAL” and “AL INTERNATIONAL”, or “APPLIED GENERICS” and “APPLIED<br />

GENETICS”). In other words, approximate string searching is very powerful in identifying<br />

potential matches but does not result in conclusive findings. The difference with the former<br />

example in Table 2 is the presence of proper names. Approximate string searching is conclusive<br />

when identifying spelling variations of common words, but far less conclusive in the case of<br />

proper names. Without additional validation efforts, the num<strong>be</strong>r of mismatches can <strong>be</strong><br />

considerable. This problem will <strong>be</strong> more marked, the shorter the length of strings <strong>be</strong>ing<br />

assessed. This lack of accuracy precludes the adoption of approximate string searching in an<br />

automated manner. This issue might <strong>be</strong> addressed by using address information as discussed in<br />

section 6.3 - Introducing address information (in conjunction with name similarity) 13 .<br />

6.2 Automatic acronym generation<br />

As mentioned in the introduction, the presence and use of acronyms and abbreviations<br />

result in an increase in name variations. To deal with acronyms, an automated method of<br />

generating acronyms for company names that consist of different parts might <strong>be</strong> considered.<br />

These generated acronyms can <strong>be</strong> matched with acronyms already present in the list of<br />

patentee names. When a match is found, one could consider harmonizing both names. This,<br />

however, requires the generated acronyms to <strong>be</strong> unambiguously related to an acronym already<br />

present. Unfortunately, this is rarely the case as the following experiment illustrates.<br />

Acronyms have <strong>be</strong>en generated automatically, <strong>be</strong>ginning with a test set of names<br />

consisting of different parts (containing at least one space). The name after its legal form<br />

indication had <strong>be</strong>en removed was used as the starting point. All non-letter (A-Z) and non-digit<br />

(0-9) characters were replaced with a space (“ “), resulting in a string of words separated by<br />

spaces. An acronym was generated taking the first character of every word. These acronyms<br />

can <strong>be</strong> linked back to the cleaned names to see if automatically generated acronyms match<br />

acronyms already present in the name list.<br />

Table 4 and Table 5 contain all names in the test set for which the automatic created<br />

acronym resulted in “IBM” and “ICC” respectively.<br />

Table 4: <strong>Name</strong>s with automatically generated acronym "IBM"<br />

AUTOMATICALLY<br />

GENERATED<br />

ACRONYM<br />

ORIGINAL NAME<br />

IBM IBM BUSINESS MACHINES<br />

IBM IINTERNATIONAL BUSINESS MACHINES<br />

IBM INDUSTRIEANLAGEN BETRIEBSGESELLSCHAFT MBH<br />

IBM INDUSTRIEANLAGEN-BETRIEBSGESELLSCHAFT MBH<br />

IBM INDUSTRIEANLAGEN-BETRIEBTGESELLSCHAFT MBH<br />

IBM INERNATIONAL BUSINESS MACHINES<br />

IBM INFORMATION BUSINESS MACHINES<br />

IBM INTELLECTUAL BUSINESS MACHINES<br />

IBM INTENATIONAL BUSINESS MACHINES<br />

IBM INTERANATIONAL BUSINESS MACHINES<br />

IBM INTERANTIONAL BUSINESS MACHINES<br />

IBM INTERNAIONAL BUSINESS MACHINES<br />

IBM INTERNAITONAL BUSINESS MACHINES<br />

IBM INTERNAL BUSINESS MACHINE<br />

IBM INTERNATIAONAL BUSINESS MACHINES<br />

IBM INTERNATIIONAL BUSINESS MACHINES<br />

IBM INTERNATINAL BUSINESS MACHINES<br />

IBM INTERNATIOAL BUSINESS MACHINES<br />

IBM INTERNATIOANAL BUSINESS MACHINES<br />

IBM INTERNATIOANL BUSINESS MACHINES<br />

IBM INTERNATIOINAL BUSINESS MACHINES<br />

13 For example, according to EPO applicant information, “APPLIED GENERICS” is always situated in Biggar,<br />

GB, while “APPLIED GENETICS” is always situated in Freeport, US, suggesting two different companies.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!