06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Some legal form indications are not removed from the name <strong>be</strong>cause they are, to all intents and<br />

purposes, part of the name and removing them could make the underlying name less<br />

comprehensible. A typical example is the German Kommanditgesellschaft, abbreviated as “KG”.<br />

The part “gesellschaft” is part of the full name in a significant num<strong>be</strong>r of cases, as in “ESPE<br />

STIFTUNG & CO. PRODUKTION - UND VERTRIEBS KG”. To prevent mutilation of these kinds of<br />

name, “KG” will not <strong>be</strong> removed from the name. In one of the next steps - common company<br />

word removal - “KG” will <strong>be</strong> removed to obtain a reduced name that can <strong>be</strong> used for searching<br />

related names.<br />

Table 15 also shows that the order of removing and harmonizing legal form indications is<br />

important. It is clear that " + CO. AG" should <strong>be</strong> replaced with “ & COMPANY” <strong>be</strong>fore removing “<br />

AG”.<br />

All 1,060 spelling variations of legal forms occurring at the end of names were converted in<br />

search and replace statements or rules, as listed in Table 15. All these rules and spelling<br />

variations were validated. For every rule or spelling variation, all names containing the spelling<br />

variation were scanned to <strong>be</strong> certain that the rule only affects actual legal form indications. If<br />

the num<strong>be</strong>r of occurrences was higher than 500 (47 of the 1,060 spelling variations), only a<br />

sample of 500 names was checked.<br />

A rule or search and replace statement is withheld only if more than 99% of found spelling<br />

variations are actually legal form indications. As the vast majority of rules resulted in 0<br />

mistakes, the overall accuracy is greater than 99%.<br />

The full list of all search and replace statements for legal forms to <strong>be</strong> removed at the end of a<br />

name can <strong>be</strong> found in Appendix 2.<br />

It has to <strong>be</strong> stressed that the objective is not to maximize the total num<strong>be</strong>r of matches (at the<br />

cost of introducing mismatches) but to minimize the num<strong>be</strong>r of mismatches given a reasonable<br />

num<strong>be</strong>r of matches.<br />

This means that a considerable num<strong>be</strong>r of legal form indications will still <strong>be</strong> present in the<br />

names after legal form removal. On the one hand, only legal forms that were identified on the<br />

basis of the top 40 occurring last words were removed and harmonized, leaving a substantial<br />

num<strong>be</strong>r of legal forms unchanged. On the other hand, not all spelling variations of identified<br />

legal form indications were removed <strong>be</strong>cause some of the occurrences might have nothing to do<br />

with legal form indications but may <strong>be</strong> mere coincidences.<br />

In addition to the real legal form removal and harmonization, some other words, commonly<br />

used in a company context, appearing as last words were also harmonized:<br />

• “CO” was harmonized to “COMPANY”. Preceding “+”, “AND”, “U”, or “UND” were harmonized<br />

to “&”<br />

• “AND” preceding “COMPANY” was harmonized to “&”<br />

• “ CORP” was harmonized to “ CORPORATION”<br />

• “ E C.” was harmonized to “ & COMPANY”<br />

• “ & C.” and “ & C” were harmonized to “ & COMPANY”<br />

At this stage, only legal form indications appearing at the end of a name were removed and<br />

harmonized. In some countries, legal form indications can also appear in front of company<br />

names. The approach descri<strong>be</strong>d above for last words can also <strong>be</strong> used for first words.<br />

Table 17 contains the top 50 occurring first words after cleanup, together with the num<strong>be</strong>r of<br />

names containing the word as a first word, the cumulative num<strong>be</strong>r of names for this word and<br />

all higher ranked words, and the percentage of the cumulative num<strong>be</strong>r of names compared to<br />

the total num<strong>be</strong>r of names (443,722). First words are identified on the basis of the first<br />

occurrence of a space in a name, then all non-(A-Z) and non-(0-9) characters are removed<br />

resulting in a cleaned version of the first word. Appendix 4 contains the list of the top 200<br />

occurring first words.<br />

FIRST WORD<br />

(CLEANED)<br />

Table 17: Top 50 occurring first words<br />

NBR<br />

NAMES CUM %<br />

FIRST WORD<br />

(CLEANED)<br />

NBR<br />

NAMES CUM %<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!