Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Implementation<br />
Implementation is straightforward. All identified spelling variations in Table 19, Table 20 and<br />
Table 21 are transferred to search and replace statements or rules as in the previous step –<br />
legal form indication treatment.<br />
All identified and validated occurrences of common company words were removed by executing<br />
a program that reads the search and replace statements or rules, and executes an update query<br />
on the data to replace the given keyword (spelling variation of common company word) with a<br />
given string (replace with nothing to simply remove the common company word), while at the<br />
same time updating a new field to contain the found spelling variation.<br />
Result<br />
Common company words have <strong>be</strong>en removed at the end of names in 68,152 names, at the<br />
<strong>be</strong>ginning of names in 2,463 names, and anywhere in the name in 7,662 names.<br />
Not all common words that are not distinctive elements in names are removed; only the most<br />
commonly used ones are identified by using the last word index, first word index and full text<br />
index. A more in-depth analysis of the indexes could reveal additional words safe to remove.<br />
Impact<br />
From 392,226 unique names to 385,771 unique names, an additional reduction of 6,455 names,<br />
or a total reduction of 57,951 names (13.1%).<br />
2.3 Spelling variation harmonization<br />
Description<br />
One of the causes of name variations is spelling variation (mistakes, typographical errors, etc.).<br />
Identification of word similarities with approximate string searching (for example, based on<br />
Levenshtein distance or edit distance) can <strong>be</strong> used to identify spelling variations. The problem is<br />
that it is not possible to validate name variations in proper names.<br />
For example, “AMTECH” and “IMTECH” have a Levenshtein distance of 1 but is it possible to<br />
combine them into one organization name?<br />
However, spelling, language and grammatical variations are identifiable in the case of plain<br />
English words or other languages.<br />
For example, “SYSTEM”, “SYSTEMS”, “SYSTEMEN”, “SYSTEMES” can all <strong>be</strong> harmonized to<br />
“SYSTEM” or “SYSTEMS”.<br />
Spelling variation harmonization can mutilate organization names and make them less<br />
comprehensible. However, the idea is not to use these spelling-variation harmonized names as<br />
final harmonized names but as some kind of technical search name that can <strong>be</strong> used to identify<br />
name variations of the same organization.<br />
Analysis<br />
Spelling variations that can <strong>be</strong> harmonized were identified by using a full text index of the<br />
organization names.<br />
By sorting the index on the num<strong>be</strong>r of occurrences, most commonly used words can <strong>be</strong><br />
identified. Then, by sorting the index alpha<strong>be</strong>tically, variations of those commonly used words<br />
can <strong>be</strong> identified.<br />
Table 22 contains spelling variations of words that can <strong>be</strong> harmonized.<br />
Table 22: Spelling variations and their harmonized equivalent<br />
KEYWORD NBR REMARKS<br />
"SYSTEMEN" 48 "SYSTEM"<br />
"SYSTEMES" 164 "SYSTEM"<br />
"SYSTEME" 1,140 "SYSTEM"<br />
"SYSTEMS" 10,104 "SYSTEM"<br />
"INTERNATIONALE" 109 "INTERNATIONAL"<br />
37