06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

"TECHNOLOGIES" 7,587 "TECHNOLOGY"<br />

"TECHNOLOGIEN" 61 "TECHNOLOGY"<br />

"TECHNOLOGIE" 705 "TECHNOLOGY"<br />

"INDUSTRIELLES" 112 "INDUSTRIEL"<br />

"INDUSTRIELLE" 415 "INDUSTRIEL"<br />

"INDUSTRIELE" 16 "INDUSTRIEL"<br />

"INDUSTRIES" 6,095 "INDUSTRY"<br />

"INDUSTRIELS" 71 "INDUSTRIEL"<br />

"INSTITUT" 3,753 "INSTITUTE"<br />

"SERVICES" 2,181 "SERVICE"<br />

"ELECTRONICS" 2,742 "ELECTRONIC"<br />

"ENTERPRISES" 1,622 "ENTERPRISE"<br />

"DESIGNS" 358 "DESIGN"<br />

"CHEMICALS" 899 "CHEMICAL"<br />

"HOLDINGS" 1,457 "HOLDING"<br />

"LABORATORIES" 1,373 "LABORATORY"<br />

"COMMUNICATIONS" 1,521 "COMMUNICATION"<br />

"INSTRUMENTS" 992 "INSTRUMENT"<br />

"PLASTICS" 959 "PLASTIC"<br />

"MACHINES" 388 "MACHINE"<br />

"SCIENCES" 843 "SCIENCE"<br />

Implementation<br />

Implementation is again very simple and straightforward. All identified spelling variations in<br />

Table 22 are transferred into search and replace statements or rules as in the previous steps.<br />

All identified and validated occurrences of spelling variations are harmonized by executing a<br />

program that reads the search and replace statements or rules, and executes an update query<br />

on the data to replace the given keyword (spelling variation) with a given string (harmonized<br />

word), while at the same time updating a new field to contain the found spelling variation.<br />

Result<br />

Spelling variations have <strong>be</strong>en harmonized in 45,715 names.<br />

By and large, not all spelling and language variations have <strong>be</strong>en harmonized; only the most<br />

commonly used ones were identified by using the full text index. A more in-depth analysis of<br />

the indexes could reveal additional words safe to harmonize.<br />

Impact<br />

From 385,771 unique names to 384,235 unique names, an additional reduction of 1,536 names,<br />

or a total reduction of 59,487 names (13.4%).<br />

2.4 Condensing<br />

Description<br />

After implementing all previous cleaning steps, a significant num<strong>be</strong>r of variations are still<br />

present <strong>be</strong>cause of alternative spellings caused by separation or punctuation characters and all<br />

other kinds of non-alphanumerical characters that are not relevant to identify a name (e.g. “3<br />

COM” and “3COM”, and “AAF-MCQUAY”, “AAF MCQAY” and “AAF – MCQAY”).<br />

Analysis<br />

Condensing names by simply removing all non-alphanumerical characters and spaces and<br />

leaving only letters and num<strong>be</strong>rs is enough to reduce many variations without introducing<br />

mismatches.<br />

Implementation<br />

All non-alphanumerical characters are removed by executing a program that reads the names<br />

character by character, removing all characters not in the range of a-z, A-Z and 0-9.<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!