06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

220 "Ü" "U"<br />

221 "Ý" "Y"<br />

159 "Ÿ" "Y"<br />

Implementation<br />

All occurrences of accented characters are replaced with their respective unaccented character<br />

equivalent, as defined in Table 9, by executing several update queries on the data.<br />

Results<br />

Accented characters have <strong>be</strong>en replaced with their unaccented equivalent in 19,934 names.<br />

There is no guarantee that single accented characters will <strong>be</strong> completely eliminated from the<br />

names, as an exhaustive list of all possible accented characters in all languages has not <strong>be</strong>en<br />

used.<br />

However, no other accented characters that can <strong>be</strong> represented as a single character, as<br />

defined by the standard ASCII/ANSI character code page, are present in the names.<br />

Impact<br />

From 440,206 unique names to 438,366 unique names, an additional reduction of 1,840 names,<br />

or a total reduction of 5,356 names (1.2%).<br />

1.1.5 Check for special characters<br />

Description<br />

After replacement of SGML coded characters, proprietary coded characters and accented<br />

characters, no special characters should remain. 'Special' refers to a character that is not<br />

expected in a name <strong>be</strong>cause it is not a letter, a digit, or a regular punctuation character.<br />

Analysis<br />

Special characters are identified by querying the data for characters that are not part of the<br />

following set of letters, digits and punctuation characters: A-Z; 0-9; “-“; “+”; “’”; “””; “#”; “*”;<br />

“@”; “!”; “?”; “/”; “&”; “(“; “)”; “:”; “;”; “,”; “.”; “ “.<br />

63 names were found to contain special characters but none of them are problematic for the<br />

harmonization (in any case, non-alphanumerical characters and spaces are removed in a further<br />

step).<br />

1.2 Punctuation cleaning (pre-parsing)<br />

1.2.1 Replace double spaces<br />

Description<br />

Double spaces should <strong>be</strong> replaced with a single space.<br />

Analysis<br />

Double spaces are identified by querying the data for names having the pattern “% %”.<br />

1,781 names were found to contain double spaces.<br />

Implementation<br />

All occurrences of double spaces are replaced with a single space by executing an update query<br />

on the data.<br />

Result<br />

Double spaces have <strong>be</strong>en replaced in 1,781 names.<br />

No other double spaces are present in the names.<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!