Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.1.4 Replace accented characters<br />
Description<br />
Some foreign languages such as French and German use accented characters like “é” or “ü”.<br />
However, these accented characters are not always used in the spelling of names and,<br />
therefore, should <strong>be</strong> replaced with their unaccented equivalents.<br />
For some characters in some languages, replacement of accented characters with unaccented<br />
equivalents might not <strong>be</strong> straightforward <strong>be</strong>cause of alternative spelling. For example,<br />
characters in German (and in other languages such as Hungarian) with a diacritic mark<br />
(’umlaut’: “ä”, “ö”, “ü”) have an alternative spelling with an additional “e”; therefore, the real<br />
equivalent of “wärme” is “waerme”, the real equivalent of “förderung” is “foerderung” and the<br />
real equivalent of “für” is “fuer”. However, in practice, the simplified spelling can also <strong>be</strong> found,<br />
so both “für”, “fuer” and “fur” can appear. This raises the question whether the German “ä”, “ö”<br />
and “ü” should <strong>be</strong> replaced by “a”, “o” and “u” or “ae”, “oe” and “ue” respectively.<br />
As the simplified spelling (without the additional “e”) already appears in the original names, in<br />
addition to the accented spelling and the real equivalent with the additional “e”, all accented<br />
characters are replaced with their simple underlying equivalent without an umlaut and without<br />
an additional “e”. This means that, as a result of this cleaning step, “für” will <strong>be</strong> harmonized<br />
with “fur”, but “fuer” will not <strong>be</strong> harmonized with “fur” or “für”.<br />
A separate step – Umlaut harmonization - will <strong>be</strong> used later on to harmonize language-specific<br />
spelling variations of accented characters.<br />
Analysis<br />
All kinds of accented characters can appear in different languages. An exhaustive list of all<br />
possible accented characters in all languages has not <strong>be</strong>en used. Instead, this step focuses on<br />
those characters that can <strong>be</strong> represented as a single character, as defined by the standard<br />
ASCII/ANSI character code page. All other possible accented characters cannot always <strong>be</strong><br />
represented correctly by all software, and are mostly coded (see above, SGML and proprietary<br />
coded characters).<br />
Table 9 contains the accented characters that can <strong>be</strong> found in the ASCII/ANSI character code<br />
page and their unaccented variant (only uppercase characters are taken into account, as the<br />
dataset is supposed only to contain uppercase characters).<br />
Table 9: Accented characters and their unaccented equivalent<br />
CODE CHARACTER UNACCENTED<br />
EQUIVALENT<br />
192 "À" "A"<br />
193 "Á" "A"<br />
194 "Â" "A"<br />
195 "Ã" "A"<br />
196 "Ä" "A"<br />
197 "Å" "A"<br />
198 "Æ" "AE"<br />
199 "Ç" "C"<br />
200 "È" "E"<br />
201 "É" "E"<br />
202 "Ê" "E"<br />
203 "Ë" "E"<br />
204 "Ì" "I"<br />
205 "Í" "I"<br />
206 "Î" "I"<br />
207 "Ï" "I"<br />
209 "Ñ" "N"<br />
210 "Ò" "O"<br />
211 "Ó" "O"<br />
212 "Ô" "O"<br />
213 "Õ" "O"<br />
214 "Ö" "O"<br />
217 "Ù" "U"<br />
218 "Ú" "U"<br />
219 "Û" "U"<br />
21