Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
Patentee Name Harmonisation - ecoom.be
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1.1.3 Replace propriety coded characters<br />
Description<br />
In addition to SGML character coding, other proprietary character coding can <strong>be</strong> used by data<br />
suppliers to code special characters.<br />
For USPTO data, codes like “{UMLAUT OVER (A)}” and “{DOT OVER (E)}” can <strong>be</strong> found. These<br />
coded characters should <strong>be</strong> replaced with their normal ASCII/ANSI equivalents whenever<br />
possible.<br />
Analysis<br />
Proprietary coded characters are identified by querying the data for the following pattern:<br />
“%{%}%”; “%[%]%” and “%(%)%”.<br />
Not all query results have to <strong>be</strong> proprietary coded characters but the query result can <strong>be</strong> used<br />
to identify all occurring proprietary coded characters.<br />
Table 8 contains the proprietary coded characters that were found in the names.<br />
Table 8: Proprietary character codes and their ASCII/ANSI equivalent<br />
PROPRIETARY CODED REPLACEMENT<br />
CHARACTER<br />
CHARACTER<br />
"{UMLAUT OVER (A)}" “Ä”<br />
"{UMLAUT OVER (E)}" “Ë”<br />
"{UMLAUT OVER (O)}" “Ö”<br />
"{UMLAUT OVER (U)}" “Ü”<br />
"{UMLAUT OVER (N)}" “N”<br />
"{UMLAUT OVER (R)}" “R”<br />
"{UMLAUT OVER (Z)}" “Z”<br />
"{ACUTE OVER (A)}" “Á”<br />
"{ACUTE OVER (E)}" “É”<br />
"{ACUTE OVER (T)}" “T”<br />
"{ACUTE OVER (V)}" “V”<br />
"{GRAVE OVER (B)}" “B”<br />
"{GRAVE OVER (R)}" “R”<br />
"{OVERSCORE (A)}" “A”<br />
"{OVERSCORE (D)}" “D”<br />
"{OVERSCORE (E)}" “E”<br />
"{OVERSCORE (O)}" “O”<br />
"{OVERSCORE (U)}" “U”<br />
"{DOT OVER (A)}" “A”<br />
"{DOT OVER (E)}" “E”<br />
"{DOT OVER (U)}" “U”<br />
"{HAECK OVER (C)}" “C”<br />
"{HAECK OVER (S)}" “S”<br />
Implementation<br />
All occurrences of proprietary coded characters are replaced with their respective ASCII/ANSI<br />
equivalent, as defined in Table 8, by executing several update queries on the data.<br />
Results<br />
Proprietary character codes have <strong>be</strong>en replaced with their ASCII/ANSI equivalent in 62 names.<br />
The possibility cannot <strong>be</strong> ruled out that other proprietary character codes are still present in the<br />
names.<br />
Impact<br />
From 440,237 unique names to 440,206 unique names, an additional reduction of 31 names, or<br />
a total reduction of 3,516 names (0.8%).<br />
20