06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.1.3 Replace propriety coded characters<br />

Description<br />

In addition to SGML character coding, other proprietary character coding can <strong>be</strong> used by data<br />

suppliers to code special characters.<br />

For USPTO data, codes like “{UMLAUT OVER (A)}” and “{DOT OVER (E)}” can <strong>be</strong> found. These<br />

coded characters should <strong>be</strong> replaced with their normal ASCII/ANSI equivalents whenever<br />

possible.<br />

Analysis<br />

Proprietary coded characters are identified by querying the data for the following pattern:<br />

“%{%}%”; “%[%]%” and “%(%)%”.<br />

Not all query results have to <strong>be</strong> proprietary coded characters but the query result can <strong>be</strong> used<br />

to identify all occurring proprietary coded characters.<br />

Table 8 contains the proprietary coded characters that were found in the names.<br />

Table 8: Proprietary character codes and their ASCII/ANSI equivalent<br />

PROPRIETARY CODED REPLACEMENT<br />

CHARACTER<br />

CHARACTER<br />

"{UMLAUT OVER (A)}" “Ä”<br />

"{UMLAUT OVER (E)}" “Ë”<br />

"{UMLAUT OVER (O)}" “Ö”<br />

"{UMLAUT OVER (U)}" “Ü”<br />

"{UMLAUT OVER (N)}" “N”<br />

"{UMLAUT OVER (R)}" “R”<br />

"{UMLAUT OVER (Z)}" “Z”<br />

"{ACUTE OVER (A)}" “Á”<br />

"{ACUTE OVER (E)}" “É”<br />

"{ACUTE OVER (T)}" “T”<br />

"{ACUTE OVER (V)}" “V”<br />

"{GRAVE OVER (B)}" “B”<br />

"{GRAVE OVER (R)}" “R”<br />

"{OVERSCORE (A)}" “A”<br />

"{OVERSCORE (D)}" “D”<br />

"{OVERSCORE (E)}" “E”<br />

"{OVERSCORE (O)}" “O”<br />

"{OVERSCORE (U)}" “U”<br />

"{DOT OVER (A)}" “A”<br />

"{DOT OVER (E)}" “E”<br />

"{DOT OVER (U)}" “U”<br />

"{HAECK OVER (C)}" “C”<br />

"{HAECK OVER (S)}" “S”<br />

Implementation<br />

All occurrences of proprietary coded characters are replaced with their respective ASCII/ANSI<br />

equivalent, as defined in Table 8, by executing several update queries on the data.<br />

Results<br />

Proprietary character codes have <strong>be</strong>en replaced with their ASCII/ANSI equivalent in 62 names.<br />

The possibility cannot <strong>be</strong> ruled out that other proprietary character codes are still present in the<br />

names.<br />

Impact<br />

From 440,237 unique names to 440,206 unique names, an additional reduction of 31 names, or<br />

a total reduction of 3,516 names (0.8%).<br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!