Patentee Name Harmonisation - ecoom.be

More documents

Recommendations

Info

No other HTML codes are present in the names. Impact From 443,722 unique names to 442,795 unique names, a reduction of 927 names (0.2%). 1.1.2 Replace SGML coded characters Description SGML coded characters such as “&AMP;” or “&OACUTE;” should be replaced with their normal ASCII/ANSI equivalent, whenever possible. Analysis SGML coded characters are identified by querying the data for the following pattern: “%&%;%”. Not all query results are real SGML coded characters. For example, “HITACHI ENGINEERING & SERVICES CO; LTD.” matches the pattern but no SGML coded character is involved. However, the query result can be used to identify all occurring SGML coded characters. Table 7 contains the SGML coded characters that were found in the names. Table 7: SGML codes and their ASCII/ANSI equivalent SGML CODE REPLACEMENT CHARACTER &AMP; & &OACUTE; Ó &SECT; § &UACUTE; Ú ⋆ replace with space &BULL; . &EXCL; ! Implementation All occurrences of SGML coded characters are replaced with their respective ASCII/ANSI equivalent, as defined in Table 7, by executing several update queries on the data. The order of the replacement is important, especially in the case of the “&AMP;” SGML code. Every SGML code starts with an ampersand but, sometimes, this ampersand, as part of an SGML code, is also represented by the SGML code “&AMP;”. For example, the following code might appear: “&AMP;EXCL;”. This is, in fact, the SGML code for an exclamation mark (“&EXCL;”) but with the first ampersand also coded as an SGML character. These kinds of codes are correctly converted if, first of all, the “&AMP” code is replaced with “&”, resulting in code “&EXCL;” that can be replaced with “!”. The “&AMP;” code must always be replaced first, before other codes. As replacement with a space can result in leading or trailing spaces, names have to be checked for and trimmed of leading and trailing spaces after replacement of SGML coded characters. Results SGML coded characters have been replaced by their ASCII/ANSI equivalent in 12,430 names. No other SGML coded characters are present in the names. Impact From 442,795 unique names to 440,237 unique names, an additional reduction of 2,558 names, or a total reduction of 3,485 names (0.8%). 19
1.1.3 Replace propriety coded characters Description In addition to SGML character coding, other proprietary character coding can be used by data suppliers to code special characters. For USPTO data, codes like “{UMLAUT OVER (A)}” and “{DOT OVER (E)}” can be found. These coded characters should be replaced with their normal ASCII/ANSI equivalents whenever possible. Analysis Proprietary coded characters are identified by querying the data for the following pattern: “%{%}%”; “%[%]%” and “%(%)%”. Not all query results have to be proprietary coded characters but the query result can be used to identify all occurring proprietary coded characters. Table 8 contains the proprietary coded characters that were found in the names. Table 8: Proprietary character codes and their ASCII/ANSI equivalent PROPRIETARY CODED REPLACEMENT CHARACTER CHARACTER "{UMLAUT OVER (A)}" “Ä” "{UMLAUT OVER (E)}" “Ë” "{UMLAUT OVER (O)}" “Ö” "{UMLAUT OVER (U)}" “Ü” "{UMLAUT OVER (N)}" “N” "{UMLAUT OVER (R)}" “R” "{UMLAUT OVER (Z)}" “Z” "{ACUTE OVER (A)}" “Á” "{ACUTE OVER (E)}" “É” "{ACUTE OVER (T)}" “T” "{ACUTE OVER (V)}" “V” "{GRAVE OVER (B)}" “B” "{GRAVE OVER (R)}" “R” "{OVERSCORE (A)}" “A” "{OVERSCORE (D)}" “D” "{OVERSCORE (E)}" “E” "{OVERSCORE (O)}" “O” "{OVERSCORE (U)}" “U” "{DOT OVER (A)}" “A” "{DOT OVER (E)}" “E” "{DOT OVER (U)}" “U” "{HAECK OVER (C)}" “C” "{HAECK OVER (S)}" “S” Implementation All occurrences of proprietary coded characters are replaced with their respective ASCII/ANSI equivalent, as defined in Table 8, by executing several update queries on the data. Results Proprietary character codes have been replaced with their ASCII/ANSI equivalent in 62 names. The possibility cannot be ruled out that other proprietary character codes are still present in the names. Impact From 440,237 unique names to 440,206 unique names, an additional reduction of 31 names, or a total reduction of 3,516 names (0.8%). 20
Page 1 and 2: Data Production Methods for Harmoni
Page 3 and 4: Table of contents 1 Introduction...
Page 5 and 6: 2 PATENTEE NAME HARMONIZATION AND L
Page 7 and 8: 3.2 DERWENT WPI company name harmon
Page 9 and 10: Figure 1: Overview schema name clea
Page 11 and 12: 4.2.3 Spelling variation harmonizat
Page 13 and 14: 6 DIRECTIONS FOR FURTHER DEVELOPMEN
Page 15 and 16: A considerable number of name varia
Page 17 and 18: IBM INTERNATION BUSINESS MACHINES I
Page 19 and 20: DIAGNOSTICS MITSUBISHIJUKOGYO MITSU
Page 21: APPENDIX 1: STEP-BY-STEP METHODOLOG
Page 25 and 26: 220 "Ü" "U" 221 "Ý" "Y" 159 "Ÿ"
Page 27 and 28: Impact From 438,069 unique names to
Page 29 and 30: “%,GMBH.%” “, GMBH.” “%,G
Page 31 and 32: 2 NAME CLEANING 2.1 Legal form indi
Page 33 and 34: INCORPORATED Incorporated AS Akties
Page 35 and 36: Some legal form indications are not
Page 37 and 38: " GMBH & CO. KG " 191 GMBH Replace
Page 39 and 40: "AND COMPANY" 120 "& COMPANY" 10,90
Page 41 and 42: "TECHNOLOGIES" 7,587 "TECHNOLOGY" "
Page 43 and 44: marked in the previous step as a na
Page 45 and 46: 3 HARMONIZATION RESULTS As the fina
Page 47 and 48: 13 37 12 35 11 61 10 82 9 136 8 179
Page 49 and 50: 3.3 Patent distribution amongst pat
Page 51 and 52: product). Appendix 5 contains the t
Page 53 and 54: The absolute differences in the num
Page 55 and 56: 1056 ", IN.C" 4 INCORPORATED Remove
Page 57 and 58: 1199 " MFG., LTD." 23 LIMITED Repla
Page 59 and 60: 1314 " (I.P) LIMITED" 1 LIMITED Rem
Page 61 and 62: 1450 " P.L.C." 60 PLC Remove 1451 "
Page 63 and 64: 1585 " A.S." 240 AS Remove 1586 " A
Page 65 and 66: 1715 CO." " GESELLSCHAFT M.B.H & CO
Page 67 and 68: 1844 " AKTIENGESELLSCHAFT & CO. KG"
Page 69 and 70: 1981 " MBH." 19 GMBH Remove 1982 "
Page 71 and 72: 2053 " KOGYOLKABUSHIKI KAISHA" 1 20
Page 73 and 74:
LAST WORD NBR OF CUM % LAST WORD NB
Page 75 and 76:
FIRST WORD NBR OF CUM % FIRST WORD
Page 77 and 78:
52 NIKON CORPORATION 4,500 615,317
Page 79 and 80:
177 UNISYS CORPORATION 1,653 942,72
Page 81 and 82:
54 HENKEL KOMMANDITGESELLSCHAFT AUF
Page 83 and 84:
177 CASIO COMPUTER COMPANY 1,848 1,
Page 85:
DIMPLEX NORTH AMERICA 1 22 1 22 0 0
show all

Patentee Name Harmonisation - ecoom.be

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?