06.06.2013 Views

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

Patentee Name Harmonisation - ecoom.be

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 NAME CLEANING<br />

2.1 Legal form indication treatment<br />

Description<br />

A lot of organization names contain some kind of legal form indication (e.g. “INC.”, “LIMITED”,<br />

“LTD.”). These legal form indications cause considerable name variation <strong>be</strong>cause of<br />

abbreviations, spelling variations, and legal form variations of names.<br />

The name of a company can mostly <strong>be</strong> separated from the legal form without changing the real<br />

company name, although there are some exceptions of legal form that really are part of the<br />

name (see <strong>be</strong>low). Moving and harmonizing legal forms to a separate field can greatly reduce<br />

the num<strong>be</strong>r of name variations.<br />

The idea is to end up with the real name where non-relevant legal form indications are<br />

removed. It is not the intention to mutilate the organization name; the name still has to <strong>be</strong><br />

complete and comprehensible. Whenever the legal form is part of the name, the legal form will<br />

not <strong>be</strong> removed.<br />

For example, “S.A.B.C.A.” or “SABCA” stands for “Société Anonyme Belge de Constructions<br />

Aéronautiques”. “Société Anonyme” or “SA” is a legal form indication, but removing it from the<br />

name would leave “BCA” or “Belge de Constructions Aéronautiques”, making the name hard to<br />

recognize.<br />

This also means that if there is any doubt that part of the name is a legal form indication, the<br />

name should <strong>be</strong> left unchanged (some parts of a name can accidentally coincide with variations<br />

or abbreviations of a legal form).<br />

The legal form indications are not completely deleted. They are removed from the name but, at<br />

the same time, the harmonized legal form is transferred to a different field. This gives the end<br />

user the opportunity to decide on whether two names that are identical except for the legal<br />

form should <strong>be</strong> considered the same entity.<br />

For example, “IBM AG”, “IBM INCORPORATED” and “IBM INC” will all <strong>be</strong> harmonized to “IBM”<br />

but, in a separate field, the first name will still <strong>be</strong> la<strong>be</strong>led as “AG” while the other two names will<br />

<strong>be</strong> la<strong>be</strong>led as “INCORPORATED”, leaving the choice to the user to query on the harmonized<br />

name only (combining all three names) or to query on the combination of the harmonized name<br />

and the harmonized legal form (again splitting up the result <strong>be</strong>tween “IBM AG”, on the one<br />

hand, and “IBM INCORPORATED”, including “IBM INCORPORATED” and “IBM INC”, on the<br />

other).<br />

Analysis<br />

An official list of legal forms and their official abbreviations of all countries applying for patents<br />

can <strong>be</strong> a starting point for the identification of legal forms, but it is not very useful due to all<br />

kind of variations appearing in the patentee names.<br />

An alternative approach is to index the last word of all organization names and check the top<br />

occurring words. All words that are not common English words are potentially legal form<br />

indications and can <strong>be</strong> checked in detail to see if they can <strong>be</strong> removed.<br />

Table 13 contains the top 50 occurring last words after cleanup, along with the num<strong>be</strong>r of<br />

names containing the word as a last word, the cumulative num<strong>be</strong>r of names for this word and<br />

all higher ranked words, and the percentage of the cumulative num<strong>be</strong>r of names compared to<br />

the total num<strong>be</strong>r of names (443,722). Last words are identified on the basis of the last<br />

occurrence of a space in a name; then all non-(A-Z) and non-(0-9) characters are removed<br />

resulting in a cleaned version of the last word. Appendix 3 contains the list of the top 200<br />

occurring last words.<br />

LAST WORD<br />

(CLEANED)<br />

Table 13: Top 50 occurring last words<br />

NBR OF<br />

NAMES CUM %<br />

LAST WORD<br />

(CLEANED)<br />

NBR OF<br />

NAMES CUM %<br />

28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!