21.01.2013 Views

STN Sequence Databases - FIZ Karlsruhe

STN Sequence Databases - FIZ Karlsruhe

STN Sequence Databases - FIZ Karlsruhe

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Sequence</strong> Searching on <strong>STN</strong> ®<br />

Part I: <strong>STN</strong> <strong>Sequence</strong> <strong>Databases</strong>


Agenda<br />

• <strong>STN</strong> sequence searchable databases<br />

• Comparison to web-based resources<br />

• Results of a comparative search example<br />

• Summary and resources<br />

2<br />

2


<strong>STN</strong> sequence searchable databases<br />

• DGENE<br />

– Thomson Reuters GENESEQ TM<br />

– Value-added patent sequence data from around<br />

the globe<br />

• USGENE ®<br />

– The USPTO Genetic <strong>Sequence</strong> Database<br />

– A new and unique access point to USPTO<br />

sequence data<br />

3<br />

3


<strong>STN</strong> sequence searchable databases (cont.)<br />

• PCTGEN<br />

– WIPO/PCT Patent Application Biosequences<br />

– The complete collection of e-published<br />

sequences from WIPO<br />

• CAS REGISTRY SM<br />

– Chemical Abstracts Service (CAS) REGISTRY<br />

– Worldwide value-added patent and non-patent<br />

sequence data<br />

4<br />

4


Thomson Reuters GENESEQ (DGENE)<br />

• Largest value-added patent sequence database<br />

• Used routinely by all major patent offices*<br />

• <strong>Sequence</strong>s from the basic patents of the 43<br />

authorities of the Derwent World Patents Index ®<br />

• Bibliography, enhanced title, abstract, indexing,<br />

and patent location provided for each sequence<br />

• Patent Family and Legal Status display<br />

• Updated every two weeks<br />

• 1981 – present<br />

* See page 11: www.trilateral.net/projects/biotechnology/guide2.pdf<br />

5<br />

5


Relationship between DWPI patent family<br />

and DGENE sequence database<br />

AN .... WPINDEX<br />

TI ….<br />

PA ….<br />

PI WO …. A1<br />

AB ….<br />

FR …. A1<br />

US …. A1<br />

US …. B2<br />

WPINDEX = Derwent World Patents Index (DWPI SM ) on <strong>STN</strong><br />

DGENE = GENESEQ on <strong>STN</strong><br />

AN .... Protein DGENE<br />

PI WO …. A1<br />

SEQ 1 ….<br />

AB …… AN .... DNA DGENE<br />

PI WO …. A1<br />

SEQ 2 ….<br />

AB …… AN .... Peptide DGENE<br />

PI WO …. A1<br />

SEQ n ….<br />

AB ……<br />

6<br />

6


What exactly is the “value-add” in DGENE?<br />

• DWPI patent title, concise sequence description,<br />

abstract, and keyword indexing per sequence<br />

– Context of each sequence illuminated within the<br />

invention<br />

– Superior text-based refinement of sequence searches<br />

– Efficient scanning and review of search results for<br />

relevance<br />

• Feature tables for sequence<br />

modifications/annotations<br />

– Extensive detailed annotations provided by indexers<br />

7<br />

7


What exactly is the “value-add” in DGENE?<br />

(cont.)<br />

• Patent sequence location (claim, example,<br />

etc.)<br />

– Assigned manually by Thomson Reuters<br />

indexers<br />

– Flexible filtering of searches to those described<br />

in the claims<br />

• <strong>Sequence</strong>s intellectually derived by indexers<br />

– Unique sequence hits not disclosed in formal<br />

listings<br />

8<br />

8


Some editorial insights regarding<br />

WIPO/PCT sequences indexed in DGENE<br />

• On average 120 WIPO/PCT basic patents have<br />

sequences indexed into DGENE each week<br />

• Of these, about 15-20 may have electronic listings<br />

available – the rest are keyed manually<br />

– <strong>Sequence</strong>s are independently double-keyed with a<br />

guaranteed accuracy of 99.995% (1 in 20,000)<br />

• About 15% of PCTs with electronic listings have<br />

extra sequences indexed from the specification<br />

• Typically 1 or 2 documents per week will also have<br />

intellectually derived sequences indexed, based<br />

upon the wording of the patent claims<br />

Source: Colin Williams, GENESEQ Editorial & Content Manager, Thomson Reuters (12/2006)<br />

9<br />

9


Derived sequences are intellectually<br />

created by indexers from wording in the<br />

patent text<br />

AN AEJ92622 protein DGENE<br />

TI Hydrolyzing/synthesizing carboxylic acid ester/amide from<br />

chiral/prochiral reactants for preparing e.g. pharmaceuticals,<br />

comprises contacting reactants with a polypeptide having hydrolytic<br />

activity.<br />

IN Svendsen A; Vind J; Brask J; De Maria L<br />

PA (NOVO) NOVOZYMES AS.<br />

PI WO 2006084470 A2 20060817 17<br />

In this example, the indexer has<br />

intellectually derived this sequence<br />

from the wild type lipolytic hydrolase.<br />

AI WO 2006-DK76 20060210<br />

PRAI EP 2005-388012 20050210<br />

PSL Claim 16<br />

DED 19 OCT 2006 (first entry)<br />

LA English<br />

OS 2006-560037 [57]<br />

DESC Variant fungal lipolytic hydrolase #2.<br />

KW hydrolysis; lipase; pharmaceutical; pesticide; enzyme; mutein.<br />

ORGN Thermomyces lanuginosus. Synthetic.<br />

AB The new invention relates to a enzymatic method of hydrolyzing or<br />

synthesizing carboxylic acid ester or amide from chiral or prochiral<br />

reactants, by providing reactants for hydrolysis or synthesis, and<br />

contacting the reactants with a polypeptide which has hydrolytic<br />

activity on ester or amide, and a sequence 50% homologous to<br />

Thermomyces lanuginosus lipase. Also described is a polypeptide,<br />

which has hydrolase activity on an ester or amide substrate, and has<br />

an amino acid sequence that has at least 80% identity to SEQ ID No:<br />

5 and compared to SEQ ID No: 5 comprises a substitution corresponding<br />

to I90Q, N92TD, F95Y, F113Y, I202M, V203GM, L269T and 270F. . . . .<br />

10<br />

10


Indexers explain exactly how they derived<br />

the sequence at the end of the abstract<br />

The polypeptide is at least 80% homologous to any of SEQ ID No: 1-6<br />

being amino acid sequences of lipolytic enzymes of fungus such as<br />

Rhizomucor miehei (SWISSPROT P19515), Rhizopus delemar, Fusarium<br />

oxysporum, Penicillium camemberti (SWISSPROT P25234), Thermomyces<br />

lanuginosus (SWISSPROT 059952) and Thermomyces ibadanensis . . . . .<br />

The method is useful in the preparation of pharmaceuticals or<br />

pesticides, where the synthesis includes synthesis of 2-butyl<br />

propionate. This sequence is a variant fungal lipolytic hydrolase<br />

(lipase), V203M T231R N233R. This sequence is not shown in the<br />

specification, but was created by the indexer using the information<br />

given in claim 16.<br />

SQL 269<br />

SEQ 1 evsqdlfnqf nlfaqysaaa ycgknndapa gtnitctgna cpevekadat<br />

51 flysfedsgv gdvtgflald ntnklivlsf rgsrsienwi gnlnfdlkei<br />

101 ndicsgcrgh dgftsswrsv adtlrqkved avrehpdyrv vftghslgga<br />

151 latvagadlr gngydidvfs ygaprvgnra faefltvqtg gtlyrithtn<br />

201 dimprlppre fgyshsspey wiksgtlvpv rrrdivkieg idatggnnqp<br />

251 nipdipahlw yfgligtcl<br />

FEATURE TABLE:<br />

Key |Location|Qualifier|<br />

=============+========+=========+==========================<br />

Modified-site|203 |note |"Wild type Val replaced by<br />

| | |Met"<br />

Modified-site|231 |note |"Wild type Thr replaced by<br />

| | |Arg"<br />

Modified-site|233 |note |"Wild type Asn replaced by<br />

| | |Arg"<br />

The indexer has added explanatory<br />

sentences to the abstract and<br />

annotations to the feature table.<br />

11<br />

11


Patent family and Legal Status information<br />

now displayable in DGENE<br />

• Patent family information from WPINDEX<br />

– FAM format provides all publication, application<br />

and priority numbers, and dates<br />

• Legal Status information from INPADOCDB<br />

– LS format is the standard legal status display for<br />

INPADOCDB<br />

– LS2 format contains same info as LS but also<br />

displays field headers<br />

12<br />

12


DGENE patent family display<br />

=> FILE DGENE<br />

=> S CA2325774/PN<br />

L1 17 CA 2325774/PN<br />

=> D FAM<br />

There are 17 sequence records<br />

in DGENE for CA2325774.<br />

The U.S. member of the family<br />

was granted on May 11, 2004.<br />

L1 ANSWER 1 OF 17 DGENE COPYRIGHT 2010 THOMSON REUTERS on <strong>STN</strong><br />

PI CA 2325774 A1 20010610 (200144)* EN 79[1]<br />

US 20030115627 A1 20030619 (200341) EN<br />

US 6734344 B2 20040511 (200431) EN<br />

ADT CA 2325774 A1 CA 2000-2325774 20001208; US 6734344 B2 Provisional US<br />

1999-170168P 19991210; US 6734344 B2 US 2000-733643 20001208; US<br />

20030115627 A1 Provisional US 1999-170168P 19991210; US 20030115627 A1<br />

US<br />

2000-733643 20001208<br />

PRAI US 1999-170168P 19991210<br />

US 2000-733643 20001208<br />

13<br />

13


DGENE legal status display<br />

=> FILE DGENE<br />

=> S WO2002079175/PN<br />

L1 4 WO2002079175/PN<br />

=> D PI LS<br />

There are 4 sequence records in<br />

DGENE for WO2002079175.<br />

L2 ANSWER 1 OF 4 DGENE COPYRIGHT 2010 THOMSON REUTERS on <strong>STN</strong><br />

PI WO 2002079175 A1 20021010 67<br />

LEGAL STATUS INPADOCDB COPYRIGHT 2010 EPO / <strong>FIZ</strong> KARLSRUHE on <strong>STN</strong><br />

AN ABP55034 DGENE<br />

20010330 GBA PRI Patent application<br />

GB 2001-8097 A 20010330<br />

20020318 WOW APP International application Number<br />

WO 2002-IB817 W 20020318<br />

. . .<br />

20040219 WOREG REFERENCE TO NATIONAL CODE<br />

DE8642 - DE: IMPACT ABOLISHED FOR DE - I.E. PCT APPL. NOT ENT.<br />

GERMAN PHASE<br />

NIF Lapses, Expiries, Withdrawals, Refusals<br />

20050608 WOWWG + WIPO INFORMATION: GRANT IN NATIONAL OFFICE<br />

EP 2002707045<br />

200824.................................20080612<br />

14<br />

14


<strong>Sequence</strong>Base USPTO Genetic <strong>Sequence</strong><br />

Database (USGENE)<br />

• <strong>Sequence</strong>s from all relevant USPTO<br />

published patent applications and granted<br />

(issued) patents<br />

• Original publication title, abstract, and<br />

claims<br />

• Publication, application, related application<br />

and priority numbers and dates; full<br />

assignee and inventor names<br />

15<br />

15


<strong>Sequence</strong>Base USPTO Genetic <strong>Sequence</strong><br />

Database (USGENE) (cont.)<br />

• Organism name, <strong>Sequence</strong> Length,<br />

Molecule Type, SEQ ID NO, Feature<br />

Tables, and Patent <strong>Sequence</strong> Location<br />

• Calculated patent expiration date<br />

• Patent Family and Legal Status display<br />

• Updated weekly – within 3 days of<br />

publication<br />

• 1982 – present<br />

New in 2009 !<br />

16<br />

16


USGENE consolidates unique USPTO<br />

sequence data from different sources<br />

• USPTO Publication Site for Issued and<br />

Published <strong>Sequence</strong>s (PSIPS)<br />

– The official mega-publication download site,<br />

2001-date<br />

• International Nucleotide <strong>Sequence</strong><br />

Database Collaboration (INSDC)<br />

(NCBI/EMBL/DDBJ, Genbank)<br />

– U.S. granted patent nucleotide sequences,<br />

1982-date<br />

17<br />

17


USGENE consolidates unique USPTO<br />

sequence data from different sources (cont.)<br />

• USPTO Protein Database (NCBI/EMBL)<br />

– U.S. granted patent protein/peptide sequences,<br />

1982-date<br />

• USPTO Published Applications and Patents<br />

Full-Text<br />

– Filling in omissions, coverage gaps, and to<br />

enhance timeliness<br />

The USGENE <strong>Sequence</strong> Source (/SSO) field indicates from which<br />

source any given USGENE sequence record was derived.<br />

18<br />

18


USGENE combines these sequences with<br />

bibliographic data and claims text<br />

INSDC<br />

USPTO nucleotide<br />

sequences<br />

NCBI/EMBL-EBI<br />

USPTO peptide<br />

sequences<br />

USPTO biblio,<br />

title, abstract,<br />

and claims text<br />

USPTO full-text<br />

sequences<br />

USPTO PSIPS<br />

sequences<br />

19<br />

19


Relationship between DWPI patent family<br />

and USGENE sequence databases<br />

AN .... WPINDEX<br />

TI ….<br />

PA ….<br />

PI WO …. A1<br />

AB ….<br />

FR …. A1<br />

US …. A1<br />

US …. B2<br />

WPINDEX = Derwent World Patents Index on <strong>STN</strong><br />

USGENE = The USPTO Genetic <strong>Sequence</strong> Database<br />

AN .... Protein USGENE<br />

PI US …. A1<br />

SEQ 1 ….<br />

AN .... DNA USGENE<br />

PI US …. A1<br />

SEQ n ….<br />

AN .... Protein USGENE<br />

PI US …. B2<br />

SEQ 1 ….<br />

AN .... DNA USGENE<br />

PI US …. B2<br />

SEQ n ….<br />

20<br />

20


USGENE sequence records are available<br />

within 3 days of publication by the USPTO<br />

L2 ANSWER 1 OF 1 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on <strong>STN</strong><br />

AN 20080256649.126 Protein USGENE<br />

TI Novel Acetylcholinesterase Gene Responsible for Insecticide Resistance<br />

and Applications Thereof (PublishedApplication)<br />

IN Weill Mylene (Montpellier, FR); Fort Philippe (Castelnau Le Lez, FR);<br />

Raymond Michel (Montpellier, FR); Pasteur Nicole (Montpellier, FR)<br />

PA CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE (PARIS CEDEX FR)<br />

PI US 20080256649 A1 20081016<br />

AI US 2003-518072 20030619<br />

RLI WO 2003-FR1876 20030619<br />

PRAI FR 2002-13799 20021105<br />

FR 2002-7622 20020620<br />

PSL Claim 4; SEQ ID NO 126<br />

DESC Anopheles gambiae Protein; sequence 126 of 129<br />

DT Patent<br />

AB The invention relates to a novel acetylcholinesterase gene (ace-1)<br />

responsible for resistance to organophosphorus and/or carbamates in<br />

. . .<br />

ECLM US20080256649 A1: 1. An insect acetylcholinesterase, characterized in<br />

that it comprises a central catalytic region which has an amino acid<br />

sequence selected from the group consisting of the sequence SEQ ID NO 1<br />

. . .<br />

SSO PROTEIN; USPTO; APPLICATION<br />

ORGN Anopheles gambiae<br />

SQL 737<br />

SEQ<br />

AN 20080256649.126 is SEQ ID<br />

NO: 126 from US20080069867.<br />

1 meirgllmgr lrlgrrmvpl gllgvtalll ilppsalvqg rhhelnngaa<br />

51 igshqlsaaa gvglssqsaq sgslasgvms svpaagasss ssssllsssa . . . .<br />

AN 20080256649.126 is<br />

displayed here in BRIEF format,<br />

which includes the Exemplary<br />

Claim (ECLM).<br />

21<br />

21


USGENE also has an extensive backfile<br />

This USPTO example is US5210028,<br />

which was issued on May 11, 1993.<br />

Published sequence data like this are<br />

identified, extracted, standardized, and<br />

loaded into USGENE on <strong>STN</strong> (compare<br />

this to the <strong>STN</strong> record on the next slide).<br />

22<br />

22


To facilitate precise searching, all USGENE<br />

sequences are in <strong>STN</strong> standardized format<br />

L1 ANSWER 1 OF 1 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on <strong>STN</strong><br />

AN 5210028.1 protein USGENE<br />

TI Process for the production of unfused IGF-II protein in E. coli<br />

(Patent)<br />

IN Schmitz Albert (Basel, CH); Marki Walter (Mohlin, CH)<br />

PA Ciba Geigy Corporation (Ardsley NY)<br />

PI US 5210028 A 19930511<br />

AI US 1990-616470 19901121<br />

AB A process for the preparation of a recombinant IGF-II (rIGF-II)<br />

without a covalently attached foreign protein moiety and without Nterminal<br />

attached methionine or a derivative of methionine or of a<br />

salt of said IGF-II, rIGF-II produced by said method, . . . .<br />

ECLM US5210028 A: We claim:1. A process for the production of a<br />

recombinant IGF-II without a covalently attached foreign protein<br />

moiety and without N-terminal attached methionine or a derivative of<br />

methionine, said process comprising:a) transforming a suitable strain<br />

of E. coli, said strain being a lon.sup.- and htpR.sup.- double<br />

mutant, with a hybrid vector comprising an expression cassette<br />

consisting of the following elements in the 5' to 3' direction, said<br />

elements which are operably linked: an inducible promoter, a<br />

ribosomal binding site, and the codon for the amino acid methionine<br />

linked in proper reading frame to a DNA sequence encoding a human<br />

IGF-II having the amino acid sequence of SEQ ID NO. . . .<br />

SSO PROTEIN; USPTO; GRANTED<br />

ORGN Human IGF II<br />

SQL 67<br />

SEQ<br />

1 ayrpsetlcg gelvdtlqfv cgdrgfyfsr pasrvsrrsr giveeccfrs<br />

51 cdlalletyc atpakse<br />

AN 5210028.1 is SEQ ID NO: 1<br />

from US5210028.<br />

AN 5210028.1 is displayed here<br />

in BRIEF format, which includes<br />

the Exemplary Claim (ECLM).<br />

Compare the <strong>STN</strong> standardized<br />

USGENE record to the original<br />

data source on the previous<br />

slide.<br />

23<br />

23


USGENE is an essential tool for tackling<br />

business-critical searches<br />

• DGENE provides curated and indexed patent<br />

sequence data from the DWPI basic publication<br />

– 61% of basics are WIPO/PCT published applications<br />

– Updated biweekly, typically 65 days from publication<br />

• USGENE provides all available sequence data<br />

from the USPTO as a single merged resource<br />

– Both U.S. patents and U.S. published applications<br />

– Updated weekly, within 3 days of USPTO publication<br />

• <strong>Sequence</strong> listing variation often occurs between<br />

PCT and U.S. granted patent publication stages<br />

– Especially important, e.g., for freedom-to-operate<br />

24<br />

24


USGENE and DGENE capture sequence<br />

data from different patent family members<br />

AN .... Protein USGENE<br />

PI US …. A1<br />

SEQ 1 ….<br />

AN .... DNA USGENE<br />

PI US …. A1<br />

SEQ 2 ….<br />

AN .... Protein USGENE<br />

PI US …. B2<br />

SEQ 1 ….<br />

AN .... DNA USGENE<br />

PI US …. B2<br />

SEQ 2 ….<br />

AN .... WPINDEX<br />

PI WO ….. A1<br />

FR ….. A1<br />

EP ….. A1<br />

US ….. A1<br />

EP ….. B1<br />

US ….. B2<br />

AN .... Protein DGENE<br />

PI WO …. A1<br />

SEQ 1 ….<br />

AN .... DNA DGENE<br />

PI WO …. A1<br />

SEQ 2 ….<br />

DGENE sequences are<br />

indexed by Thomson<br />

Reuters from DWPI<br />

basic publications.<br />

WPINDEX = Derwent World Patents Index on <strong>STN</strong><br />

DGENE = GENESEQ on <strong>STN</strong><br />

USGENE = USPTO Genetic <strong>Sequence</strong> Database<br />

25<br />

25


<strong>Sequence</strong> listing variation often occurs<br />

between PCT and U.S. granted patent stage<br />

L1 ANSWER 1 OF 1 WPINDEX COPYRIGHT 2009 THOMSON REUTERS on <strong>STN</strong><br />

AN 1994-358278 [44] WPINDEX<br />

TI New polynucleotide(s) specific for hepatitis C virus types 4, 5 and 6 -<br />

and related antigenic peptide(s) and antibodies, useful in vaccines,<br />

diagnosis, HCV typing and treatment<br />

DC B04; D16; S03<br />

IN PIKE I H; SIMMONDS P; YAP P L<br />

PA (COMM-N) COMMON SERVICES AGENCY; (MURE-N) MUREX DIAGNOSTICS INT INC; . . .<br />

PI WO 9425602 A1 19941110 (199444)* EN 70[5]<br />

AU 9465797 A 19941121 (199508) EN<br />

FI 9505224 A 19951220 (199611) FI<br />

EP 698101 A1 19960228 (199613) EN [0]<br />

JP 09500009 W 19970107 (199711) JA 52[0]<br />

AU 695259 B 19980813 (199844) EN<br />

EP 698101 B1 20041103 (200475) EN<br />

DE 69434116 E 20041209 (200481) DE<br />

US 20050032047 A1 20050210 (200512) EN<br />

US 6881821 B2 20050419 (200527) EN<br />

. . . . .<br />

ADT WO 9425602 A1 WO 1994-GB957 19940505 . . . .<br />

PRAI GB 1994-263 19940107<br />

GB 1993-9237 19930505<br />

In this example the patent<br />

family has:<br />

• 9 sequences from<br />

WO9425602 in DGENE<br />

• 50 sequences from<br />

US20050032047 in<br />

USGENE<br />

• 58 sequences from<br />

US6881821 in USGENE<br />

26<br />

26


USGENE enhancements<br />

• Patent family and Legal Status displays<br />

• Patent <strong>Sequence</strong> Location (/PSL)<br />

• Priority application info (/PRAI)<br />

• U.S. related application info (/RLI)<br />

• Calculated patent expiration date (/XPD)<br />

• Patent term adjustment info (/NTE, /PTA)<br />

• Concise, on-line <strong>Sequence</strong> Description<br />

(/DESC)<br />

Note: All fields available from June 5, 2007.<br />

27<br />

27


Patent family and Legal Status information<br />

now displayable in USGENE<br />

• Patent family information from INPADOCDB<br />

– FAM format provides all publication, application<br />

and priority numbers, and dates in tabular<br />

format<br />

– CFAM format provides simple table of all<br />

publication numbers and dates<br />

• Legal Status information from INPADOCDB<br />

– LS format is the standard legal status display for<br />

INPADOCDB<br />

– LS2 format contains the same info as LS but<br />

also displays field headers<br />

28<br />

28


Patent family and Legal Status information<br />

available in USGENE<br />

=> FILE USGENE<br />

FILE 'USGENE' ENTERED AT 19:49:28 ON 12 JAN 2010<br />

COPYRIGHT (C) 2010 SEQUENCEBASE CORP<br />

FILE LAST UPDATED: 7 JAN 2010 <br />

MOST RECENT PUBLICATION DATE: 7 JAN 2010 <br />

FILE COVERS 1981 TO DATE<br />

>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLE<br />

IN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS > DOWNLOAD THE USGENE WORKSHOP MANUAL:<br />

http://www.stn-international.com/USGENE_workshop_manual.html<br />

. . .<br />

=> S US2009004109/PN<br />

L1 103 US2009004109/PN<br />

(US20090004109/PN)<br />

29<br />

29


Patent family and Legal Status information<br />

available in USGENE<br />

=> D BRIEF FAM LS<br />

L1 ANSWER 1 OF 103 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on <strong>STN</strong><br />

AN 20090004109.103 Protein USGENE <br />

TI Antibodies and Molecules Derived Therefrom that Bind to Steap-1 Proteins<br />

(PublishedApplication)<br />

IN Jacobovits Aya (Beverly Hills, CA); Etessami Soudabeh (Tarzana, CA);<br />

Challita-Eid Pia M. (Encino, CA); Perez-Villar Juan J. (Los Angeles, CA);<br />

Morrison Karen J. (Santa Monica, CA); Jia Xiao-Chi (Los Angeles, CA);<br />

Faris Mary (Los Angeles, CA); Gudas Jean (Pacific Palisades, CA); Raitano<br />

Arthur B. (Los Angeles, CA)<br />

PA AGENSYS INC (Santa Monica CA)<br />

PI US 20090004109 A1 20090101<br />

AI US 2004-587197 20040422<br />

RLI WO 2004-US12625 20040422<br />

PRAI US 2004-587197 20040422<br />

PSL SEQ ID NO 103<br />

DESC Homo Sapiens Protein; sequence 103 of 103<br />

DT Patent<br />

AB Antibodies and molecules derived there from that bind to novel STEAP-1<br />

…<br />

STEAP-1 can be used in active or passive immunization.<br />

ECLM US20090004109 A1: 1. An antibody or fragment thereof comprising an<br />

antigen binding site that binds specifically to STEAP-1 protein . . .<br />

SSO PROTEIN; USPTO; APPLICATION<br />

ORGN Homo Sapiens<br />

SQL 29<br />

SEQ 1 dkwmltrkqf gllslffavl haiyslsyp<br />

20090004109.103 is displayed<br />

here in BRIEF format.<br />

30<br />

30


Patent family information available in<br />

USGENE<br />

PATENT FAMILY INFORMATION INPADOCDB COPYRIGHT 2010 EPO / <strong>FIZ</strong> KARLSRUHE on <strong>STN</strong><br />

AN 20090004109.103 USGENE<br />

+-------------PRAI-------------+ +--------------AI--------------+<br />

WO 2004-US12625 W 20040422 AU 2004-319915 A 20040422<br />

BR 2004-18766 A 20040422<br />

CA 2004-2563735 A 20040422<br />

EA 2006-1946 A 20040422<br />

EP 2004-750565 A 20040422<br />

JP 2007-509439 T 20040422<br />

MX 2006-12187 A 20061020<br />

NO 2006-5366 A 20061121<br />

US 2004-587197 A 20040422<br />

WO 2004-US12625 W 20040422<br />

+--------------AI--------------+ +--------------PI--------------+<br />

AU 2004-319915 A 20040422 AU 2004319915 A1 20051201<br />

BR 2004-18766 A 20040422 BR 2004018766 A 20071009<br />

CA 2004-2563735 A 20040422 CA 2563735 A1 20051201<br />

EA 2006-1946 A 20040422 EA 2006001946 A2 20070629<br />

EA 2006001946 A3 20070831<br />

EP 2004-750565 A 20040422 EP 1742966 A2 20070117<br />

EP 1742966 A4 20080402<br />

JP 2007-509439 T 20040422 JP 2008509880 T 20080403<br />

MX 2006-12187 A 20061020 MX 2006012187 A 20070328<br />

NO 2006-5366 A 20061121 NO 2006005366 A 20070119<br />

US 2004-587197 A 20040422 US 20090004109 A1 20090101<br />

WO 2004-US12625 W 20040422 WO 2005113601 A2 20051201<br />

WO 2005113601 A8 20070628<br />

1 priority, 10 applications, 13 publications<br />

20090004109.103 is displayed<br />

here in FAM format, which<br />

includes the patent family<br />

information from INPADOCDB.<br />

31<br />

31


Legal Status information available in<br />

USGENE<br />

LEGAL STATUS INPADOCDB COPYRIGHT 2010 EPO / <strong>FIZ</strong> KARLSRUHE on <strong>STN</strong><br />

AN 20090004109.103 USGENE<br />

20040422 WOWA PRI PCT application claimed from national procedure<br />

WO 2004-US12625 WA 20040422<br />

.......................................20090115<br />

20040422 USA APP Patent application<br />

US 2004-587197 A 20040422<br />

.......................................20090115<br />

20090101 USA1 PUB FIRST PUBLISHED PATENT 20090004109.103 APPLICATION [FROM is 2001 displayed ONWARDS]<br />

US 20090004109 A1 20090101<br />

here in LS format, which<br />

200903.................................20090115<br />

20090701 USAS ASSIGNMENT includes the Legal Status<br />

20090521<br />

AGENSYS, INC., CALIFORNIA<br />

information from INPADOCDB.<br />

ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JAKOBOVITS,<br />

AYA;ESTESSAMI, SOUDABEH;CHALLITA-EID, PIA M.;AND<br />

OTHERS;REEL/FRAME:022905/0519;SIGNING DATES FROM<br />

TO 20090528<br />

CHG Change of Owner, Inventor, Applicant<br />

200942.................................20091015<br />

32<br />

32


USGENE now provides Patent <strong>Sequence</strong><br />

Location (/PSL)<br />

• /PSL includes<br />

– <strong>Sequence</strong> identity number<br />

– If claimed, the corresponding claim number<br />

• Easily identify claimed sequences<br />

• Available from publication date of June 5,<br />

2007<br />

33<br />

33


USGENE now provides Patent <strong>Sequence</strong><br />

Location (/PSL)<br />

L2 ANSWER 1 OF 1 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP on <strong>STN</strong><br />

AN 7482426.2 protein USGENE<br />

TI C23 Polypeptides (Patent)<br />

IN Franz-Bacon Karin (San Diego, CA); Gorman Daniel M. (Newark, CA);<br />

McClanahan Terrill K. (Sunnyvale, CA)<br />

PA Schering Corporation (Kenilworth NJ)<br />

PI US 7482426 B2 20090127<br />

US 20030092123 A1 20030515<br />

AI US 2002-246983 20020918<br />

RLI US 1998-99898 19980618<br />

PRAI US 1997-50156P 19970619<br />

XPD 20180618 (calculated)<br />

NTE Subject to any Disclaimer, the term of this patent is extended or<br />

adjusted under 35 USC 154(b) by 548 days.<br />

PSL Claim 1; SEQ ID NO 2<br />

DESC Protein; sequence 2 of 4<br />

DT Patent<br />

AB Nucleic acids encoding a new family of small cysteine rich soluble<br />

7482426.2 is displayed here in<br />

BRIEF format, which includes<br />

the Patent <strong>Sequence</strong> Location<br />

(/PSL).<br />

. . .<br />

ECLM US7482426 B2: 1. An isolated polypeptide comprising the sequence of<br />

residues 1 to 90 of SEQ ID NO: 2.<br />

SSO PROTEIN; USPTO; GRANTED<br />

ORGN Not provided<br />

SQL 108<br />

SEQ 1 mkalcllllp vlgllvsskt lcsmeeaine riqevagsli fraissigle<br />

51 cqsvtsrgdl atcprgfavt gctcgsacgs wdvraettch cqcagmdwtg<br />

101 arccrvqp<br />

34<br />

34


PCTGEN is the World Patent Application<br />

Biosequences database on <strong>STN</strong><br />

• Produced by <strong>FIZ</strong> <strong>Karlsruhe</strong> and WIPO<br />

• <strong>Sequence</strong>s submitted & published electronically<br />

as a formal part of PCT patent applications<br />

• Publication number and date, patent applicant<br />

name(s) and the original publication title are<br />

provided for each sequence<br />

• <strong>Sequence</strong> length, SEQ ID, organism name, and<br />

molecule type are included for each sequence<br />

• Updated weekly – within 24 hours of publication<br />

• August 2001 – present<br />

35<br />

35


Relationship between PCTFULL and<br />

PCTGEN databases<br />

AN … PCTFULL<br />

TI ....<br />

PA ....<br />

PI WO …. A1<br />

AB ….<br />

DETD ....<br />

CLM ....<br />

PCTFULL = WIPO/PCT patent applications full-text<br />

PCTGEN = WIPO/PCT patent application biosequences<br />

AN .... Protein PCTGEN<br />

PI WO …. A1<br />

SEQ 1 ….<br />

AN .... DNA PCTGEN<br />

PI WO …. A1<br />

SEQ 2 ….<br />

AN .... Peptide PCTGEN<br />

PI WO …. A1<br />

SEQ n ….<br />

36<br />

36


Each PCTGEN sequence record includes<br />

publication title and bibliography<br />

L1 ANSWER 1 OF 1 PCTGEN COPYRIGHT 2010 WIPO on <strong>STN</strong><br />

AN 2006069200.16112 PRT PCTGEN<br />

TI Group B Streptococcus<br />

PA Tettelin, Herve<br />

Masignani, Vega<br />

PI WO 2006069200 20060629<br />

RLI US 2004-638943P 20041222; US 2004-640438P 20041230<br />

ED 20060630<br />

DT Patent<br />

ORGN Streptococcus agalactiae<br />

SQL 302<br />

SEQ<br />

1 mflmplasll gnltvwhhlk heiikipfsr ldilihlrpt lmlflpqitm<br />

51 qiylslnksm lgamdsvvsa gyfdqsdkii rilftivsai ggvflprlss<br />

. . . .<br />

251 atlsgavlyy intqmsvslv nyviqslvav tiyvgivfit kapviqllXX<br />

301 Xn<br />

FEATURE TABLE:<br />

Key |Location |<br />

==========+=============+=======================<br />

VARIANT |299, 300, 301|Xaa = Any Amino Acid<br />

AN 2006069200.16112 is<br />

SEQ ID NO: 16112 from<br />

WO2006069200.<br />

<strong>Sequence</strong>s are typically<br />

added to PCTGEN within 24<br />

hours of publication by WIPO.<br />

37<br />

37


Patent family and Legal Status information<br />

now displayable in PCTGEN<br />

• Patent family information from INPADOCDB<br />

– FAM format provides all publication, application and<br />

priority numbers, and dates in tabular format<br />

– CFAM format provides table of all publication numbers<br />

and dates<br />

• Legal Status information from INPADOCDB<br />

– LS format is the standard Legal Status display for<br />

INPADOCDB<br />

– LS2 format contains the same info as LS but also<br />

displays field headers<br />

38<br />

38


Patent family and Legal Status information<br />

available in PCTGEN<br />

=> FILE PCTGEN<br />

FILE 'PCTGEN' ENTERED AT 20:01:02 ON 12 JAN 2010<br />

COPYRIGHT (C) 2010 WIPO<br />

FILE LAST UPDATED: 8 JAN 2010 <br />

MOST RECENT PCT PUB DATE: 7 JAN 2010 <br />

PCTGEN CURRENTLY CONTAINS 7,004,444 BIOSEQUENCES<br />

>>> DOWNLOAD COMPLETE PCTGEN HELP AS PDF:<br />

http://www.stn-international.com/pctgen_help.html<br />

. . .<br />

>>> Percent identity sorting is now available


Patent family and Legal Status information<br />

available in PCTGEN<br />

=> D ALL FAM LS<br />

L1 ANSWER 1 OF 16 PCTGEN COPYRIGHT 2010 WIPO on <strong>STN</strong><br />

AN 2009010296.16 PRT PCTGEN<br />

TI Antibodies against 4-(methylnitrosamino)-1-(3-pyridy1)-1-butanone<br />

andmetabolites thereof [File created by using OCR software]<br />

PA Centre de Recherche Publique Sante<br />

PI WO 2009010296 20090122<br />

RLI EP 2007-7 20070718<br />

ED 20090128<br />

DT Patent<br />

ORGN artificial sequence<br />

SQL 11<br />

SEQ<br />

1 qqwrshppal t<br />

FEATURE TABLE:<br />

Key |Location|<br />

==========+========+==============================<br />

source | |Description<br />

| |of artificial sequence:CDR3 of<br />

| |VL of antibody produced<br />

WO2009010296.16 is displayed<br />

here in ALL format.<br />

40<br />

40


Patent family information available<br />

in PCTGEN<br />

PATENT FAMILY INFORMATION INPADOCDB COPYRIGHT 2010 EPO / <strong>FIZ</strong> KARLSRUHE on<br />

<strong>STN</strong><br />

AN 2009010296.16 PCTGEN<br />

+-------------PRAI-------------+ +--------------AI--------------+<br />

EP 2007-14089 A 20070718 EP 2007-14089 A 20070718<br />

WO 2008-EP5876 W 20080717<br />

+--------------AI--------------+ +--------------PI--------------+<br />

EP 2007-14089 A 20070718 EP 2016954 A1 20090121<br />

WO 2008-EP5876 W 20080717 WO 2009010296 A2 20090122<br />

WO 2009010296 A3 20090618<br />

1 priority, 2 applications, 3 publications<br />

WO2009010296.16 is displayed here in FAM<br />

format, which includes the patent family<br />

information from INPADOCDB.<br />

41<br />

41


Legal Status information available<br />

in PCTGEN<br />

LEGAL STATUS INPADOCDB COPYRIGHT 2010 EPO / <strong>FIZ</strong> KARLSRUHE on <strong>STN</strong><br />

AN 2009010296.16 PCTGEN<br />

20070718 EPA PRI Patent application<br />

EP 2007-14089 A 20070718<br />

.......................................20090129<br />

20080717 WOW APP International application Number<br />

WO 2008-EP5876 W 20080717<br />

.......................................20090129<br />

20090122 WOA2 PUB INTERNATIONAL APPLICATION PUBLISHED WITHOUT<br />

INTERNATIONAL SEARCH REPORT or INTERNATIONAL<br />

APPLICATION PUBLISHED WITH<br />

DECLARATION UNDER ARTICLE 17 (2) (A)<br />

WO 2009010296 A2 20090122<br />

200905.................................20090129<br />

20090318 WO121 EP: THE EPO HAS BEEN INFORMED BY WIPO THAT EP WAS<br />

DESIGNATED IN THIS APPLICATION<br />

EP 08784856 A2<br />

WO2009010296.16 is displayed<br />

here in LS format, which includes<br />

the Legal Status information from<br />

INPADOCDB.<br />

200912.................................20090320<br />

20090618 WOA3 PUB LATER PUBLICATION OF ISR WITH REVISED FRONT PAGE<br />

WO 2009010296 A3 20090618<br />

200926.................................20090625<br />

42<br />

42


CAS REGISTRY<br />

• Produced by CAS<br />

• <strong>Sequence</strong>s from >3,000 life science journals<br />

and the basic patents of the 60 patent<br />

authorities in CAplus SM on <strong>STN</strong><br />

• Patent number, location, and standardized<br />

nomenclature provided for each sequence<br />

• >61.5 million sequence records<br />

• Updated daily<br />

• Comprehensively from 1957 - present<br />

43<br />

43


Relationship between CAplus patent family<br />

and CAS REGISTRY databases<br />

AN .... CAPLUS<br />

TI ….<br />

PA ….<br />

PI WO …. A1<br />

AB ….<br />

FR …. A1<br />

US …. A1<br />

US …. B2<br />

IT RN ....<br />

RN .... Protein REGISTRY<br />

PI WO …. A1<br />

SEQ 1 ….<br />

RN .... DNA REGISTRY<br />

PI WO …. A1<br />

SEQ 2 ….<br />

RN .... Peptide REGISTRY<br />

PI WO …. A1<br />

SEQ 3 ….<br />

44<br />

44


A typical REGISTRY sequence record for a<br />

patent sequence<br />

L1 ANSWER 1 OF 1 REGISTRY COPYRIGHT 2009 ACS on <strong>STN</strong><br />

ED Entered <strong>STN</strong>: 16 Jan 2007<br />

RN 917531-59-0 REGISTRY<br />

CN 20: PN: WO2006137596 SEQID: 21 unclaimed protein (CA INDEX NAME)<br />

FS PROTEIN SEQUENCE<br />

SQL 128<br />

PATENT ANNOTATIONS (PNTE):<br />

<strong>Sequence</strong> |Patent<br />

Source |Reference<br />

=========+============<br />

Not Given|WO2006137596<br />

|unclaimed<br />

|SEQID 21<br />

WIPO and other patent sequences typically<br />

enter REGISTRY within 27 days of publication<br />

– in this example only 20 days.<br />

SEQ 1 KCDLALDPDL ARIMAHSRDY DEQLHVWLAW RDAIGPQIRD KYIQYVQMAN<br />

51 HAARLNGFHD AGQQQREAYE DSDINSQLTE LWATLAPLYR ELHAYVRRHL<br />

101 VQRYGPERVR PDGPMPAHLL GNMWSRAN<br />

MF Unspecified<br />

CI MAN<br />

SR CA<br />

LC <strong>STN</strong> Files: CA, CAPLUS<br />

DT.CA CAplus document type: Patent<br />

RL.P Roles from patents: PRP (Properties)<br />

Since October 1999, CAS REGISTRY patent sequence<br />

records include the publication number, SEQ ID number, and<br />

an intellectually assigned claimed/unclaimed notation.<br />

45<br />

45


The corresponding basic patent record is<br />

in CAplus on <strong>STN</strong><br />

CAS provides value added<br />

abstracting and indexing for the<br />

L1 ANSWER 1 OF 1 CAPLUS COPYRIGHT 2009 ACS on <strong>STN</strong><br />

AN 2006:1357195 CAPLUS<br />

DN 146:95066<br />

TI Screening for effectors of insect dipeptidyl CAplus carboxypeptidase basic publication. A as<br />

insecticides<br />

IN Shimokawatoko, Yasutaka; Craen, Marc Van De; Nooren, Irene; Turconi,<br />

Sandra; Naudet, Yann; Nys, Guy; Debaveye, Jurgen<br />

PA Sumitomo Chemical Company, Limited, Japan<br />

FAN.CNT 1<br />

PATENT NO. KIND DATE APPLICATION NO. DATE<br />

--------------- ---- -------- -------------------- --------<br />

PI WO 2006137596 A2 20061228 WO 2006-JP313039 20060623<br />

JP 2007000060 A 20070111 JP 2005-183031 20050623<br />

PRAI JP 2005-183031 A 20050623<br />

The sequence is linked to the<br />

AB Methods of screening for effector of the dipeptidyl carboxypeptidase A<br />

(I) of insects for use as insecticides is described. . . . .<br />

IC ICM A01N<br />

CC 5-4 (Agrochemical Bioregulators)<br />

Section cross-reference(s): 3, 7, 12<br />

. . . .<br />

IT 917531-51-2 917531-53-4 917531-55-6 917531-57-8 917531-59-0<br />

RL: PRP (Properties)<br />

(unclaimed protein sequence; screening for effectors of insect<br />

dipeptidyl carboxypeptidase A as insecticides)<br />

CAplus patent family record by its<br />

CAS Registry Number ® .<br />

46<br />

46


<strong>Sequence</strong>s are indexed from patents<br />

• 60 patent authorities from around the world<br />

– Including WIPO, EPO, USPTO, JP, DE, GB,<br />

FR, RU, and CA<br />

• For the major patent authorities listed<br />

above, timeliness is top notch<br />

– Bibliographic information is available in CAplus<br />

within 2 days of patent issuance<br />

– <strong>Sequence</strong>s are available in REGISTRY within 1<br />

month<br />

47<br />

47


REGISTRY sequences also come from<br />

>3,000 life science journals<br />

• Biochemistry<br />

• Cell<br />

• EMBO Journal<br />

• Gene<br />

• Journal of Biological Chemistry<br />

• Journal of Cell Biology<br />

• Journal of Molecular Biology<br />

• Nature<br />

• Nucleic Acid Research<br />

• Nature Genetics<br />

• Proceedings of the National Academy of Sciences<br />

• Science<br />

48<br />

48


Unique sequence types are indexed<br />

in CAS REGISTRY<br />

• Naturally occurring proteins and nucleotides<br />

• Chemically modified peptides and proteins<br />

• <strong>Sequence</strong>s deduced from gene translations<br />

• GenBank sequences and translations<br />

• Multichain proteins<br />

• Cyclic peptides<br />

• Fusion proteins<br />

• Peptide-metal complexes<br />

• <strong>Sequence</strong>s containing uncommon amino acids<br />

• Protein-nucleotide sequences (PNA sequences)<br />

49<br />

49


<strong>Sequence</strong> searching in CAS REGISTRY<br />

has many advantages<br />

• Links to CAplus and other databases for<br />

references<br />

– Add date restrictions<br />

– Add keywords for relevance<br />

• Links to electronic full text of journal articles<br />

and patents<br />

• Current awareness alerts based on similarity<br />

search results<br />

50<br />

50


Agenda<br />

• <strong>STN</strong> sequence searchable databases<br />

• Comparison to web-based resources<br />

• Results of a comparative search example<br />

• Summary and resources<br />

51<br />

51


There are three main web resources that<br />

provide searchable patent sequence data<br />

• National Center for Biotechnology Information<br />

(NCBI) of the U.S. National Library of Medicine<br />

– www.ncbi.nlm.nih.gov<br />

• European Bioinformatics Institute* (EBI)<br />

– www.ebi.ac.uk<br />

• DNA DataBank of Japan (DDBJ)<br />

– www.ddbj.nig.ac.jp<br />

• The USPTO, EPO, and JPO rely on the NCBI,<br />

EBI, and DDBJ, respectively, to provide an<br />

interface for searching patent sequence data<br />

(* The EBI is the U.K. based outstation of the European Molecular Biology Laboratory – EMBL)<br />

52<br />

52


Why does <strong>STN</strong> have more patent<br />

sequences than NCBI?<br />

1. NCBI has a substantially incomplete collection of<br />

WIPO/PCT sequence data<br />

2. NCBI does not cover any USPTO Published<br />

Application sequence data<br />

3. DGENE and REGISTRY cover many more patent<br />

authorities than NCBI<br />

4. DGENE, USGENE, PCTGEN, and REGISTRY<br />

are typically much more timely than NCBI<br />

5. NCBI does not cover all of the USPTO granted<br />

patent sequence data available in USGENE<br />

53<br />

53


In addition, NCBI/EMBL/DDBJ patent<br />

records have only minimal bibliographic<br />

and text data<br />

Reminder: NCBI/EMBL/DDBJ cover<br />

sequences from U.S. granted<br />

patents – sequences from U.S.<br />

published applications are not<br />

covered (see previous slide).<br />

54<br />

54


Agenda<br />

• <strong>STN</strong> sequence searchable databases<br />

• Comparison to web-based resources<br />

• Results of a comparative search example<br />

• Summary and resources<br />

55<br />

55


A simple BLAST ® example shows the<br />

importance of using <strong>STN</strong> databases<br />

Search Question<br />

Find all patent references to Breast Cancer 1, early<br />

onset isoform 1 (NCBI: NP_009225), or other very<br />

similar proteins (i.e. >80% match).<br />

(Search conducted on June 12, 2008)<br />

56<br />

56


Homo sapiens Breast Cancer 1, early onset<br />

isoform 1 protein (NCBI: NP_009225)<br />

57<br />

57


The best answer from DGENE<br />

=> D BIB AB PSL ORGN SCORE ALIGN<br />

114 sequence hits were<br />

found in DGENE.<br />

L2 ANSWER 1 OF 114 DGENE COPYRIGHT 2009 THOMSON REUTERS on <strong>STN</strong><br />

AN AFU71404 protein DGENE<br />

TI Diagnosing breast cancer or a susceptibility to breast cancer<br />

comprises detecting BRCA2 999del5 and BARD1 Cys557Ser.<br />

IN Stacey S N; Sulem P; Thorsteinsdottir U; Kong A<br />

PA (STAC-I) STACEY S N. . . . .<br />

PI US 2007092900 A1 20070426 65<br />

AI US 2006-515368 20060831<br />

PRAI US 2005-730703P 20051026<br />

OS 2007-556498 [54]<br />

CR N-PSDB: AFU71403<br />

DESC Human breast cancer 1 (BRCA1) protein.<br />

PSL Disclosure; SEQ ID NO 2<br />

AB The present invention relates to a methods for diagnosing and<br />

characterizing breast cancer or a susceptibility to breast cancer in<br />

an individual comprising detecting breast cancer 2 (BRCA2) 999del5<br />

and BRCA1 associated RING domain 1 (BARD1) Cys557Ser. . . . .<br />

ORGN Homo sapiens.<br />

SCORE 3722 100% of query self score 3722<br />

BLASTALIGN<br />

Query = 1863 letters<br />

Length = 1863<br />

Score = 3722 bits (9652), Expect = 0.0<br />

Identities = 1863/1863 (100%), Positives = 1863/1863 (100%)<br />

Query: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

Sbjct: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

DGENE records feature<br />

extensive Thomson Reuters<br />

value-added content.<br />

58<br />

58


The best answer from USGENE<br />

=> D BIB AB ECLM ORGN SCORE ALIGN<br />

70 sequence hits were<br />

found in USGENE.<br />

L4 ANSWER 1 OF 70 USGENE COPYRIGHT 2009 SEQUENCEBASE CORP on <strong>STN</strong><br />

AN 20070083334.873670 Protein USGENE<br />

TI Methods and systems for annotating biomolecular sequences<br />

(PublishedApplication)<br />

IN Mintz Liat (Kendall-Park, NJ); Xie Hanqing (Plainsboro, NJ); . . . .<br />

PA Compugen Ltd<br />

PI US 20070083334 A1 20070412<br />

AI US 2006-443428 20060531<br />

DT Patent<br />

AB A method of annotating biomolecular sequences. The method comprises<br />

(a) computationally clustering the biomolecular sequences according<br />

to a progressive homology range, to thereby generate a plurality of<br />

clusters each being of a predetermined homology . . . .<br />

ECLM US20070083334 A1: 1-98. (canceled) 99. An isolated protein, having a<br />

sequence according to the amino acid sequence of SEQ ID NO: 836875.<br />

ORGN Homo Sapiens<br />

SCORE 3722 100% of query self score 3722<br />

BLASTALIGN<br />

Query = 1863 letters<br />

Length = 1863<br />

Score = 3722 bits (9652), Expect = 0.0<br />

Identities = 1863/1863 (100%), Positives = 1863/1863 (100%)<br />

Query: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

Sbjct: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

Note that this best answer differs<br />

from that found in DGENE.<br />

59<br />

59


The best answer from PCTGEN<br />

=> D BIB ORGN SCORE ALIGN<br />

L6 ANSWER 1 OF 7 PCTGEN COPYRIGHT 2009 WIPO on <strong>STN</strong><br />

AN 2007047796.8180 PRT PCTGEN<br />

TI TISSUE- AND SERUM-DERIVED GLYCOPROTEINSAND METHODS OF THEIR USE<br />

PA Institute for Systems Biology<br />

Zhang, Hui<br />

Aebersold, Rudolf H.<br />

PI WO 2007047796 20070426<br />

RLI US 2005-728044P 20051017<br />

ED 20070427<br />

DT Patent<br />

ORGN Homo sapiens<br />

7 sequence hits were<br />

found in PCTGEN.<br />

Note that this is a different<br />

best answer from those found<br />

in DGENE and USGENE<br />

(previous slides).<br />

SCORE 3722 100% of query self score 3722<br />

BLASTALIGN<br />

Query = 1863 letters<br />

Length = 1863<br />

Score = 3722 bits (9652), Expect = 0.0<br />

Identities = 1863/1863 (100%), Positives = 1863/1863 (100%)<br />

Query: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

Sbjct: 1 MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ<br />

60<br />

60


The best patent answer from REGISTRY<br />

Note that this is the same<br />

best answer found in<br />

DGENE.<br />

53 sequence hits were found<br />

in REGISTRY.<br />

61<br />

61


The best patent answer from NCBI<br />

32 sequence hits were<br />

found at NCBI.<br />

Note that these patents are not the same<br />

best answers seen in DGENE, USGENE,<br />

PCTGEN, or REGISTRY.<br />

Reminder: NCBI provides only protein<br />

patent sequence coverage from U.S. issued<br />

patents. U.S. and WIPO/PCT (WO)<br />

published application protein sequences<br />

cannot be retrieved from NCBI.<br />

62<br />

62


Summary of results for Breast Cancer 1,<br />

early onset isoform 1 protein (NCBI:<br />

NP_009225)<br />

SEQs<br />

> 80%<br />

PNs<br />

PATENT<br />

FAMILIES<br />

DGENE 114 36 26<br />

USGENE 70 38 22<br />

PCTGEN 7 4 4<br />

REGISTRY 52 30 23<br />

NCBI* 32 21 9<br />

Total Unique - - 34<br />

(* All of the NCBI patent hits were for U.S. patents, and none of them were unique compared to USGENE.)<br />

63<br />

63


Agenda<br />

• <strong>STN</strong> sequence searchable databases<br />

• Comparison to web-based resources<br />

• Results of a comparative search example<br />

• Summary and resources<br />

64<br />

64


Comparing databases<br />

• DGENE<br />

– The most comprehensive global patent<br />

sequence database<br />

– Implemented in-house at major patent offices*<br />

• USGENE<br />

– <strong>Sequence</strong>s from equivalent USPTO<br />

applications and patents<br />

– Incorporates all U.S. patent sequence data<br />

available at NCBI<br />

* See page 11: www.trilateral.net/projects/biotechnology/guide2.pdf<br />

65<br />

65


Comparing databases (cont.)<br />

• PCTGEN<br />

– <strong>Sequence</strong>s from equivalent WIPO/PCT<br />

publications<br />

– The complete collection of e-published<br />

sequences from WIPO<br />

• REGISTRY<br />

– Complementary value-added indexing to<br />

DGENE<br />

– Unique non-patent literature coverage<br />

66<br />

66


Database timeliness is an important factor<br />

in understanding comprehensiveness<br />

Update<br />

Frequency<br />

Typical<br />

Timeliness<br />

PCTGEN Weekly 24 hours<br />

USGENE Weekly 3 days<br />

REGISTRY Daily 27 days<br />

DGENE Biweekly 65 days<br />

NCBI/EMBL Daily 1-6 months<br />

Value<br />

added<br />

67<br />

67


Resources for sequence searching on <strong>STN</strong><br />

• <strong>Sequence</strong> Searching on <strong>STN</strong> modular workshop<br />

www.fiz-k.com/bostonsequenceworkshop<br />

– <strong>Sequence</strong> Code Match (SCM) searching<br />

– DGENE, USGENE, PCTGEN content and searching<br />

– CAS REGISTRY and REGISTRY BLAST<br />

– Multifile searching and post-processing<br />

• USGENE resources, reference materials and FAQ<br />

www.sequencebase.com<br />

• CAS REGISTRY sequence coverage and<br />

resources<br />

www.cas.org/support/stngen/stndoc/sequences.html<br />

68<br />

68


For more information …<br />

CAS<br />

• E-mail: help@cas.org<br />

• Training and Support:<br />

www.cas.org/support/<br />

<strong>FIZ</strong> <strong>Karlsruhe</strong><br />

• E-mail: helpdesk@fiz-karlsruhe.de<br />

• Training and Support:<br />

www.stn-international.de<br />

Feb. 2010

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!