Annotated Borrelia burgdorferi B31 Plasmid Nucleotide Sequences

Fraser et al. (1997) and Casjens et al. (2000)

Compiled by Sherwood Casjens, Daniel Haft, Jeremy Peterson, Brian Stevenson and Claire Fraser
Last modified on Feb. 3, 2000.

This document is also available as Macintosh Microsoft WORD 5.1 and OFFICE ’98 (WORD’98) document from Sherwood Casjens.

Please send corrections, additions, comments, etc. to sherwood.casjens@hci.utah.edu.

Table of Contents

 

Introduction and Summary

Part I

Annotated Complete B31 Plasmid Gene/Pseudogene List

Part II

B31 Paralogous Gene Families

Part III

Putative B31 Plasmid Lipoprotein Genes

Part IV

The Pseudo-, Questionable, and Short Genes on the B31 Plasmids

Part V

Short Sequence Repeats in the B31 Plasmids

Part VI

Ambiguous Nucleotides in the B. burgdorferi B31 Genome Sequence

Part VII

Genome Sequence Assembly Methods

Part VIII

References

ORGANIZATION OF THIS DOCUMENT (read me first)

PURPOSE

This document contains a number of tables which cross-annotate the current knowledge of the B. burgdorferi B31 genome in various ways. We hope that this cross-referencing will allow readers to browse through the information profitably, and that it will allow them to become familiar with what is not known as well as what is known about this genome. Major conclusions from this analysis are published in Fraser et al. (1997) and Casjens et al. (2000)

ORGANIZATION

In each section of this document plasmids are listed with circular plasmids ascending in number (approximate size) followed by linear plasmids ascending in number as follows:

cp9, cp26, cp32-1, cp32-3, cp32-4, cp32-6, cp32-7, cp32-8, cp32-9,

lp5, lp17, lp21, lp25, lp28-1, lp28-2, lp28-3, lp28-4, lp36, lp38, lp54, lp56

OPEN READING FRAMES and PREDICTED GENES

Throughout this document we use the words "gene" and "protein" advisedly to mean putative gene and putative protein that has been predicted from the nucleotide sequence. Since little molecular biology has been done with these organisms, nearly all of the "genes" in this document are currently only identified as open reading frames.

GENBANK ACCESSION NUMBERS and GENE NAME PREFIXES

The B. burgdorferi B31 chromosome and plasmid sequences are available at the TIGR Borrelia web site or from GENBANK. The accession numbers from GENBANK and gene name prefixes are as follows (as reported in Fraser et al. (1997) and Casjens et al. (2000):

Replicon

Accession #

gene name prefix

     

Chromosome

AE000788

BB0 (BBzero)

cp9

AE000791

BBC

cp26

AE000792

BBB

cp32-1

AE001575

BBP

cp32-3

AE001576

BBS

cp32-4

AE001577

BBR

cp32-6

AE001578

BBM

cp32-7

AE001579

BBO (BB"oh")

cp32-8

AE001580

BBL

cp32-9

AE001581

BBN

lp5

AE001583

BBT

lp17

AE000793

BBD

lp21

AE001582

BBU

lp25

AE000785

BBE

lp28-1

AE000794

BBF

lp28-2

AE000786

BBG

lp28-3

AE000784

BBH

lp28-4

AE000789

BBI

lp36

AE000788

BBK

lp38

AE000787

BBJ

lp54

AE000790

BBA

lp56

AE001584

BBQ

 

 

 

 

 

 

 

 

B31 Plasmid Open Reading Frame Summary

Sherwood Casjens - 1999

ALL B31 PLASMIDS

898 total gene-like entities. Among these gene-like entities are the following:

836 genes (which are not "questionable") + pseudogenes

167 pseudogenes (+ about 10 others that have marginal similarity to "intact" genes)

62 "questionable" genes (29 in-frame fragments of larger pseudogenes; 33 ²300 bp genes inside a larger pseudogene in another frame and short genes that were not called in paralogous sequence elsewhere on the plasmids).

669 "intact" genes (which are not "questionable")

39 convincing similarity hits to genes of known function outside of Borrelia among plasmid genes

16 convincing similarity hits to genes of unknown function outside of Borrelia among plasmid genes

535 "intact" genes >300 bp (which are not "questionable")

134 "intact" genes ²300 bp (which are not "questionable")

472 genes (which are not "questionable") have a paralog (it may not be intact)

197 genes (which are not "questionable") have no paralog (63 of these are >300 bp and 134 are ²300 bp)

98 plasmid gene-like entities that encode potential lipoproteins

90 intact plasmid genes that encode potential lipoproteins

7 gene-like entities that we defined as pseudogenes have translation start codons that could possibly lead to expression of lipoproteins that are truncated relative to their paralogs

32 intact plasmid genes that are below but close to our lipidation cutoff

162 paralogous gene families, 107 of which have plasmid-borne members

9 paralogous gene families encode only predicted lipoproteins

17 paralogous gene families are heterogeneous in that at least 1 potential LP and at least one non-LP is found in the family

 

 

THE "LOW PSEUDOGENE" or "WELL BEHAVED" B31 PLASMIDS

These plasmids are: cp9, cp26, all seven of the cp32s, lp28-2, lp54 and the cp32-like portion of lp56

498 gene-like entities on the "well behaved" plasmids on which apparent protein-encoding genes occupy >70% of the DNA.

9 "questionable" genes (all are ²300 bp genes inside a larger pseudogene in another frame or short genes that were not called in paralogous sequence elsewhere on the plasmids).

489 genes (which are not "questionable") + pseudogenes

22 pseudogenes

467 genes (which are not "questionable")

420 genes >300 bp (which are not "questionable")

47 genes ²300 bp (which are not "questionable")

54 genes that encode potential lipoproteins

12 genes that are below but close to our lipidation cutoff

23 convincing matches to genes of known function outside of Borrelia among plasmid genes (which are not "questionable")

13 convincing matches to genes of unknown function outside of Borrelia among plasmid genes (which are not "questionable")

 

THE "HIGH PSEUDOGENE" or "NOT YET AMMELIORATED" B31 PLASMIDS

These plasmids are: lp5, lp17, lp21, lp25, lp28-1, lp28-3, lp28-4, lp36, lp38, lp56 and the non-cp32-like portion of lp56

400 gene-like entities on the "bad" plasmids on which apparent protein-encoding genes occupy <75% of the DNA.

53 "questionable" genes (29 in-frame fragments of larger pseudogenes; 24 ²300 bp genes inside a larger pseudogene in another frame and short genes that were not called in paralogous sequence elsewhere on the plasmids).

347 genes (which are not "questionable") + pseudogenes

145 pseudogenes

202 genes (which are not "questionable")

115 genes >300 bp (which are not "questionable")

87 genes ²300 bp (which are not "questionable")

37 genes that encode potential lipoproteins

5 genes that are below but close to our lipidation cutoff

16 convincing matches to genes of known function outside of Borrelia among plasmid genes (which are not "questionable")

3 convincing matches to genes of unknown function outside of Borrelia among plasmid genes (which are not "questionable")

 

Part I

Annotated B. burgdorferi B31 Plasmid Gene List

Compiled by Sherwood Casjens, Dan Haft and Jeremy Peterson - April 1999

Definitions for Gene List

Note that these definitions are NOT necessarily absolutely identical to those used in the other published gene lists and maps for B. burgdorferi or on the TIGR WEB site. In particular we have an expanded definition of "pseudogene" that includes truncated members of paralogous gene families.

Putative genes and gene names column lists all the putative "gene-like entities" - genes and pseudogenes - currently recognized in the twenty-one B. burgdorferi B31 plasmids. We tentatively interpret those genes not indicated to be pseudogenes to be intact and potentially functional, but since the functionality of most Borrelia genes is unknown this may not be true. The gene and plasmid names used here are those used in Fraser et al. (1997) and Casjens et al. (2000). Of course any given putative pseudo-, questionable, short, fragmented or frameshifted genes could in principle have an important function, but it seems likely that a substantial fraction of them are not functional.

Daggers mark computer-recognized ORFs that are an in-frame and part of a larger pseudogene entity. To avoid counting the entity twice, these were ignored when compiling gene and pseudogene numbers in Casjens et al. (2000).

Coordinates - these columns list the positions of the 5’ and 3’ ends of the gene or pseudogene on the sequence of the relevant plasmid.

Database hit outside Borrelia indicates all similarities to non-Borrelia sequences in the extant database as of January 1999. The criteria for inclusion in the list are those of the TIGR protocol, which uses BLAST (Altschul et al., 1997), and alignments can be found on the TIGR Borrelia WEB page. A search using EMOTIF (Nevill-Manning et al., 1998) did not find any additional convincing B31 plasmid gene similarities to previously known genes.

Common name column gives gene names previously used in the literature. If it was previously named in a strain other than B31, the Borrelia strain is given in parentheses. In addition, we and others have suggested more specific, clarifying common names for genes currently under study in the following paralogous families: mlp [family 113], bdr [80], rev [63] and erp [162/163/164] genes.

Paralog family column indicates the family of paralogous genes (homologs within B. burgdorferi B31) to which individual genes belong. A complete list of genes and pseudogenes in each of these paralogous gene families can be found in PART II of this document.

Comments Column

N-terminal lipidation consensus refers to genes whose products are most likely to be lipoproteins.

Near-consensus N-terminal lipidation signal refers to genes whose products may be lipoproteins, but whose N-terminal amino acid sequences did not quite meet the arbitrary cutoff that we set for criteria for inclusion in the "probable lipoproteins" category.

See PART III of this document for a discussion of the strategies used to identifiy possible lipoprotein encoding genes.

Authentic frameshift genes contain one or a few simple frameshifts relative to their paralogs. It is unlikely that these are actually expressed by programmed frameshifting mechanisms, since they usually do not contain the expected translationally "slippery" sequences. The TIGR computer uses this term for damaged genes (hence it currently replaces "pseudogene" in some parts of the TIGR Borrelia web page). These considered to be pseudogenes in this analysis (Casjens et al., 2000).

Authentic point mutation gene has an in-frame stop codon relative to its paralogs. These are considered to be pseudogenes in this analysis.

Gene fragments or truncated genes are substantially shorter than other members of their paralogous families. Some of these could be expressed and have a function, although they are included in the pseudogene category for ease of discussion in this analysis and to point out that they are truncated.

Pseudogenes are regions of DNA that are similar in sequence to a paralogous Borrelia gene or to a gene from another organism, but which are obviously truncated and/or do not have full open reading frames relative to those homologs. These mostly appear to be mutationally damaged genes - they include "authentic frameshift", "authentic point mutation", fused and truncated genes. These pseudogenes often contain multiple frameshifts, deletions, insertions and inversions (see Casjens et al., 2000).

Exceptions to this definition of a pseudogene are the 15 silent vlsE cassettes on lp28-1; these are not damaged are apparently "designed" to be a reservoir of antigenic variation for the vlsE protein. They are pseudogenes in that they are incomplete relative to the expressed vlsE gene and are probably not expressed themselves.

Of course the gene fragments whose reading frames are intact, that we include in this category for ease of discussion, could in fact be expressed and if so could perform a function. Nonetheless such fragments are very unusual in prokaryotes, and given the other evidence for many rearrangements in the B31 plasmids (Casjens et al., 2000) it seems likely that many, if not all of such fragments, may no longer have a biological function.

See PART IV of this document for a complete list of pseudogenes and the reasons why each is so classified.

"Questionable genes" were called by TIGR’s standard gene recognition protocol, but there is reason to suspect they may be spurious calls. For example, "computer-called genes" that are inside another gene or pseudogene and small genes that were not called in paralogous sequence elsewhere in the Borrelia sequence. Those marked with daggers (†) are inside of larger pseudogenes, but which were nonetheless called as genes by the TIGR protocol.

See PART IV of this document for a complete list of questionable genes and the reasons why each is so classified.

Short genes are <300 bp in length but ARE NOT in the "questionable" or "pseudogene" categories. The Borrelia plasmids have an inordinately large fraction of called genes that are <300 bp in length. These are often not tightly packed and fall into regions that contain no larger genes. Of course any given putative short gene could in principle be functional, but it seems likely that a substantial fraction of them are not functional

See PART IV of this document for a complete list of short plasmid "genes".

Putative functions were deduced in most cases from homologies to genes of known function.

WE EMPHASIZE ONE MORE TIME! Any given putative pseudo-, questionable, short, fragmented or frameshifted gene (as we have defined them) could in principle be functional. But it seems likely that a substantial fraction of them are not functional. We use the above pseudogene definitions only as terms to describe relevant features of the B31 plasmid genes, not to imply functionality in any specific cases.

A Complete B. burgdorferi B31 Plasmid Gene List

Putative Gene

5’end

3’end

Database hit outside Borrelia

{organism of best database hit}

Common

Name

Paralog Family

Comments/References

cp9

         

A homolog of cp9, called cp8.3 from B. garinii strain Ip21 was completely sequenced by (Dunn et al., 1994)

BBC01

163

1269

   

57

 

BBC02

1282

1836

   

50

 

BBC03

1892

2449

   

49

 

BBC04

2700

2593

     

short gene

BBC05

2804

3709

   

161

 

BBC06

4377

3856

 

eppA

95

exported protein (Champion et al., 1994)

BBC07

4788

4507

     

short gene

BBC08

5534

5977

   

55

 

(BBC09)

         

Does not exist; erroneously present in original gene list and map in figure 2 of Fraser et al. (1997)

BBC10

6808

6284

   

63

N-terminal lipidation consensus

BBC11

6974

7768

   

96

 

BBC12

9203

7914

   

165

 
             

cp26

         

Homolog of cp26 present in essentially all isolates (e.g., Tilly et al., 1997)

BBB01

16

321

conserved hypothetical protein {Escherichia coli}

   

weak similarity to acylphosphatase

BBB02

751

311

       

BBB03

2186

840

weak (Y-BLAST) similarity to phage N15 gene 29

   

The protein encoded by this gene has weak similarity to the putative "protelomerase" encoded by gene 29 of phage N15 ( Ravin et al., in preparation). Circumstantial evidence suggests this N15 protein is responsible for hairpoin end formation in the N15 prophage plasmid.

BBB04

3807

2479

PTS system, cellobiose-specific IIC component (celB) {Bacillus stearothermophilus}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB05

4084

4428

PTS system, cellobiose-specific IIA component (celC) {Bacillus subtilis}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB06

4440

4754

PTS system, cellobiose-specific IIB component (celA) {Bacillus subtilis}

   

possible chitobiose transporter (Fraser et al., 1997)

BBB07

4769

5863

       

BBB08

6517

5891

     

N-terminal lipidation consensus

BBB09

6677

7711

     

N-terminal lipidation consensus

BBB10

7836

8762

   

62

 

BBB11

8781

9296

   

50

 

BBB12

9275

10033

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBB13

10104

10649

   

49

 

BBB14

11417

10923

     

N-terminal lipidation consensus

BBB15

11636

11737

     

short gene

BBB16

12014

13603

oligopeptide ABC transporter, periplasmic oligopeptide-binding protein {Escherichia coli}

oppAIV

37

N-terminal lipidation consensus, not surface exposed, and not essential in culture (Bono et al., 1998)

BBB17

15107

13896

IMP dehydrogenase {Haemophilus influenzae}

guaA

 

IMP dehydrogenase (Margolis et al., 1994b; Zhou et al., 1997)

BBB18

16718

15135

GMP synthase {Haemophilus influenzae}

guaB

 

putative GMP synthase Margolis et al., 1994b) erroneous duplication in cp26 between BBB18 and BBB19 corrected in current gene list; affected originally released gene coordinates to right of BB18

BBB19

16903

17532

 

ospC

 

surface localized (Wilske et al., 1993), N-terminal lipidation consensus (Fuchs et al., 1992; Jauris-Heipke et al., 1993; Jauris-Heipke et al., 1995; Marconi et al., 1993c; Margolis et al., 1994a; Margolis et al., 1994b; Masuzawa et al., 1997; Stevenson and Barthold, 1994; Stevenson et al., 1994; Tilly et al., 1997; Wang et al., 1999; Wilske et al., 1996a; Wilske et al., 1996b); transcription start site (Marconi et al., 1993b); temperature regulation (Schwan et al., 1995; Stevenson et al., 1995)

BBB20

17733

17626

     

short gene

BBB21

17750

17842

     

short gene

BBB22

19321

17969

conserved hypothetical protein MJ0326 {Methanococcus jannaschii}

 

94

12 putative membrane spanning regions; homologs in E. coli

BBB23

20822

19434

conserved hypothetical protein MJ0326 {Methanococcus jannaschii}

 

94

12 putative membrane spanning regions; homologs in E. coli

BBB24

21364

20861

     

near-consensus N-terminal lipidation signal

BBB25

21851

21342

     

N-terminal lipidation consensus

BBB26

21898

22590

       

BBB27

23154

22606

     

N-terminal lipidation consensus

BBB28

23255

24496

       

BBB29

24825

26450

PTS system, maltose and glucose-specific IIABC component (malX) {Escherichia coli}

 

16

putative sugar transport

             

cp32-1

           

BBP01

66

1286

   

146

 

BBP02

1306

1995

   

147

 

BBP03

2011

2565

   

148

 

BBP04

2575

3336

   

148

 

BBP05

3369

3938

   

148

 

BBP06

3948

4919

   

149

(Casjens et al., 1997)

BBP07

4936

5394

   

150

 

BBP08

5379

5777

   

107

 

BBP09

5768

6154

   

108

 

BBP10

6154

6717

   

151

 

BBP11

6701

7810

   

152

 

BBP12

7828

8253

   

153

 

BBP13

8272

8724

   

154

 

BBP14

8724

8957

   

155

short gene

BBP15

8968

10239

   

156

 

BBP16

10265

10945

   

157

 

BBP17

10952

11899

   

159

 

BBP18

11920

12462

   

160

 

BBP19

12495

12824

   

139

 

BBP20

12824

13696

   

140

 

BBP21

13709

14311

   

141

 

BBP22

14324

15136

   

142

 

BBP23

15215

15415

 

orfA-1; blyA-1

109

putative hemolysin; short gene; sequenced for homologous plasmids in strain 297 by Porcella et al. (1996)

BBP24

15422

15766

 

orfB; blyB-1

111

putative hemolysin; sequenced for homologous plasmids in strain 297 by Porcella et al. (1996)

BBP25

15759

16091

 

orfC

112

(Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al. (1996)

BBP26

16081

16437

 

orfD

143

(Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al. (1996); near-consensus N-terminal lipidation signal but strain 297 homolog was not ipidated in E. coli.

BBP27

17060

16581

 

rev-1

63

N-terminal lipidation consensus (Gilmore and Mbow, 1998); sequenced in homologous plasmids of strain 297 by Porcella et al.(1996)

BBP28

17232

17675

 

mlpA

113

N-terminal lipidation consensus (Gilmore and Mbow, 1998); sequenced in several homologous plasmids of strain 297 by Porcella et al. (1996); lipidated in E. coli (Porcella et al., 1996); paralog lipidated in B. afzelii Theisen (1996)

BBP29

18728

17718

 

orf4-1

161

(Gilmore and Mbow, 1998)

BBP30

19114

20211

 

orf1-1

57

(Zuckert and Meyer, 1996)

BBP31

20224

20787

 

orf2-1

50

(Zuckert and Meyer, 1996)

BBP32

20766

21503

plasmid partition protein {Bacillus subtilis}

orfC-1

32

putative plasmid partition function (Zuckert and Meyer, 1996)

BBP33

21510

22115

 

orf3-1

49

(Zuckert and Meyer, 1996)

BBP34

22131

22760

 

bdrA

80

contains 4.7 repeats of a 54 bp sequence; all "bdr" genes contain direct, tandem repeats (Casjens et al., 1999; Zuckert and Meyer, 1996)

BBP35

23231

24553

 

orf8/7-1

165

(Casjens et al., 1997; Zuckert and Meyer, 1996)

BBP36

24609

25031

 

orf10-1

144

(Casjens et al., 1997)

BBP37

25816

25043

 

orf6-1

96

(Casjens et al., 1997)

BBP38

26235

26765

 

erpA

162

surface exposed (Lam et al., 1994); N-terminal lipidation consensus (Stevenson et al., 1996); lipidated in E. coli (Akins et al., 1995b; Wallich et al., 1995); erp-like genes have been sequenced from several other strains (Akins et al., 1999; Lam et al., 1994; Marconi et al., 1996b; Stevenson et al., 1997; Suk et al., 1995)

BBP39

26796

27929

 

erpB

163

N-terminal lipidation consensus (Stevenson et al., 1996)

BBP40

28074

28652

   

114

 

BBP41

28835

29398

   

115

 

BBP42

29398

30747

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

(Amouriaux et al., 1993; Casjens et al., 1997); phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-3

           

BBS01

66

1286

   

146

 

BBS02

1306

1995

   

147

 

BBS03

2011

2565

   

148

 

BBS04

2575

3336

   

148

 

BBS05

3369

3938

   

148

 

BBS06

3963

4919

   

149

(Casjens et al., 1997)

BBS07

4936

5394

   

150

 

BBS08

5379

5777

   

107

 

BBS09

5768

6154

   

108

 

BBS10

6154

6717

   

151

 

BBS11

6701

7810

   

152

 

BBS12

7828

8253

   

153

 

BBS13

8272

8724

   

154

 

BBS14

8724

8957

   

155

short gene

BBS15

8968

10239

   

156

 

BBS16

10265

10945

   

157

 

BBS17

10952

11899

   

159

 

BBS18

11920

12462

   

160

 

BBS19

12495

12824

   

139

 

BBS20

12824

13696

   

140

 

BBS21

13709

14311

   

141

 

BBS22

14324

15133

   

142

 

BBS23

15212

15412

 

blyA-3

109

putative hemolysin; short gene

BBS24

15419

15763

 

blyB-3

111

putative hemolysin;

BBS25

15756

16088

   

112

 

BBS26

16078

16434

   

143

near-consensus N-terminal lipidation signal

BBS27

16586

16900

       

BBS28

16915

17046

     

short gene

BBS29

17068

17694

 

bdrF

80

contains 3.6 repeats of a 33 bp sequence

BBS30

17803

18246

 

mlpC

113

N-terminal lipidation consensus

BBS31

19159

18290

 

orf4-3

161

(Zuckert and Meyer, 1996)

BBS32

19198

19392

conserved hypothetical protein {Chlorella vulgaris}(similarity poor)

   

questionable gene; gene not called in paralogous sequence on other cp32s

BBS33

19605

20702

 

orf1-3

57

(Zuckert and Meyer, 1996)

BBS34

20715

21278

   

50

 

BBS35

21257

21994

plasmid partition protein {Bacillus subtilis}

orfC-3

32

putative plasmid partition function; (Stevenson et al., 1998b)

BBS36

22038

22577

 

orf3-3

49

(Stevenson et al., 1998b)

BBS37

22593

23180

 

bdrE

80

contains 4.1 repeats of a 54 bp sequence

BBS38

23649

25013

 

orf8/7-3

165

(Casjens et al., 1997)

BBS39

25069

25491

 

orf10-3

144

(Casjens et al., 1997)

BBS40

26276

25503

 

orf6-3

96

(Casjens et al., 1997)

BBS41

26708

27295

 

erpG; pG

164

N-terminal lipidation consensus; (Stevenson et al., 1996; Wallich et al., 1995)

BBS42

27410

27916

 

bapA

95

(Stevenson et al., 1996; Wallich et al., 1995)

BBS43

28067

28246

     

short gene

BBS44

28236

28871

   

115

 

BBS45

28871

30220

conserved hypothetical protein Orf26 of phage f01205 {Streptococcus thermophilus}

 

145

(Amouriaux et al., 1993; Casjens et al., 1997); phage f01205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-4

           

BBR01

66

1286

   

146

 

BBR02

1306

1998

   

147

pseudogene; authentic frameshift

BBR03

2013

2573

   

148

 

BBR04

2580

3344

   

148

 

BBR05

3340

3948

 

orfI

148

(Casjens et al., 1997)

BBR06

3958

4929

 

orfII

149

(Casjens et al., 1997)

BBR07

4952

5404

 

orfIII

150

(Casjens et al., 1997)

BBR08

5389

5787

 

orfIV

107

(Casjens et al., 1997)

BBR09

5778

6164

 

orfV

108

(Casjens et al., 1997)

BBR10

6164

6727

   

151

 

BBR11

6711

7820

   

152

 

BBR12

7838

8263

   

153

 

BBR13

8282

8734

   

154

 

BBR14

8734

8967

   

155

short gene

BBR15

8978

10270

   

156

 

BBR16

10296

10889

   

157

 

BBR17

10896

11843

   

159

 

BBR18

11864

12415

   

160

 

BBR19

12448

12777

   

139

 

BBR20

12777

13649

   

140

 

BBR21

13662

14264

   

141

 

BBR22

14277

15089

   

142

 

BBR23

15167

15367

 

blyA-4

109

putative hemolysin; short gene

BBR24

15374

15718

 

blyB-4

111

putative hemolysin

BBR25

15711

16043

   

112

 

BBR26

16033

16389

   

143

near-consensus N-terminal lipidation signal

BBR27

16467

16994

 

bdrH

80

sequenced in homologous plasmids of strain 297 by Porcella et al. (1996) and in B. afzelii by Theisen (1996)

BBR28

17103

17522

 

mlpD

113

N-terminal lipidation consensus

BBR29

18664

17576

   

161

 

BBR30

18829

18737

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBR31

18960

20054

   

57

 

BBR32

20067

20630

   

50

 

BBR33

20609

21361

plasmid partition protein {Bacillus subtilis}

orfC-4

32

putative plasmid partition function (Stevenson et al., 1998b)

BBR34

21415

21957

 

orf3-4

49

(Stevenson et al., 1998b)

BBR35

21974

22249

 

bdrG

80

authentic point mutation; has an in-frame stop codon

BBR36

22831

24153

   

165

 

BBR37

24210

24632

 

orf10-4

144

(Casjens et al., 1997)

BBR38

25435

24644

 

orf6-4

96

(Casjens et al., 1997); sequence from strain N40 - assession # AF011453

BBR39

25636

25538

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBR40

25865

25966

 

erpH

162

pseudogene; severely truncated relative to other erps; N-terminal lipidation consensus (Stevenson et al., 1996)

BBR41

26077

26817

   

161/

162

pseudogene; this is a "fusion" gene - a family [161] gene is fused to an [162] erp gene

BBR42

26853

27524

 

erpY

164

N-terminal lipidation consensus

BBR43

27634

28200

   

114

 

BBR44

28384

28947

   

115

 

BBR45

28947

30296

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

homolog of phage Streptococcus thermophilus fO1205 gene orf26 that is likely to be phage structural protein; (Amouriaux et al., 1993; Casjens et al., 1997)

             

cp32-6

           

BBM01

66

1286

   

146

 

BBM02

1306

1995

   

147

 

BBM03

2010

2570

   

148

 

BBM04

2577

3341

   

148

 

BBM05

3337

3945

   

148

 

BBM06

3955

4926

   

149

 

BBM07

4949

5401

   

150

 

BBM08

5386

5784

   

107

 

BBM09

5775

6161

   

108

 

BBM10

6161

6727

   

151

 

BBM11

6711

7820

   

152

 

BBM12

7838

8263

   

153

 

BBM13

8282

8734

   

154

 

BBM14

8734

8967

   

155

short gene

BBM15

8978

10249

   

156

 

BBM16

10275

10955

   

157

 

BBM17

10962

11909

   

159

 

BBM18

11930

12481

   

160

 

BBM19

12514

12843

   

139

 

BBM20

12843

13715

   

140

 

BBM21

13728

14330

   

141

 

BBM22

14343

15152

   

142

 

BBM23

15231

15431

 

blyA-6

109

putative hemolysin; short gene

BBM24

15438

15782

 

blyB-6

111

putative hemolysin

BBM25

15775

16107

   

112

 

BBM26

16097

16453

   

143

near-consensus N-terminal lipidation signal

BBM27

17075

16596

 

rev-6

63

N-terminal lipidation consensus

BBM28

17247

17693

 

mlpF

113

N-terminal lipidation consensus

BBM29

18680

17736

   

161

 

BBM30

19069

20166

   

57

 

BBM31

20179

20742

   

50

 

BBM32

20721

21467

plasmid partition protein {Bacillus subtilis}

orfC-6

32

putative plasmid partition (Stevenson et al., 1998b)

BBM33

21520

22095

 

orf3-6

49

(Stevenson et al., 1998b)

BBM34

22102

22767

 

bdrK

80

 

BBM35

23241

24563

   

165

 

BBM36

24619

25041

   

144

 

BBM37

25820

25053

   

96

only [96] member with signal sequence

BBM38

26245

27012

 

erpK

164

N-terminal lipidation consensus; (Casjens et al., 1997)

BBM39

27745

27080

       

BBM40

27731

27850

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBM41

27923

28486

   

115

 

BBM42

28486

29835

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein; (Amouriaux et al., 1993; Casjens et al., 1997)

             

cp32-7

           

BBO01

65

1285

   

146

 

BBO02

1305

1994

   

147

 

BBO03

2010

2564

   

148

 

BBO04

2574

3335

   

148

 

BBO05

3368

3937

   

148

 

BBO06

3962

4918

   

149

 

BBO07

4935

5393

   

150

 

BBO08

5378

5776

   

107

 

BBO09

5767

6153

   

108

 

BBO10

6153

6719

   

151

 

BBO11

6703

7812

   

152

 

BBO12

7830

8255

   

153

 

BBO13

8274

8726

   

154

 

BBO14

8726

8959

   

155

short gene

BBO15

8970

10301

   

156

 

BBO16

10317

10955

   

157

 

BBO17

10962

11900

   

159

 

BBO18

11904

12470

   

160

 

BBO19

12503

12832

   

139

 

BBO20

12832

13707

   

140

 

BBO21

13716

14318

   

141

 

BBO22

14331

15143

   

142

 

BBO23

15222

15422

 

blyA-7

109

putative hemolysin; short gene

BBO24

15429

15782

 

blyB-7

111

putative hemolysin

BBO25

15766

16098

   

112

 

BBO26

16088

16444

   

143

near-consensus N-terminal lipidation signal

BBO27

16522

17136

 

bdrN

80

 

BBO28

17245

17664

 

mlpG

113

N-terminal lipidation consensus

BBO29

18770

17715

   

161

 

BBO30

19117

20211

   

57

 

BBO31

20224

20787

   

50

 

BBO32

20766

21512

plasmid partition protein {Bacillus subtilis}

orfC-7

32

putative plasmid partition function (Stevenson et al., 1998b)

BBO33

21522

22073

 

orf3-7

49

(Stevenson et al., 1998b)

BBO34

22088

22657

 

bdrM

80

 

BBO35

22755

22630

     

questionable gene; gene not called in paralogous sequence in other cp32s

BBO36

23093

24457

   

165

 

BBO37

24513

24935

   

144

 

BBO38

25720

24947

   

96

 

BBO39

26152

26838

 

erpL

164

N-terminal lipidation consensus (Casjens et al., 1997)

BBO40

26893

27981

 

erpM

163

N-terminal lipidation consensus (Casjens et al., 1997)

BBO41

28117

28007

   

116

questionable gene; gene not called in paralogous sequence in other cp32s

BBO42

28134

28700

   

114

 

BBO43

28885

29448

   

115

 

BBO44

29448

30797

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

(Amouriaux et al., 1993; Casjens et al., 1997); phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-8

           

BBL01

66

1286

   

146

 

BBL02

1306

1995

   

147

 

BBL03

2011

2565

   

148

 

BBL04

2575

3336

   

148

 

BBL05

3369

3938

   

148

 

BBL06

3948

4919

   

149

 

BBL07

4936

5394

   

150

 

BBL08

5379

5777

   

107

 

BBL09

5768

6154

   

108

 

BBL10

6154

6717

   

151

 

BBL11

6701

7810

   

152

 

BBL12

7828

8253

   

153

 

BBL13

8272

8724

   

154

 

BBL14

8724

8957

   

155

short gene

BBL15

8968

10239

   

156

 

BBL16

10265

10945

   

157

 

BBL17

10952

11899

   

159

 

BBL18

11920

12462

   

160

 

BBL19

12495

12824

   

139

 

BBL20

12824

13696

   

140

 

BBL21

13709

14311

   

141

 

BBL22

14324

15136

   

142

 

BBL23

15215

15415

 

blyA-8

109

putative hemolysin; short gene

BBL24

15422

15766

 

blyB-8

111

putative hemolysin

BBL25

15759

16091

   

112

 

BBL26

16081

16437

   

143

near-consensus N-terminal lipidation signal

BBL27

16515

17096

 

bdrO

80

 

BBL28

17205

17648

 

mlpH

113

N-terminal lipidation consensus

BBL29

18761

17691

   

161

 

BBL30

19091

20185

   

57

 

BBL31

20198

20761

   

50

 

BBL32

20740

21477

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBL33

21467

21556

     

short gene

BBL34

21540

22097

   

49

 

BBL35

22113

22688

 

bdrP

80

 

BBL36

23306

24688

   

165

 

BBL37

24744

25166

   

144

 

BBL38

25951

25178

   

96

 

BBL39

26370

26900

 

erpN

162

N-terminal lipidation consensus

BBL40

26931

28064

 

erpO

163

N-terminal lipidation consensus

BBL41

28209

28787

   

114

 

BBL42

28970

29533

   

115

 

BBL43

29533

30882

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

             

cp32-9

           

BBN01

66

1289

   

146

 

BBN02

1309

1998

   

147

 

BBN03

2013

2576

   

148

 

BBN04

2583

3347

   

148

 

BBN05

3343

3950

   

148

pseudogene; authentic frameshift

BBN06

3960

4935

   

149

pseudogene; authentic frameshift

BBN07

4958

5410

   

150

 

BBN08

5395

5793

   

107

 

BBN09

5784

6170

   

108

 

BBN10

6170

6733

   

151

 

BBN11

6717

7802

   

152

 

BBN12

7845

8270

   

153

 

BBN13

8289

8742

   

154

pseudogene; authentic frameshift

BBN14

8742

8975

   

155

short gene

BBN15

8986

10257

   

156

 

BBN16

10283

11034

   

157

pseudogene; authentic frameshift

BBN17

11041

11988

   

159

 

BBN18

12009

12560

   

160

pseudogene; authentic frameshift

BBN19

12593

12922

   

139

 

BBN20

12922

13794

   

140

 

BBN21

13807

14410

   

141

pseudogene; authentic frameshift

BBN22

14423

15230

 

orfX-9

142

pseudogene; authentic frameshift (Guina and Oliver, 1997)

BBN23

15312

15512

 

blyA-9

109

pore-forming hemolysin; short gene (Guina and Oliver, 1997)

BBN24

15519

15863

 

blyB-9

111

hemolysin accessory protein (Guina and Oliver, 1997)

BBN25

15856

16188

 

orfC-9

112

(Guina and Oliver, 1997)

BBN26

16178

16534

 

orfD-9

143

(Guina and Oliver, 1997); near-consensus N-terminal lipidation signal

BBN27

16612

17193

 

bdrR

80

(Guina and Oliver, 1997)

BBN28

17302

17727

 

mlpI

113

N-terminal lipidation consensus

BBN29

18784

17776

   

161

pseudogene; authentic frameshift

BBN30

19164

20261

   

57

 

BBN31

20275

20838

   

50

 

BBN32

20817

21569

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBN33

21614

22171

   

49

 

BBN34

22184

22720

 

bdrQ

80

 

BBN35

23194

24516

   

165

 

BBN36

24572

24994

   

144

 

BBN37

25779

25006

   

96

pseudogene; authentic frameshift

BBN38

26198

26767

 

erpP

162

N-terminal lipidation consensus

BBN39

26798

27826

 

erpQ

163

N-terminal lipidation consensus

BBN40

27991

27884

   

116

questionable gene; gene not called in paralogous sequence on other cp32s

BBN41

27984

28541

   

114

 

BBN42

28736

29299

   

115

 

BBN43

29299

30648

conserved hypothetical protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

             

lp5

           

BBT01

195

635

   

57

pseudogene

BBT02

744

1094

   

57

pseudogene

BBT03

1208

1573

   

84

(pseudogene; also [57] related?)

BBT04

2148

3251

   

57

 

BBT05

3200

3350

   

57

pseudogene

BBT06

3340

4329

conserved hypothetical protein ywlC {Bacillus subtilis}

 

137

family of genes that includes yeast SUA gene

BBT07

4388

4816

   

52

pseudogene

             

lp17

         

lp17 from B31 was independently sequenced by Barbour et al (1996) Hinnebusch et al. (1990) determined the sequences of the two telomeres - those sequences contain 29 bp and 78 bp that are not present in the TIGR sequence

BBD001

214

405

 

orfA

166

short gene; N-terminal lipidation consensus

BBD01

332

802

 

orfB

76

 

BBD02

873

1019

   

57/77

pseudogene

BBD03

1117

1309

   

57

pseudogene

BBD04

1412

1765

 

orfC

57

pseudogene; different translation start called by Barbour et al. (1996)

BBD05

2389

2541

 

orfD

84

(pseudogene [57]?)

BBD05.1

3018

3604

   

57

pseudogene

BBD06

3143

3604

 

orfE

57

in-frame and inside pseudogene BBD05.1; questionable gene

BBD07

4373

4260

   

82

short gene

BBD08

4707

4802

transposase-like protein {Anabena}

 

82

pseudogene

BBD09

5058

5738

 

orfF

   

BBD10

6454

5879

 

orfG

 

N-terminal lipidation consensus

BBD11

6681

7631

 

orfH

 

sequence difference caused Barbour et al. (1996) orfH to end at ~7556

BBD12

7752

7624

     

short gene

BBD13

7787

8110

 

orfI

   

BBD14

8269

9378

 

orfJ

62

 

BBD15

10015

9596

 

orfK

85

different start called by Barbour et al. (1996); N-terminal lipidation consensus

BBD15.01

10000

9940

   

175

pseudogene

BBD15.1

10152

10326

transposase-like protein {Anabena}

 

82

pseudogene

BBD16

10520

10428

     

short gene

BBD17

10591

10683

     

short gene

BBD18

11648

10989

 

orfL

   

BBD19

12057

12167

     

short gene

BBD20

12250

12975

transposase-like protein {Anabena}

 

82

pseudogene

21 bp

repeat

13154

13329

     

8.3 tandem, direct repeats of TAATTAATATGTGATATAAAA; not in a gene

BBD21

13341

14078

plasmid partition protein {Bacillus subtilis}

orfM

32

putative plasmid partition function

BBD22

14072

14338

 

orfN

 

short gene

BBD23

14781

15725

transposase-like protein {Anabena}

 

82

pseudogene

BBD24

16121

15894

 

orfO

 

N-terminal lipidation consensus; short gene

BBD25

16212

16367

     

short gene

             

lp21

           

BBU01

184

615

   

57

pseudogene

BBU02

746

1111

   

84

(pseudogene [57]?)

BBU03

1357

1241

   

172

short

BBU04

1486

2625

   

57

 

BBU05

2868

3653

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

63 bp repeat (not a gene)

3618

14636

     

tandem array of about 176 repeats of 63 bp sequence; not in ORF, has stop codons in all 6 frames (Casjens et al., 2000)

BBU06

14633

15232

   

49

 

BBU07

15349

15810

   

57

pseudogene

BBU08

15791

16081

   

137

pseudogene; short

BBU09

16548

16231

   

55

pseudogene?

BBU10

16603

16797

   

57

pseudogene; short

BBU11

16886

17875

protein Y02L_MYCTU {Mycobacterium tuberculosis}

 

137

family of genes that includes B. subtilis ywlC and yeast SUA genes

BBU12

17918

18362

   

52

pseudogene; authentic frameshift

             

lp25

           

BBE01

255

157

     

short gene

BBE02

4156

326

   

1

 

BBE03

4613

4422

   

98

short gene

BBE04

4719

4856

   

54

pseudogene; near-consensus N-terminal lipidation signal

BBE04.1

5377

5734

   

44

pseudogene

BBE05

5377

5526

   

44

inside and in-frame with BBE04

BBE06

5757

5903

     

N-terminal lipidation consensus; short gene

BBE07

6401

6185

psf-I protein {Escherichia coli}

 

26

pseudogene

BBE08

6701

6558

     

N-terminal lipidation consensus; short gene

BBE09

6898

7758

   

44

N-terminal lipidation consensus

BBE10

7972

7877

     

short gene

BBE11

8446

8315

     

short gene

BBE12

8646

8524

     

short gene

BBE13

8863

8955

     

short gene

BBE14

9163

9375

     

short gene

BBE15

9490

9356

     

short gene

BBE16

10187

9570

   

99

near-consensus N-terminal lipidation signal (but short signal sequence?)

BBE17

10709

10203

       

BBE18

12079

11501

   

49

 

BBE19

12854

12099

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBE20

13393

12833

   

50

 

BBE21

14530

13406

   

57

 

BBE21.1

14767

14893

transposase-like protein {Anabena}

 

82

pseudogene

BBE22

15578

15045

pyrazinamidase/nicotinamidase (pncA) {Mycobacterium tuberculosis}

   

putative pyrazinamidase/nicotinamidase (pncA)

BBE23

15973

16155

     

short gene

BBE23.1

16459

16540

   

57

pseudogene

BBE23.2

16540

16721

plasmid partition protein {Bacillus subtilis}

 

32

pseudogene

BBE24

16740

17300

   

49

pseudogene

BBE24.1

17902

18303

   

49

pseudogene

BBE25

18606

18505

     

short gene

BBE26

18586

18711

     

short gene; near-consensus N-terminal lipidation signal

BBE27

19055

19195

     

short gene

BBE28

19489

19340

     

near N-terminal lipidation consensus; short gene

BBE29

19697

20883

adenine specific DNA methyltransferase {Helicobacter pylori}

 

167

pseudogene; adenine specific DNA methyltransferase

BBE29.1

21110

21476

   

102

pseudogene

BBE30

21558

21701

   

49

pseudogene

BBE31

22677

21949

   

60

N-terminal lipidation consensus

BBE32

23723

23418

   

57

pseudogene

BBE33

24100

23850

   

169

pseudogene; authentic frameshift

             

lp28-1

         

Zhang et al. (1997) determined the sequence of the right telomere of lp28-1.

BBF001

1

163

   

88

pseudogene; near-consensus N-terminal lipidation signal

BBF001.1

200

380

   

80

pseudogene

BBF01

467

1462

 

erpT (N40)

163

small patch of similarity to erp genes of family 163; near-consensus N-terminal lipidation signal and affinity to lipoprotein families [60] and [163]; (Fikrig et al., 1999)

BBF02

1720

2073

orf105 {Plasmodium falciparum} fairly poor match

 

88

pseudogene

BBF03

2619

2101

 

bdrS

80

pseudogene - N-terminal truncation; contains about 3 repeats each of 2 different 33 bp sequences

BBF04

2658

2804

   

57/77

pseudogene in [57]

BBF05

2777

3073

   

57

pseudogene in [57]

BBF06

3201

3377

   

57

pseudogene in [57] (actually a "fusion" gene)

BBF07

3529

3413

   

100

short gene

BBF08

3849

3685

   

72

pseudogene; paralog of fragment of BBK43

BBF09

4027

4179

   

71

pseudogene; paralog of C-term of BBK42

BBF10

4488

4982

   

70

 

BBF11

5435

5539

     

questionable gene; backwards inside BBF11.1

BBF11.1

5620

5412

   

32

pseudogene

BBF12

6540

5956

   

49

pseudogene (?) in [49] - patchy similarity to other [49] genes

BBF13

7381

6635

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBF14

7911

7357

   

50

 

BBF14.1

8197

8367

   

65

pseudogene

BBF16

8389

8571

   

64

pseudogene; paralog of N-term of K34

BBF17

8772

9026

   

68

pseudogene; paralog of N-term of K35

BBF18

9561

10049

transposase-like protein {Anabena}

 

82

pseudogene

BBF19

10559

10036

transposase-like protein {Anabena}

 

82

questionable gene; BBF18 and F19 are almost certainly inverted parts of one complex pseudogene

BBF19.1

10916

11200

   

175

pseudogene

BBF20

10991

10701

   

85

N-terminal lipidation consensus; pseudogene

BBF21

11550

11449

   

66

short gene

BBF22

12018

11794

   

44

pseudogene

BBF23

12992

12444

   

49

 

BBF24

13793

13032

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBF25

14329

13772

   

50

 

BBF26

15451

14354

   

57

 

BBF26.1

15663

16209

   

101

badly deleted pseudogene

BBF27

15925

15758

   

101

questionable gene; in-frame and inside BBF26.1

BBF28

16129

16001

     

questionable gene; out-of-frame (?) and inside BBF26.1

BBF29

16825

16457

   

49

pseudogene

BBF30

17415

17170

     

short gene

BBF31

17805

17394

   

50

pseudogene

BBF31.1

17920

18050

   

57

pseudogene

BBF32

26698

18430

   

170

15 tandem pseudogenes; N-terminal lipidation consensus; unexpressed reservoir of diversity for vlsE expression site; no frame disruptions (Zhang et al., 1997; Zhang and Norris, 1998a; Zhang and Norris, 1998b)

vlsE

27097

28170

 

vlsE

170

surface exposed; N-terminal lipidation consensus; this is the vlsE expression site; it is beyond the end of the TIGR lp28-1 sequence; there is a 100 bp gap (apparently unclonable sequence) between the TIGR and vlsE sequences (Zhang et al., 1997; Zhang and Norris, 1998a; Zhang and Norris, 1998b)

             

lp28-2

           

BBG01

116

1006

   

12

N-terminal lipidation consensus

BBG02

1047

1925

HP1353 gene {Helicobacter pylori}

 

102

N-terminal lipidation consensus; rather good similarity to HP1353 across the C-terminal 3/4 of the gene - HP1353 has an "FLSTC" sequence that is about 30 aa’s from the N-terminus, not a good lipidation consensus and not a particularly good signal sequence; HP1352 & HP1354 are putative adenine methylases!

BBG03

2104

2492

   

48

pseudogene; authentic frameshift

BBG04

2857

2753

     

short gene

BBG05

4056

2894

transposase-like protein {Anabena}

 

82

pseudogene; one authentic frameshift; BBG05 is the most intact member of this family which is homologous throughout its length to a putative transposase gene family originally found in Anabena, Saccharopolyspora, Salmonella and thermophilic bacterium PS3 (Bancroft and Wolk, 1989; Donadio and Staver, 1993; Gulig et al., 1992; Krause et al., 1991; Murai et al., 1995); BBG05 was first characterized by Barbour and Carter (1997)

BBG06

4208

5365

   

57

 

BBG07

5378

5947

   

50

 

BBG08

5911

6675

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBG09

6737

7285

   

49

 

BBG10

10779

7486

   

101

weak similarity to phage TM4 tail tape measure protein in a psi-BLAST search

BBG11

11015

10779

     

short gene

BBG12

11491

11066

       

BBG13

12355

11504

       

BBG14

12752

12312

       

BBG15

12681

13166

       

BBG16

13495

13160

       

BBG17

14341

13511

       

BBG18

14885

14379

       

BBG19

15431

14889

   

117

 

BBG20

16482

15460

   

103

 

BBG21

17684

16497

       

BBG22

18827

18033

   

86

 

BBG23

19619

18840

   

86

 

BBG24

22310

19623

   

104

 

BBG25

22659

22276

   

143

N-terminal lipidation consensus

BBG26

23033

22662

       

BBG27

23725

23036

       

BBG28

24108

23725

       

BBG29

24489

25952

   

62

 

BBG30

25962

26387

       

BBG31

26567

27082

   

50

pseudogene; N-terminus missing relative to paralogs

BBG32

27113

27937

replicative DNA helicase, putative {Bacillus subtilis}

 

46

putative DNA helicase

BBG33

28031

28828

 

bdrT

80

contains 3 repeats of 87 bp sequence and 4 repeats of a 33 bp sequence

BBG34

29618

28857

   

88

 
             

lp28-3

           

BBH01

273

464

   

166

N-terminal lipidation consensus; short gene

BBH02

391

855

   

76

 

BBH03

926

1072

   

57/77

pseudogene

BBH04

1045

1365

   

57

pseudogene

BBH05

1498

1677

   

57

pseudogene

BBH06

2970

2263

     

near-consensus N-terminal lipidation signal

BBH07

3514

3086

   

50

pseudogene

BBH08

3730

3593

     

short gene

BBH09

7728

3895

   

1

 

BBH09.1

8091

7810

   

95

pseudogene

BBH10

8203

8003

     

questionable gene (overlaps BBH09.1)

BBH10.1

8240

8310

transposase-like protein {Anabena}

 

82

pseudogene

BBH11

8796

8704

     

questionable gene; backwards inside BBH11.1

BBH11.1

8320

8850

   

1

pseudogene (in part)

BBH12

9455

9589

     

short gene

BBH13

10516

9851

 

bdrU

80

contains 5.6 repeats of a 54 bp sequence

BBH14

10934

10821

     

short gene

BBH15

11005

10913

     

short gene

BBH16

11068

11187

     

short gene

BBH17

11837

12025

     

short gene

BBH18

12105

13217

   

69

N-terminal lipidation consensus

BBH18.1

13571

13693

   

65

pseudogene; N-terminus inverted

BBH19

13590

13709

     

questionable gene; overlaps BB18.1 partly in-frame

BBH20

14596

13840

   

171

pseudogene

BBH20.1

14750

15300

   

104

pseudogene

BBH21

--

--

     

No longer considered to be a realistic potential gene.

BBH22

14870

14766

     

questionable gene; inside BBH20.1 backwards

BBH23

15051

15158

   

104

questionable gene; inside of and in-frame with pseudogene BBH20.1

BBH24

15136

15342

   

104

questionable gene; mostly inside, in-fame with pseudogene BBH20.1

BBH24.1

15354

15750

   

86

pseudogene

BBH25

15810

15568

     

questionable gene; inside BBH24.1 backwards

BBH26

16519

17412

   

62

 

BBH27

17408

17971

   

50

 

BBH28

17947

18699

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBH29

18798

19424

   

49

 

BBH30

20871

20415

   

96

pseudogene

BBH31

20997

21104

     

short gene

BBH32

21470

22216

   

60

N-terminal lipidation consensus

BBH33

22678

22950

   

61

pseudogene

BBH34

23383

23192

   

62

pseudogene

BBH35

23447

23560

     

short gene

BBH36

24180

24031

   

44

questionable gene; in-frame, inside of BBH36.1

BBH36.1

24223

24041

   

44

pseudogene

BBH36.2

24751

25112

   

102

pseudogene

BBH37

26371

25436

   

12

N-terminal lipidation consensus

BBH38

26614

26498

     

short gene

BBH39

26754

26855

     

short gene

BBH40

27445

26981

transposase-like protein {Anabena}

 

82

pseudogene

BBH41

28197

27628

   

48

 
             

lp28-4

           

BBI01

174

605

   

57

pseudogene

BBI02

736

1101

   

84

(pseudogene [57]?)

BBI02.1

1416

1346

   

172

pseudogene

BBI02.2

1617

1862

   

57

pseudogene

BBI03

1972

1850

     

short gene

BBI04

2219

2127

     

short gene

BBI05

2310

2191

     

short gene

BBI06

2536

3348

pfs-I protein {Escherichia coli}

 

26

putative 5'-methylthioadenosine/S-adenosylhomocysteine nucleosidase

BBI07

3441

3346

     

short gene

BBI08

3576

3752

     

questionable gene; overlaps BBI08.1

BBI08.1

3674

4314

   

59

pseudogene

BBI09

3745

3879

   

59

questionable gene; in-frame inside of pseudogene BBI08.1

BBI10

3911

4312

   

59

questionable gene; in-frame inside of pseudogene BBI08.1

BBI11

4721

4626

     

short gene

BBI12

5128

5343

     

short gene

BBI13

5609

5704

IS9016 (V-4) orf1 {Haemophilus influenzae} (very small similarity)

   

short gene

BBI14

6159

6269

   

60

N-terminal lipidation consensus; pseudogene

BBI15

6603

6830

   

60

pseudogene

BBI16

7183

8535

   

60

N-terminal lipidation consensus; contains 22 internal tandem repeats of 27 bp sequence that is not in other members of family [60]

BBI17

8967

8824

     

short gene

BBI18

10647

10498

     

short gene

BBI19

10749

11924

   

57

 

BBI20

11924

12475

   

50

 

BBI21

12454

13203

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBI22

13265

13834

   

49

 

BBI23

13989

14090

     

short gene

BBI24

14334

14438

     

short gene

BBI25

15211

15339

     

short gene

BBI26

15352

16527

multidrug-efflux transporter tetA(B) {Helicobacter pylori}

 

105

putative multidrug-efflux transporter

BBI27

17272

17096

   

60

pseudogene

BBI28

17874

17305

   

60

N-terminal lipidation consensus

BBI29

19183

18521

   

60

N-terminal lipidation consensus

BBI30

19403

19507

   

168

 

BBI31

20127

19618

   

48

N-terminal lipidation consensus

BBI31.1

20240

20340

   

98

pseudogene

BBI32

20479

20273

     

N-terminal lipidation consensus; questionable gene; backwards inside BBI31.1

BBI33

20482

20589

transposase-like protein {Anabena}

 

82

pseudogene

BBI34

21562

20774

   

60

N-terminal lipidation consensus

BBI35

21992

22090

   

93

questionable gene; paralog not called in paralogous sequence

BBI36

22931

22098

   

54

N-terminal lipidation consensus

BBI37

23056

23154

   

93

questionable gene; paralog not called in paralogous sequence

BBI38

23995

23162

   

54

N-terminal lipidation consensus

BBI39

25089

24226

   

54

N-terminal lipidation consensus

BBI40

25320

25802

   

49

pseudogene

BBI41

26036

25797

transposase-like protein {Anabena}

 

82

pseudogene

BBI42

26360

26911

   

52

pseudogene; N-terminal lipidation consensus

BBI43

27069

26884

   

55

pseudogene?; short gene

             

lp36

           

BBK001

86

14

transposase-like protein {Anabena}

 

82

pseudogene

BBK01

188

1078

   

12

N-terminal lipidation consensus

BBK02

2478

1213

   

1

questionable gene; in-frame inside of BBK02.1

BBK02.1

3770

1213

   

1

pseudogene

BBK03

3222

2821

   

1

questionable gene; in-frame inside of BBK02.1

BBK04

3595

3419

   

1

questionable gene; in-frame inside of BBK02.1; near N-terminal lipidation consensus

BBK05

5096

4905

     

short gene

BBK06

5126

5233

     

short gene

BBK07

6040

5291

   

59

N-terminal lipidation consensus

BBK08

6281

6180

     

short gene

BBK09

6366

6647

     

short gene

BBK10

6983

6807

   

1

pseudogene

BBK11

6956

7060

     

short gene

BBK12

7335

8030

   

59

N-terminal lipidation consensus

BBK13

8880

8167

protein slr1258 Synechocystis PCC6803}

 

40

 

BBK14

8921

9013

     

short gene

BBK15

9373

9996

   

60

BBK16

10223

10101

plasmid partition protein {Bacillus subtilis}

 

32

pseudogene

BBK17

10301

11944

adenine deaminase {Bacillus subtilis}

 

61

putative adenine deaminase

BBK18

12143

12054

hypothetical protein {Cyanidium caldarium} (very small similarity)

   

short gene

BBK19

12602

13234

     

N-terminal lipidation consensus

BBK20

13212

13301

     

short gene

BBK21

14326

13580

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBK22

14841

14305

   

50

 

BBK23

15760

14837

   

62

 

BBK24

16275

16883

   

49

 

BBK24.1

17380

17580

     

questionable gene; backwards inside BBK25

BBK25

17565

16969

transposase-like protein {Anabena}

 

82

pseudogene

BBK25.1

17880

20033

   

1

pseudogene

BBK26

18346

18462

   

1

questionable gene; in-frame inside of BBK25.1

BBK27

19094

18807

     

questionable gene; backwards inside BBK25.1

BBK28

19232

19348

   

1

questionable gene; in-frame inside of BBK25.1

BBK29

19807

19718

   

1

questionable gene; in-frame inside of BBK25.1

BBK30

19935

20033

   

1

questionable gene; in-frame inside of BBK25.1

BBK31

20026

20166

     

short gene

BBK32

20389

21450

 

P47

(P35 in N40)

 

fibronectin-binding protein; surface localized (Probert and Johnson, 1998); upregulated in stationary phase in N40 (Fikrig et al., 1997; Indest et al., 1997); near-consensus N-terminal lipidation signal

BBK33

21720

21890

   

65

pseudogene

BBK34

21912

22130

   

64

short gene

BBK35

22294

22464

   

68

short gene

BBK36

22545

22646

   

66

short gene

BBK37

23953

22913

   

75 & 175

pseudogene

BBK38

24146

23922

     

short gene; near-consensus N-terminal lipidation signal

BBK39

24293

24667

   

59

pseudogene

BBK40

25103

25654

 

bdrX

58/80

has family 80-like repeats

BBK41

26379

25816

   

70

 

BBK42

26803

26588

   

71

short gene

BBK42.1

26916

27078

   

72

short gene

BBK43

26937

27041

   

72

questionable gene; largely overlaps BBK42.1 in-frame

BBK44

27234

27350

   

100

short gene

BBK45

28315

27386

   

75

near-consensus N-terminal lipidation signal

BBK46

29212

28394

   

75

pseudogene; authentic frameshifts

BBK47

30463

29480

   

69

N-terminal lipidation consensus

BBK48

31585

30722

   

75

N-terminal lipidation consensus

BBK49

32822

31830

   

69

N-terminal lipidation consensus

BBK50

34079

33084

 

P37 (N40)

75

N-terminal lipidation consensus; (Fikrig et al., 1997)

BBK51

34232

34327

     

short gene; near-consensus N-terminal lipidation signal

BBK52

35443

34598

 

P23 (297)

44

N-terminal lipidation consensus; previously only sequenced in strain 297 (Akins et al., 1994)

BBK52.1

35722

35811

   

174

authentic frameshift

BBK53

35868

36419

   

52

N-terminal lipidation consensus

BBK54

36577

36392

   

55

pseudogene?

             

lp38

           

BBJ001

482

1208

   

60

N-terminal lipidation consensus; pseudogene

BBJ01

482

664

   

60

questionable gene; in-frame and inside of BBJ001 pseudogene; N-terminal lipidation consensus

BBJ02

927

1208

   

60

questionable gene; in-frame and inside of BBJ001 pseudogene

BBJ02.1

1475

2367

   

48

pseudogene

BBJ03

1593

1742

   

48

questionable gene; in-frame and inside of BJ02.1 pseudogene

BBJ04

2381

2271

     

N-terminal lipidation consensus; questionable gene; backwards inside BBJ03.2

BBJ05

3828

2768

transposase-like protein {Anabena}

 

82

pseudogene

BBJ06

3486

3629

     

questionable gene; backwards inside BBJ05; near-consensus N-terminal lipidation signal

BBJ07

4307

4167

   

98

questionable gene; in-frame and inside of BBJ07.1

BBJ07.1

4409

4167

   

98

pseudogene

BBJ08

4576

5493

   

12

near-consensus N-terminal lipidation signal

17 bp repeat

5938

6063

     

7.6 repeats of the 17 bp sequence AATTGATATTAAAATAT; not in a gene

BBJ09

6089

6859

 

ospD

 

outer surface protein; N-terminal lipidation consensus; rather extensive short direct repeat upstream of gene (Marconi et al., 1994; Norris et al., 1992)

BBJ10

7270

7473

 

bdrY

58

pseudogene

BBJ11

7965

7783

     

short gene

(BBJ11.1)

8070

8260

   

171

possible pseudogene, but too weak a match to be in the TIGR master gene list

BBJ12

8725

8636

   

86

questionable gene; inside BBJ12.1

BBJ12.1

8782

8593

   

86

pseudogene

BBJ13

9155

8880

   

69

pseudogene

BBJ14

9125

9283

protein urf (51aa) {Thermoproteus tenax virus} (poor match)

   

questionable gene; overlaps BBJ13 backwards

BBJ15

10168

10043

     

questionable gene; backwards inside BBJ15.1)

BBJ15.1

9450

10150

multidrug-efflux transporter tetA(B) {Helicobacter pylori}

 

105

pseudogene

7 bp repeat

10287

10373

     

12.3 repeats of the 7 bp sequence TAATAGT; not in a gene

BBJ16

11105

10521

   

49

 

BBJ17

11889

11155

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBJ18

12452

11865

   

50

 

BBJ19

13440

12448

   

62

 

BBJ20

13936

13775

   

167

pseudogene

BBJ21

15514

15657

     

questionable gene; backwards inside BBJ21.1

BBJ21.1

15976

15484

   

138

pseudogene

BBJ22

16003

15905

     

questionable gene; overlaps BBJ21.1

BBJ23

16505

17326

   

106

near-consensus N-terminal lipidation signal

BBJ24

17388

18167

   

106

 

BBJ25

18202

19251

       

BBJ26

19313

20005

ABC transporter, ATP-binding protein {Methanococcus jannaschii}

 

4

putative ABC transporter, ATP- binding subunit

BBJ27

19995

21227

       

BBJ28

21220

21945

       

BBJ29

21971

23005

   

90

 

BBJ30

23018

23119

   

91

short gene

BBJ31

23366

24085

   

59

 

BBJ32

24517

24389

   

173

short gene

BBJ33

24681

24791

     

short gene

BBJ34

26407

25340

   

92

N-terminal lipidation consensus

BBJ35

26858

26959

     

short gene

BBJ36

28053

26998

   

92

N-terminal lipidation consensus

BBJ37

28442

28281

     

short gene

BBJ38

28703

28608

     

short gene

BBJ39

29401

29267

     

short gene

BBJ39.1

29800

29600

   

54

pseudogene

BBJ40

29827

29919

     

questionable gene; ; no gene called in paralogous sequence

BBJ41

30771

29908

   

54

N-terminal lipidation consensus

BBJ42

31133

30945

   

(54?)

pseudogene if weak similarity to family 54 is real; not in TIGR master paralog list

BBJ43

31220

32137

   

90

 

BBJ44

32150

32251

   

91

short gene

BBJ45

32498

33217

   

59

 

BBJ45.1

33652

33522

   

173

pseudogene in [J32]

BBJ46

34668

34375

     

short gene

BBJ47

35591

34908

   

99

near-consensus N-terminal lipidation signal

BBJ48

36272

35637

       

BBJ49

36455

36315

   

92

pseudogene

BBJ50

36502

37203

 

BbK2.5-6

 

BBJ50 has "authentic frameshifts" relative to outer membrane protein gene BbK2.5-6 which was sequenced from strain 297 (accession #L31615)

BBJ51

37917

37418

   

171

pseudogene

             

lp54

         

lp54-like plasmids have been found in almost all Bb (sensu lato) isolates analyzed (e.g., Casjens et al., 1995; Marconi et al., 1996a; Mathiesen et al., 1997; Samuels et al., 1993)

BBA01

588

1070

 

p11/S3 (N40)

48

S3 does not correspond perfectly to BBA01 (Feng et al., 1996)

BBA02

1238

1122

     

short gene; Feng et al. (1996); recognized a different small ORF called S4 (or p5) in this region in strain N40

BBA03

1397

1903

 

BbK2.14 (297)

 

near-consensus N-terminal lipidation signal; not labeled with palmitate (Akins et al., 1995a)

BBA04

2829

1984

 

S2 (N40)

44

N-terminal lipidation consensus

BBA05

4192

2942

 

S1 (N40)

 

N-terminal lipidation consensus (Feng et al., 1995)

BBA06

4342

4226

     

short gene

BBA07

5091

4606

chpAI protein {Escherichia coli} (chpAI similar only to central region of BBA07; best guess is that this is false hit)

   

N-terminal lipidation consensus; patchy homolog chpAI does not have a lipidation consensus.

BBA08

5250

5582

   

139

has cp32 paralog

BBA09

5582

6451

   

140

has cp32 paralog

BBA10

6457

7080

   

141

has cp32 paralog

BBA11

7145

8176

   

142

has cp32 paralog

BBA12

8202

8378

     

short gene

BBA13

8378

8800

       

BBA14

8793

9155

   

143

N-terminal lipidation consensus; has cp32 paralogs but they do not have lipidation consensus

BBA15

9393

10211

 

ospA

53

outer surface protein (Barbour et al., 1983); N-terminal lipidation consensus and lipidated in Bb; (Bergstrom et al., 1989; Brandt et al., 1990); transcription start site mapped (Jonsson et al., 1992); sequenced in numerous other strains (Bunikis et al., 1996; Caporale and Kocher, 1994; Jonsson et al., 1992; Marconi et al., 1993a; Rosa et al., 1992; Wallich et al., 1992; Wallich et al., 1989; Wang et al., 1997a; Wang et al., 1997b; Wang et al., 1997c; Will et al., 1995; Wilske et al., 1996a; Wilske et al., 1996b; Wilske et al., 1992; Zumstein et al., 1992); atomic resolution structure (Li et al., 1997); in vitro mutagenesis (McGrath et al., 1995)

BBA16

10224

11111

 

ospB

53

outer surface protein (Barbour et al., 1984); N-terminal lipidation consensus (Bergstrom et al., 1989)

BBA17

11390

11301

     

short gene

BBA18

11687

12880

   

57

has cp32 paralog

BBA19

12931

13512

   

50

has cp32 paralog

BBA20

13491

14240

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function; has cp32 paralog

BBA21

14274

14816

   

49

has cp32 paralog

BBA22

15084

15239

     

short gene

BBA23

15294

15734

   

144

has cp32 paralog

BBA24

16512

15940

 

dbpB (297)

74

N-terminal lipidation consensus; binds decorin (Guo et al., 1998)

BBA25

17195

16635

 

dbpA (297)

74

N-terminal lipidation consensus, surface exposed on outer membrane, binds decorin (Guo et al., 1998; Hagman et al., 1998; Hanson et al., 1998)

BBA26

17386

17514

     

short gene

BBA27

17563

17679

     

short gene

BBA28

17906

17757

     

short gene

BBA29

18019

17897

     

short gene

BBA30

18064

18654

       

BBA31

18661

20010

protein Orf26 of phage fO1205 {Streptococcus thermophilus}

 

145

homolog of phage Streptococcus thermophilus fO1205 gene orf26 that is likely to be a terminase subunit

11 bp repeat

20138

20216

     

7.1 repeats of TAAATCAATAT; not in a gene

BBA32

20654

20845

     

short gene; near-consensus N-terminal lipidation signal

BBA33

21023

21559

     

N-terminal lipidation consensus

BBA34

23209

21623

oligopeptide ABC transporter, periplasmic oligopeptide-binding protein {Escherichia coli}

oppAV

37

N-terminal lipidation consensus (Bono et al., 1998)

BBA35

23391

23284

     

short gene

BBA36

23710

24348

     

N-terminal lipidation consensus; has weak similarity to family 113

BBA37

24434

25036

       

BBA38

25389

26627

   

146

has cp32 paralog

BBA39

26642

27226

   

147

has cp32 paralog

BBA40

27326

27931

   

148

has cp32 paralog

BBA41

27950

28864

   

149

has cp32 paralog

BBA42

28871

29320

   

150

has cp32 paralog

BBA43

29320

29691

   

107

has cp32 paralog

BBA44

29688

30005

       

BBA45

30030

30566

   

151

has cp32 paralog

BBA46

30646

31713

   

152

has cp32 paralog

BBA47

31716

32135

   

153

has cp32 paralog

BBA48

32197

32682

   

154

has cp32 paralog

BBA49

32678

32893

   

155

short gene; has cp32 paralog

BBA50

32905

34299

       

BBA51

34321

34881

   

157

has cp32 paralog

BBA52

34924

35766

 

BK2.1 (297)

 

(Akins et al., 1993)

BBA53

35890

36162

   

158

short gene

BBA54

36192

36467

   

158

short gene; near-consensus N-terminal lipidation signal

BBA55

36558

37484

   

159

has cp32 paralog

BBA56

37477

38043

   

160

has cp32 paralog

BBA57

39341

38100

     

N-terminal lipidation consensus

BBA58

39566

39766

     

short gene

BBA59

40048

39812

 

12 kd lipoprotein

 

N-terminal lipidation consensus; short gene but its real and expressed; (McGrath et al., 1997)

BBA60

40981

40151

 

P27 (B29)

 

N-terminal lipidation consensus, lipidated in Bb, surface exposed (Reindl et al., 1993)

BBA61

41955

41335

 

D6 (B. garinii VS102)

 

(Balmelli et al., 1996)

BBA62

42203

42406

 

7.5 kd;

6.6 kd (297)

 

N-terminal lipidation consensus; lipidated in Bb, possible surface exposure, short gene (Katona et al., 1992; Lahdenne et al., 1997); transcription start (Indest et al., 1997)

BBA63

42576

42454

     

short gene

BBA64

43483

42563

 

P35 antigen

54

N-terminal lipidation consensus; (Gilmore et al., 1997); cell density-dependent expression and transcription start (Indest et al., 1997)

BBA65

44469

43624

   

54

N-terminal lipidation consensus

BBA66

45883

44651

   

54

N-terminal lipidation consensus

BBA67

46021

46197

     

short gene

BBA68

47164

46412

   

54

N-terminal lipidation consensus

BBA69

48203

47415

   

54

N-terminal lipidation consensus

BBA70

49158

48523

   

54

pseudogene

BBA71

49796

49386

   

54

pseudogene

BBA72

50031

49792

     

short gene; near-consensus N-terminal lipidation signal

BBA73

51112

50225

   

54

N-terminal lipidation consensus

BBA74

51642

52412

 

oms28

171

outer membrane protein with porin activity (Skare et al., 1996)

BBA75

52591

52496

     

short gene

BBA76

52642

53436

thymidylate synthase-complementing protein (thy1) {Dictyostelium discoideum}

 

65

one patch of similarity to thy1

             

lp56

         

Hinnebusch et al. (1990) determined the sequence of what turned out to be the right telomere of lp56 - this sequence (called TL49 in that paper) adds 25 bp to the right end of the TIGR sequence

BBQ01

279

545

   

55

pseudogene?

BBQ02

710

799

   

174

questionable gene; paralog not called in similar sequence elsewhere

BBQ03

856

1404

   

52

N-terminal lipidation consensus

BBQ04

1404

2265

   

44

pseudogene; authentic frameshift; N-terminal lipidation consensus

BBQ05

2744

3493

   

60

N-terminal lipidation consensus

BBQ06

3623

4105

   

48

 

BBQ07

4986

4252

   

49

 

BBQ08

5830

5072

plasmid partition protein {Bacillus subtilis}

 

32

putative plasmid partition function

BBQ09

6339

5806

   

50

 

BBQ10

6674

6339

   

62

pseudogene; fusion protein resulting from integration of a cp32-like plasmid into the linear lp56 precursor

BBQ11

6585

6800

   

148

pseudogene; result of integration of a cp32-like plasmid into the linear lp56 precursor

BBQ12

6800

7408

   

148

 

BBQ13

7418

8389

   

149

 

BBQ14

8406

8864

   

150

 

BBQ15

8849

9247

   

107

 

BBQ16

9238

9624

   

108

pseudogene; authentic frameshift

BBQ17

9624

10187

   

151

 

BBQ18

10171

11280

   

152

 

BBQ19

11276

11722

   

153

 

BBQ20

11741

12193

   

154

 

BBQ21

12193

12426

   

155

short gene

BBQ22

12437

13729

   

156

 

BBQ23

13755

14435

   

157

 

BBQ24

14442

15389

   

159

 

BBQ25

15410

15961

   

160

 

BBQ26

15994

16323

   

139

 

BBQ27

16323

17195

   

140

 

BBQ28

17208

17810

   

141

 

BBQ29

17823

18635

   

142

 

BBQ30

18713

18913

 

blyA-56

109

putative hemolysin; short gene

BBQ31

18920

19264

 

blyB-56

111

putative hemolysin

BBQ32

19257

19589

   

112

 

BBQ33

19579

19935

   

143

near-consensus N-terminal lipidation signal

BBQ34

20022

20735

 

bdrW

80

contains ~7 repeats of a 33 bp sequence

BBQ35

20844

21452

 

mlpJ

113

N-terminal lipidation consensus

BBQ36

21424

21513

     

questionable gene; gene not called in paralogous sequence on other cp32s

BBQ37

22479

21535

 

orf4-56

161

(Zuckert and Meyer, 1996)

BBQ38

22869

23963

 

orf1-56

57

(Zuckert and Meyer, 1996)

BBQ39

23976

24539

 

orf2-56

50

(Zuckert and Meyer, 1996)

BBQ40

24518

25270

plasmid partition protein {Bacillus subtilis}

orfC-56

32

putative plasmid partition function (Zuckert and Meyer, 1996)

BBQ41

25317

25874

 

orf3-56

49

(Zuckert and Meyer, 1996)

BBQ42

25890

26423

 

bdrV

80

(Zuckert and Meyer, 1996)

BBQ43

26879

28243

 

orf8/7-56

165

(Zuckert and Meyer, 1996)

BBQ44

28299

28721

   

144

 

BBQ45

29506

28733

   

96

 

BBQ46

29559

29651

     

questionable gene; gene not called in paralogous sequence on other cp32s; near consensus lipidation sequence

BBQ47

29895

30971

 

erpX

163

N-terminal lipidation consensus (Stevenson et al., 1998a)

BBQ48

31117

31707

   

114

 

BBQ49

31892

32455

   

115

 

BBQ50

32455

33804

hypothetical protein Orf26 of phage fO1205 gene {Streptococcus thermophilus}

 

145

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

BBQ51

33873

35095

   

146

pseudogene; authentic frameshift

BBQ52

35115

35804

   

147

 

BBQ53

35819

36382

   

148

 

BBQ54

36388

37132

   

148

pseudogene; result of integration of a cp32-like plasmid into the linear lp56 precursor

BBQ55

37533

36933

   

62

pseudogene; result of integration of a cp32-like plasmid into the linear lp56 precursor

BBQ56

37809

37672

     

near-consensus N-terminal lipidation signal; short gene

BBQ57

38223

37819

   

101

questionable gene; in-frame and inside of BBQ60

BBQ58

38514

38419

   

101

questionable gene; in-frame and inside of BBQ60

BBQ59

39077

38568

   

101

questionable gene; in-frame and inside of BBQ60

BBQ60

39360

37817

   

101

pseudogene

BBQ61

39482

39360

   

117

questionable gene; in-frame and inside of BBQ63

BBQ62

39531

39902

     

questionable gene; backwards in BBQ63

BBQ63

39934

39400

   

117

pseudogene

BBQ64

40186

39962

   

103

questionable gene; in-frame and inside of BBQ65

BBQ65

40218

39961

   

103

pseudogene

BBQ66

40317

40409

     

short gene

BBQ67

43732

40439

C-terminal portion — adenine specific DNA methyltransferase {Helicobacter pylori}

N-terminal portion hits gene HP1353 {Helicobacter pylori}

 

102/

167

probable pseudogene or fusion gene; N-terminal portion is good match to adenine specific DNA methyltransferase; C-term is good match to BBG02 which matches H. pylori HP1353 and HP1352 & HP1354 are putative adenine methylases!

BBQ68

44232

43918

   

138

questionable gene; in-frame and part of BBQ69

BBQ69

44581

43769

   

138

pseudogene

BBQ70

44612

44511

     

questionable gene; in-frame and overlaps Q69 (in part)

BBQ71

44582

45263

multidrug-efflux transporter tetA(B) {Helicobacter pylori}

 

105

pseudogene

BBQ72

45314

45427

     

short gene

BBQ73

45630

45530

   

60

pseudogene

BBQ74

46462

45804

   

60

pseudogene

BBQ75

46671

46781

   

168

pseudogene

BBQ76

46982

47077

     

short gene

BBQ77

47163

47279

transposase-like protein {Anabena}

 

82

pseudogene contains a fragment of [82] in middle; probably actually part of BBQ80 pseudogene

BBQ78

47295

47393

     

questionable gene; out of frame inside BBQ81

BBQ79

47273

47569

transposase-like protein {Anabena}

 

82

pseudogene (see BBQ77 comment)

BBQ80

48626

47787

   

60

pseudogene

BBQ81

49246

49047

   

48

pseudogene

BBQ82

49538

49347

   

76

pseudogene

BBQ83

49868

49755

     

short gene

BBQ84

50550

50398

   

84

short gene (pseudogene [57])

BBQ84.1

50900

50700

   

57

pseudogene

BBQ85

51528

51175

   

57

pseudogene

BBQ86

51823

51722

   

57

pseudogene

BBQ87

52067

51921

   

57/77

pseudogene

BBQ88

52608

52138

   

76

 

BBQ89

52726

52535

   

166

short gene; N-terminal lipidation consensus

             

Right 7.2 kbp of B31 chrm

   

Note that "genes" BB0850 and BB851 in this region have been removed from the published (Fraser et al., 1997) gene list due to improvements in the TIGR gene-calling protocol.

   

The rightmost 7.2 kbp of the B31 chromosome contains largely plasmid-like sequences, many of which are pseudogenes. This is not true of the remainder or "constant portion" of the chromosome, including sequences near the left chromosomal end.

BB0843.1

903255

903415

   

32

pseudogene

BB0844

904900

903932

   

12

 

BB0845

905120

905224

     

questionable gene; inside BB0845.1 and backwards

BB0845.1

905255

905025

   

76

pseudogene

BB0845.11

905395

905295

   

166

possible pseudogene; this is a poor homolog and is not included in the TIGR analysis.

BB0845.2

905475

905775

   

105

pseudogene

BB0846

905865

905755

     

questionable gene; overlaps BB 845.2 backwards

BB0847

905839

905943

     

short gene

BB0848

905928

906029

     

short gene

BB0848.1

906075

906275

   

82

pseudogene

BB0849

906162

906260

     

questionable gene; inside BB 848.1

BB0849.1

906725

906275

   

57

pseudogene

BB0849.2

907225

908225

   

1

pseudogene

BB0850

——

——

     

No longer considered to be a realistic potential gene.

BB0851

——

——

     

No longer considered to be a realistic potential gene.

BB0852

908407

909588

   

138

 

BB0853

910175

909845

   

57

pseudogene

BB0853.1

910555

910375

   

57

pseudogene

 

Part II

Paralogous Gene Families in B. burgdorferi B31

Compiled by Daniel Haft, Owen White and Sherwood Casjens - April 1999

Procedure for generation of the paralogous gene families.

1. Any pair of B31 proteins whose comparison scored better than 0.02 probability by FASTA3 was clustered. Any additional protein similar to (with better than 0.02 probability) any member of an existing cluster joined that cluster. Therefore each protein is a member of AT MOST a single cluster (no cluster is linked to any other cluster by a score better than 0.02). However clusters may fail to be closed under transitivity, that is, if protein A scores >0.02 with B, and B scores >0.02 with C, A does not necessarily score >0.02 with C.

2. This preliminary clustering was followed by manual curation of amino acid sequence alignments. Final clusters were generated by parsing the approved alignments. Curation was as follows: For each initial cluster, multiple sequence alignments were generated by CLUSTALW and by a TIGR program called MSA (G. Sutton, unpublished; not to be confused with another MSA program) that runs on the MASPAR. The better of the two alignments was selected by inspection of both. Sometimes manual editing was performed to improve the alignment further.

3. Adjustments to the paralogous families were achieved by splitting clusters, not joining them or adding proteins to clusters (with a single exception, in which one additional GTP-binding protein was added to the cluster of all other GTP-binding proteins). The standard for splitting versus not splitting was as follows: Alignments were viewed in BELVU, which could be used to generate a UPGMA difference tree (which is NOT a phylogenetic tree). In some cases, a domain could be recognized that was responsible for the initial clustering. If large portions of the resulting alignment contained protein sequence that clearly was not homologous, although aligned, these were split. This happened most often for plasmid proteins that shared similar amino-terminal domains (signal and lipidation sequences), but that otherwise were easily resolvable into several different classes. Two notable cases, genes BBR41 and BBQ76, appear to be gene fusions that join large, easily recognizable parts of two genes that fall into two different paralogous families; thus, even though, for example BBR41 bridges two families by virtue of its two different domains, these two families were not joined since they have no similarity to one another.

Why are some numbers not used in the current set of paralogous family names?

Why are some published paralogous family names no longer used?

The methods TIGR uses to classify paralogous gene family members has changed somewhat since Fraser et al. (1997) was published. Where possible, the gene family names in that paper are used here. In cases where two or more families have fused since that publication, we chose one of the previous family names and have not re-used the others. Hence, not all numbers are currently used in the list of family names; 161 of the numbers between 1 and 175 are currently used as family names (the numbers 5, 7, 17, 24, 27, 28, 51, 67, 73, 79, 81,83, 87, 169, etc., are not used).

Mini-summary of Some Paralogous Relationships

There are 161 paralogous families of B. burgdorferi B31 genes, 107 of which have plasmid borne members. The family sizes vary from 2 members to 41 members. Family 57 has 41 members of which only 16 appear to be full-length, intact genes. Some families have noticeable subgroups that are not delineated here - for, example, family 148 (26 members of which 24 are intact) has three clear subgroups each of which has 8 members (these subgroups coincide with three continuous related genes on each of the cp32’s and on lp56).

Some of the largest paralogous families are as follows:

family

total genes

pseudogenes

apparently intact genes

32

29

4

25

49

26

6

20

50

23

3

20

54

14

4

10

57

41

25

16

60

15

6

9

62

12

3

6

80

18

1

17

82

17

17

0

It is curious to note that five of the above families (32, 49, 50 and 57) are members of the so-called "partition gene cluster" (Zuckert & Meyer, 1996; Casjens et al., 1999), intact variants of which are found on all or most of the B31 plasmids.

Most plasmid genes are members of paralogous families. Only 63 of the 535 plasmid non-pseudogenes >300 bp that have no paralogs, and 93 of the 134 ²300 bp non-pseudo genes have no paralogs.

The 63 >300 bp plasmid genes with no paralogs are the following:

BBB01, BBB02, BBB03, BBB04, BBB05, BBB06, BBB07, BBB08, BBB09, BBB14, BBB17, BBB18, BBB19, BBB24, BBB25, BBB26, BBB27, BBB28, BBS27, BBM39, BBD09, BBD10, BBD11, BBD13, BBD18, BBE17, BBE22, BBG12, BBG13, BBG14, BBG15, BBG16, BBG17, BBG18, BBG21, BBG26, BBG27, BBG28, BBG30, BBH06, BBK19, BBK32, BBJ09, BBJ25, BBJ27, BBJ28, BBJ48, BBA03, BBA05, BBA07, BBA13, BBA30, BBA33, BBA36, BBA37, BBA44, BBA50, BBA52, BBA57, BBA59, BBA60, BBA61, BBA62.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Paralogous Gene Families in B. burgdorferi B31

Definitions used in the following table:

* - Indicates pseudogenes as defined in PARTs I and IV.

LP - Indicates that the gene contains a "perfect" lipoprotein consensus (see PART III).

LP? - Indicates that the gene contains an "imperfect" but near-consensus lipidation sequence (see PART III). Cross referencing of this feature into gene families without plasmid members was not performed.

‡ - Indicates families with only chromosomal member genes.

† - Daggers indicate genes inside of a larger pseudogene (i.e., are part of a larger entity that is also in the gene list).

(...) - Gene names in parentheses are not in this location in the TIGR paralog list - see comments column in those cases.

Paralogous Family Name

Member Genes and Pseudogenes*

Comments

1 family

BB0849.2*

BBE02

BBH09

BBH11.1*

BBK02.1*

†BBK02*

†BBK03*

†BBK04*

BBK10*

BBK25.1*

†BBK26*

†BBK28*

†BBK29*

†BBK30*

7 members

‡2 family

BB0246

BB0255

BB0262

BB0761

4 members

‡3 family

BB0611

BB0757

2 members

4 family

BB0080

BB0146

BB0218

BB0318

BB0334

BB0335

BB0466

BB0573

BB0642

BB0677

BB0742

BB0754

BBJ26

13 members

‡6 family

BB0020

BB0727

2 members

‡8 family

BB0302

BB0719

2 members

‡9 family

BB0264

BB0518

BB0715

3 members

‡10 family

BB0076

BB0270

BB0694

3 members

‡11 family

BB0088

BB0540

BB0691

3 members

12 family

BB0844LP

BBG01LP

BBH37LP

BBJ08LP?

BBK01*LP

5 members; All are fairly near a telomere, transcribed towards center of plasmid.

‡13 family

BB0578

BB0596

BB0597

BB0680

BB0681

5 members

‡14 family

BB0419

BB0420

BB0551

BB0570

BB0672

BB0763

6 members

‡15 family

BB0517

BB0655

2 members

16 family

BB0116

BB0645

BBB29

3 members

‡18 family

BB0344

BB0607

2 members

‡19 family

BB0408

BB0629

2 members

‡20 family

BB0581

BB0623

2 members

‡21 family

BB0002

BB0620

2 members

‡22 family

BB0253

BB0613

2 members

‡23 family

BB0369

BB0834

2 members

‡25 family

BB0137

BB0593

2 members

26 family

BB0375

BB0588

BBE07*

BBI06

4 members

‡29 family

BB0451

BB0452

2 members

‡30 family

BB0036

BB0436

2 members

‡31 family

BB0035

BBB0435

2 members

32 family

BB0269

BB0361

BB0431

BB0726

BB0843.1*

BBA20

BBB12

BBD21

BBE19

BBE23.2*

BBF11.1*

BBF13

BBF24

BBG08

BBH28

BBI21

BBJ17

BBK16*

BBK21

BBL32

BBM32

BBN32

BBO32

BBP32

BBQ08

BBQ40

BBR33

BBS35

BBU05

29 members; Previously called ORF-C (Zuckert and Meyer, 1996)

Homology to parA genes in other bacterial systems suggests these genes function in plasmid partitioning.

BBD21 is fairly distant member

‡33 family

BB0040

BB0312

BB0414

BB0565

BB0670

5 members

‡34 family

BB0738

BB0833

2 members

‡35 family

BB0405

BB0406

BB0562

BB0563

BB0564

5 members

‡36 family

BB0382

BB0383

BB0384

BB0385

4 members

37 family

BB0328LP

BB0329LP?

BB0330LP?

BBA34LP

BBB16LP

5 members; oppA homologous genes (oligopeptide ABC transporter)

‡38 family

BB0221

BB0290

2 members

‡39 family

BB0093

BB0094

BB0288

3 members

40 family

BB0223

BB0224

BBK13

3 members

‡41 family

BB0145

BB0216

BB0217

BB0332

BB0333

BB0640

BB0641

BB0746

BB0747

9 members

‡42 family

BB0059

BB0202

2 members

‡43 family

BB0074

BB0196

2 members

44 family

BB0158LP

BB0159

BBA04LP

BBE04.1*LP

†BBE05*

BBE09LP

BBF22*

†BBH36*

BBH36.1*

BBK52LP

BBQ04*LP

9 members; BBA04 protein is "S2 antigen"

‡45 family

BB0018

BB0815

2 members

46 family

BB0111

BBG32

2 members; These proteins are homologs to helicases; lipidation seems unlikely

‡47 family

BB0050

BB0051

2 members

48 family

BB0034

BBA01

BBG03*

BBH41

BBI31

BBJ02.1*

†BBJ03*

BBQ06

BBQ81*

9 members; BBG03* has patchy similarity to family and is slightly truncated at N-terminus

49 family

BBA21

BBB13

BBC03

BBE18

BBE24*

BBE24.1*

BBE30*

BBF12(*?)

BBF23

BBF29*

BBG09

BBH29

BBI22

BBI40*

BBJ16

BBK24

BBL34

BBM33

BBN33

BBO33

BBP33

BBQ07

BBQ41

BBR34

BBS36

BBU06

26 members

Previously called ORF-3 (Dunn et al., 1994; Zuckert and Meyer, 1996);

BBF12 is very patchy paralog (fusion pseudogene?);

BBE24.1 is a fragment of BBF12 that is not very related to the rest of the family;

BBF29 is patchy paralog (fusion pseudogene?)

50 family

BBA19

BBB11

BBC02

BBE20

BBF14

BBF25

BBF31*

BBG07

BBG31*

BBH07*

BBH27

BBI20

BBJ18

BBK22

BBL31

BBM31

BBN31

BBO31

BBP31

BBQ09

BBQ39

BBR32

BBS34

23 members

Previously called ORF-2 (Dunn et al., 1994; Zuckert and Meyer, 1996).

51

 

no longer exists - merged into family 62

52 family

BBI42LP

BBJ50*

BBK53LP

BBQ03LP

BBT07*

BBU12*

6 members; BBJ50* is fairly distant member of family

53 family

BBA15LP

BBA16LP

2 members; ospA & ospB genes

54 family

BBA64LP

BBA65LP

BBA66LP

BBA68LP

BBA69LP

BBA70*

BBA71*

BBA73LP

BBE04*LP?

BBI36LP

BBI38LP

BBI39LP

BBJ39.1*

BBJ41LP

(BBJ42*?)

14 members

Only BBA64, BBA65, BBA66 and BBA73 have a "standard" lipidation consensus; BBA68, BBA69, BBI36, BBI38, BBI39 and BBJ41 have sequences that fit a slightly relaxed consensus;

Includes "old family 87";

BBJ42* has very weak homology to family 54; it is not included in the TIGR computer’s paralog list.

55 family

BBC08

BBI43(*)

BBK54(*)

BBQ01(*)

BBU09(*)

5 members; A confusing family - it demonstrates the problems involved in attempting to make gene/pseudogene decisions on novel DNA sequence for which there is no information beyond sequence: relative sizes are BBC08 > BBU09>BBK54=BBI43=BBQ01; I called the shortest 4 pseudogenes, but obviously no one really knows; it is possible that BBC08 is unusually large for an as yet unknown reason.

‡56 family

BB0473

BB0583

BB0584

3 members

57 family

BB0849.1*

BB0853*

BB0853.1*

BA18

BBC01

(BBD02*)

BBD03*

BBD04*

BBD05.1*

†BBD06*

BBE21

BBE23.1*

BBE32*

BBE33*

(BBF04*)

BBF05*

BBF06*

BBF26

BBF31.1*

BBG06

(BBH03*)

BBH04*

BBH05*

BBI01*

BBI02.2*

BBI19

BBL30

BBM30

BBN30

BBO30

BBP30

BBQ38

BBQ84.1*

BBQ85*

BBQ86*

(BBQ87*)

BBR31

BBS33

BBT01*

BBT02*

BBT04

BBT05*

BBU01*

BBU04

BBU07*

BBU10*

45 members - family 57 is distant relative of family 62

Previously called ORF-1 (Dunn et al., 1994; Zuckert and Meyer, 1996).

BBD02, BBF04, BBH03 & BBQ87 (family 77) appear to be fusions of part of a family 57 gene and something else;

BBD05, BBI02, BBQ84, BBT03, and BBU02 (family 84) also have weak similarities to family 57.

Many of the family 57 members are in highly recombined telomeric regions are perhaps not functional (?) (Casjens et al., 2000).

58 family

BBJ10*

BBK40

2 members

BBK40 contains repeats that are somewhat similar to BBG33 in family 80

59 family

BBI08.1*

†BBI09*

†BBI10*

BBJ31

BBJ45LP?

BBK07LP

BBK12LP

BBK39*

6 members

only BBK07 and BBK39 have good lipidation consensus

60 family

BBE31LP

BBH32LP

BBI14*LP

BBI15*

BBI16LP

BBI27*

BBI28LP

BBI29LP

BBI34LP

BBJ001*LP

†BBJ01LP

†BBJ02

BBK15

BBQ05LP

BBQ73*

BBQ74*

BBQ80*

15 members

F01 is fairly poor paralog

K15 missing LP consenus - is it a pseudogene?

61 family

BBH33*

BBK17

2 members

62 family

BBB10

BBD14

BBG29

BBH26

BBH34*

BBJ19

BBK23

BBQ10*

BBQ55*

9 members - family 62 is related to family 57

Published families 51 and 62 now merged

63 family

BBC10LP

BBM27LP

BBP27LP

3 members

BBP27 is rev gene of (Gilmore et al., 1997)

64 family

BBF16*

BBK34

2 members

65 family

BBA76

BBF14.1*

BBH18.1*

BBK33*

4 members

66 family

BBF21

BBK36

2 members

68 family

BBF17*

BBK35

2 members

69 family

BBH18LP

BBJ13*

BBK47LP

BBK49LP

4 members

published families 69 & 81 merged

70 family

BBF10

BBK41

2 members

71 family

BBF09*

BBK42

2 members

72 family

BBF08*

BBK42.1*

†BBK43*

2 members

complex situation regarding pseudogenes

74 family

BBA24LP

BBA25LP

2 members

dbpA & dbpB (decorin binding proteins) (Guo et al., 1998)

75 family

BBK37*

BBK45LP?

BBK46*

BBK48LP

BBK50LP

5 members

BBK45 does not have the lipidation consensus as the gene was originally listed, but it has a near-consensus lipidation signal at alternate translation start (see Part III).

BBK37 is truncated and fused to other sequences C-terminal to its family 75 similarity; thus it is also placed in family 175 as well by virtue of similarity to that family at its C-terminus; it looks as though it could be expressed.

76 family

BB0845.1*

BBD01

BBH02

BBQ82*

BBQ88*

5 members

77 family

BBD02

BBF04

BBH03

BBQ87

4 members

These appear to be fusions between family 57 and something else (that is common to family 77 genes); see also comments under family 57

‡78 family

BB0283

BB0293

BB0774

BB0775

4 members

80 family

BBF001.1*

BBF03*

BBG33

BBH13

(BBK40)

BBL27

BBL35

BBM34

BBN27

BBN34

BBO27

BBO34

BBP34

BBQ34

BBQ42

BBR27

BBR35

BBS29

BBS37

19 members

Two subfamilies previously typified by ORF-E (Zuckert and Meyer, 1996;

Zuckert et al., 1999) and rep (Porcella et al., 1996).

Now these are called the bdr genes, for Borrelia direct repeat containing genes (W. Zuckert et al. 1999).

Most cp32s carry two homologs in this family called one at or near gene position 28 (rep) and the other at or near gene position 34 (ORF-E).

BBK40 (family 58) contains repeats that are somewhat similar to BBG33, but is otherwise not very similar to this family.

82 family

BB0848.1*

BBD08*

BBD15.1*

BBD20*

BBD23*

BBE21.1*

BBF18*

†BBF19*

BBG05*

BBH10.1*

BBH40*

BBI33*

BBI41*

BBJ05*

BBK001*

BBK25*

BBQ77*

BBQ79*

17 members

BBG05 is "best" family member; it has one frameshift relative to homologous putative transposases in some other bacteria.

It does not appear that BBG05 could be expressed by programmed translational frameshifting, and so it appears that strain B31 no longer has an intact version of this gene.

84 family

BBD05*

BBI02(*)

BBQ84*

BBT03(*)

BBU02(*)

5 members; This group may in fact be part of family 57 since BBI02, BBU02 & BBT03 have weak similarity to the N-terminal region of [57] members; if so then all the genes in family 84 would be pseudogenes. These are all in the highly recombined telomeric regions which may carry no "real" genes (Casjens et al., 2000).

85 family

BBD15LP

BBF20*LP

2 members

86 family

BBG22

BBG23

BBH24.1*

†BBJ12*

BBJ12.1*

4 members

87 family

none

published family 87 (Fraser et al., 1997) merged into family 54

88 family

BBF001*

BBF02

BBG34

3 members

‡89 family

BB0712

BB0771

2 members

90 family

BBJ29

BBJ43

2 members

91 family

BBJ30

BBJ44

2 members

92 family

BBJ34LP

BBJ36LP

BBJ49*

3 members

93 family

BBI35

BBI37

2 members

94 family

BBB22

BBB23

2 members

95 family

BBC06

BBH09.1*

BBS42

3 members; BBC06 and BBS42 previously called eppA and bapA, respectively (Champion et al., 1994; Wallich et al., 1995)

96 family

BBC11

BBH30*

BBL38

BBM37

BBN37*

BBO38

BBP37

BBQ45

BBR38

BBS40

10 members; Previously called ORF-6 (Dunn et al., 1994; Zuckert and Meyer, 1996).

‡97 family

BB0068

BB0421

2 members

98 family

BBE03

BBI31.1*

†BBJ07*

BBJ07.1*

3 members

99 family

BBE16LP?

BBJ47LP?

2 members

100 family

BBF07

BBK44

2 members

101 family

BBF26.1*

†BBF27*

BBG10

†BBQ57

†BBQ58

†BBQ59

BBQ60*

3 members

102 family

BBE29.1*

BBG02LP

BBH36.2*

(BBQ67*)

4 members

The C-terminal portion of BBQ67 is similar to BBG02; the N-terminal part is similar to adenine methylases (see family 167)

103 family

BBG20

†BBQ64

BBQ65*

2 members

104 family

BBG24

BBH20.1*

†BBH23*

†BBH24*

2 members

105 family

BB0845.2*

BBI26

BBJ15.1*

BBQ71*

4 members

106 family

BBJ23

BBJ24

2 members

107 family

BBA43

BBL08

BBM08

BBN08

BBO08

BBP08

BBQ15

BBR08

BBS08

9 members

BBA43 is the most distant paralog of this family

108 family

BBL09

BBM09

BBN09

BBO09

BBP09

BBQ16*

BBR09

BBS09

8 members

109 family

BBL23

BBM23

BBN23

BBO23

BBP23

BBQ30

BBR23

BBS23

8 members

‡110 family

BB0079

BB0081

2 members

111 family

BBL24

BBM24

BBN24

BBO24

BBP24

BBQ31

BBR24

BBS24

8 members

112 family

BBL25

BBM25

BBN25

BBO25

BBP25

BBQ32

BBR25

BBS25

8 members

113 family

BBL28LP

BBM28LP

BBN28LP

BBO28LP

BBP28LP

BBQ35LP

BBR28LP

BBS30LP

8 members

cp32 mlp genes

114 family

BBL41

BBN41

BBO42

BBP40

BBQ48

BBR43

7 members

115 family

BBL42

BBM41

BBN42

BBO43

BBP41

BBQ49

BBR44

BBS44

8 members

116 family

BBN40

BBO41

2 members

117 family

BBG19

†BBQ61

BBQ63*

2 members

‡118 family

BB0098

BB0797

2 members

‡119 family

BB0136

BB0718

2 members

‡120 family

BB0147

BB0182

2 members

‡121 family

BB0172

BB0173

2 members

‡122 family

BB0179

BB0508

BB0643

3 members

‡123 family

BB0058

BB0195

2 members

‡124 family

BB0225

BB0737

2 members

‡125 family

BB0231

BB0245

BB0538

3 members

‡126 family

BB0251

BB0587

2 members

‡127 family

BB0295

BB0612

2 members

‡128 family

BB0304

BB0817

2 members

‡129 family

BB0316

BB0317

2 members

‡130 family

BB0678

BB0679

2 members

‡131 family

BB0366

BB0627

2 members

‡132 family

BB0415

BB0568

2 members

‡133 family

BB0471

BB0505

2 members

‡134 family

BB0567

BB0669

2 members

‡135 family

BB0638

BB0637

2 members

‡136 family

BB0652

BB0653

2 members

137 family

BB0734

BBT06

BBU08*

BBU11

4 members

138 family

BB0852

BBJ21.1*

†BBQ68

BBQ69*

3 members

139 family

BBA08

BBL19

BBM19

BBN19

BBO19

BBP19

BBQ26

BBR19

BBS19

9 members

140 family

BBA09

BBL20

BBM20

BBN20*

BBO20

BBP20

BBQ27

BBR20

BBS20

9 members

141 family

BBA10

BBL21

BBM21

BBN21*

BBO21

BBP21

BBQ28

BBR21

BBS21

9 members

142 family

BBA11

BBL22

BBM22

BBN22

BBO22

BBP22

BBQ29

BBR22

BBS22

9 members

143 family

BBA14LP

BBG25LP

BBL26LP?

BBM26LP?

BBN26LP?

BBO26LP?

BBP26LP?

BBQ33LP?

BBR26LP?

BBS26LP?

10 members

only BBA14 and BBG25 have lipidation consensus; others are one off from the current consensus (see section III below) - they are the most distant members of the family;

Porcella et al. (1996) showed that a homolog (from strain 297) of the cp32 members of this family is not lipidated in E. coli.

144 family

BBA23

BBL37

BBM36

BBN36

BBO37

BBP36

BBQ44

BBR37

BBS39

9 members

Previously called ORF-10 (Dunn et al., 1994; Zuckert and Meyer, 1996).

BBG27 is a weak paralog of family 144

145 family

BBA31

BBL43

BBM42

BBN43

BBO44

BBP42

BBQ50

BBR45

BBS45

9 members

phage fO1205 Orf26 homology; Orf26 is a possible phage structural protein

146 family

BBA38

BBL01

BBM01

BBN01

BBO01

BBP01

BBQ51*

BBR01

BBS01

9 members

147 family

BBA39

BBL02

BBM02

BBN02

BBO02

BBP02

BBQ52

BBR02

BBS02

9 members

148 family

BBA40

BBL03

BBL04

BBL05

BBM03

BBM04

BBM05

BBN03

BBN04

BBN05*

BBO03

BBO04

BBO05

BBP03

BBP04

BBP05

BBQ11*

BBQ12

BBQ53

BBQ54*

BBR03

BBR04

BBR05

BBS03

BBS04

BBS05

26 members

There are three subfamilies in 148; each subfamily has one member from each cp32; the three subfamily genes lie in a contiguous cluster on each cp32.

149 family

BBA41

BBL06

BBM06

BBN06*

BBO06

BBP06

BBQ13

BBR06

BBS06

9 members

150 family

BBA42

BBL07

BBM07

BBN07

BBO07

BBP07

BBQ14

BBR07

BBS07

9 members

151 family

BBA45

BBL10

BBM10

BBN10

BBO10

BBP10

BBQ17

BBR10

BBS10

9 members

152 family

BBA46

BBL11

BBM11

BBN11

BBO11

BBP11

BBQ18

BBR11

BBS11

9 members

153 family

BBA47

BBL12

BBM12

BBN12

BBO12

BBP12

BBQ19

BBR12

BBS12

9 members

154 family

BBA48

BBL13

BBM13

BBN13*

BBO13

BBP13

BBQ20

BBR13

BBS13

9 members

155 family

BBA49

BBL14

BBM14

BBN14

BBO14

BBP14

BBQ21

BBR14

BBS14

9 members

156 family

BBL15

BBM15

BBN15

BBO15

BBP15

BBQ22

BBR15

BBS15

9 members

157 family

BBA51

BBL16

BBN16*

BBM16

BBO16

BBP16

BBQ23

BBR16

BBS16

9 members

BBN16 authentic frameshift

158 family

BBA53

BBA54

2 members

159 family

BBA55

BBL17

BBM17

BBN17

BBO17

BBP17

BBQ24

BBR17

BBS17

9 members

160 family

BBA56

BBL18

BBM18

BBN18*

BBO18

BBP18

BBQ25

BBR18

BBS18

9 members

161 family

BBC05

BBL29

BBM29

BBN29*

BBO29

BBP29

BBQ37

BBR29

BBR41*

BBS31

10 members

Previously called ORF-4 (Dunn et al., 1994; Zuckert and Meyer, 1996). BBR41* is a fusion of family 161 and 162 genes.

162 family

BBL39LP

BBN38LP

BBP38LP

BBR40*LP

BBR41*

5 members

erp genes; BBR41* is an apparent fusion of family 161 and 162 genes.

163 family

BBF01LP?

BBL40LP

BBN39LP

BBO40LP

BBP39LP

BBQ47LP

6 members

erp genes; BBF01 has small patch of similarity to this gene family and an N-terminal sequence that is one off from our "stringent consensus" lipidation sequence.

164 family

BBM38LP

BBO39LP

BBR42LP

BBS41LP

4 members

erp genes

165 family

BBC12

BBL36

BBM35

BBN35

BBO36

BBP35

BBQ43

BBR36

BBS38

9 members

Previously called ORF-8/7 (Dunn et al., 1994; Zuckert and Meyer, 1996).

166 family

(BB0845.11*)

BBD001LP

BBH01LP

BBQ89LP

4 members; Could well all be pseudogenes since they are in highly recombined telomeric regions (Casjens et al., 2000).

BB0845.11 is a possible pseudogene; this is a poor homolog and is not included in the TIGR master gene list

167 family

BBE29*

BBQ67(*)

BBJ20*

3 members; The N-terminal portion of BBQ67 is paralogous to BBE29 and its C-terminal portion is similar to BBG02 (family 102); The N-terminal region of BBQ67 is a good match to the full length of an adenine methylase and could be functional in spite of this apparent fusion to BBG02-like sequences?

BBJ20 is similar to the N-terminal portion of BBQ67 but not BBE29.

168 family

BBI30

BBQ75*

2 members

170 family

BBF32*

vlsELP

BBJ51*

3 members; BBF32 contains 15 tandem, direct repeats of vlsE-like sequences (in effect it is 15 pseudogenes, even though they are in-frame with one another);

The vlsE gene is the only known B31 gene that is absent from TIGR sequence; this is due to its terminal location on lp28-1 and the presence of an unclonable sequence near it - see Zhang et al. (1997) and Casjens et al., 1999).

171 family

BBA74

BBH20*

(BBJ11.1*)

3 members - A74 is osm28 of Skare et al. (1996)

BBJ11.1 similarity to BBA74 is short and not strong - it is not in the TIGR computer’s master gene list

172 family

BBI02.1*

BBU03

2 members

173 family

BBJ32

BBJ45.1*

2 members

174 family

BBK52.1*

BBQ02

2 members

175 family

BBD15.01*

BBF19.1*

BBK37*

3 members

BBK37 is a fusion gene that also contains sequences similar to family 75 members; although BBK37 is truncated relative to intact family 75 members it appears that it might be expressed.

 

Part III

Borrelia burgdorferi B31 Plasmid Lipoprotein Genes

Compiled by Sherwood Casjens, Dan Haft & Claire Fraser - April, 1999

Find below in this section

(i) Background on lipidation consensus sequence

(ii) Numbers of possible lipoprotein genes on B31 plasmids

(iii) Cross-referenced table of the possible lipoprotein genes on B31 plasmids

(iv) Summary of lipoprotein analysis

(v) List of sequences of N-terminal 60 amino acids of the consensus and near-consensus lipoprotein genes on B31 plasmids

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Most authors use an "N-terminal lipidation consensus sequence" in which the proteins must

(i) have a Cys between positions 10 and 30

(ii) have a credible hydrophobic N-terminal signal sequence

(iii) have a positively charged amino acid very near the N-terminus

(iv) contain the following consensus relative to the above Cys (defined as position +1)

 

[L,A,V,I,F,T,M] — [L,A,V,I,F,S] — X — [G,A,S,N] — C

             -4                        -3              -2          -1            +1     position relative to C

We refer to this as the "stringent consensus" in the following discussion (for Gram positive bacterial consensus, see Sutcliffe and Russell, 1995).

B31 proteins that contain this consensus that may be most likely to be lipoproteins in reality, but lipoprotein gene identification in this way is not completely certain, especially in Borrelia, and so the following analysis should be understood to be a current best guess, and NOT as the final word in the identification of Borrelia burgdorferi B31 lipoprotein encoding genes.

Interestingly, the sequence "LTXIC" is present in conjunction with a decent signal sequence in A68, A69, I36, I38, I39 and J41. These genes are all in paralogous family [54], which contains a number of other consensus lipoprotein genes; could this mean that this "non-consensus" sequence is actually lipidated? Similarly the erpL gene (BBO39) has an M at position -3 but is likely to be a lipoprotein, since it is a member of a family of genes in which all other members have the above "stringent" lipidation consensus. Perhaps a better(?) "relaxed consensus" for potential Borrelia lipoproteins can be guessed at from "near-consensus" genes which are known surface proteins or which fall into paralogous families with "consensus gene" members. Most proteins of this type have conservative differences from the above consensus - in particular, S, G or M in position -4, T or M in position -3, and I, T or L in position -1; these could be included in a "relaxed consensus". In part because of these uncertainties, an analysis (described below) that was more complex than a simple consensus search was performed on all of the putative B31 encoded proteins in order to identify the potential lipoproteins.

 

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Prediction of Lipoproteins

Putative B31 proteins were included in the predicted lipoprotein list by the following procedure:

1. A preliminary B31 potential lipoprotein list was generated by rules derived from other species (positive charge near N-terminus followed by a hydrophobic stretch of amino acids that is in turn followed immediately by a lipidation consensus).

2. This list was then manually curated using the Borrelia lipoprotein literature and by inspection of the N-terminal sequences. At this point a few genes were added and or removed. Nearly all of the remaining proteins met the "stringent" consensus (above) and the remaining few met a somewhat more relaxed consensus with only conservative amino acid additions.

3. This list was then used to build a multiple sequence alignment of the putative lipoprotein N-terminal regions that included the presumptive lipoprotein signal and some additional sequence. The alignment was edited, poorly aligned sequences removed, and the alignment was trimmed at the N-terminus to include Met-1 of most but not all sequences, and at the C-terminus to include the predicted modified Cys residue and 4 residues beyond it (the analysis showed that these last 4 residues had little effect on the final results).

4. From this alignment, a Hidden Markov Model (HMM) for the N-terminal region of lipoproteins was constructed using the HMMER 1.8.4 package.

5. The HMM analysis provided a set of all B31 proteins with potential lipidation sequences, listed according to descending HMM score. This list was used in a final manual curation of the potential lipoprotein list as follows:

(i) Any high-scoring genes (HMM scores >25 with a lipidation consensus at position ²40) that were previously missed were added to the list.

(ii) In some cases new start codon assignment allowed high scoring genes to be added to the list.

(iii) Genes that are members of "lipoprotein families" were added if their HMM score was >19.5. The HMM scores dropped off rather quickly; only 16 plasmid proteins not in the final potential lipoprotein list have the sequence N-terminal positive charge and hydrophobic signal sequence features required and HMM scores above 11.

6. The resulting "Potential B31 Lipoprotein List" includes 141 genes given below.

The HMM analysis also identified a number of additional proteins that have N-terminal regions that are similar to those predicted to be lipoproteins, but which do not quite meet the above criteria. We conclude that there may be as many as 20-50 lipoproteins in Borrelia beyond those we specifically predicted.

Note: Bona fide type I signal sequences on non-lipidated proteins share some of the above characteristics, including basic residues near the N-terminus followed by a hydrophobic region. Lipoproteins differ in having the modified Cys itself and the lipidation consensus region. The HMM model was not specifically designed to discriminate between type I and lipidated signal sequences, so many of the weakly predicted lipoproteins are likely to be membrane proteins even if they are not in fact lipidated.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Summary of Lipoproteins Encoded by the B31 Genome

Predicted B31 Lipoproteins

136 putative B31 genes encode proteins that are the most likely to be lipoproteins.

38 of these putative lipoproteins are chromosomally encoded (all by "intact" genes).

98 of these are plasmid encoded.

90 putative lipoproteins are plasmid encoded by "intact" genes.

7 putative plasmid encoded lipoproteins (BBF20, BBF32, BBI14, BBJ001[BBJ01], BBK01, BBQ04, BBR40) come from ORFs currently classified as pseudogenes that may have intact translation starts with the lipidation consensus sequence.

1 putative plasmid lipoprotein (BBI32) is translated from "questionable" genes (see discussion of these above).

9 paralogous gene families (families 36, 53, 63, 74, 85, 113, 164, 166, 170) encode only predicted lipoproteins, and several others (e.g., family 12) contain genes that encode only with a near-consensus lipidation sequence.

17 paralogous gene families (families 12, 21, 37, 40, 44, 52, 54, 59, 60, 69, 75, 92, 102, 136, 143, 162, 163) are heterogeneous in that at least 1 potential LP and at least one non-LP is found in the family. A substantial fraction of the proteins from these families that are not predicted to be lipidated by the above protocol have "near-consensus" sequences, suggesting that they might in fact also be lipoproteins.

The 38 chromosomal genes listed as potentially encoding lipoproteins are:

BB0028, BB0038, BB0071, BB0141, BB0144, BB0155, BB0213, BB0215, BB0224, BB0227, BB0298, BB0321, BB0324, BB0352, BB0328, BB0365, BB0382, BB0383, BB0384, BB0385, BB0398, BB0458, BB0464, BB0475, BB0536, BB0542, BB0553, BB0620, BB0628, BB0652, BB0664, BB0689, BB0758, BB0806, BB0823, BB0832, BB0840, BB0844

The 98 plasmid genes and pseudogenes that are potential lipoproteins are as follows:

(* indicates ORFs currently classified as pseudogenes; † indicates "questionable" gene calls; # pseudogenes not listed as LPs on the TIGR WEB site for technical reasons):

BBA04, BBA05, BBA07, BBA14, BBA15, BBA16, BBA24, BBA25, BBA33, BBA34, BBA36, BBA57, BBA59, BBA60, BBA62, BBA64, BBA65, BBA66, BBA68, BBA69, BBA73, BBB08, BBB09, BBB14, BBB16, BBB19, BBB25, BBB27, BBC10, BBD001, BBD10, BBD15, BBE06, BBE08, BBE09, BBE31, BBF20*, BBF32*, vlsE#, BBG01, BBG02, BBG25, BBH01, BBH18, BBH32, BBH37, BBI14*, BBI16, BBI28, BBI29, BBI32†, BBI34, BBI36, BBI38, BBI39, BBI42, BBJ001#*(BBJ01*†), BBJ09, BBJ34, BBJ36, BBJ41, BBK01*, BBK07, BBK12, BBK19, BBK47, BBK48, BBK49, BBK50, BBK52, BBK53, BBL28, BBL39, BBL40, BBM27, BBM28, BBM38, BBN28, BBN38, BBN39, BBO28, BBO39, BBO40, BBP27, BBP28, BBP38, BBP39, BBQ03, BBQ04*, BBQ05, BBQ35, BBQ47, BBQ89, BBR28, BBR40*, BBR42, BBS30, BBS41

The following is a list of the scores of the potential lippoprotein gene products obtained from the HMM analysis (see above; Dan Haft, unpublished). These scores were taken into account but were not the sole criteria for inclusion of genes as potential lipoprotein encoding genes.

GENE/aa

HMM SCORE

PREDICTED N-TERMINAL AA SEQUENCE

     

BBR40/1-24

36.67

MNKKMKNLIICAVFVLIISCKNNT

BBS41/1-24

36.54

MNKKMKNLIICAVFVLIISCKIDA

BBR42/1-24

36.17

MNKKIKMFIICAIFMLISSCKNDV

BBK52/1-24

35.38

MKKNIYILNIFLYIPLFYSCFLTP

BBQ47/1-24

35.27

MNKKMKIFIICAVFVLISSCKIDA

BBM28/1-22

34.68

MKIINIL--FCLFLLLLNSCNSND

BBP28/1-22

34.68

MKIINIL--FCLFLLLLNSCNSND

BBQ35/1-22

33.69

MKIINIL--FCISLLLLNSCNSND

BBC10/1-26

33.66

MQKINIAKLIFILIFSLFVISCELFI

BB0664/12-35

33.62

MNINTLFYGMIIIIFALISCNHKN

BBN28/1-22

33.44

MKIINIL--FCLFLLMLNSCNSND

BBO39/1-24

33.34

MNKKMKMFIICAVFALMISCKNYA

BBB08/1-22

32.71

MKKKFNF--IFPFIIFLFSCNISV

BBJ09/1-24

32.58

MKKLIKILLLSLFLLLSISCVHDK

BBA14/1-24

32.51

MQIKNFPFLFLLNSLIIFSCSTIA

BBL39/5-28

32.12

MNKKMKMFIICAVFILIGACKIHT

BBP38/1-24

32.12

MNKKMKMFIICAVFILIGACKIHT

BB0398/1-24

31.93

MNLLVKIAKFILILFLFTSCNQKQ

BBO40/1-22

31.65

MNKKILI--IFAVFALIISCKNYA

BBM27/1-24

31.33

MRNKNIFKLFFASMLFVMACKAYV

BBL28/1-22

30.97

MKIINIL--FCLFLLMLNGCNSND

BBO28/1-22

30.97

MKIINIL--FCLFLLMLNGCNSND

BBR28/1-22

30.97

MKIINIL--FCLFLLMLNGCNSND

BBS30/1-22

30.97

MKIINIL--FCLFLLMLNGCNSND

BBK48/1-24

30.91

MNLINKLFILTILFSSVISCKLYK

BBB16/1-26

30.43

MKILIKKLKVVLFLNLILLISCVNES

BBB19/1-23

30.38

MKKNTLSAI-LMTLFLFISCNNSG

BBE09/1-24

30.32

MQKDIYISNIFLYIPLFYSCFLTP

BBM38/5-26

30.32

MNKKMFI--ICAIFALIVSCKNYA

BBP27/1-24

30.23

MRNKNIFKLFFAAMLFVMACKAYV

BBI14/1-22

29.88

MKNKIIL--CMCVFSLLNSCNFDN

BBA65/1-24

29.85

MNKIKLSILITLGITTFFSCDLNN

BBK19/1-21

29.73

MKKYIIN---LSLCLLLLSCNLFS

BBD15/1-26

29.66

MNRKFVISLLFIILTFLLILGCDLSI

BBJ36/1-24

29.59

MKRKSNICISLLVTILFVSCKFFG

BBE31/1-22

29.58

MKYHIIV--SIFIFLFLNACNPDS

BBA07/1-24

29.55

MCGRRMKNILLFVILLFFSCKEFN

BBG25/1-27

29.44

MKNLKTKINFLGIFWLLLLFLSCESIP

BBN38/1-24

29.36

MNKKMKMFIVCAVFILIGACKIHT

BBD001/1-19

29.08

MNKL-----LIFIILLVFSCNLSN

BBH01/1-19

29.08

MNKL-----LIFIILLVFSCNLSN

BBQ89/1-19

29.08

MNKL-----LIFIILLVFSCNLSN

BB0155/1-23

28.94

MKKHYKALI-LSLLFAIISCNTKT

BBA69/6-29

28.90

LNIIKINIITMILTLICISCAPFN

BBK50/1-24

28.89

MNLIIKVMLISSLFSSFISCKLYE

BBJ01/1-22

28.70

MKYHIIV--SIFVFLFLNACNPDS

BBA73/1-25

28.62

MKRNKIWKTLKLFQITLLFSCSFYS

BBL40/1-22

28.55

MNKKTII--ICAVFALILSCKNYA

BBP39/1-22

28.55

MNKKTII--ICAVFALILSCKNYA

BBA59/1-21

28.15

MVKKII---FISFSIFIVSCSAIG

BBD10/1-23

28.14

MNSKFILK-YFILAFFLVSCQTYQ

BB0844/1-23

28.03

MKKKNLSI-YMIMLISLLSCNTSD

BB0365/1-26

27.84

MYKNGFFKNYLSLFLIFLVIACTSKD

BB0840/1-25

27.76

MKNINRLILLILTTHTLLFSCALIA

BBI29/1-22

27.73

MKNNIIL--CMCVFLLLNSCTANH

BB0806/1-26

27.55

MKFVLNNLFKGCLICFFLFFSCLTTD

BBN39/1-22

27.47

MNKKTLI--ICAVFALIISCKNFA

BB0328/1-22

27.39

MKYIKIA--LMLIIFSLIACISNA

BBA05/1-22

27.28

MNKIGIA--FIISFLLFVNCKGKS

BB0385/1-21

26.57

MLKKVY---YFLIFLFIVACSSSD

BBH32/1-22

26.56

MKYNTII--SIFVCLFLTACNPDF

BBE08/1-21

26.49

MQKNVY---CFIIFVLISSCNNYA

BB0141/2-30

26.48

MNLIFNINLYLKKYFLVLFLVLVACVGDN

BBH37/1-22

26.46

MKKKMFL--YTLLTIGLMSCNLNS

BBF20/1-26

26.35

MNKKFSISLLSTILAFLLVLGCDLSS

BB0352/1-26

26.27

MNNFMRIKNLILIAILLISPSCSTNK

BB0536/1-27

26.26

MNYQRIKNYCKFTSVFLFFLFSCVSNE

BB0383/1-22

25.89

MNKILLL--ILLESIVFLSCSGKG

BB0324/1-18

25.77

MKK------LILLNLIFISCYTIN

BB0652/1-22

25.61

MKKGSKL--ILILLVTFFACLLIF

BBA57/1-27

25.60

MNGKLRKALKIAIFTTLLLVISCNANM

BBA60/1-21

25.22

MSKKVI---LILLEILILSCDLSI

BB0215/1-24

25.14

MMKKVIILIFMLSTSLLYNCKNQD

BBA33/1-21

24.90

MKRYIY---VYIISVAVISCYLND

BBA68/6-29

24.71

LNIIKINIIAMILTLICTSCAPFS

BBK12/1-20

24.64

MSKLI----LAISILLIISCKWYV

BBK47/6-29

24.44

IKIFIIPNLVFSSLFLFESCSGFL

BBK49/6-29

24.44

IKIFIIPNLVFSSLFLFESCSGFL

BBJ34/1-23

24.32

MIKGNTFI-LILVTTMFVSCKFYG

BB0628/1-21

24.21

MRKCFV---SLSLLLIFFACSSNV

BBE06/1-21

23.97

MRFLNI---IKSLEFFVMSCNDIF

BB0028/6-29

23.83

ENYFKKRLILNLLIFLLLACSSES

BBA64/6-32

23.81

LKNNKLIAIFLLHVLTVLILISCSLEV

BBB09/1-20

23.76

MKYLKN----ISLFLLILGCKSIP

BBB14/2-26

23.7

ILYQNQLKFLKLLVFFLLISCTSLN

BB0224/3-26

23.4

GRKDLFFLILFLSFSIIISCRVKG

BBK07/1-20

23.4

MSKLI----LAISILLIISCKWHV

BBQ05/1-20

23.35

MKYYI----CVCVFLLLNACNSDF

BBA66/1-23

23.24

MKIKPLIQ-LKLLGLFLFSCTIDA

BBI28/1-21

23.22

MKCHIIA---TIFVFLFLACSTDF

BBI16/1-21

23.19

MKYHII---TTIFVFLFLACRPDF

BB0689/1-20

22.97

MKKLI----IIFTLFLSQACNLST

BBA25/1-25

22.81

MKIGKLNSIVIALFFKLLVACSIGL

BB0542/12-35

22.79

DLSKFFMYKLLFIIVFVLSCSSIF

BBB27/1-20

22.66

MKKFL----ISVYFLLFYGCSTIS

BB0823/1-20

22.36

MNTKT----LYLISLILLACNKNN

BBI34/1-22

22.36

MKHYIIV--HIFVFLFLNACYPVA

BBB25/1-21

22.29

MKYCFS---LILMVFICSSCKILN

BB0298/1-20

21.70

MKILW----LIILVNLFLSCGNES

BBA62/1-22

21.51

MTKLMYA--IFLSAILFVACETTR

BBA15/1-21

21.43

MKKYLLG---IGLILALIACKQNV

BB0144/1-20

21.38

MYKLF----LFFIIFMFLSCDEKK

BB0384/1-21

21.35

MFKRFI---FITLSLLVFACFKSN

BB0758/10-36

21.32

LKLLRQSINLKSLFPLSVLFFSCNVVD

BB0038/18-41

21.17

ELIPFYKFLFLFFFFTLLACSKVS

BB0620/1-20

21.03

MKRNFY----LIVLFIANNCFSID

BB0321/5-29

20.81

MVYSLKIQIEEEINIFVFISCLKLL

BB0382/1-19

20.76

MRIV-----IFIFGILLTSCFSRN

BBA36/1-21

20.70

MMQRIS---ILLMLLAVFSCKQFG

BBJ41/1-29

20.45

MKNLKLNIIKLNVITAILTSICISCAPFG

BB0464/1-19

20.42

MKILR-----LCLLFLFFACTFDY

BB0475/7-31

20.35

FKLKLLPILVISGILIVFMSCMKTS

BBG02/1-27

20.16

MEINLQSKLNNKNNNKLIFFISCSLVL

BB0553/1-21

20.07

MNKTKNR---SLTYFIILSCISLF

BBI39/1-29

19.84

MKNFKLNIIKLNVITAILTSICISCAPFG

BB0213/1-22

19.76

MQSGLKI--KLILFFCCFACSCDI

BBI36/1-29

19.62

MKNFKLNTIKLNVITAILTLICISCAPFG

BBI38/1-29

19.62

MKNFKLNTIKLNVITAILTLICISCAPFG

BBG01/1-22

19.51

MRKSLFL--YTLLMGGLMSCNLDS

BB0227/1-27

19.50

MLNPRTIKTTFMLISTLMIFNGCTKKL

BBA24/5-29

19.22

NNKTFNNLLKLTILVNLLISCGLTG

BB0071/1-24

19.19

MVRFLGFLYLITTIPLIKSCDAAQ

BBA34/1-23

19.11

MIIKKRGL-LILGIATVISCSAMS

BBA04/1-19

19.06

MKRV-----IVSFVVLILGCNLDD

BBK01/1-22

18.94

MRKSLFL--YALLMGGLMSCNLDS

BBH18/6-29

17.94

KRVGNKIFYISVVLILIVGCDWGT

BBI32/3-29

17.61

LNFRVIFLISHTQYMYSPILKSCEFIN

BB0832/1-22

17.38

MLKTLTK--IITISCLIVGCASLP

BB0458/16-39

17.09

KRSKMRLILMLLLXFLCFSTLLSQ

BBQ03/1-22

16.09

MRILVGV--FIIAALALLGCYLPD

BBK53/1-22

14.77

MRILVGV--CIIAALALLGCYLPD

BBI42/1-21

14.70

MRILVG---VCIIALALLGCYLPD

BBA16/1-20

13.07

MRLLIG----FALALALIGCAQKG

 

———————————————————————————————————————————————————————————————

Below this line are listed the HMM scores of "near cutoff" genes that did not

make it into the list of potential lipoprotein genes. Some of them could in fact

be lipoproteins. See description of lipoprotein prediction criteria above.

———————————————————————————————————————————————————————————————

BBK45/26-48

24.00

MNLMIKVLI..........FSLFLSFISCKLYE

BB0460/1-21

18.27

MKTRII............IFLSILSILSCSKSV

BBK32/1-24

18.13

MKKVKSKYL.........ALGLLFGFISCDLFI

BBQ46/1-17

17.71

MIKNVIYILp kt...kkFSQIVWLLIACKIWI

BB0323/1-25

17.40

MNIKNKLISl........LIVVAISFIACKTPP

BB0171/1-25

17.30

MGRNFLAILy........FCFLFLGFLSCSNVK

BB0329/1-23

16.13

MKLQRSLF..........LIIFFLTFLCCNNKE

BB0259/16-48

15.96

MKKINMFNRsscvlqnflFLFLFLSLVSCFAKK

BB0330/7-32

15.60

KKIGKKIKIv.......tLLMLAVSLIACNNNS

BBB24/18-41

15.59

LKIFKKYLL.........LFYLLFLTLSCSTIY

BBA32/1-20

15.14

MRITG.............LLFFLICLLSCSSFN

BB0639/1-20

14.81

MKKIF.............ILIVILTTFACTNKD

BBA03/1-21

13.90

MKKTII............VFIILAFMLNCKNKS

BBH06/1-23

12.76

MKKSFLSI..........YMLISISLLSCDVSR

BBS26/1-22

12.19

MKMLKRL...........HCLLIVLLLCCTTIA

BBP26/1-22

12.19

MKMLKRL...........HCLLIVLLLCCTTIA

BBO26/1-22

12.19

MKMLKRL...........HCLLIVLLLCCTTIA

BBL26/1-22

12.19

MKMLKRL...........HCLLIVLLLCCTTIA

BBR26/1-22

10.70

MKMLKRL...........HCLLIALLLCCTTIA

BBQ33/1-22

10.70

MKMLKRL...........HCLLIALLLCCTTIA

BBN26/1-22

10.70

MKMLKRL...........HCLLIALLLCCTTIA

BBM26/1-22

10.70

MKMLKRL...........HCLLIALLLCCTTIA

BB0560/1-23

10.68

MILIFYFK..........QIALFIIFRLCYIIK

BB0729/7-30

10.60

LYTLINIII.........MLILISIVYLCKRKN

BB0305/1-28

10.31

MNSISKIEFkv.....ycILVLILTVILCFNIY

BB0332/1-27

10.22

MLKFTLKKIlg......iIPTLLVIIFLCFFVM

BB0204/1-26

16.01

MKFIINLLLs.......tIKIITFTVIVCLTIL

BB0752/1-27

12.73

MKNKENEVLnl......tLNLTIIFLIFCNISI

BBF01/1-23

12.48

MKLLKIFM..........CAFLLLNLVNCKFDS

BB0315/1-22

11.64

MRKITIM...........ILFYGLIINVCPTTT

BB0353/1-21

11.45

MKREIYA............FLSNFIIFMCFFLG

BBJ47/5-28

11.16

SNCIKYIIL.........TMLIGLLIFCCATFV

Of this latter group (near, but below cutoff) the following are members of families with putative lipoprotein members:

BB0328, BB0329, BB0330, BBF01, BBK45, BBL26, BBM26, BBN26, BBO26, BBO39, BBP26, BBQ33, BBR26 & BBS26.

BBK32 has been shown to be an outer surface protein.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

B31 Plasmid-encoded Putative B. burgdorferi Lipoproteins

Footnotes to following table

1. Previously described gene or gene family name.

2. BB gene names are the open reading frames identified in the nucleotide sequence of the genome of isolate B31.

* Putative pseudogenes may not always be included in this table, but when they are they are indicated by an asterisk.

# Have a "near-cutoff" lipidation signal, but was not included in our list of potential lipoproteins.

† member of paralogous family but has NO lipidation signal.

3. vlsE, the rightmost gene on lp28-1 is not in the reported genomic sequence, apparently due to an unclonable sequence nearby (Zhang et al., 1997).

4. Reference to the complete B31 genome sequence (Casjens et al., 1999; Fraser et al., 1997) are not given in the table. Sequences from Borrelia isolates other than B31 are indicated in French brackets {....}.

 

Common

gene name1

BB gene name2

Paralog

Family

Location

Comments

References4

7.5 kd

lipoprotein

BBA62

lp54

transcription start (Indest et al., 1997)

also called 6.6 kbp lipoprotein

possible surface exposure (Katona et al., 1992; Lahdenne et al., 1997) {297}

12 Kda

lipoprotein

BBA59

lp54

 

short gene but it is expressed; (McGrath et al., 1997)

mlpA

mlpC

mlpD

mlpF

mlpG

mlpH

mlpI

mlpJ

BBP28

BBS30

BBR28

BBM28

BBO28

BBL28

BBN28

BBQ35

113

cp32-1

cp32-3

cp32-4

cp32-6

cp32-7

cp32-8

cp32-9

lp56

also called "LP" and "nlpH"

(Gilmore and Mbow, 1998; Porcella et al., 1996) {297}; lipidated in E. coli (Porcella et al., 1996); lipidated in B. afzelii DK1 (Theisen, 1996) {297, DK1}

dbpA

dbpB

BBA25

BBA26

74

lp54

binds decorin

surface exposed on outer membrane; binds decorin (Guo et al., 1998; Hagman et al., 1998; Hanson et al., 1998){N40}

erpA & B

erpG

erpH & Y

erpK

erpL & M

erpN & O

erpP & Q

erpX

BBP38, P39

BBS41

BBR40*, R42

BBM38

BBO39#, O40

BBL39, L40

BBN38, N39

BBQ47

162,

163

and

164

cp32-1

cp32-3

cp32-4

cp32-6

cp32-7

cp32-8

cp32-9

lp56

these genes fall into at least 3 paralogous classes [162,163,164] (Akins et al., 1999; Casjens et al., 1999; Stevenson et al., 1998b)

also called "p21" and "ospE & ospF"

surface exposed (Lam et al., 1994){N40}; lipidated in E. coli (Akins et al., 1995b; Wallich et al., 1995); erp-like genes have been sequenced from several strains (Akins et al., 1999; Lam et al., 1994; Marconi et al., 1996b; Stevenson et al., 1997; Stevenson et al., 1996; Suk et al., 1995){297, ZS7, N40}

fibronectin

binding

protein

BBK32#

lp36

fibronectin binding;

called P35 in strain N40;

upregulated in stationary phase in N40 (Fikrig et al., 1997; Indest et al., 1997)

one off from "stringent lipidation consensus" (G in position 1);

surface localized (Probert and Johnson, 1998)

oppAIV

oppAV

BBA34

BBB16

37

cp26

lp54

ABC oligopeptide transporter

B16 not surface exposed; not essential in culture (Bono et al., 1998)

ospA

ospB

BBA15

BBA16

53

lp54

lp54

transcription start site (Jonsson et al., 1992);

OspA atomic resolution structure (Li et al., 1997);

in vitro mutagenesis (McGrath et al., 1995)

outer surface protein (Barbour et al., 1983); lipidated in Bb; (Bergstrom et al., 1989; Brandt et al., 1990); ; sequenced in numerous other strains (Bunikis et al., 1996; Caporale and Kocher, 1994; Jonsson et al., 1992; Marconi et al., 1993a; Rosa et al., 1992; Wallich et al., 1992; Wallich et al., 1989; Wang et al., 1997a; Wang et al., 1997b; Wang et al., 1997c; Will et al., 1995; Wilske et al., 1996a; Wilske et al., 1996b; Wilske et al., 1992; Zumstein et al., 1992)

ospC

BBB19

cp26

temperature regulated (Schwan et al., 1995; Stevenson et al., 1995)

transcription start site (Marconi et al., 1993b)

surface localized (Wilske et al., 1993); sequenced in {many strains} (e.g., Fuchs et al., 1992; Jauris-Heipke et al., 1993; Jauris-Heipke et al., 1995; Marconi et al., 1993c; Margolis et al., 1994a; Margolis et al., 1994b; Masuzawa et al., 1997; Stevenson and Barthold, 1994; Stevenson et al., 1994; Tilly et al., 1997; Wang et al., 1999; Wilske et al., 1996a; Wilske et al., 1996b)

ospD

BBJ09

lp38

short direct repeat upstream of gene

outer surface protein (Marconi et al., 1994; Norris et al., 1992){many strains}

P27

BBA60

lp54

 

(Reindl et al., 1993) {B29}

P35 antigen

BBA64

BBA65

BBA66

BBA68

BBA69

BBA70*

BBA71*

BBA73

BBI36

BBI38

BBI39

BBJ41

54

lp54

lp54

lp54

lp54

lp54

lp54

lp54

lp28-4

lp28-4

lp28-4

lp38

BBA64 cell density-dependent expression and transcription start (Indest et al., 1997)

 

 

(Gilmore et al., 1997)

 

rev

BBP27

BBC10

BBM27

63

cp32-1

cp9

cp32-4

 

(Gilmore and Mbow, 1998; Porcella et al., 1996)

S1

BBA05

lp54

 

(Feng et al., 1995){N40}

S2

BBA04

BBE09

BBK04*

BBK52

BBQ04*

44

lp54

lp25

lp36

lp36

lp56

 

(Feng et al., 1995){N40}

vlsE7

vlsE

BBF32*

BBJ52*†

170

lp28-1

lp28-1

lp36

F32 is 15 tandem unexpressed versions of part of the vlsE gene

cassette surface antigenicity variation mechanism (Zhang et al., 1997; Zhang and Norris, 1998a; Zhang and Norris, 1998b)

Below are plasmid open reading frames with LP "stringent" consensus that have not been previously given common names

 

BBA07

lp54

   
 

BBA14

BBG25

143

lp54

lp28-2

other cp32 members of family do not have lipidation consensus

 
 

BBA33

lp54

   
 

BBA36

lp54

   
 

BBA57

lp54

   
 

BBB08

cp26

   
 

BBB09

cp26

   
 

BBB14

cp26

   
 

BBB25

cp26

   
 

BBB27

cp26

   
 

BBE31

BBH32

BBI14*

BBI16

BBI28

BBI29

BBI34

BBJ001*

BBK15#

BBQ05

60

lp25

lp28-2

lp28-4

lp28-4

lp28-4

lp28-4

lp28-4

lp38

lp36

lp56

   
 

BBD001

BBH01

BBQ89

166

lp17

lp28-3

lp56

²300 bp

²300 bp

²300 bp

 
 

BBD10

lp17

   
 

BBD15

BBF20*

85

lp17

lp28-1

   
 

BBE06

lp25

²300 bp

 
 

BBE08

lp25

²300 bp

 
 

BBG01

BBH37

BBK01*

BBJ08†

BB0844

12

lp28-2

lp28-3

lp36

lp38

chrm

   
 

BBG02

102

lp28-2

   
 

BBI32

lp28-4

questionable gene

 
 

BBJ34

BBJ36

92

lp38

lp38

   
 

BBK07

BBK12

59

lp36

lp36

   
 

BBK19

lp36

   
 

BBK45#

BBK46*†

BBK48

BBK50

75

lp36

lp36

lp36

lp36

   
 

BBK47

BBK49

BBH18

69

lp36

lp36

lp28-3

   
 

BBK53

BBI42

BBQ03

52

lp36

lp28-4

lp56

   

 

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

N-terminal Sequences of Potential B31 Plasmid-Encoded Lipoproteins

The table below shows the first 60 amino acids of all the B31 plasmid genes that contain a putative N-terminal lipidation consensus. It also includes those B31 plasmid genes that are not included in our "potential lipoprotein list" but which have an HMM score (above) of >11 as well as some added by manual inspection. If fewer than 60 amino acids are shown in the table below, the entire predicted amino acid sequence of that gene’s translation is shown.

 

Notes on the following table:

* Asterisks mark the putative proteins which contain the following "stringent" consensus

[L,A,V,I,F,T,M] — [L,A,V,I,F,S] — X — [G,A,S,N] — C

             -4                        -3              -2          -1            +1     position relative to C

Non-stringent consensus proteins that we consider to be potential lipoproteins (see rules above)

# Indicates a below cut-off gene that might be worthwhile considering when dealing with the potential lipoprotein genes in this genome, since it is particularly close the above "stringent" consensus.

Underlined amino acids are non-stringent consensus amino acids within the consensus region.

Alternate, better(?) translation starts are double underlined in the following table.

C’s are red, positively charged K, R and H’s are blue and negatively charged D and E’s are green.

cp9

BBC10 *

MQKINIAKLIFILIFSLFVISCELFIIKRRATITETTTIEKKRINWLIMSVSGLNDEADE

 

cp26

BBB08 *

MKKKFNFIFPFIIFLFSCNISVSSIFIRPLDEVIKSEIALYESLGDGKFKTGIHAKNYFD

BBB09 *

MKYLKNISLFLLILGCKSIPNGNFNLHDTNHKLGKLKFQEDSIISRNYDNKISIVGVYNP

BBB14 *

MILYQNQLKFLKLLVFFLLISCTSLNVEHDQFGKTFRIYQSLNKNAELKGIFNYKTGITK

BBB16 * (oppAIV)

MKILIKKLKVVLFLNLILLISCVNESNRNKLVFKLNIGSEPATLDAQLINDTVGSGIVSQ

BBB19 * (ospC) (surface exposed, Wilske et al., 1993)

MKKNTLSAILMTLFLFISCNNSGKDGNTSANSADESVKGPNLTEISKKITDSNAVLLAVK

BBB24 #

MHLKTKFYKKTYILWTFLKIFKKYLLLFYLLFLTLSCSTIYFDGIPELKKDSKYIKLIQE

BBB25 †

MKYCFSLILMVFICSSCKILNIAEDLEKNFEKIERADYFLYFYPDSQIYIKKDKSSNKFS

BBB27 *

MKKFLISVYFLLFYGCSTISLVKIPEKDKINLTVLSSLMNYPDLKISNFKIKDYEHLHYS

 

cp32-1

BBP26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIVLLLCCTTIANLPEEPKPPIIPTLKSLAKYETQLSEYVMYLVTFLAKT

BBP27 * (rev-1)

MRNKNIFKLFFAAMLFVMACKAYVEEKKEIDSLMEDVLALVNDSSGGKFKDYKDKINELK

BBP28 * (mlpA) (lipidated in E. coli, Porcella et al., 1996), (lipidated in Bb, Theisen, 1996)

MKIINILFCLFLLLLNSCNSNDNDTLKNNAQQTKSRGKRDLTQKEATPEKPKSKEELLRE

BBP38 * (erpA) (Erp homologs surface exposed in other Bb strains and lipidated in E. coli, Akins et al., 1995b; Lam et al., 1994; Wallich et al., 1995)

MEKFMNKKMKMFIICAVFILIGACKIHTSYDEQSNGEVKVKKIEFSEFTVKIKNKNNSNN

BBP39 * (erpB2)

MNKKTIIICAVFALILSCKNYAIKDLEQNAKGKIKGFIDKALDPAKDKITSSSSKVDELA

 

cp32-3

BBS26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIVLLLCCTTIANLPEEPKPPIIQTLKSLAKYETQLSEYVMYLVTFLAKT

BBS30 * (mlpC)

MKIINILFCLFLLMLNGCNSNDNDTLKNNAQQTKRRGKRDLTQKETTQEKPKSKEELLRE

BBS41 * (erpG)

MNKKMKNLIICAVFVLIISCKIDASSEDLKQNVKEKVEGFLDKELMQGDDPNNSLFNPPP

 

cp32-4

BBR26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIALLLCCTTIANLPEEPKPPIIPTLKSLAKYETQLSEHVMYLVTFLAKT

BBR28 * (mlpD)

MKIINILFCLFLLMLNGCNSNDTNNSQTKSRQKRDLTQKEATQEKPKSKEELLREKLNDN

BBR31 #

MKDFLITTKNPTCHNKHQHKLIYLTSTVDFLNKKDKKYTQQNILYYYNKNLKRNGLAPTT

BBR39 #

MYLSCVPPLKIASSVYPTCSTAQILHAMRTHKI

BBR40 * (erpH) truncated pseudogene

MNKKMKNLIICAVFVLIISCKNNTLSLYDEQSIG

BBR42 * (erpY)

MNKKIKMFIICAIFMLISSCKNDVTSKDLEGAVKDLESSEQNVKKTEQEIKKQVEGFLEI

 

cp32-6

BBM26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIALLLCCTTIANLPEEPKPPIIQTLKSLAKYETQLSEYVMYLVTFLAKT

BBM27 * (rev-6)

MRNKNIFKLFFASMLFVMACKAYVEEKKEIDSLMEDVLALVNDSSGGKFKDYKDKINELK

BBM28 * (mlpF)

MKIINILFCLFLLLLNSCNSNDNDTLKNNAQQTKSRGKRDLTQKEATPEKPKSKEELLRE

BBM38 * (erpK)

MEQLMNKKMFIICAIFALIVSCKNYASGEDVKKSLEQDLKGKVKGFLDTKKEEFFGDFKK

 

cp32-7

BBO26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIVLLLCCTTIANLPEEPKPPIIPTLKSLAKYETQLSEYVMYLVTFLAKT

BBO28 * (mlpG)

MKIINILFCLFLLMLNGCNSNDTNTKQTKSRQKRDLTQKEATQEKPKSKSKEDLLREKLS

BBO39 † (erpL)

MNKKMKMFIICAVFALMISCKNYASGENLKNSEQNLESSEQNVKKTEQEIKKQVEGFLEI

BBO40 * (erpM)

MNKKILIIFAVFALIISCKNYATGKDIKQNAKGKIKGFLDKVLDPAKDKITSSSSKVDEL

 

cp32-8

BBL26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIVLLLCCTTIANLPEEPKPPIIPTLKSLAKYETQLSEYVMYLVTFLAKT

BBL28 * (mlpH)

MKIINILFCLFLLMLNGCNSNDNDTLKNNAQQTKSRRKRDLTQKEVTQEKPKSKEELLRE

BBL39 * (erpN)

MEKFMNKKMKMFIICAVFILIGACKIHTSYDEQSNGEVKVKKIEFSEFTVKIKNKNNSNN

BBL40 * (erpO)

MNKKTIIICAVFALILSCKNYAIKDLEQNAKGKIKGFIDKALDPAKDKITSSSSKVDELA

cp32-9

BBN26 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIALLLCCTTIANLPEEPNPPIIPTLKSLAKYETQLSEYVIYLVTFLAKT

BBN28 * (mlpI)

MKIINILFCLFLLMLNSCNSNDTNTSQTKSRQKRDLTQKEATQEKPKSKEDLLREKLSED

BBN38 * (erpP)

MEKFMNKKMKMFIVCAVFILIGACKIHTSYDEQSSGEINHTLYDEQSNGELKLKKIEFSK

BBN39 * (erpQ)

MNKKTLIICAVFALIISCKNFATGKDIKQNSEGKIKGFVNKILDPVKDKIASSGTKVDEV

 

lp5

BBT04 # (not a good looking signal sequence?)

MNSKTTNKTNRNCYNKVQHKLIVLISTICYLNKTHKKYTQKTILYYFNENLRKNGQTIST

 

lp17

BBD001 *

MNKLLIFIILLVFSCNLSNSDQNNPLNMSNKEKISEYQINESSNKYSIFKRNSSVKRYTFB

BBD10 *

MNSKFILKYFILAFFLVSCQTYQIAYDRFSQVLDSQYDIGVNYSRDGIFKSVISIKYDKL

BBD15 †

MKNKLSVYTTIMLNFKFLKCVYLCFMVFVRLILIIKFRGKKFMNRKFVISLLFIILTFLLB

 

lp21

none

 

lp25

BBE04 #

MKNPKSNKSKLNIITAILASIYISCAPIGKVNTKPNSDTNPENNQN

BBE06 * (E in signal seq)

MRFLNIIKSLEFFVMSCNDIFTKKGTLSNLKLSAVERCILDDMEIVIMN

BBE08 *

MQKNVYCFIIFVLISSCNNYANDKGLKRVKEYLEKEAKVFLCLSNFVL

BBE09 * (D in signal seq)

MQKDIYISNIFLYIPLFYSCFLTPPKSLKINSIKTEVFDFKIIEEGDITKYNKNPIKESN

BBE16 #

MGKILFFGLLLICIFLGFFFYKQKENNVIYNKIVEKFDDNVFVDETYTYLFKDSNLKELV

BBE25 #

MQTIKIQDIPTLFNKVGIIFCNINFESIIKINIY

BBE26 # (but no N-terminal + charge)

MYFYCLHLIVFIVICFGDFGICALGGVVFLGFIFLLYSVQCN

BBE28 #

MIFKEIKMMPQKLLIIKNCYSCQKLLKKNSKICCVVYRTRNKYPKTLITS

BBE31 *

MKYHIIVSIFIFLFLNACNPDSNTNQNNSKKGLLKIEKIPNKQIKNKLLDDLKNLIETAN

 

lp28-1

BBF01 #

MKLLKIFMCAFLLLNLVNCKFDSLNLSTKSVDDKNNSIAKLLQHLSKSEDQANKTSTSED

BBF11 # (but signal sequence too short?)

MVFYNRNIIFFSLCLVIPLIILIKILKLSIDHISD

BBF20 * (pseudogene)

MNKKFSISLLSTILAFLLVLGCDLSSNNAENKMDDIFNLEKKYMDNSNYKCLSKNEAIVK

BBF32 * (pseudogene cassette)

MFKTIIKQKNMKKISSAILLTTFFVFINCKSQVADKASVTGIAKGIKEIVEAAGGSEKLK

vlsE * (surface exposed, Zhang et al., 1997)

MNMKKISSASLLTTFFVFINCKSQVADKDDPTNKFYQSVIQLGNGFLDVFTSFGGLVAEA

 

lp28-2

BBG01 †

MRKSLFLYTLLMGGLMSCNLDSKLSSNKEQKNNNNVKEVSNSVQEDGLNDLYSNQEKQKS

BBG02 *

MEINLQSKLNNKNNNKLIFFISCSLVLVSTRPFDNRFTYYSKNRGVIIRPGYKIMKHILE

BBG06 #

MKKVFTFLKKLCIIYNINPIRSSTMINNSKKPNCHNKLQQKLIVLLSTLAYVNSKYNKYT

BBG07 #

MQNMAKSIQLVKPIVRCSNKKDLFIKIEKDNDKTIYHTKIMMDIYKFGLNKKKNKYRISL

BBG25 *

MKNLKTKINFLGIFWLLLLFLSCESIPSLPQKPTLTNKEDIENLMLDEAELFRYSTALNV

BBG32 # (bad signal sequence)

MKVASLIRSTCENENLILRSGFRDLDAIIQGFRESNFVVIGARPSVGKTAFALNIAHNIC

 

lp28-3

BBH01 *

MNKLLIFIILLVFSCNLSNSDQNNPLNMSNKEKISEYQINESSNKYSIFKRNSSVKRYTF

BBH06 #

MKKSFLSIYMLISISLLSCDVSRLNQRNINELKIFVEKAKYYSIKLDAIYNECTGAYNDI

BBH08 #

MVGVFIKAKTLEIKSFTSLHKRSSGCHIGSILLTYIAKEDDLAINM

BBH18 *

MKMKEKRVGNKIFYISVVLILIVGCDWGTIKDKSTEISKLLRTDKDKTKNQDRIELGEDN

BBH32 *

MKYNTIISIFVCLFLTACNPDFNTNKKRTLSKGIISNQDADSDKIIKNKLLDDLINLIEK

BBH37 †

MIKGKESIFMKKKMFLYTLLTIGLMSCNLNSKLSGNKEEQKNNNDIKEALNGVQENAINN

lp28-4

BBI14 * (pseudogene)

MKNKIILCMCVFSLLNSCNFDNDAEAATKKHADKIKN

BBI16 *

MKYHIITTIFVFLFLACRPDFNIDQKDIKYPPTEKSRPKTESSKQKESKPKTEEELKKKQ

BBI25 #

MLLILFFTLTMNMKKFFILNKEIGIGNCNLLFYLYFLKNINKI

BBI28 *MKCHIIATIFVFLFLACSTDFNTDQKGIKYPPTEKSKPKTEDSKQKELKPKTEKELKKKQ

BBI29 *

MKNNIILCMCVFLLLNSCTANHEAEAKIKKHVDKTKNEYINEIKNLIATTKEIIEKRKLL

BBI32 *

MSLNFRVIFLISHTQYMYSPILKSCEFINNLKTVSSRLIKNILFIWRGINENFIFGIEVI

BBI34 *

MKHYIIVHIFVFLFLNACYPVASNKIELKPKTETSLNQEEVPNQEANYKEEKEAKEEGIN

BBI36 †

MKNFKLNTIKLNVITAILTLICISCAPFGNVNPNKLKNPITSKNLKKTKRSNHSRNLKKT

BBI38 †

MKNFKLNTIKLNVITAILTLICISCAPFGNVNPNKLKNPITSKNLKKPKRSNHSRNLKKT

BBI39 †

MKNFKLNIIKLNVITAILTSICISCAPFGNVNPNEPKNPTTSKSLKKTKRSNNSRNLKNT

BBI42 *

MRILVGVCIIALALLGCYLPDNQEQAVQTFFENSESSDMGSDEIVTEGIFSSLKLYASEH

 

lp36

BBK01 † (pseudogene)

MRKSLFLYALLMGGLMSCNLDSKLSSNKEQKNNNNVKEVSDSVQEDGLNDLYNNQEKQKS

BBK04 # (pseudogene; consensus at 2nd C - poor signal sequence)

MTEFMVVSSIEERLKTKSPLDIKIIDNSCGSGNFLISCLDYLTEKVWYELDKFEDVKKN

BBK07 *

MSKLILAISILLIISCKWHVDNPIDEATAESKSALTSVDQVLDEISEATGLSSEKITKLT

BBK12 *

MSKLILAISILLIISCKWYVDNTIDEATVESKSALTSIDQVLDEISEATGLSSEKITKLT

BBK19 *

MKKYIINLSLCLLLLSCNLFSKDSRSRQKYNFKVPAKSVSNPINKENIDTEKGTNTTLCI

BBK30 #

MVALFKFAIFQLSNTKTCTSSFKAKFMIQGNDN

BBK32 # (Fibronectin-binding protein, surface localized, Probert and Johnson, 1998)

MKKVKSKYLALGLLFGFISCDLFIRYEMKEESPGLFDKGNSILETSEESIKKPMNKKGKG

BBK38 # (but no N-terminal positive charge)

MLLPPFVMLSSTLTLVSLATSFTSCAIFSSPSLPNALLSYLDYLPTPFSLTTARSKAGSA

BBK45 #

MITNNKCNIMILYYNNTLFLHKVSTMNLMIKVLIFSLFLSFISCKLYEAVDKSLIKDNKR

BBK47 *

MRLCLIKIFIIPNLVFSSLFLFESCSGFLSKKSIEQFALALKDHQENKNTTNTSVDKNSK

BBK48 †

MNLINKLFILTILFSSVISCKLYKKITYNADQVIDKLKSNNGSFNTLKSNDDSKRSGRKP

BBK49 †

MRLCLIKIFIIPNLVFSSLFLFESCSGFLSKKSIEQFALALKDHQENKNTTNTSVDKNSK

BBK50 †

MNLIIKVMLISSLFSSFISCKLYEKLTNKSQQALAKAFVYDKDIADNKSTNSTSKLDNSS

BBK51 #

MNLRINKFILILNSILELCIAESISKIFILEK

BBK52 *

MKKNIYILNIFLYIPLFYSCFLTPPKSSKINSIKTEVLDFKIIEEGNIIKYDKKPIEERN

BBK53 *

MRILVGVCIIAALALLGCYLPDNQEQAVQTFFENSESSDMGSDEIVTEGIFSSLKLYASE

 

lp38

BBJ01 (BBJ001) * (N terminal end of pseudogene BBJ001)

MKYHIIVSIFVFLFLNACNPDSNTNQNNSKKELKTGRIPNKQIKNALLDDLKNLIETASA

BBJ04 # (bad signal sequence)

MLLLYKKTFSMIGFCFRALSENEKHFLMLLFSAGTIF

BBJ06 #

MIISSKSFIIFLLSYLENYCVLFPFLIYLKNYYMQSLSQAGHYKELNY

BBJ08 # (no N-terminal positive charge)

MFLYTLLTIGLISCNLDSKLPNKEQKNNNDIKETLGSSVQENALNNLYGNQEEKKDFKNF

BBJ09 * (ospD) (lipidated and surface exposed, Norris et al., 1992)

MKKLIKILLLSLFLLLSISCVHDKQELSSKSNLNNQKGYLDNEGANSNYESKKQSILSEL

BBJ23 #

MNGVIMREISCCFLLLTFSVVCVYSFDVSSRKFYGILEGYYSGKIEELSKKNDEDVYIYR

BBJ25 #

MKVFMKKFLFLILPCFGVFANELNDELGDFVIRGVDFEFRLDYLSVPNNFENNFDFILNI

BBJ28 #

MMLKIFVIVFNFCVLNLLNAGDGKSLIKEFENLYYPQLKNGIYAFKMNFKINVKNNLEES

BBJ34 *

MIKGNTFILILVTTMFVSCKFYGSDDTNKKNTSLNGDTREIDNIGSVILEQDGNKKGDTT

BBJ36 *

MKRKSNICISLLVTILFVSCKFFGNKSASKEKEETSFSDTASKISKSGTAASSDKQEKNT

BBJ41 †

MKNLKLNIIKLNVITAILTSICISCAPFGNVNPNEPKNPTTSKSLKKTKRSNNSRNLKNT

BBJ46 #

MDKFLTSNHPPIIIFTIGALCATVLICLIIIFIIHGIINPILIKKFKSINNSLQKITKEF

BBJ47 #

MRNISNCIKYIILTMLIGLLIFCCATFVWLIGIFYSNNFKEERNYSISPIDSVIMRKCYF

 

lp54

BBA03 #

MKKTIIVFIILAFMLNCKNKSNDAEPNNDLDEKSQAKSNLVDEDRIEFSKATPLEKLVSR

BBA04 * (S2)

MKRVIVSFVVLILGCNLDDNSKMERKGSNKLIRESGSDRRGQENRALGAMNFGLFSGDSG

BBA05 * (S1) (not labeled with palmitate, Feng et al., 1995)

MNKIGIAFIISFLLFVNCKGKSLEEDLKSTTSNNKQNLISNEKKSLNSKNNRLKDSRLSN

BBA07 *

MCGRRMKNILLFVILLFFSCKEFNYSDLRRRPSKVLNASNGASNKELKISFVDSLNDDQK

BBA14 *

MQIKNFPFLFLLNSLIIFSCSTIASLPEEPSSPQESTLKALSLYEAHLSSYIMYLQTFLV

BBA15 * (ospA) (lipidated and surface exposed, Barbour et al., 1983; Brandt et al., 1990)

MKKYLLGIGLILALIACKQNVSSLDEKNSVSVDLPGEMKVLVSKEKNKDGKYDLIATVDK

BBA16 * (ospB) (lipidated and surface exposed, Barbour et al., 1983; Brandt et al., 1990)

MRLLIGFALALALIGCAQKGAESIGSQKENDLNLEDSSKKSHQNAKQDLPAVTEDSVSLF

BBA24 * (dbpB) (surface exposed, protein lipidated in E. coli, Hagman et al., 1998; Hanson et al., 1998)

MIKCNNKTFNNLLKLTILVNLLISCGLTGATKIRLERSAKDITDEIDAIKKDAALKGVNF

BBA25 * (dbpA)

MKIGKLNSIVIALFFKLLVACSIGLVERTNAALESSSKDLKNKILKIKKEATGKGVLFEA

BBA26 # (but no N-terminal positive charge)

MNLFYANIAVTLFGITSLLCNLACDFKTCFQFAKIFQIGYFAN

BBA32 #

MRITGLLFFLICLLSCSSFNKSSNKSLLAKNKQKASDYNREYYQKNREKLKLRARERYRR

BBA33 *

MKRYIYVYIISVAVISCYLNDFSGMKENNCNKYDLSFFELSLAERENAILKIQRKFKSLT

BBA34 * (oppA IV) (not surface exposed/probably periplasmic, Bono et al., 1998)

MIIKKRGLLILGIATVISCSAMSKPKDDIVFGVGIGNEPTSLDPQFCSDRLGNLIINELF

BBA36 *

MMQRISILLMLLAVFSCKQFGDVKSLTEIDSGNGIPLVVSDVVKDLIPKEISLTPEEAEK

BBA54 #

MVAFVRKVCTIFILFFCLSFNLHSYAVENGVVIKVKIFNFKLNQQRSFEELERDLRLFIQ

BBA57 *

MNGKLRKALKIAIFTTLLLVISCNANMDTNDKNKALNEYKLKNISEVIKNSLQLESDPKL

BBA59 * (12 kd)

MVKKIIFISFSIFIVSCSAIGRGILIDSILNNVHKELEQEKKDEKKKNPQSKASIEENAD

BBA60 * (P27) (lipidated and surface exposed in strain B29, Reindl et al., 1993)

MSKKVILILLEILILSCDLSINKEQKTKEKTSEKQESEKQNIEKQEPEKQKQNAAKIIPT

BBA62 * (7.5 kd) (lipidated, in outer membrane, possible surface exposure, Katona et al., 1992; Lahdenne et al., 1997)

MTKLMYAIFLSAILFVACETTRISDEMENTSDEDSKVTAPMTDKDMMKSMPDKNTKSMKQ

BBA64 *

MKDNILKNNKLIAIFLLHVLTVLILISCSLEVKDSNESKKHKKEKRKGKVENLLVAINNL

BBA65 *

MNKIKLSILITLGITTFFSCDLNNKDNKDKVASFTETKYNELSPQKGTKTQDQRSTKNLK

BBA66 *

MKIKPLIQLKLLGLFLFSCTIDANLNEDYKNKVKGILNKAADDQETTSADTNSNAAKNIP

BBA68 †

MKKAKLNIIKINIIAMILTLICTSCAPFSKIDPKANANTKPKKITNPGENTQNFEDKSGD

BBA69 †

MKKAKLNIIKINIITMILTLICISCAPFNKINPKANENTKLKKNTRLKKPANPGENIQNF

BBA72 #

MHKESVLTKNKLNIIATILTLIGTSCAVNPIGPKVKSRTDIKESNQKSGNPESLNQKYQE

BBA73 *

MKRNKIWKTLKLFQITLLFSCSFYSKSNNTEAISELQSSPIKLGKIKVLQKTEKIVSTQN

lp56

BBQ03 *

MRILVGVFIIAALALLGCYLPDNQEQAVQTFFENSESIDMGSDEIVTEGIFSSLKLYASE

BBQ04 * (pseudogene)

MKKNICILNIFLYIPLFYSCFLTTPKSSKINSIKTEVLDFKIIEEGNIIKYDKKPIEESN

BBQ05 *

MKYYICVCVFLLLNACNSDFSTNQEDIKYPSDKEKSKSNMEASSKEEDPNKKIKNTLLND

BBQ33 # (one member of this family [from strain 297] was not lipidated in E. coli, Porcella et al., 1996)

MKMLKRLHCLLIALLLCCTTIANLPEEPKPPIIPTLKSLAKYETQLSEYVMYLVTFLAKT

BBQ35 * (mlpJ)

MKIINILFCISLLLLNSCNSNDNDTLKNNAQQTKSRKKRDLSQEELPQQEKITLTSDEEK

BBQ46 # (questionable signal sequence)

MIKNVIYILPKTKKFSQIVWLLIACKIWIVG

BBQ47 * (erpX)

MFGVVVNLRLMEWIMNKKMKIFIICAVFVLISSCKIDATGKDATGKDATGKDATGKDATG

BBQ56 #

MKIFFIFLDILYFLAYNICIDSYIINYELSIPCKRPLAKANGLLLY

BBQ89 *

MNKLLIFIILLVFSCNLSNSDQNNPLNMSNKEKISEYQINESSNKYSIFKRNSSVKRYTF

 

Part IV

Pseudo-, Questionable and Short B. Burgdorferi B31 Plasmid Genes

by Sherwood Casjens, Granger Sutton, Jeremy Peterson & Dan Haft - completed March 1999.

The purpose of the following table is to call attention to the B31 plasmid genes which may not have a biological function. It lists (i) all the putative pseudogenes, (ii) all those computer (GLIMMER)-recognized genes that are "questionable" for some reason, and (iii) all the short (<300 bp) genes on the B. burgdorferi B31 plasmids. The reason for categorizing each gene as a "questionable gene" or "pseudogene" is given in the "COMMENTS" column. It is of course very possible that any given "short" or "questionable" gene or truncated "pseudogene" could be expressed and could have a biological function. We do not mean to imply that this is anything beyond our best current guess at functionality

 

DEFINTIONS

PSEUDOGENE - a region of DNA that is similar in sequence to a paralogous Borrelia gene or a gene from another organism, but which is truncated relative to other members of the gene family and/or does not have a full open reading frame.

QUESTIONABLE GENE - an ORF that is called a putative gene by the TIGR gene-recognition protocol, but may be a false recognition because it is either (i) within another gene or pseudogene, or (ii) was not called in highly similar, paralogous sequence elsewhere on the B31 plasmids.

† -"Daggers" in column 1 indicate computer-recognized open reading frames that are in-frame inside of larger pseudogenes (they are questionable gene calls that are part of a larger pseudogene).

SHORT GENE - a <300 bp long ORF that was recognized as a putative gene by the TIGR gene-recognition protocol that is NOT in the "questionable" or "pseudogene" categories.

Genes <300 bp are rather highly over-represented on some of the B31 the plasmids, so it is possible, or even perhaps likely, that some these are in fact not real genes, but are spurious gene calls in "junk DNA" on these plasmids. Such short genes are often not closely packed like the Borrelia chromosomal genes and most other prokaryotic genes. About 9% (80/844) of the putative Bb chromosomal genes are <300 bp (and some of these are homologs of small genes of known function, pointing out again that particular short plasmid genes may well have a function). The fraction of "short" genes is considerably higher on the B31 plasmids which contain putative decaying DNA regions.

A "*" in the SHORT GENE column of the table below indicates a <300 bp gene that is either a "questionable gene" or "pseudogene" and so does not fit into the "short gene" category as we define it.

 

Pseudo-, Questionable and Short B. Burgdorferi B31 Plasmid Genes

GENE

NAME

5’-end

3’-end

SHORT

GENE

QUESTIONABLE

GENE

PSEUDO-

GENE

COMMENTS

(paralogous family names in [square brackets])

cp9

           

BBC04

2700

2593

SHORT

   

no homolog outside of Borrelia

BBC07

4788

4507

SHORT

   

no homolog outside of Borrelia

             

cp26

           

BBB15

11636

11737

SHORT

   

no homolog outside of Borrelia

BBB20

17733

17626

SHORT

   

no homolog outside of Borrelia

BBB21

17750

17842

SHORT

   

no homolog outside of Borrelia

             

cp32-1

           

BBP14

8724

8957

SHORT

   

probably a real gene; has paralogs on other cp32s

BBP23

15215

15415

SHORT

   

probably a real gene; has paralogs on other cp32s

             

cp32-3

           

BBS14

8724

8957

SHORT

   

probably a real gene; has paralogs on other cp32s

BBS23

15212

15412

SHORT

   

probably a real gene; has paralogs on other cp32s

BBS28

16915

17046

SHORT

   

no homolog

BBS32

19198

19392

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBS43

28067

28246

SHORT

   

no homolog

             

cp32-4

           

BBR02

1306

1998

   

PSEUDO

authentic frameshift

BBR14

8734

8967

SHORT

   

probably a real gene; has paralogs on other cp32s

BBR23

15167

15367

SHORT

   

probably a real gene; has paralogs on other cp32s

BBR30

18829

18737

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBR35

21974

22357

   

PSEUDO

authentic point mutation

BBR39

25636

25538

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBR40

25865

25966

*

 

PSEUDO

N-terminal erp gene [162] fragment (named erpH)

BBR41

26077

26817

   

PSEUDO

fusion of erp gene [162] and family [161] gene

             

cp32-6

           

BBM14

8734

8967

SHORT

   

probably a real gene; has paralogs on other cp32s

BBM23

15231

15431

SHORT

   

probably a real gene; has paralogs on other cp32s

BBM40

27731

27850

*

QUESTIONABLE

 

overlaps BBM39

             

cp32-7

           

BBO14

8726

8959

SHORT

   

probably a real gene; has paralogs on other cp32s

BBO23

15222

15422

SHORT

   

probably a real gene; has paralogs on other cp32s

BBO35

22755

22630

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBO41

28117

28007

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

             

cp32-8

           

BBL14

8724

8957

SHORT

   

probably a real gene; has paralogs on other cp32s

BBL23

15215

15415

SHORT

   

probably a real gene; has paralogs on other cp32s

BBL33

21467

21556

SHORT

   

no homolog

             

cp32-9

           

BBN05

3343

3950

   

PSEUDO

authentic frameshift

BBN06

3960

4935

   

PSEUDO

authentic frameshift

BBN13

8289

8742

   

PSEUDO

authentic frameshift

BBN14

8742

8975

SHORT

   

probably a real gene; has paralogs on other cp32s

BBN16

10283

11034

   

PSEUDO

authentic frameshift

BBN18

12009

12560

   

PSEUDO

authentic point mutation

BBN21

13807

14410

   

PSEUDO

authentic frameshift

BBN22

14423

15320

   

PSEUDO

authentic frameshift

BBN23

15312

15512

SHORT

   

probably a real gene; has paralogs on other cp32s

BBN29

18783

17776

   

PSEUDO

authentic point mutation

BBN37

25779

25006

   

PSEUDO

authentic point mutation

BBN40

27991

27884

*

QUESTIONABLE

 

gene not called in paralogous sequence elsewhere

             

lp5

           

BBT01

195

635

   

PSEUDO

[57] fragment

BBT02

744

1094

   

PSEUDO

[57] fragment

BBT03

1208

1573

   

?

[84] ([57] fragment?)

BBT05

3200

3350

*

 

PSEUDO

[57] fragment

BBT07

4388

4816

   

PSEUDO

[52] C-terminal fragment

             

lp17

           

BBD001

214

405

SHORT

   

no homolog outside of Borrelia

BBD02

873

1019

*

 

PSEUDO

[77] ([57] fragment?)

BBD03

1117

1309

*

 

PSEUDO

[57] fragment

BBD04

1412

1765

   

PSEUDO

[57] fragment

BBD05

2389

2541

SHORT

 

?

[84] ([57] fragment?)

BBD05.1

3018

3604

   

PSEUDO

[57] fragment

BBD06

3143

3604

 

QUESTIONABLE

 

in-frame inside of D05.1

BBD07

4373

4260

SHORT

   

no homolog outside of Borrelia

BBD08

4707

4802

*

 

PSEUDO

[82] fragment

BBD12

7752

7624

SHORT

   

no homolog outside of Borrelia

BBD15.01

10000

9940

*

 

PSEUDO

[175] fragment

BBD15.1

10152

10326

*

 

PSEUDO

[82] fragment

BBD16

10520

10428

SHORT

   

no homolog outside of Borrelia

BBD17

10591

10683

SHORT

   

no homolog outside of Borrelia

BBD19

12057

12167

SHORT

   

no homolog outside of Borrelia

BBD20

12250

12975

   

PSEUDO

[82] fragment

BBD22

14072

14338

SHORT

   

no homolog outside of Borrelia

BBD23

14781

15725

   

PSEUDO

[82] fragment

BBD24

16121

15894

SHORT

   

no homolog outside of Borrelia

BBD25

16212

16367

SHORT

   

no homolog outside of Borrelia

             

lp21

           

BBU01

184

615

   

PSEUDO

[57] fragment

BBU02

746

1111

   

?

[84] ([57] fragment?)

BBU03

1357

1241

SHORT

   

no homolog outside of Borrelia

BBU07

15349

15810

   

PSEUDO

[57] fragment

BBU08

15791

16081

*

 

PSEUDO

[137] fragment

BBU09

26548

16231

   

PSEUDO

[55] fragment

BBU10

16603

16797

*

 

PSEUDO

[57] fragment

BBU12

17918

18362

   

PSEUDO

[52] authentic frameshift

             

lp25

           

BBE01

255

157

SHORT

   

no homolog outside of Borrelia

BBE03

4613

4422

SHORT

   

no homolog outside of Borrelia

BBE04

4719

4856

   

PSEUDO

family [54] fragment

BBE04.1

5377

5734

   

PSEUDO

family [44] fragment

BBE05

5377

5526

*

QUESTIONABLE

 

E05 inside E04.1 and out of frame

BBE06

5757

5903

SHORT

   

no homolog outside of Borrelia

BBE07

6401

6185

*

 

PSEUDO

[26] fragment

BBE08

6701

6558

SHORT

   

no homolog outside of Borrelia

BBE10

7972

7877

SHORT

   

no homolog outside of Borrelia

BBE11

8446

8315

SHORT

   

no homolog outside of Borrelia

BBE12

8646

8524

SHORT

   

no homolog outside of Borrelia

BBE13

8863

8955

SHORT

   

no homolog outside of Borrelia

BBE14

9163

9375

SHORT

   

no homolog outside of Borrelia

BBE15

9490

9356

SHORT

   

no homolog outside of Borrelia

BBE21.1

14767

14893

*

 

PSEUDO

[82] fragment

BBE23

15973

16155

SHORT

   

no homolog outside of Borrelia

BBE23.1

16459

16540

*

 

PSEUDO

[57] fragment

BBE23.2

16540

16721

*

 

PSEUDO

[32] fragment

BBE24

16915

17169

*

 

PSEUDO

[49] fragment

BBE24.1

17902

18303

   

PSEUDO

[49?] fragment - see paralog list

BBE25

18606

18505

SHORT

   

no homolog outside of Borrelia

BBE26

18586

18711

SHORT

   

no homolog outside of Borrelia

BBE27

19055

19195

SHORT

   

no homolog outside of Borrelia

BBE28

19489

19340

SHORT

   

no homolog outside of Borrelia

BBE29

19697

20883

   

PSEUDO

DNA methyltransferase pseudogene

BBE29.1

21110

21476

   

PSEUDO

[102] fragment

BBE30

21558

21701

*

 

PSEUDO

[49] fragment

BBE32

23723

23418

   

PSEUDO

[57] fragment

BBE33

24100

23850

*

 

PSEUDO

[57] fragment

             

lp28-1

           

BBF001

1

163

*

 

PSEUDO

[88] fragment

BBF001.1

200

300

*

 

PSEUDO

[80] fragment

BBF02

1720

2073

   

PSEUDO

[88] fragment

BBF03

2619

2101

   

PSEUDO

[80] fragment

BBF04

2658

2804

*

 

PSUEDO

[77] ([57] fragment?)

BBF05

2777

3073

*

 

PSEUDO

[57] fragment

BBF06

3201

3377

*

 

PSEUDO

[57] fragment / overlaps K45 homology

BBF07

3529

3413

SHORT

   

no homolog outside of Borrelia

BBF08

3849

3685

*

 

PSEUDO

[72] authentic frameshift

BBF09

3962

4179

*

 

PSEUDO

[71] authentic frameshift

BBF11

5435

5539

*

QUESTIONABLE

 

inside BBF11.1 and backwards

BBF11.1

5620

5412

*

 

PSEUDO

[32] fragment

BBF12

6540

5956

   

?

patches of [49] homology; fusion pseudogene?

BBF14.1

8197

8367

*

 

PSEUDO

[65] authentic frameshift

BBF16

8389

8607

*

 

PSEUDO

[64] authentic point mutation

BBF17

8772

9026

*

 

PSEUDO

[68] authentic frameshift

BBF18

9561

10049

   

PSEUDO

[82] pseudogene

BBF19

10559

10036

 

QUESTIONABLE

 

(BBF19 is an inverted part of BBF18)

BBF19.1

10916

11200

*

 

PSEUDO

[175] fragment

BBF20

10991

10701

*

 

PSEUDO

[85] fragment

BBF21

11550

11449

SHORT

   

no homolog outside of Borrelia

BBF22

12018

11794

*

 

PSEUDO

[44] fragment

BBF26.1

16209

15663

   

PSEUDO

badly deleted [101] pseudogene

BBF27

15925

15758

*

QUESTIONABLE

 

in frame inside F26.1

BBF28

16129

16001

*

QUESTIONABLE

 

out of frame (?) inside F26.1

BBF29

16825

16457

   

PSEUDO

[49] fragment (patchy similarity)

BBF30

17415

17170

SHORT

   

no homolog outside of Borrelia

BBF31

17805

17394

*

 

PSEUDO

[50] fragment

BBF31.1

17920

18050

   

PSEUDO

[57] fragment

BBF32

26698

18430

   

PSEUDO

vlsE recombination cassette; apparently functional but not transcribed; In the wider sense BBF32 contains 15 unexpressed pseudogenes (Zhang et al., 1997).

             

lp28-2

           

BBG03

2104

2492

   

PSEUDO

[48] fragment

BBG04

2857

2753

SHORT

   

no homolog outside of Borrelia

BBG05

4056

2894

   

PSEUDO

one frameshift relative to homologous transposases of this type

BBG11

11015

10779

SHORT

   

no homolog outside of Borrelia

BBG31

26567

27082

   

PSEUDO

N-terminal truncation of family [50]

             

lp28-3

           

BBH01

273

464

SHORT

   

no homolog outside of Borrelia

BBH03

926

1072

*

 

PSEUDO

[77] ([57] fragment?)

BBH04

1045

1365

   

PSEUDO

[57] fragment

BBH05

1498

1677

*

 

PSEUDO

[57] fragment

BBH07

3514

3086

   

PSEUDO

[50] fragment

BBH08

3730

3593

SHORT

   

no homolog outside of Borrelia

BBH09.1

8091

7810

*

 

PSEUDO

[92] fragment

BBH10

8203

8003

*

QUESTIONABLE

 

partially overlaps BBH09.1 backwards

BBH10.1

8240

8310

*

 

PSEUDO

[82] fragment

BBH11

8796

8704

*

QUESTIONABLE

 

backwards inside BBH11.1

BBH11.1

8320

8850

   

PSEUDO

[1] fragment (in part)

BBH12

9455

9589

SHORT

   

no homolog outside of Borrelia

BBH14

10934

10821

SHORT

   

no homolog outside of Borrelia

BBH15

11005

10913

SHORT

   

no homolog outside of Borrelia

BBH16

11068

11187

SHORT

   

no homolog outside of Borrelia

BBH17

11837

12025

SHORT

   

no homolog outside of Borrelia

BBH18.1

13571

13693

*

 

PSEUDO

[65] fragment

BBH19

13590

13709

*

QUESTIONABLE

 

overlaps BB18.1 partly in-frame

BBH20

14596

13840

   

PSEUDO

[171] fragment

BBH20.1

14750

15300

   

PSEUDO

[104] fragment

BBH22

14870

14766

*

QUESTIONABLE

 

backwards inside BBH20.1

BBH23

15051

15158

*

QUESTIONABLE

 

in-frame inside BBH20.1

BBH24

15136

15342

*

QUESTIONABLE

 

mostly within BBH20.1

BBH24.1

15354

15750

   

PSEUDO

[86] fragment

BBH25

15810

15568

*

QUESTIONABLE

 

backwards inside BBH24.1

BBH30

20871

20415

   

PSEUDO

[96] fragment

BBH31

20997

21104

SHORT

   

no homolog outside of Borrelia

BBH33

22678

22950

*

 

PSEUDO

[61] fragment

BBH34

23383

23192

*

 

PSEUDO

[62] fragment

BBH35

23447

23560

SHORT

   

no homolog outside of Borrelia

BBH36

24180

24031

*

QUESTIONABLE

 

in frame inside H36.1

BBH36.1

24223

24041

*

 

PSEUDO

[44] fragment

BBH36.2

24751

25112

   

PSEUDO

[102] fragment

BBH38

26614

26498

SHORT

   

no homolog outside of Borrelia

BBH39

26754

26855

SHORT

   

no homolog outside of Borrelia

BBH40

27445

26981

   

PSEUDO

[82] fragment

             

lp28-4

           

BBI01

174

605

   

PSEUDO

[57] fragment

BBI02

736

1101

   

?

[84] (fragment of [57]?)

BBI02.1

1416

1346

*

 

PSEUDO

[172] fragment

BBI02.2

1617

1862

*

 

PSEUDO

[57] fragment

BBI03

1972

1850

SHORT

   

no homolog outside of Borrelia

BBI04

2219

2127

SHORT

   

no homolog outside of Borrelia

BBI05

2310

2191

SHORT

   

no homolog outside of Borrelia

BBI07

3441

3346

SHORT

   

no homolog outside of Borrelia

BBI08

3576

3752

*

QUESTIONABLE

 

overlaps BBI08.1 at C-terminus

BBI08.1

3674

4314

   

PSEUDO

[59] fragment

BBI09

3745

3879

*

QUESTIONABLE

 

in-frame inside I08.1

BBI10

3911

4312

 

QUESTIONABLE

 

in-frame inside I08.1

BBI11

4721

4626

SHORT

   

no homolog outside of Borrelia

BBI12

5128

5343

SHORT

   

no homolog outside of Borrelia

BBI13

5609

5704

SHORT

   

no homolog outside of Borrelia

BBI14

6159

6269

*

 

PSEUDO

[60] fragment

BBI15

6603

6830

*

 

PSEUDO

[60] fragment

BBI17

8967

8824

SHORT

   

no homolog outside of Borrelia

BBI18

10647

10498

SHORT

   

no homolog outside of Borrelia

BBI23

13989

14090

SHORT

   

no homolog outside of Borrelia

BBI24

14334

14438

SHORT

   

no homolog outside of Borrelia

BBI25

15211

15339

SHORT

   

no homolog outside of Borrelia

BBI27

17272

17096

*

 

PSEUDO

[60] fragment

BBI30

19403

19507

SHORT

   

no homolog outside of Borrelia

BBI31.1

20240

20340

*

 

PSEUDO

[98] fragment

BBI32

20479

20273

*

QUESTIONABLE

 

backwards inside of BBI31.1

BBI33

20482

20589

*

 

PSEUDO

[82] fragment

BBI35

21992

22090

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBI37

23056

23154

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBI40

25320

25802

*

 

PSEUDO

[49] fragment

BBI41

26036

25797

   

PSEUDO

[82] fragment

BBI43

27069

26884

*

 

PSEUDO

[55] fragment - see paralog list

             

lp36

           

BBK001

86

14

*

 

PSEUDO

[82] fragment

BBK02

2478

1213

 

QUESTIONABLE

 

in frame inside K02.1

BBK02.1

3770

1213

   

PSEUDO

pseudogene in [1]

BBK03

3222

2821

 

QUESTIONABLE

 

in frame inside K02.1

BBK04

3595

3419

*

QUESTIONABLE

 

in frame inside K02.1

BBK05

5096

4905

SHORT

   

no homolog outside of Borrelia

BBK06

5126

5233

SHORT

   

no homolog outside of Borrelia

BBK08

6281

6180

SHORT

   

no homolog outside of Borrelia

BBK09

6366

6647

SHORT

   

no homolog outside of Borrelia

BBK10

6983

6807

*

 

PSEUDO

[1] fragment

BBK11

6956

7060

SHORT

   

no homolog outside of Borrelia

BBK14

8921

9013

SHORT

   

no homolog outside of Borrelia

BBK16

10223

10101

*

 

PSEUDO

[32] fragment

BBK18

12143

12054

SHORT

   

conserved hypothetical protein

BBK20

13212

13301

SHORT

   

no homolog outside of Borrelia

BBK24.1

17380

17580

*

QUESTIONABLE

 

inside BBK25 and backwards

BBK25

17565

16969

   

PSEUDO

[82] fragment

BBK25.1

17880

20033

   

PSEUDO

[1] fragment

BBK26

18346

18462

*

QUESTIONABLE

 

in frame inside of K25.1

BBK27

19094

18807

*

QUESTIONABLE

 

backwards inside of BBK25.1

BBK28

19232

19348

*

QUESTIONABLE

 

in frame inside of K25.1

BBK29

19807

19718

*

QUESTIONABLE

 

in frame inside of K25.1

BBK30

19935

20033

*

QUESTIONABLE

 

in frame inside of K25.1

BBK31

20026

20166

SHORT

   

no homolog outside of Borrelia

BBK33

21720

21890

*

 

PSEUDO

[65] fragment

BBK34

21912

22130

SHORT

   

no homolog outside of Borrelia

BBK35

22294

22464

SHORT

   

no homolog outside of Borrelia

BBK36

22545

22646

SHORT

   

no homolog outside of Borrelia

BBK37

23953

22913

   

PSEUDO

[175 & 75] pseudogene

BBK38

24146

23922

SHORT

   

no homolog outside of Borrelia

BBK39

24293

24667

   

PSEUDO

[59] fragment

BBK42

26803

26588

SHORT

   

no homolog outside of Borrelia

BBK42.1

26916

27078

SHORT

   

no homolog outside of Borrelia

BBK43

26937

27041

 

QUESTIONABLE

 

largely in-frame inside of BBK42.1

BBK44

27234

27350

SHORT

   

no homolog outside of Borrelia

BBK46

29212

28394

   

PSEUDO

[75] authentic frameshifts

BBK51

34232

34327

SHORT

   

no homolog outside of Borrelia

BBK52.1

35722

35811

*

 

PSEUDO

[174] authentic frameshift

BBK54

36577

36392

SHORT

   

[55] fragment? - see paralog list

             

lp38

           

BBJ001

482

1208

   

PSEUDO

[60] fragment

BBJ01

482

664

*

QUESTIONABLE

 

in frame inside J001

BBJ02

927

1208

*

QUESTIONABLE

 

in frame inside J001

BBJ02.1

1475

2367

   

PSEUDO

[48] fragment

BBJ03

1593

1742

*

QUESTIONABLE

 

in frame inside of BBJ02.1

BBJ04

2381

2271

*

QUESTIONABLE

 

backwards inside of J02.1

BBJ05

3828

2768

   

PSEUDO

[82] fragment

BBJ06

3486

3629

*

QUESTIONABLE

 

backwards inside of BBJ05

BBJ07

4307

4167

*

QUESTIONABLE

 

inside J07.1

BBJ07.1

4409

4167

*

 

PSEUDO

[98] fragment

BBJ10

7270

7473

*

 

PSEUDO

[58] fragment

BBJ11

7965

7783

SHORT

   

no homolog outside of Borrelia

BBJ11.1

8070

8260

*

 

?

[171] fragment (weak - not in TIGR gene list)

BBJ12

8725

8636

*

QUESTIONABLE

 

inside of and out-of-frame with BBJ12.1

BBJ12.1

8782

8593

*

 

PSEUDO

[86] fragment

BBJ13

9155

8880

*

 

PSEUDO

[69] fragment

BBJ14

9125

9283

*

QUESTIONABLE

 

overlaps BBJ13 backwards

BBJ15

10168

10043

*

QUESTIONABLE

 

inside BBJ15.1 and backwards

BBJ15.1

9450

10150

   

PSEUDO

[105] fragment

BBJ20

13936

13775

*

 

PSEUDO

[167] pseudogene

BBJ21

15514

15657

*

QUESTIONABLE

 

backwards inside BBJ21.1

BBJ21.1

15976

15484

*

 

PSEUDO

[138] fragment

BBJ22

16003

15905

 

QUESTIONABLE

 

overlaps BBJ21.1 in part

BBJ30

23018

23119

SHORT

   

no homolog outside of Borrelia

BBJ32

24517

24389

SHORT

   

no homolog outside of Borrelia

BBJ33

24681

24791

SHORT

   

no homolog outside of Borrelia

BBJ35

26858

26959

SHORT

   

no homolog outside of Borrelia

BBJ37

28442

28281

SHORT

   

no homolog outside of Borrelia

BBJ38

28703

28608

SHORT

   

no homolog outside of Borrelia

BBJ39

29401

29267

SHORT

   

no homolog outside of Borrelia

BBJ39.1

29680

29600

   

PSEUDO

[54] fragment

BBJ40

29827

29919

*

QUESTIONABLE

 

paralog not called between BBJ41 and BBJ42

BBJ42

31133

30945

SHORT

 

?

possible [54] fragment (similarity not great; not in TIGR paralog list)

BBJ44

32150

32251

SHORT

   

no homolog outside of Borrelia

BBJ45.1

33652

33522

*

 

PSEUDO

[173] authentic frameshift

BBJ46

34668

34375

SHORT

   

no homolog outside of Borrelia

BBJ49

36455

36315

*

 

PSEUDO

[92] fragment

BBJ50

36502

37203

   

PSEUDO

authentic point mutation (see annotated gene list)

BBJ51

38522

37382

   

PSEUDO

[170] fragment/authentic frameshift

             

lp54

           

BBA02

1238

1122

SHORT

   

no homolog outside of Borrelia

BBA06

4342

4226

SHORT

   

no homolog outside of Borrelia

BBA12

8202

8378

SHORT

   

no homolog outside of Borrelia

BBA17

11390

11301

SHORT

   

no homolog outside of Borrelia

BBA22

15084

15239

SHORT

   

no homolog outside of Borrelia

BBA26

17386

17514

SHORT

   

no homolog outside of Borrelia

BBA27

17563

17679

SHORT

   

no homolog outside of Borrelia

BBA28

17906

17757

SHORT

   

no homolog outside of Borrelia

BBA29

18019

17897

SHORT

   

no homolog outside of Borrelia

BBA32

20654

20845

SHORT

   

no homolog outside of Borrelia

BBA35

23391

23284

SHORT

   

no homolog outside of Borrelia

BBA49

32678

32893

SHORT

   

no homolog outside of Borrelia

BBA53

35890

36162

SHORT

   

no homolog outside of Borrelia

BBA54

36192

36467

SHORT

   

no homolog outside of Borrelia

BBA58

39566

39766

SHORT

   

no homolog outside of Borrelia

BBA59

40048

39812

SHORT

   

12 kd lipoprotein

BBA62

42203

42406

SHORT

   

7.5 kd lipoprotein

BBA63

42576

42454

SHORT

   

no homolog outside of Borrelia

BBA67

46021

46197

SHORT

   

no homolog outside of Borrelia

BBA70

49158

48523

   

PSEUDO

[54] fragment

BBA71

49796

49386

   

PSEUDO

[54] fragment

BBA72

50031

49792

SHORT

   

no homolog outside of Borrelia

BBA75

52591

52496

SHORT

   

no homolog outside of Borrelia

             

lp56

           

BBQ01

279

545

*

 

PSEUDO

[55] fragment? see paralog list

BBQ02

710

799

*

QUESTIONABLE

 

paralog not called in right end of lp28-4

BBQ04

1404

2265

   

PSEUDO

[44] authentic frameshift

BBQ10

6674

6339

   

PSEUDO

[51] fragment; result of cp32 integration

BBQ11

6586

6800

   

PSEUDO

[148] fragment; result of cp32 integration

BBQ16

9238

9624

   

PSEUDO

[108] authentic frameshift

BBQ21

12193

12426

SHORT

   

probably a real gene; has paralogs on other cp32s

BBQ30

18713

18913

SHORT

   

probably a real gene; has paralogs on other cp32s

BBQ36

21424

21513

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBQ46

29559

29651

*

QUESTIONABLE

 

paralog not called in paralogous sequence elsewhere

BBQ51

33873

35095

   

PSEUDO

[146] authentic frameshift

BBQ54

36389

37132

   

PSEUDO

[148] fragment; result of cp32 integration

BBQ55

37533

36933

   

PSEUDO

[51] fragment; result of cp32 integration

BBQ56

37809

37672

SHORT

   

no homolog outside of Borrelia

BBQ57

38223

37819

 

QUESTIONABLE

 

in frame and inside of BBQ60

BBQ58

38514

38419

*

QUESTIONABLE

 

in frame and inside of BBQ60

BBQ59

39077

38568

 

QUESTIONABLE

 

in frame and inside of BBQ60

BBQ60

39360

37817

   

PSEUDO

[101] fragment

BBQ61

39482

39360

*

QUESTIONABLE

 

in frame and inside of BBQ63

BBQ62

39531

39902

 

QUESTIONABLE

 

backwards and inside of BBQ63

BBQ63

39934

39400

   

PSEUDO

[117] fragment

BBQ64

40186

39962

*

QUESTIONABLE

 

inside of and in-frame with BBQ65

BBQ65

40218

39961

*

 

PSEUDO

[103] fragment

BBQ66

40317

40409

SHORT

   

no homolog outside of Borrelia

BBQ67

43732

40439

   

PSEUDO

putative adenine specific DNA methyltransferase (e.g., BBE29 [167]) fused to [102] - like sequences

BBQ68

44232

43918

 

QUESTIONABLE

 

in-frame and inside of BBQ69

BBQ69

44581

43769

   

PSEUDO

[138] authentic frameshift/fragment

BBQ70

44612

44511

*

QUESTIONABLE

 

in-frame in BBQ69 in part

BBQ71

44582

45263

   

PSEUDO

[105] fragment

BBQ72

45314

45427

SHORT

   

no homolog outside of Borrelia

BBQ73

45630

45530

*

 

PSEUDO

[60] fragment

BBQ74

46462

45804

   

PSEUDO

[60] fragment

BBQ75

46671

46781

*

 

PSEUDO

[168] fragment

BBQ76

46982

47077

SHORT

   

no homolog outside of Borrelia

BBQ77

47163

47279

*

 

PSEUDO

contains a [82] fragment which may be part of one deleted pseudogene that includes BBQ79

BBQ78

47295

47393

*

QUESTIONABLE

 

out of frame inside BBQ81

BBQ79

47273

47596

   

PSEUDO

[82] fragment

BBQ80

48626

47787

*

 

PSEUDO

[60]fragment

BBQ81

49246

49047

*

 

PSEUDO

[48] fragment

BBQ82

49538

49347

*

 

PSEUDO

[76] fragment

BBQ83

49868

49755

SHORT

   

no homolog outside of Borrelia

BBQ84

50550

50398

*

 

PSEUDO

[84] fragment (also [57] fragment?)

BBQ84.1

50900

50700

*

 

PSEUDO

[169] authentic frameshift

BBQ85

51528

51175

   

PSEUDO

[57] fragment

BBQ86

51823

51722

*

 

PSEUDO

[57] fragment

BBQ87

52067

51921

*

 

PSUEDO

[77] ([57] fragment?)

BBQ89

52726

52535

SHORT

   

no homolog outside of Borrelia

             

Large Chromosome pseudogenes (and short and questionable genes in rightmost 7.2 kbp)

BB0119

117763

116828

   

PSEUDO

single authentic frameshift

BB0843.1

903255

903415

*

 

PSEUDO

[32] fragment

BB0845

905120

905224

*

QUESTIONABLE

 

inside BB0845.1 and backwards

BB0845.1

905255

905025

*

 

PSEUDO

[76] pseudogene

BB0845.11

905395

905295

*

 

PSEUDO?

[166] fragment; possible pseudogene; this is a relatively poor homolog and is not included in the TIGR analysis

BB0845.2

905475

905775

*

 

PSEUDO

[105] fragment

BB0846

905865

905755

*

QUESTIONABLE

 

overlaps BB845.3 backwards

BB0847

905839

905943

SHORT

     

BB0848

905928

906029

SHORT

     

BB0848.1

906075

906275

*

 

PSEUDO

[82] fragment

BB0849

906162

906260

*

QUESTIONABLE

 

inside BB0848.1

BB0849.1

906725

906275

   

PSEUDO

[57] fragment

BB0849.2

907225

908225

   

PSEUDO

[1] fragment

BB0853

910175

909845

   

PSEUDO

[57] fragment

BB0853.1

910555

910375

*

 

PSEUDO

[57] fragment

 

Part V

Direct, Tandem Repeat Arrays of "Short" Sequences and "Long" Inverted Repeats in the B. burgdorferi B31 Plasmids

Sherwood Casjens, Granger Sutton and Brian Stevenson - February, 1999

This compilation was not done in a completely rigorous fashion, so there may be additional short tandem repeat arrays on the B31 plasmids. The roles of these repeats are unknown with the exception of the vlsE cassettes on lp28-1 (Zhang et al., 1997).

Plasmid

Unit

Length

(bp)

# of

repeats

bp in

array

in gene?

(location)

Paralogous gene family

Typical repeat unit sequence

             

lp17

21

8.3

175

no (13154)

--

TAATTAATATGTGATATAAAA

             

lp21

63

176.2

11,004

no (3618)

--

ATAAATCATATAAATAAATATTTCATAAATAAT AAGTAAAAGTGGTTTAGTTTTGGAGTGTAT (see below)

             

lp28-1

33(A)

2.5

83

BBF03

[80]

tactactaagattgatactgttaaaagcgaact

lp28-1

33(B)

3.1

101

BBF03

[80]

ggaatccaataacaaagttcttttggaaaagct

lp28-1

~570

15

8800

BBF32

[170]

vlsE pseudogene cassettes

             

lp28-2

87(A)

3.1

272

BBG33

[80]

CAATCTTGTTACTAAGATTGATACTGTTAAAAG TGAACTTACTACTAAGATTGATAATGTAGAAAA GAATTTACAAAAGGATATATC

lp28-2

33(B)

4.4

145

BBG33

[80]

AAAGCTGGAAGCCAATAACAAACTTCTTTTGGA

             

lp28-3

54(A)

5.6

304

BBH13

[80]

tagaaaagaatttacaaaaagacatatttaatt tagatgctaagatagattctg

             

lp28-4

27

21.5

661

BBI16

[60]

aacagaagaagagcttaagaaaaaaca

             

lp38

17

7.6

125

no (5938)

--

AATTGATATTAAAATAT

 

7

12.3

86

no (10287)

--

TAATAGT

             

lp54

11

7.1

78

no (20128)

--

TAAATCAATAT

             

cp32-3

54

3.6

194

BBS29

[80]

ATACCAAGATAGATAATGTAGAAAAGAATTTAC AAAAAGATATATCTAATTTAG

             

cp32-3

87

4.1

354

BBS37

[80]

TTTAAACACTAAAATAGACAATGTTGAAAAGAA TTTAAATCTAAAAATAGATAATTTAGACTCTAA AATAGATACTGTAGAAAAGAA

(note that this complex 87 bp repeat is actually made up of two parts, a 54 bp part and a 33 bp part that is a fragment of the 54 bp sequence).

Each of the cp32s have an "ortho-paralog" of both BBS29 and BBS37 (except cp32-1 and cp32-6, which have only a ortho-paralog of BBS37), that are not listed here. All of these do contain similar repeats. W. Zuckert has performed a more detailed analysis of the family 80 repeats and named these genes "bdr" for Borrelia direct repeat (Zuckert, Meyer & Barbour, 1999).

 

The 63 bp repeat tract on plasmid lp21

The lp21 "63 bp repeat tract" is a tandem array of 176 near repeats of a 63 bp sequence. The 63 bp sequence unit is actually made up of 34 different, very similar sequences (analysis by G. Sutton & S. Casjens). In the table below, the three columns represent: (1) a letter name given to each type of 63 bp repeat unit, (2) the number of that particular repeat unit present in the lp21 array, and (3) the sequence of that type of repeat unit.

a 28 --TGGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

b 2 --TGGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATTAGTAAAAGTGGTTTAGTTT

c 1 --TGGAGTGTATAAATCATAAAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

d 14 --TGGAGTGTATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

e 1 --TGGAGTGTATAAATCATATAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

f 1 A--GGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

g 1 A--GGAGTGTATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

h 5 ACTGGACTACATAAATCATAAAAATAAATATTTCATAAAGAATAAGTAAAAGTGGTTTAGTTT

i 6 ACTGGACTACATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

j 1 ACTGGACTACATAAATCATAAAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

k 1 ACTGGACTACATAAATCATAAAAATAGATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

l 6 ACTGGACTACATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

m 5 ACTGGACTACATAAATCATATAAATAGATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

n 3 ACTGGAGTGTATAAATCATAAAAATAAATATTTCATAAAGAATAAGTAAAAGTGGTTTAGTTT

o 5 ACTGGAGTGTATAAATCATAAAAATAAATATTTCATAAATAATAAGTAAAAGTGGTTTAGTTT

p 1 ACTGGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTGTAGTTT

q 38 ACTGGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

r 1 ACTGGAGTGTATAAATCATAAAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

s 1 ACTGGAGTGTATAAATCATAAAAATAGATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

t 2 ACTGGAGTGTATAAATCATATAAATAAATATTTCATAAATAATAAGTAAAAGTGGTTTAGTTT

u 6 ACTGGAGTGTATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

v 2 AGTGGACTACATAAATCATAAAAATAAATATTTCATAAAGAATAAGTAAAAGTGGTTTAGTTT

w 14 AGTGGACTACATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

x 1 AGTGGACTACATAAATCATAAAAATAAATATTTTATAAAGAATTAGTAAAAGTGGTTTAGTTT

y 3 AGTGGACTACATAAATCATAAAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

z 1 AGTGGACTACATAAATCATAAAAATAGATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

A 6 AGTGGACTACATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

B 1 AGTGGACTACATAAATCATATAAATAAATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

C 3 AGTGGACTACATAAATCATATAAATAGATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

D 3 AGTGGACTACATAAATCATATAAATAGATATTTTATAAATAATAAGTAAAAGTGGTTTAGTTT

E 1 AGTGGAGTGTATAAATCATAAAAATAAATATTTCATAAAGAATAAGTAAAAGTGGTTTAGTTT

F 8 AGTGGAGTGTATAAATCATAAAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

G 3 AGTGGAGTGTATAAATCATATAAATAAATATTTTATAAAGAATAAGTAAAAGTGGTTTAGTTT

H 1 CCTCGGGTACATAAATCATAAAAATAAATATTTCATAAAGAATAAGTAAAAGTGGTTTAGTTT

 

The types of repeat units appear in the order given below. Note that single underline and double underline indicate the two major larger perfect repeats of a contiguous group of 63 bp units within the array (i.e., large perfect sequence repeats) that probably indicate recent duplications of sections of the array; some smaller pieces (e.g., "w, l, q, q") of these groups can also be found elsewhere in the array. The longest tandem group of perfect 63 bp repeats is 3 "q" repeats in a row.

H, t, b, d, q, C, q, A, a, d, a, i, q, a, q, w, l, q, q, q, a, o, D, h, h, A,

G, a, o, a, q, a, o, D, h, h, A, G, a, o, a, e, u, F, d, d, q, C, D, a, a, d,

F, w, l, q, q, n, d, d, x, n, n, l, a, q, i, l, w, a, w, d, a, d, F, u, a, o,

w, a, q, w, l, q, q, q, a, i, a, q, q, k, d, d, q, E, u, a, q, q, p, a, q, d,

F, G, q, q, u, F, F, q, u, j, F, s, w, q, q, u, g, f, F, w, q, A, q, v, t, b,

d, q, C, q, A, a, d, a, i, q, a, q, w, l, q, q, i, q, q, a, w, A, a, w, y, B,

a, y, y, q, z, r, m, h, w, a, m, w, a, m, w, m, v, i, m, c

 

 

 

The cp32/cp9 ~180 bp Inverted Repeats

Each of the cp32s, the cp32-like portion of lp56 and cp9 carry a similar pair of ~180 bp inverted repeats. Each plasmid carries a left inverted repeat unit (IRL) and a right inverted repeat unit (IRR) which surround the putative plasmid partitioning gene cluster (Casjens et al., 2000).

        outside      inside                    inside           outside

ORF-4     --------------->    "partition genes"   <-----------------    ORF-8/7

                IRL                                   IRR

These inverted repeats are not all completely identical; in some locations all IRL’s are more like each other than they are like the IRR’s (e.g., some members in tier 1 below), and in other places IRL’s from one plasmid are more like IRR’s from the same plasmid than they are like IRL’s or IRR’s from other plasmids (some members in tier 3 below). Below, we indicate the presence of smaller inverted repeats within the larger repeat units as follows (---> <---).

Each of the inverted repeat units overlaps a putative gene such that the ATG marked with asterisks (***) is the most likely start codon. Each of these has a credible Shine-Dalgarno ribosome binding site marked by "xxxx"; a "====>" indicates the 5’-portion of the putative gene which is translated outward from the repeat unit. A paralogous family 161 gene (previously called ORF-4 family) gene starts within each IRL and a family 165 gene (previously called ORF-8/7 family) starts within each IRR; curiously, these two (otherwise unrelated) protein families will have very similar 13-15 N-terminal amino acid sequences. In some, but not all of these genes, the above open reading frame extends 5’ of the marked ATG to the TTG marked with bullets (•••), but its ribosome binding site is not as credible as that of the ATG described above.

The roles of these inverted repeats are unknown, but they are positioned such that one might expect an outward directed promoter to initiate transcription in each of them. It is thus possible, or even likely, because of the extreme similarity of these regions, that these outward-directed promoters might be coordinately controlled.

It is interesting to note that these inverted repeats do not surround similar "partitioning" gene clusters on the linear plasmids, where family 161 and 165 genes are not located adjacent to the "partitioning" gene cluster.

The inverted repeat sequences below are named as follows: each of the cp32’s is indicated simply by its cp32 number, lp56 is indicated by "56" and cp9 by "cp9". Note that cp32-5, which is not present in strain B31 MI but is present in some other B31 cultures, is included in this analysis.

 

Inside end

 

--——————————————————>............<————————————————

1-IRL

ACGGG..CTTAACTAATTTCTTTAGTAGATAATAGAGAATTTAGCTAAGC

3-IRL

ATGGG..CTTAGCTAAGTTCTTTAACA......AGAGAACGTAGCTAAGC

4-IRL

GTGGGAACTTGGCGAAATTCTTTTTAA......AGGGAATTTGGTTAAGT

5-IRL

TAGGG..ATTAACTAAGTTTTTTAGTAGATAATAGAAAATTTAGCTAAGC

6-IRL

GTGGGGACTTAACGAGATTCTTTAAGA......AAAGAGTTTGGTTAAGT

7-IRL

TTGGGGACTTAACGAGATTCTTTGAGA......AAAGAGTTTGGTTAAGT

8-IRL

ACGGG..CTTAACTAATTTCTTTAGTAGATAATAGAGAATTTAGCTAAGC

9-IRL

ATGGG..CTTAACTAAGTTCTTTAACA......AGAGAATTTAGCTAAGC

56-IRL

ACGGG..CTTAACTAATTTCTTTAGTAGATAATAGAGAATTTAGCTAAGC

cp9-IRL

TAGGG.. CTTTACTAAGTTCTTTTAAA......AGAGAATTTAGCAAAGC

1-IRR

TTGGG..TTTAGCTAAGTTCTTTAACA......AGAGAATTTAGCTAAGC

3-IRR

TTGGG..TTTAGCTAAGTTCTTGGATA......AGAGAATTTAAATAAAC

4-IRR

TTGGG..TTTAGCTAAGTTCTTTAACA......AGAGAATTTAAATAAGC

6-IRR

TTGGG..TTTAGCTAAGTTCTTTAACA......AGAGAATTTAAATAAGC

7-IRR

TTGGG..TTTAGCTAAGTTCTCTAACA......AGAGAATTTAAATAAGC

8-IRR

TTGGG..TTTAGCTAAGTTCTTAGACA......AGAGAATTTAAATAAGC

9-IRR

TTGGG..TTTAGCTAAGTTCTTTAACA......AGAGAATTTAAATAAGT

56-IRR

TTGGG..TTTAGCTAAGTTCTTAGATA......AGAGAATTTAAATAAAC

cp9-IRR

TAGGG..CTTTTCTAAGTTCTTTTAAA......AGAGAATTTAGTAAAGC

   
 

————....————————————>..<———————————....————

1-IRL

CC....TATTTTTTTGTAAAATTTTTTGTAAAAAAG....TTGGCAAAAA

3-IRL

CCG...CACCTTTTTGTAAAGATTTTTGTAAAAAAG....TTGGCAAAAA

4-IRL

CCCAC.TTCTTTTGTGTAAAATTTTTTGTAAAAAAG....TTGGCAAAAA

5-IRL

CC....TAATTTTTTGTAAAAATTTTTGTAAAAAAG....TTGGCAAAAA

6-IRL

CCCAC.TTCTTTTTTGTAAAAATTTTTGTAAAAAGC....CTGACAAAAA

7-IRL

CCCAC.TTCTTTTTTACAAAAATTTTTGTAAAAAAG....TTGGCAAAAA

8-IRL

CC...TATTTTTTTTGTAAAAATTTTTGTAAAAAAG....TTGGCAAAAA

9-IRL

CCGCAC...TTTTTGTAAAAATTTTTTGTAAAGAAG....TTGGCAAAAA

56-IRL

CC...TAT.TTTTTTGTAAAAATTTTTGTAAAAAAG....TTGGCAAAAA

cp9-IRL

CCTA..AGTCTTTTAACAAAAATTTTTATTAAAAAA..AGTTGACAAAAA

1-IRR

CC...TAT.TTTTTTGTAAAATTTTTTGTAAAAAAG....TTGGCAAAAA

3-IRR

CCAACTAT.TTTTTTACAAAAATTTTTGTAAAAAAAAAAGTTGGCAAAAA

4-IRR

CCAACTA.ATTTTTTGTAAAATTTTTTGTAAAAAAG....TTGGCAAAAA

6-IRR

CCAACTAT.TTTTTTGTAAAAATTTTTGTAAAAAAG....TTGGCAAAAA

7-IRR

CCAACTAA.TTTTTTGTAAAATTTTTTGTAAAAAAG....TTGGCAAAAA

8-IRR

CCAACTATTTTTTTTGTAAAGATTTTTGTAAAAAAG....TTGGCAAAAA

9-IRR

CCAACTA.TTTTTTTGTAAAAATTTTTGTAAAAAAG....TTGTCAAAAA

56-IRR

CCAACTA.TTTTTTTGTAAAATTTTTTGTAAAAAAG...CCTGACAAAAA

cp9-IRb

CCTA..AGTGTTTTAATAAAAAATTTTTTGTAAAAA..AGTTGGCAAAAA

   
 

......•••..................................xxxx

1-IRL

TAGTTTTTGCTATATACTTATTTTTATAAATAACC..ATA...GGAGTAA

3-IRL

TAGTTTTTGCTATATATTTATTTTTAT..ACAAAT..ATAA..GGAGAAA

4-IRL

TAGTTTTTGCTATATAATTA..TTTATTACAA....AATAA..GGAGGAA

5-IRL

TAGTTTTTGCTATATACTTATATTTATTAATACCA..ATTAAAGGAGGAA

6-IRL

TAGTTTTTGCTATATACTTAT..TTATTACTAA....ATAA..AGGAGGA

7-IRL

TAGTTTTTGCTATATACTTATATTTATTACTAT....AAAA..GGAGTAA

8-IRL

TAGTTTTTGCTATATACTTATATTTATTAATACAA..ACAA..GGAGGAA

9-IRL

TAGTTTTTGCTATATACTTATATTTATTAATACAT..ATAAACGGAGGAA

56-IRL

TAGTTTTTGCTATATACTTATATTTATTGAAAA...AACA...GGAGGAA

cp9-IRa

TAGTTTTTGCTATATATTTATATATAAGAAAATTATAACTTACGGAGTAA

1-IRR

TAGTTTTTGCTATATACTTATTTTTATAAATAACC..ATA...GGAGTAA

3-IRR

TAGTTTTTGCTATATATTTATTTTTAT.ACAAAT...ATAA..GGAGAAA

4-IRR

TAGTTTTTGCTATATAATTAT..TTATTACAA....AATAA..GGAGGAA

6-IRR

TAGTTTTTGCTATATACTTAT..TTATTACTA....AATAA..AGGAGGA

7-IRR

TAGTTTTTGCTATATACTTATATTTATTACTATAA.AA.....GGAGTAA

8-IRR

TAGTTTTTGCTATATACTTATATTTATTAATACA..AACAA..GGAGGAA

9-IRR

TAGTTTTTGCTATGTAATTAT.....TTATTACAA.AATAA..GGAGGAA

56-IRR

TAGTTTTTGCTATATACTTATATTTTTTACTATAA.AA.....GGAGTAA

cp9-IRb

TAGTTTTTGCTATATATTTATATATAAGAAAATTATAACTTGCGGAGTAA

   
 

....***=============================>translation

1-IRL

AAAGATGGAAAATCTTTCAAACAATAAT...CAAGAAATACAAAATAATA

3-IRL

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

4-IRL

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

5-IRL

AAAGATGGAAAATCTTTCAAACAATAATAATCAAGAAATACAAAATAATA

6-IRL

AAACATGAACAATGTTTCAAACAATAATAATCAAGAAATACAAAATAATA

7-IRL

AAAGATGGAAAATCTTTCAAACAATAATAATCAAGAAATACAAAATAATA

8-IRL

AAAGATGGAAAATCTTTCAAACAATAAT...CAAGAAATACAAAATAATA

9-IRL

AAAGATGGAAAATCTTTAAAACAATAATAATC......CACAAGAAAATA

56-IRL

AAAGATGGAAAATCTTTCAAACAATAATAATCAAGAAATACAAAATAATA

cp9-IRa

AAAAATGAAAAACC..GCAAA.AACAATAATC......CACAAGAAATTA

1-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

3-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

4-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

6-IRR

AAACATGAACAATGTTTCAAACAATAATAATC......CACAAGACAATA

7-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

8-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

9-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

56-IRR

AAAGATGGAAAATCTTTCAAACAATAATAATC......CACAAGAAAATA

cp9-IRb

AAAAATGAAAAACT.....TA.AACAATAATC......CACAAAAAATTA

   
 

Outside end

1-IRL

TTCAA

3-IRL

TTCAA

4-IRL

TTCAA

5-IRL

TTCAA

6-IRL

TTCAA

7-IRL

TTCAA

8-IRL

TTCAA

9-IRL

TTCAA

56-IRL

TTCAA

cp9-IRa

ATCAA

1-IRR

TTCAA

3-IRR

TTCAA

4-IRR

TTCAA

6-IRR

TTCAA

7-IRR

TTCAA

8-IRR

TTCAA

9-IRR

TTCAA

56-IRR

TTCAA

cp9-IRb

ATCAA

 

 

Part VI

"AMBIGUOUS" NUCLEOTIDES IN THE BORRELIA BURGDORFERI B31 GENOME SEQUENCE

Compiled by Jeremy Peterson & Sherwood Casjens - January , 1999

A few nucleotide positions in the B. burgdorferi B31 genome sequence were determined to have sequencing template clones of two types in the DNA library. These are likely to represent heterogeneity in the culture that was sequenced and are listed below. They are indicated in the GENBANK sequences with ambiguous nucleotide symbols.

Standard ambiguous nucleotide nomenclature is used in the GENBANK entries and below:

R=A,G

Y=C,T

M=A,C

K=G,T

S=C,G

W=A,T

N=A,G,C,T

Ambiguous nucleotides in the long chromosome

nucleotide, position

R, 18473

R, 62338

Y, 83386

M, 88493

R, 141298

R, 175458

Y, 188937

M, 194913

S, 198850

M, 210217

Y, 260019

K, 267641

R, 318524

W, 322559

M, 351692

Y, 367774

M, 390255

M, 390264

M, 390397

Y, 461147

R, 461510

Y, 478223

R, 511724

W, 540030

R, 540041

M, 540430

M, 540431

R, 546978

S, 552218

K, 565089

S, 586413

K, 608556

K, 658807

K, 658810

R, 691440

M, 758684

R, 762044

W, 781552

R, 796301

N, 804071

N, 804561

K, 834040

W, 889305

Ambiguous nucleotides in cp9

none

Ambiguous nucleotides in cp26

none

Ambiguous nucleotides in cp32-1

none

Ambiguous nucleotides in cp32-3

none

Ambiguous nucleotides in cp32-4

none

Ambiguous nucleotides in cp32-6

none

Ambiguous nucleotides in cp32-7

none

Ambiguous nucleotides in cp32-8

none

Ambiguous nucleotides in cp32-9

none

Ambiguous nucleotides in lp5

nucleotide, position

M, 4530

Ambiguous nucleotides in lp17

none

Ambiguous nucleotides in lp21

none

Ambiguous nucleotides in lp25

nucleotide, position

Y, 2440

M, 3601

Y, 6064

S, 21223

R, 21237

W, 21835

Y, 21837

Ambiguous nucleotides in lp28-1

none

Ambiguous nucleotides in lp28-2

none

Ambiguous nucleotides in lp28-3

nucleotide, position

M, 12885

W, 27274

Ambiguous nucleotides in lp28-4

none

Ambiguous nucleotides in lp36

none

Ambiguous nucleotides in lp38

nucleotide, position

M, 30313

Y, 30403

Ambiguous nucleotides in lp54

nucleotide, position

R, 10215

K, 10662

Ambiguous nucleotides in lp56

none

 

 

Part VII

B. burgdorferi B31 Plasmid Sequence Assembly

by Granger Sutton, Patti Rosa and Sherwood Casjens - February 1999

An improved version of TIGR ASSEMBLER was required to uniquely assemble the highly similar cp32s and lp56, as well as the very similar lp5 and lp21 plasmids. Very similar regions of DNA with only single base pair differences needed to be differentiated. In order to do this, unique base pairs within these repetitive regions needed to be identified. The new TIGR ASSEMBLER does this by counting the number of occurrences of each 32 bp oligomer (32mer) in the random sequence reads. 32mers which are not unique to a single plasmid or region will be over-represented. In determining the assembly, TIGR ASSEMBLER gives more weight to overlapping sequence reads which contain the same relatively under-represented 32mers, while giving much less weight to over-represented 32mers. In regions where there were few or no unique base pairs, clone mate constraints guided the assembly process. Clone mates are sequence reads from opposite ends of the same DNA clone. By using the known orientation of the clone mates and approximate size of the clone, TIGR ASSEMBLER chooses sequence read overlaps which satisfy the clone mate constraints. A new feature of TIGR ASSEMBLER which was essential for assembling the long tandem 63 bp repeat in lp21 is tandem 32mer masking. By default, any 32mer which occurs more than once in a single sequence read is not used for sequence overlap determination. This allowed small differences in the tandem repeat unit to determine correct overlaps while ignoring the myriad of possible non unique overlaps. The last important new feature of TIGR ASSEMBLER is the ability to initialize the assembly process with a previous set of assemblies from TIGR ASSEMBLER. The new TIGR ASSEMBLER did not achieve the final assemblies of the cp32s/lp56 or lp21 on the first iteration. The assemblies were inspected and, when determined to be incorrect, an assembly or portion of an assembly was discarded. The remaining assemblies and portion of assemblies were used to jump start TIGR ASSEMBLER and the process was repeated with different parameters until unique, internally consistent assemblies were achieved. Internal consistency was judged from clone mate constraints and base pair differences between overlapping sequence reads. This process was carried out without using information from the restriction maps of the plasmids that are described in the text.

Because of the difficulties encountered in the sequence assembly process (due to the extensive similarities among the plasmids), it was necessary to confirm the accuracy of the assembly of the plasmid sequences in B31. Restriction maps of five of the cp32 plasmids and of cp26 from B31 have been previously described in Casjens et al. (1997b) and Tilly et al. (1997), respectively. We therefore screened the pUC plasmid library clones used in the sequencing project (Fraser et al., 1997) for ones that hybridized uniquely or nearly so to only one of the remaining B31 linear plasmids in Southern analysis of CHEF gels of intact and of restricted B31 MI DNA, and these were used to construct restriction site maps of the cognate plasmids according to the strategy described by Casjens et al. (1997b) The structures of cp9 and lp17 were not confirmed in this way, since (i) they assembled unambiguously even with the original, less stringent TIGR ASSEMBLER, and (ii) Barbour et al. (1996) previously reported the sequence of B31 lp17, and (iii) Dunn et al. (1994) previously reported the sequence of a cp9-like plasmid from a related isolate. Our sequence assemblies agree with those sequences. For example for lp56 all of the 23 mapped sites were correctly predicted within experimental error by the assembled sequence of lp56. 344 restriction enzyme cleavage sites were mapped on the 19 plasmids, and all are correctly predicted by the sequence. In addition, no sites were found in the sequence that were not in the plasmid DNAs and vice versa (six differences between predictions made by the sequence and the previously published cp32-1, cp32-3, cp32-4 and cp32-6 restriction maps were found by further experimentation to be mapping errors).

Assembly of the sequences of the cp32s and the closely related portion of lp56 were particularly difficult. Nonetheless, they are likely to be correct since all of their restriction maps are predicted perfectly by the nucleotide sequences, which were assembled without knowledge of the restriction maps. All eight previously mapped restriction sites (or lack thereof) that are diagnostic of a single cp32 are present in the assembled sequence on the correct cp32, and all of the 19 blocks of sequence that had been previously mapped to individual cp32s (Casjens et al., 1997; Stevenson et al., 1998b; Zuckert and Meyer, 1996) are present in the correct cp32 at the experimentally determined location. (We note that the pOMB25 sequence that was attributed without mapping data to cp32-1 (Zuckert and Meyer, 1996) is actually in the cp32-3 sequence.) Although we feel it is unlikely, it remains possible that small regions of similar sequences could be placed on the wrong cp32 by our assembly technique.

Assembly of the lp21 sequence had a special problem in that it contains a long tract of a 63 bp direct repeat. The lp21 sequence reported here contains 11,004 bp, or 176 (plus one partial) tandem copies of the repeat unit. There are no unique, unrelated sequences interspersed among the 63 bp repeats, and not all of the repeats are identical, as is indicated experimentally by the small number of Tsp509I sites within the tract. This non-identity made assembly from random sequencing runs possible. In the reported sequence there are 34 distinct repeat sequences; 27 of these types (128 total repeat units) are 63 bp long and 7 types (48 units) are 61 bp long. The maximum number of adjacent identical units is three, and there are two large exact repeats within the tract; units 2-19 are identical to units 129-146 and units 20-30 are identical to units 31-41. In order to experimentally characterize this repeat tract further, we used Southern analysis to measure the sizes of several restriction fragments that contain all the repeats. CHEF electrophoresis gels of B31 MI DNA cleaved with MseI, DraI, AseI, HindIII, EcoRI, StuI, BsrGI, XbaI and EcoO109I (all of which are predicted not to cleave within the repeat region) gave single DNA bands of 11, 13, 14, 13, 16, 17, 18, 16 and 18 kbp (all ±1 kb), respectively, that hybridize to a 63 bp repeat DNA probe. In addition, Tsp509I gave probe reactive fragments of about 3.0, 3.5 and 4.3 kbp. MseI and Tsp509I cleave at TTAA and AATT, respectively, and so are expected to cleave the 71.8% A+T Borrelia DNA to an average fragment size of 60-70 bp; indeed the fragments mentioned above are by far the largest fragments (as observed by ethidium bromide staining) in complete digests of B31 DNA made by these two enzymes. The three Tsp509I fragments from genomic B31 DNA were gel purified and used as probes in Southern analyses of electrophoresis gels of uncleaved and MseI digested B31 DNA. All were found t o hybridize only to lp21 DNA and to the 11 kbp MseI band (data not shown). We also confirmed predictions from the sequence that three other four bp-recognizing restriction enzymes, AluI, RsaI and Sau3AI, do not cut the repeat region, and that the six bp recognizing SspI cuts the entire repeat region into very small (<300 bp) fragments. Calculations from the locations of the cleavage sites for the above nine enzymes in the surrounding unique sequence gave a repeat tract length value of 11.9±1.0 kbp. This is in reasonable agreement with the 11 kbp in the sequence, as are the predicted sizes of the Tsp509I repeat-containing bands (2.9, 3.6 and 4.5 kbp; cf. measured sizes above), and we conclude that the assembly of this repeat region is likely to be accurate. Thus the sequences of all twenty-one of the plasmids are strongly supported either by physical maps that are correctly predicted by the sequence or by independent sequence determinations.

 

 

 

 

 

PART VIII

REFERENCES

Akins, D. R., Caimano, M. J., Yang, X., Cerna, F., Norgard, M. V., and Radolf, J. D. (1999) Molecular and Evolutionary Analysis of Borrelia burgdorferi 297 Circular Plasmid-Encoded Lipoproteins with OspE- and OspF-Like Leader Peptides. Infect Immun 67: 1526-1532.

Akins, D. R., Popova, T., Brusca, J., Goldberg, M. L., Li, M., Baker, S. C., and Norgard, M. V. (1993) Use of PhoA Gene Fusions and Anchored PCR to Identify and Express Borrelia burgdorferi Candidate Outer Membrane Proteins. GENBANK ACCESSION #: L31427.

Akins, D. R., Popova, T., Brusca, J., Goldberg, M. L., Li, M., Baker, S. C., and Norgard, M. V. (1995a) Use of PhoA Gene Fusions and Anchored PCR to Identify and Express Borrelia burgdorferi Candidate Outer Membrane Proteins. GENBANK ACCESSION #: L31423.

Akins, D. R., Porcella, S., Norgard, M. V., and Radolf, J. D. (1994) Borrelia burgdorferi protein p23. GENBANK ACCESSION #: L31616.

Akins, D. R., Porcella, S. F., Popova, T. G., Shevchenko, D., Baker, S. I., Li, M., Norgard, M. V., and Radolf, J. D. (1995b) Evidence for in vivo but not in vitro expression of a Borrelia burgdorferi outer surface protein F (OspF) homologue. Mol Microbiol 18: 507-20.

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-402.

Amouriaux, P., Assous, M., Margarita, D., Baranton, G., and Saint Girons, I. (1993) Polymerase chain reaction with the 30-kb circular plasmid of Borrelia burgdorferi B31 as a target for detection of the Lyme borreliosis agents in cerebrospinal fluid. Res Microbiol 144: 211-9.

Balmelli, T., Valsangiacomo, C., Peter, O., and Piffaretti, J.-C. (1996) Borrelia garinii 70 kbp plasmid D6 protein gene. GENBANK ACCESSION #: U50840.

Bancroft, I., and Wolk, C. P. (1989) Characterization of an insertion sequence (IS891) of novel structure from the cyanobacterium Anabaena sp. strain M-131. J Bacteriol 171: 5949-54.

Barbour, A. G., and Carter, C. J. (1997) Putative transposase with translational frameshifting in the Lyme disease agent Borrelia burgdorderi. GENBANK ACCESSION #: U85588.

Barbour, A. G., Carter, C. J., Bundoc, V., and Hinnebusch, J. (1996) The nucleotide sequence of a linear plasmid of Borrelia burgdorferi reveals similarities to those of circular plasmids of other prokaryotes. J Bacteriol 178: 6635-9.

Barbour, A. G., Tessier, S. L., and Hayes, S. F. (1984) Variation in a major surface protein of Lyme disease spirochetes. Infect Immun 45: 94-100.

Barbour, A. G., Tessier, S. L., and Todd, W. J. (1983) Lyme disease spirochetes and ixodid tick spirochetes share a common surface antigenic determinant defined by a monoclonal antibody. Infect Immun 41: 795-804.

Bergstrom, S., Bundoc, V. G., and Barbour, A. G. (1989) Molecular analysis of linear plasmid-encoded major surface proteins, OspA and OspB, of the Lyme disease spirochaete Borrelia burgdorferi. Mol Microbiol 3: 479-86.

Bono, J. L., Tilly, K., Stevenson, B., Hogan, D., and Rosa, P. (1998) Oligopeptide permease in Borrelia burgdorferi: putative peptide-binding components encoded by both chromosomal and plasmid loci. Microbiology 144: 1033-44.

Brandt, M. E., Riley, B. S., Radolf, J. D., and Norgard, M. V. (1990) Immunogenic integral membrane proteins of Borrelia burgdorferi are lipoproteins. Infect Immun 58: 983-91.

Bunikis, J., Olsen, B., Fingerle, V., Bonnedahl, J., Wilske, B., and Bergstrom, S. (1996) Molecular polymorphism of the lyme disease agent Borrelia garinii in northern Europe is influenced by a novel enzootic Borrelia focus in the North Atlantic. J Clin Microbiol 34: 364-8.

Caporale, D. A., and Kocher, T. D. (1994) Sequence variation in the outer-surface-protein genes of Borrelia burgdorferi. Mol Biol Evol 11: 51-64.

Casjens, S., Delange, M., Ley, H. L., 3rd, Rosa, P., and Huang, W. M. (1995) Linear chromosomes of Lyme disease agent spirochetes: genetic diversity and conservation of gene order. J Bacteriol 177: 2769-80.

Casjens, S., Palmer, N., van Vugt, R., Huang, W. M., Stevenson, B., Rosa, P., Lathigra, R., Sutton, G., Peterson, J., Dodson, R., Haft, D., Hickey, E., Gwinn, M., White, O., and Fraser, C. (2000) A bacterial genome in flux: The twelve linear and nine circular extrachromosomal DNAs of an infectious isolate of the Lyme disease spirochete Borrelia burgdorferi. Molecular Microbiology 35: in press.

Casjens, S., van Vugt, R., Tilly, K., Rosa, P. A., and Stevenson, B. (1997) Homology throughout the multiple 32-kilobase circular plasmids present in Lyme disease spirochetes. J Bacteriol 179: 217-27.

Champion, C. I., Blanco, D. R., Skare, J. T., Haake, D. A., Giladi, M., Foley, D., Miller, J. N., and Lovett, M. A. (1994) A 9.0-kilobase-pair circular plasmid of Borrelia burgdorferi encodes an exported protein: evidence for expression only during infection. Infect Immun 62: 2653-61.

Donadio, S., and Staver, M. J. (1993) IS1136, an insertion element in the erythromycin gene cluster of Saccharopolyspora erythraea. Gene 126: 147-51.

Dunn, J. J., Buchstein, S. R., Butler, L. L., Fisenne, S., Polin, D. S., Lade, B. N., and Luft, B. J. (1994) Complete nucleotide sequence of a circular plasmid from the Lyme disease spirochete, Borrelia burgdorferi. J Bacteriol 176: 2706-17.

Feng, S., Das, S., Barthold, S. W., and Fikrig, E. (1996) Characterization of two genes, p11 and p5, on the Borrelia burgdorferi 49-kilo base linear plasmid. Biochim Biophys Acta 1307: 270-2.

Feng, S., Das, S., Lam, T., Flavell, R. A., and Fikrig, E. (1995) A 55-kilodalton antigen encoded by a gene on a Borrelia burgdorferi 49- kilobase plasmid is recognized by antibodies in sera from patients with Lyme disease. Infect Immun 63: 3459-66.

Fikrig, E., Barthold, S. W., Sun, W., Feng, W., Telford, S. R., 3rd, and Flavell, R. A. (1997) Borrelia burgdorferi P35 and P37 proteins, expressed in vivo, elicit protective immunity. Immunity 6: 531-9.

Fikrig, E., Chen, M., Barthold, S. W., Anguita, J., Feng, W., Telford, S. R., and Flavell, R. A. (1999) Borrelia burgdorferi erpT expression in the arthropod vectcor and murine host. Molecular Microbiology 21: 281-290.

Fraser, C. M., Casjens, S., Huang, W. M., Sutton, G. G., Clayton, R., Lathigra, R., White, O., Ketchum, K. A., Dodson, R., Hickey, E. K., Gwinn, M., Dougherty, B., Tomb, J. F., Fleischmann, R. D., Richardson, D., Peterson, J., Kerlavage, A. R., Quackenbush, J., Salzberg, S., Hanson, M., van Vugt, R., Palmer, N., Adams, M. D., Gocayne, J., Venter, J. C., and et al. (1997) Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390: 580-6.

Fuchs, R., Jauris, S., Lottspeich, F., Preac-Mursic, V., Wilske, B., and Soutschek, E. (1992) Molecular analysis and expression of a Borrelia burgdorferi gene encoding a 22 kDa protein (pC) in Escherichia coli. Mol Microbiol 6: 503-9.

Gilmore, R. D., Jr., Kappel, K. J., and Johnson, B. J. (1997) Molecular characterization of a 35-kilodalton protein of Borrelia burgdorferi, an antigen of diagnostic importance in early Lyme disease. J Clin Microbiol 35: 86-91.

Gilmore, R. D., Jr., and Mbow, M. L. (1998) A monoclonal antibody generated by antigen inoculation via tick bite is reactive to the Borrelia burgdorferi Rev protein, a member of the 2.9 gene family locus. Infect Immun 66: 980-6.

Guina, T., and Oliver, D. B. (1997) Cloning and analysis of a Borrelia burgdorferi membrane-interactive protein exhibiting haemolytic activity. Mol Microbiol 24: 1201-13.

Gulig, P. A., Caldwell, A. L., and Chiodo, V. A. (1992) Identification, genetic analysis and DNA sequence of a 7.8-kb virulence region of the Salmonella typhimurium virulence plasmid. Mol Microbiol 6: 1395-411.

Guo, B. P., Brown, E. L., Dorward, D. W., Rosenberg, L. C., and Hook, M. (1998) Decorin-binding adhesins from Borrelia burgdorferi. Molecular Microbiology 30: 711-23.

Hagman, K. E., Lahdenne, P., Popova, T. G., Porcella, S. F., Akins, D. R., Radolf, J. D., and Norgard, M. V. (1998) Decorin-binding protein of Borrelia burgdorferi is encoded within a two- gene operon and is protective in the murine model of Lyme borreliosis. Infect Immun 66: 2674-83.

Hanson, M. S., Cassatt, D. R., Guo, B. P., Patel, N. K., McCarthy, M. P., Dorward, D. W., and Hook, M. (1998) Active and passive immunity against Borrelia burgdorferi decorin binding protein A (DbpA) protects against infection. Infect Immun 66: 2143-53.

Hinnebusch, J., Bergstrom, S., and Barbour, A. G. (1990) Cloning and sequence analysis of linear plasmid telomeres of the bacterium Borrelia burgdorferi. Mol Microbiol 4: 811-20.

Indest, K. J., Ramamoorthy, R., Sole, M., Gilmore, R. D., Johnson, B. J., and Philipp, M. T. (1997) Cell-density-dependent expression of Borrelia burgdorferi lipoproteins in vitro. Infect Immun 65: 1165-71.

Jauris-Heipke, S., Fuchs, R., Motz, M., Preac-Mursic, V., Schwab, E., Soutschek, E., Will, G., and Wilske, B. (1993) Genetic heterogenity of the genes coding for the outer surface protein C (OspC) and the flagellin of Borrelia burgdorferi. Med Microbiol Immunol (Berl) 182: 37-50.

Jauris-Heipke, S., Liegl, G., Preac-Mursic, V., Rossler, D., Schwab, E., Soutschek, E., Will, G., and Wilske, B. (1995) Molecular analysis of genes encoding outer surface protein C (OspC) of Borrelia burgdorferi sensu lato: relationship to ospA genotype and evidence of lateral gene exchange of ospC. J Clin Microbiol 33: 1860-6.

Jonsson, M., Noppa, L., Barbour, A. G., and Bergstrom, S. (1992) Heterogeneity of outer membrane proteins in Borrelia burgdorferi: comparison of osp operons of three isolates of different geographic origins. Infect Immun 60: 1845-53.

Katona, L. I., Beck, G., and Habicht, G. S. (1992) Purification and immunological characterization of a major low- molecular-weight lipoprotein from Borrelia burgdorferi. Infect Immun 60: 4995-5003.

Krause, M., Harwood, J., Fierer, J., and Guiney, D. (1991) Genetic analysis of homology between the virulence plasmids of Salmonella dublin and Yersinia pseudotuberculosis. Infect Immun 59: 1860-3.

Lahdenne, P., Porcella, S. F., Hagman, K. E., Akins, D. R., Popova, T. G., Cox, D. L., Katona, L. I., Radolf, J. D., and Norgard, M. V. (1997) Molecular characterization of a 6.6-kilodalton Borrelia burgdorferi outer membrane-associated lipoprotein (lp6.6) which appears to be downregulated during mammalian infection. Infect Immun 65: 412-21.

Lam, T. T., Nguyen, T. P., Montgomery, R. R., Kantor, F. S., Fikrig, E., and Flavell, R. A. (1994) Outer surface proteins E and F of Borrelia burgdorferi, the agent of Lyme disease. Infect Immun 62: 290-8.

Li, H., Dunn, J. J., Luft, B. J., and Lawson, C. L. (1997) Crystal structure of Lyme disease antigen outer surface protein A complexed with an Fab. Proc Natl Acad Sci U S A 94: 3584-9.

Marconi, R. T., Casjens, S., Munderloh, U. G., and Samuels, D. S. (1996a) Analysis of linear plasmid dimers in Borrelia burgdorferi sensu lato isolates: implications concerning the potential mechanism of linear plasmid replication. J Bacteriol 178: 3357-61.

Marconi, R. T., Konkel, M. E., and Garon, C. F. (1993a) Variability of osp genes and gene products among species of Lyme disease spirochetes. Infect Immun 61: 2611-7.

Marconi, R. T., Samuels, D. S., and Garon, C. F. (1993b) Transcriptional analyses and mapping of the ospC gene in Lyme disease spirochetes. J Bacteriol 175: 926-32.

Marconi, R. T., Samuels, D. S., Landry, R. K., and Garon, C. F. (1994) Analysis of the distribution and molecular heterogeneity of the ospD gene among the Lyme disease spirochetes: evidence for lateral gene exchange. J Bacteriol 176: 4572-82.

Marconi, R. T., Samuels, D. S., Schwan, T. G., and Garon, C. F. (1993c) Identification of a protein in several Borrelia species which is related to OspC of the Lyme disease spirochetes. J Clin Microbiol 31: 2577-83.

Marconi, R. T., Sung, S. Y., Hughes, C. A., and Carlyon, J. A. (1996b) Molecular and evolutionary analyses of a variable series of genes in Borrelia burgdorferi that are related to ospE and ospF, constitute a gene family, and share a common upstream homology box. J Bacteriol 178: 5615-26.

Margolis, N., Hogan, D., Cieplak, W., Jr., Schwan, T. G., and Rosa, P. A. (1994a) Homology between Borrelia burgdorferi OspC and members of the family of Borrelia hermsii variable major proteins. Gene 143: 105-10.

Margolis, N., Hogan, D., Tilly, K., and Rosa, P. A. (1994b) Plasmid location of Borrelia purine biosynthesis gene homologs. J Bacteriol 176: 6427-32.

Masuzawa, T., Komikado, T., and Yanagihara, Y. (1997) PCR-restriction fragment length polymorphism analysis of the ospC gene for detection of mixed culture and for epidemiological typing of Borrelia burgdorferi sensu stricto. Clin Diagn Lab Immunol 4: 60-3.

Mathiesen, D. A., Oliver, J. H., Jr., Kolbert, C. P., Tullson, E. D., Johnson, B. J., Campbell, G. L., Mitchell, P. D., Reed, K. D., Telford, S. R., 3rd, Anderson, J. F., Lane, R. S., and Persing, D. H. (1997) Genetic heterogeneity of Borrelia burgdorferi in the United States. J Infect Dis 175: 98-107.

McGrath, B. C., Dunn, J. J., Buchstein, S. R., and Luft, B. J. (1997) Adaptation of RARE cleavage for mapping genes in Borrelia burgdorferi. GENBANK ACCESSION #: U22451.

McGrath, B. C., Dunn, J. J., Gorgone, G., Guttman, D., Dykhuizen, D., and Luft, B. J. (1995) Identification of an immunologically important hypervariable domain of major outer surface protein A of Borrelia burgdorferi [published erratum appears in Infect Immun 1995 Jun;63(6):2390]. Infect Immun 63: 1356-61.

Murai, N., Kamata, H., Nagashima, Y., Yagisawa, H., and Hirata, H. (1995) A novel insertion sequence (IS)-like element of the thermophilic bacterium PS3 promotes expression of the alanine carrier protein- encoding gene. Gene 163: 103-7.

Nevill-Manning, C. G., Wu, T. D., and Brutlag, D. L. (1998) Highly specific protein sequence motifs for genome analysis. Proc Natl Acad Sci U S A 95: 5865-71.

Norris, S. J., Carter, C. J., Howell, J. K., and Barbour, A. G. (1992) Low-passage-associated proteins of Borrelia burgdorferi B31: characterization and molecular cloning of OspD, a surface-exposed, plasmid-encoded lipoprotein. Infect Immun 60: 4662-72.

Porcella, S. F., Popova, T. G., Akins, D. R., Li, M., Radolf, J. D., and Norgard, M. V. (1996) Borrelia burgdorferi supercoiled plasmids encode multicopy tandem open reading frames and a lipoprotein gene family. J Bacteriol 178: 3293-307.

Probert, W., and Johnson, B. (1998) Identification of a 47 kd fibronectin-binding protein expressed by Borrelia burgdorferi isolate B31. Molecular Microbiology 30: 1003-1015.

Reindl, M., Redl, B., and Stoffler, G. (1993) Isolation and analysis of a linear plasmid-located gene of Borrelia burgdorferi B29 encoding a 27 kDa surface lipoprotein (P27) and its overexpression in Escherichia coli. Mol Microbiol 8: 1115-24.

Rosa, P. A., Schwan, T., and Hogan, D. (1992) Recombination between genes encoding major outer surface proteins A and B of Borrelia burgdorferi. Mol Microbiol 6: 3031-40.

Samuels, D. S., Marconi, R. T., and Garon, C. F. (1993) Variation in the size of the ospA-containing linear plasmid, but not the linear chromosome, among the three Borrelia species associated with Lyme disease. J Gen Microbiol 139: 2445-9.

Schwan, T. G., Piesman, J., Golde, W. T., Dolan, M. C., and Rosa, P. A. (1995) Induction of an outer surface protein on Borrelia burgdorferi during tick feeding. Proc Natl Acad Sci U S A 92: 2909-13.

Skare, J. T., Champion, C. I., Mirzabekov, T. A., Shang, E. S., Blanco, D. R., Erdjument-Bromage, H., Tempst, P., Kagan, B. L., Miller, J. N., and Lovett, M. A. (1996) Porin activity of the native and recombinant outer membrane protein Oms28 of Borrelia burgdorferi. J Bacteriol 178: 4909-18.

Stevenson, B., and Barthold, S. W. (1994) Expression and sequence of outer surface protein C among North American isolates of Borrelia burgdorferi. FEMS Microbiol Lett 124: 367-72.

Stevenson, B., Bockenstedt, L. K., and Barthold, S. W. (1994) Expression and gene sequence of outer surface protein C of Borrelia burgdorferi reisolated from chronically infected mice. Infect Immun 62: 3568-71.

Stevenson, B., Bono, J. L., Schwan, T. G., and Rosa, P. (1998a) Borrelia burgdorferi erp proteins are immunogenic in mammals infected by tick bite, and their synthesis is inducible in cultured bacteria. Infect Immun 66: 2648-54.

Stevenson, B., Casjens, S., and Rosa, P. (1998b) Evidence of past recombination events among the genes encoding the Erp antigens of Borrelia burgdorferi. Microbiology 144: 1869-79.

Stevenson, B., Casjens, S., van Vugt, R., Porcella, S. F., Tilly, K., Bono, J. L., and Rosa, P. (1997) Characterization of cp18, a naturally truncated member of the cp32 family of Borrelia burgdorferi plasmids. J Bacteriol 179: 4285-91.

Stevenson, B., Schwan, T. G., and Rosa, P. A. (1995) Temperature-related differential expression of antigens in the Lyme disease spirochete, Borrelia burgdorferi. Infect Immun 63: 4535-9.

Stevenson, B., Tilly, K., and Rosa, P. A. (1996) A family of genes located on four separate 32-kilobase circular plasmids in Borrelia burgdorferi B31. J Bacteriol 178: 3508-16.

Suk, K., Das, S., Sun, W., Jwang, B., Barthold, S. W., Flavell, R. A., and Fikrig, E. (1995) Borrelia burgdorferi genes selectively expressed in the infected host. Proc Natl Acad Sci U S A 92: 4269-73.

Sutcliffe, I., and Russell, R. (1995) Lipoproteins of Gram-positive bacteria. J. Bacteriol. 177: 1123-1128.

Theisen, M. (1996) Molecular cloning and characterization of nlpH, encoding a novel, surface-exposed, polymorphic, plasmid-encoded 33-kilodalton lipoprotein of Borrelia afzelii. J Bacteriol 178: 6435-42.

Tilly, K., Casjens, S., Stevenson, B., Bono, J. L., Samuels, D. S., Hogan, D., and Rosa, P. (1997) The Borrelia burgdorferi circular plasmid cp26: conservation of plasmid structure and targeted inactivation of the ospC gene. Mol Microbiol 25: 361-73.

Wallich, R., Brenner, C., Kramer, M. D., and Simon, M. M. (1995) Molecular cloning and immunological characterization of a novel linear- plasmid-encoded gene, pG, of Borrelia burgdorferi expressed only in vivo. Infect Immun 63: 3327-35.

Wallich, R., Helmes, C., Schaible, U. E., Lobet, Y., Moter, S. E., Kramer, M. D., and Simon, M. M. (1992) Evaluation of genetic divergence among Borrelia burgdorferi isolates by use of OspA, fla, HSP60, and HSP70 gene probes. Infect Immun 60: 4856-66.

Wallich, R., Schaible, U. E., Simon, M. M., Heiberger, A., and Kramer, M. D. (1989) Cloning and sequencing of the gene encoding the outer surface protein A (OspA) of a European Borrelia burgdorferi isolate. Nucleic Acids Res 17: 8864.

Wang, I., Dykhuizen, D., Qiu, W., Dunn, J., Bosler, E., and Luft, B. (1999) Genetic diversity of ospC in a local population of Borrelia burgdorferi sensu stricto. Genetics 151: 15-30.

Wang, J., Masuzawa, T., Komikado, T., and Yanagihara, Y. (1997a) Consensus sequence on the genes encoding the major outer surface proteins (OspA and OspB) of Borrelia garinii isolate. Microbiol Immunol 41: 83-91.

Wang, J., Masuzawa, T., Li, M., and Yanagihara, Y. (1997b) Deletion in the genes encoding outer surface proteins OspA and OspB of Borrelia garinii isolated from patients in Japan. Microbiol Immunol 41: 673-9.

Wang, J., Masuzawa, T., Li, M., and Yanagihara, Y. (1997c) An unusual illegitimate recombination occurs in the linear-plasmid- encoded outer-surface protein A gene of Borrelia afzelii. Microbiology 143: 3819-25.

Will, G., Jauris-Heipke, S., Schwab, E., Busch, U., Rossler, D., Soutschek, E., Wilske, B., and Preac-Mursic, V. (1995) Sequence analysis of ospA genes shows homogeneity within Borrelia burgdorferi sensu stricto and Borrelia afzelii strains but reveals major subgroups within the Borrelia garinii species. Med Microbiol Immunol (Berl) 184: 73-80.

Wilske, B., Busch, U., Eiffert, H., Fingerle, V., Pfister, H. W., Rossler, D., and Preac-Mursic, V. (1996a) Diversity of OspA and OspC among cerebrospinal fluid isolates of Borrelia burgdorferi sensu lato from patients with neuroborreliosis in Germany. Med Microbiol Immunol (Berl) 184: 195-201.

Wilske, B., Busch, U., Fingerle, V., Jauris-Heipke, S., Preac Mursic, V., Rossler, D., and Will, G. (1996b) Immunological and molecular variability of OspA and OspC. Implications for Borrelia vaccine development. Infection 24: 208-12.

Wilske, B., Luft, B., Schubach, W. H., Zumstein, G., Jauris, S., Preac-Mursic, V., and Kramer, M. D. (1992) Molecular analysis of the outer surface protein A (OspA) of Borrelia burgdorferi for conserved and variable antibody binding domains. Med Microbiol Immunol 181: 191-207.

Wilske, B., Preac-Mursic, V., Jauris, S., Hofmann, A., Pradel, I., Soutschek, E., Schwab, E., Will, G., and Wanner, G. (1993) Immunological and molecular polymorphisms of OspC, an immunodominant major outer surface protein of Borrelia burgdorferi. Infect Immun 61: 2182-91.

Zhang, J. R., Hardham, J. M., Barbour, A. G., and Norris, S. J. (1997) Antigenic variation in Lyme disease borreliae by promiscuous recombination of VMP-like sequence cassettes. Cell 89: 275-85.

Zhang, J. R., and Norris, S. J. (1998a) Genetic variation of the Borrelia burgdorferi gene vlsE involves cassette-specific, segmental gene conversion. Infect Immun 66: 3698-704.

Zhang, J. R., and Norris, S. J. (1998b) Kinetics and in vivo induction of genetic variation of vlsE in Borrelia burgdorferi. Infect Immun 66: 3689-97.

Zhou, X., Cahoon, M., Rosa, P., and Hedstrom, L. (1997) Expression, purification, and characterization of inosine 5’- monophosphate dehydrogenase from Borrelia burgdorferi. J Biol Chem 272: 21977-81.

Zuckert, W. R., and Meyer, J. (1996) Circular and linear plasmids of Lyme disease spirochetes have extensive homology: characterization of a repeated DNA element. J Bacteriol 178: 2287-98.

Zuckert, W., Meyer, J. and Barbour, A. (1999) Circular and linear plasmids of Lyme disease spirochetes have extensive homology: characterization of a repeated DNA element. Infect. Immun. 67: 3257-66.

Zumstein, G., Fuchs, R., Hofmann, A., Preac-Mursic, V., Soutschek, E., and Wilske, B. (1992) Genetic polymorphism of the gene encoding the outer surface protein A (OspA) of Borrelia burgdorferi. Med Microbiol Immunol 181: 57-70.