|
The Eukaryotic GeneMark.hmm Output
|
gmhmmp
|
Interpreting the Eukaryotic GeneMark.hmm Output
GeneMark.hmm output contains predicted protein-coding exon boundaries, and predicted
proteins. The output will be a file divided into three sections as follows:
The Output Header
Each output generated by GeneMark.hmm has a header describing the parameters and
matrix used in the analysis. This information is purely for recordkeeping purposes.
Here's a sample header:
GeneMark.hmm (Version 1.0.0)
Sequence name: /test-human/humhbb/sequence-humhbb.txt
Sequence length: 73308 bp
G+C content: 39.46%
Matrices file: /test-sequences/human.mtx (Homo sapiens)
Fri May 14 16:27:47 1999
The Predicted Exon Boundaries
This section describes the predicted exons. The ' Gene #' column is the sequential gene number. ' Exon # ' refers to the order of exons in the current
gene. 'DNA
Strand' indicates which strand the gene was found on -- ' +' refers to direct, '-' to complementary. ' Exon Type' can be one of four options --
initial, internal, terminal, or single. The 'Exon Range' columns indicate exons boundaries relative
to the beginning of the sequence (5' end of the direct strand.) The 'Start/End Frame' indicates the codon positions of exon
boundaries.
Predicted genes/exons
|
Gene
|
Exon
|
Strand
|
Exon
|
Exon
|
Range
|
Exon
|
Start/End
|
|
#
|
#
|
|
Type
|
|
|
Length
|
Frame
|
|
|
|
|
|
|
|
|
|
|
1
|
1
|
+
|
Terminal
|
6168
|
6449
|
282
|
1 3
|
|
|
|
|
|
|
|
|
|
|
2
|
3
|
-
|
Terminal
|
13450
|
13528
|
79
|
3 3
|
|
2
|
2
|
-
|
Internal
|
16097
|
16311
|
215
|
1 2
|
|
2
|
1
|
-
|
Initial
|
16436
|
16468
|
33
|
1 3
|
|
|
|
|
|
|
|
|
|
|
3
|
1
|
+
|
Initial
|
19541
|
19632
|
92
|
1 2
|
|
3
|
2
|
+
|
Terminal
|
19755
|
20169
|
415
|
3 3
|
|
|
|
|
|
|
|
|
|
|
4
|
1
|
+
|
Initial
|
34531
|
34622
|
92
|
1 2
|
|
4
|
2
|
+
|
Internal
|
34745
|
34967
|
223
|
3 3
|
|
4
|
3
|
+
|
Terminal
|
35854
|
35982
|
129
|
1 3
|
|
|
|
|
|
|
|
|
|
|
5
|
1
|
+
|
Initial
|
39467
|
39558
|
92
|
1 2
|
|
5
|
2
|
+
|
Internal
|
39681
|
39903
|
223
|
3 3
|
|
5
|
3
|
+
|
Terminal
|
40770
|
40898
|
129
|
1 3
|
|
|
|
|
|
|
|
|
|
|
6
|
1
|
+
|
Initial
|
45995
|
46144
|
150
|
1 3
|
|
6
|
2
|
+
|
Internal
|
47314
|
47417
|
104
|
1 2
|
|
6
|
3
|
+
|
Terminal
|
50485
|
50572
|
88
|
3 3
|
|
|
|
|
|
|
|
|
|
|
7
|
1
|
+
|
Initial
|
54790
|
54881
|
92
|
1 2
|
|
7
|
2
|
+
|
Internal
|
55010
|
55232
|
223
|
3 3
|
|
7
|
3
|
+
|
Terminal
|
60474
|
60557
|
84
|
1 3
|
|
|
|
|
|
|
|
|
|
|
8
|
1
|
+
|
Initial
|
62187
|
62278
|
92
|
1 2
|
|
8
|
2
|
+
|
Internal
|
62409
|
62631
|
223
|
3 3
|
|
8
|
3
|
+
|
Terminal
|
63482
|
63610
|
129
|
1 3
|
|
|
|
|
|
|
|
|
|
|
9
|
1
|
+
|
Initial
|
68183
|
68396
|
214
|
1 1
|
|
9
|
2
|
+
|
Terminal
|
68586
|
68746
|
161
|
2 3
|
|
|
|
|
|
|
|
|
|
|
10
|
1
|
+
|
Single
|
68770
|
69078
|
309
|
1 3
|
|
|
|
|
|
|
|
|
|
|
11
|
1
|
+
|
Single
|
70355
|
70819
|
465
|
1 3
|
|
|
|
|
|
|
|
|
|
|
12
|
1
|
+
|
Initial
|
72905
|
73053
|
149
|
1 2
|
|
|
|
|
|
|
|
|
|
Predicted Protein Sequences:
Each sequence has a header which contains the sequence file name, the gene number,
and the number of amino acids.
>sequence-humhbb.txt|GeneMark.hmm|gene
1|93_aa
NHQVVRLGCRPSSATSEDSVFSTAKHKLRYCGCEKLEVDIPALWPLLLTFTSWRLEVVVQ
ATVADHTSSTIIAFLQESLREKKVKKNLETTSE
>sequence-humhbb.txt|GeneMark.hmm|gene 2|108_aa
MKAVALPQNLNSMDTSLLLDSEYGVDSLLLPRSQLQSPHFLLSLLPMVPDLACIQGSDPF
HVSLWLKVVGRSFKKGYSIERPRLGMVIAGQRQKLCVDIDKSSDYAEL
>sequence-humhbb.txt|GeneMark.hmm|gene 3|168_aa
MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPK
VKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKVSSGAGDVIFWLYIL
TLIEAHNLIGKTNKDLRNHGSSLMLEQQTSSEHNQNLHDSELVTVKDY
>sequence-humhbb.txt|GeneMark.hmm|gene 4|147_aa
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
KEFTPEVQASWQKMVTGVASALSSRYH
>sequence-humhbb.txt|GeneMark.hmm|gene 5|147_aa
MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPK
VKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFG
KEFTPEVQASWQKMVTAVASALSSRYH
>sequence-humhbb.txt|GeneMark.hmm|gene 6|113_aa
MGNPKVKAHGKKVLISFGKAVMLTDDLKGTFATLSDLHCNKLHVDPENFLPKGRTISDGN
ENVGEWEFKDREDTFLQSCKKRENSQCLPLQNVHATERVRKPGKCQFLKYREH
>sequence-humhbb.txt|GeneMark.hmm|gene 7|132_aa
MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRIAIEEPNTFCVCENN
QSEIFSQVPDEG
>sequence-humhbb.txt|GeneMark.hmm|gene 8|147_aa
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVANALAHKYH
>sequence-humhbb.txt|GeneMark.hmm|gene 9|124_aa
MEQSWAENDFDELREEGFRRSNYSKLKEEVRTNGKEVKNFEKKLDEWITRITNAQKSLKD
LMELKTKAGELPESDGENGTKLENTLQDIIQENFPNLARQPKFTFRKYRERHKDTPREKQ
LQDT
>sequence-humhbb.txt|GeneMark.hmm|gene 10|102_aa
MKEKMLRAAREKGRVTHKGKPIRLTADLSAETLQARRKWGPIFNIVKEKNFRPRISYPAK
LSFISIGEIKSFTDKQMLRDFVTTRPALQELLKEALNMERNN
>sequence-humhbb.txt|GeneMark.hmm|gene 11|154_aa
MTRGITTDPTEIQTTVREYYKHLYANKLENLEEMDKFLDTYTLPRLNQEEVVSLNRPITG
SEIEAIINSLSTKKSPGPVGFIAEFYQRYKEELVPFLLKLFQSIEKEGILPNSFYEASII
LIPKPDRDTTKKENVTPISLMNIDAKILNKILAN
>sequence-humhbb.txt|GeneMark.hmm|gene 12|49_aa
MDEAGNYHSQQTITRTINQTPHVLTHRWELNNENTWTHEEEHHTLGTVM
previous: Using GeneMark.hmm
next: Comparison of GeneMark.hmm
|