一、LOCUS
在GenBank格式中,
LOCUS NM_001469 2156 bp mRNA linear(家系血统) PRI(primate猿类) 16-DEC-2004
DEFINITION Homo sapiens thyroid autoantigen 70kDa (Ku antigen) (G22P1), mRNA.
The LOCUS field contains a number of different data elements, including locus name, sequence length, molecule type, GenBank division, and modification date. Each element is described below.
二、COMMENT
1、REVIEWED REFSEQ:说明了该RefSeq生成的过程。
2、Summary:说明了该序列的功能。
三、Feature名词解释:information about genes and gene products, as well as regions of(biological significance reported in the) sequence. These can include regions of the sequence that code for proteins and RNA molecules.
Feature下的副标题内容太复杂,必要时到这里The DDBJ/EMBL/GenBank Feature Table查.
1、key:一般选择Location/Qualifier。
2、complement:cDNA。If a feature is located on the complementary strand, the word "complement" will appear before the base span.
3、5<:指向5’端。If the "<" symbol precedes a base span, the sequence is partial on the 5' end (e.g., CDS <1..206). If the ">" symbol follows a base span, the sequence is partial on the 3' end (e.g., CDS 435..915>.
4、/db_xref:其字符串是通往其他数据库的链接。
/db_xref="taxon:9606" taxonomy 物种分类学
/db_xref="GeneID:2547" 链接到Gene。
/db_xref="LocusID:2547" 链接到Locuslink。
/db_xref="MIM:152690" 链接到OMIM。
四、两个例子:
Key =Location/Qualifiers
CDS=23..400
====/product="alcohol dehydrogenase"
====/gene="adhI"
might be read as:
The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called “adhI”
A more complex description:
Key=Location/Qualifiers
CDS=join(544..589,688..>1032)
====/product="T-cell receptor beta-chain"
which might be read as:
This feature, which is a partial coding sequence is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain.