您好,登錄后才能下訂單哦!
小編給大家分享一下perl如何提取miRNA前后500bp序列,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
在沒有位置信息時,提取miRNA前后500bp的序列
在提取miRNA前后500bp序列時,若沒有其位置信息,提取會比較麻煩,可用以下方法提取:
1.首先利用blast軟件將miRNA的前體序列與基因組比對,獲取前體序列的位置信息,比對方法:
makeblastdb -in Donkey_Hic_genome.20180408.fa -dbtype nucl -title Donkey_Hic_genome.20180408.fa blastall -i miRNA_Pre.fa -d Donkey_Hic_genome.20180408.fa -p blastn -e 0.01 -b 5 -v 5 -m 8 -o blast.out #Donkey_Hic_genome.20180408.fa 參考基因組染色體序列 #miRNA_Pre.fa miRNA的前體序列
由于序列可能較短,所以e-value值可調高一些。比對結果如下:
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224 bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216 bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222 bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218 bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222 bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220 bta-miR-378 Chr13 96.88 32 1 0 67 98 32631593 32631624 3e-06 56.0 bta-miR-378 Chr13 93.94 33 2 0 62 94 29900614 29900582 2e-04 50.1 bta-miR-378 Chr07 100.00 27 0 0 71 97 65072541 65072515 1e-05 54.0 bta-miR-378 Chr30 96.55 29 1 0 69 97 28926810 28926782 2e-04 50.1 bta-miR-378 Chr18 93.94 33 2 0 62 94 21721834 21721866 2e-04 50.1 bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222 bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214 bta-miR-1 Chr07 88.89 63 7 0 32 94 26712206 26712268 2e-10 69.9 bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222 bta-let-7c Chr20 89.19 74 8 0 18 91 29703021 29703094 1e-14 83.8
然后對比對結果篩選,盡量選擇完全匹配,并且匹配長度最長的。篩選后的blast.out結果如下:
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224 bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216 bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222 bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218 bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222 bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220 bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222 bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214 bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222
3.接下來就可以提取序列啦,我有提供腳本哦,用法:
perl /share/work/wangq/script/lv/miRNA_500_fasta.pl -id blast.out -fa Donkey_Hic_genome.20180408.fa -out result.fa -miR miRNA.fa -l 500
-id 后跟篩選的blast結果;-fa后跟參考基因組染色體序列;-miR后跟miRNA的序列文件;-l后跟提取miRNA前后多長序列,默認500。
提取結果(miRNA序列大寫標注,其余小寫):
>bta-miR-34c acaaggcacagcatcacccgccgccctgctgggaggaggccgccaccctccgcggcgaactgccgacccgaagcgtttgagaggagaaagctgcgcttcgagactggatgcgtcagcatttcttccgcgcggcgcggcgagcgcgtggtccgcgtgcgagcaaaccccctgcaaacgcaggcgggctgatgcttctagctggagttaaacagttagctatcactaggagtaaaagcaagattgggcgatcccatgtaaggaaagcaaaatctggggcgtttagtagctttaattgtaaggtgtaaagacatttgaattgctgggggaagccccctgtgtaacgttcagagctgtgagtcactgtgcctattttccattgtttatcagggtactcaccaatccaccaactaaagtgagtaccacggagccagtgatctgcctgtcacgacgcatgggggcaccaacttgagactgaagtttgtgatgaaagtaaagcttttttgctgtgagtctagttactAGGCAGTGTAGTTAGCTGATTGctaataataccaatcactaaccacacggccaggtaaaaagatttgggaattcatccagatgagctgcgtgtgcacaccagtgggtttgggggcaagaaggggattggaataccctaatagtacgcattgcctgtttatccatagctcagccaagagagaaatcagcattttagctgctaaatatacaacatatgtagtaaatatacatttttaacatataatttttcaatattccttcaggcgcttaaccaaaaaaattttcagatatattgaggaatagaggttttctctcttagctcatttatgtcattgtgtaaacttgtgattattttgaactatctgtaagactgtattactattttgaaagaataaagtgccctaaagtcataaattgtggtcattcatgtgtccattgcctttcctaagttggctttatgatgtccttctcagcgcctgccacagtaattataactttcctatcgctaaaatggttaaattgttcctagattaacaaaatctcccgggaaggctattcacaagcactttttagtatttttctaagaacacgtta >bta-miR-370 tgtttattgttcgtgtctcagagttgctctgaggccagtgtgctgggcacccagcaggccatctgggaggggagaaaggaagctagccatgtatgtcagtaggggtcagcgtggaggcctttggggtttctgggaaacggttcgaacatggagaatctcgtgatgtggacgcccccgagggcagccccatttcatgaactggatcccttagtcggatttctgttttccagggcacttgtttaagctttcatgccgtgtccaaaaaaaagagcagatcccaagagtttcctgcccgaggcccgtcttgccagaagccctcggcttggcctgagcatcgagatcattcctctaatcagcctgggggtggtctgacccgcggggtgggcggcgtggggcggggaggggcagggcgtgtgcggggcgggagctgctgggggagctggcggcggttcctttcaccctcgccgtggacccgcgggggcgtggctgtcctcggtctacaaatcgcgcaagtcggggcacaagacagagaggccaggtcacgtctctgcagttacacagctcatgagtGCCTGCTGGGGTGGAACCTGGTctgtctgtctgtctagcaccacagctcgggcgctgctgcagagggaacaaagatttgggtgagggcctcagagacgggctggggaaggggtttactcgggctgactttgacatgaaaacaatagctaattctgcttggggcctagtgctgtgactatcttacagatgggaaacaggcacagacaggttagttaactttcctgatgctttccagctcatcaggattggaggctggatttggaggcaggcggtgtggttcgagcatatgtgctctaaccattgggaaacactgcctcctgactgtgagcacacgtagagatggcacatggagtccaagaatctgggtttgagtccttttaccactgcctggtgggtgtactgtccagtcaatcatacatatttcattcttcctcttgacagagtctctgcagaggcccagcgagtcgttggcccgtgggaaatatttttttaagaatgcatgtgtgtgatagaattttatatctatcgaaggggagggg
腳本代碼如下:
#!/usr/bin/perl -w use strict; use Getopt::Long; use Bio::SeqIO; use Bio::Seq; my $version = "1.3"; my %opts; GetOptions(\%opts, "id=s", "fa=s", "miR=s", "l=s","out=s","h"); if(!defined($opts{out}) || !defined($opts{fa}) || !defined($opts{miR}) || !defined($opts{id}) ||defined($opts{h})) { print <<"Usage End."; Description: $version:lefse analysis Usage Forced parameter: -id blast.out <infile> must be given -out outfile <outfile> must be given -fa genome fasta file <infile>must be given -miR miRNA fasta file <infile>must be given -l length <int> Other parameter: -h Help document Usage End. exit; } my $len = $opts{l}; $len ||=500; my $in = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta'); my %fasta; while ( my $seq = $in->next_seq() ) { my($id,$sequence)=($seq->id,$seq->seq); $fasta{$id}=$sequence; } my $ina = Bio::SeqIO->new(-file => "$opts{miR}" , -format => 'Fasta'); my %mi; while ( my $seq = $ina->next_seq() ) { my($id,$sequence)=($seq->id,$seq->seq); $mi{$id}=$sequence; } open(IN,"$opts{id}") ||die "open file $opts{id} faild.\n"; open(OUT,">$opts{out}") ||die "open file $opts{out} faild.\n"; while(<IN>){ chomp; my @line = split("\t"); if($line[8] < $line[9]){ my $miR = lc(substr( $fasta{$line[1]},$line[8]-1, $line[9] - $line[8]+1)); my $small = lc($mi{$line[0]}); $miR =~ s/$small/$mi{$line[0]}/; my $before = lc(substr( $fasta{$line[1]},$line[8]-$len-1, $len)); my $laft = lc(substr( $fasta{$line[1]},$line[9], $len)); print OUT ">$line[0]\n$before$miR$laft\n"; }elsif($line[8] > $line[9]){ my $miR = lc(substr( $fasta{$line[1]},$line[9]-1, $line[8] - $line[9]+1)); my $before = lc(substr( $fasta{$line[1]},$line[9]-$len-1, $len)); my $laft = lc(substr( $fasta{$line[1]},$line[8], $len)); my $gene = $before.$miR.$laft; $gene = &reverse_complement_IUPAC($gene); my $small = lc($mi{$line[0]}); $gene =~ s/$small/$mi{$line[0]}/; print OUT ">$line[0]\n$gene\n"; } } close(IN); close(OUT); sub reverse_complement_IUPAC { my $dna = shift; # reverse the DNA sequence my $revcomp = reverse($dna); # complement the reversed DNA sequence $revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/; return $revcomp; }
以上是“perl如何提取miRNA前后500bp序列”這篇文章的所有內容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內容對大家有所幫助,如果還想學習更多知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。