reference-fasta spec
spec version: 0.1.0
Summary
A reference.fasta
file contains the DNA sequences of all the primary-reference genomes, used in primerscheme generation. Its purpose is to provide a reference genome, and coordinate system to be used for referenced-based assembly and consensus generation.
1.1 Format overview
reference.fasta
files are typical .fasta
format files, with text representing the nucleotide sequence of the reference. Each genome starts with a header line (starting with >
) that denotes the id of the genome, followed by lines of nucleotide data.
The id provided in the header line should match the chrom field of the corresponding primers in the primer.bed
file.
1.2 Examples
Single fasta
>MN908947.3
ATTAAAGGTTTATACCTTCCCA...
The corresponding bedfile should be
MN908947.3 47 78 ...
Multi fasta
>MN908947.3
ATTAAAGGTTTATACCTTCCCA...
>NC_006432.1
CGGACACACAAAAAGAAAGAAA...
The corresponding bedfile should be
MN908947.3 47 78 ...
andNC_006432.1 126 154 ...
1.3 Best practices
As the reference.fasta
is often used for referenced-based assembly, using high quality genome with minimal Ns
or ambiguous bases is advisable.
Using RNA sequences in the reference.fasta
is not advice, as DNA is expected.