primer-bedfile spec
spec version: 0.1.0
Summary
A primer.bed
file describes an amplicon sequencing primer scheme and is generated by tooling. Its purpose is to encapsulate all the information needed to i) reproduce a primer scheme and ii) facilitate correct bioinformatic analysis of resulting sequencing data. It therefore incorporates both wet lab and analytical elements. These include primer sequences, their associated pools, and relative concentrations, as well as their coordinates with respect to one or more reference genome sequences.
1.1 Format overview
primer.bed
files are tab-delimited text where each line describes a single primer that forms part of a primer pair associated with an amplicon. A compliant primer.bed
file contains one or more pairs of primers. The format of primer.bed
is loosely based on the Browser Extensible Data (BED) specification, with seven required columns followed by one optional column.
Column | Name | Type | Brief description | Restrictions |
---|---|---|---|---|
1 | chrom | String | Chromosome name | [A-Za-z0-9_] |
2 | primerStart | Int | Primer start position | u64 |
3 | primerEnd | Int | Primer end position | u64 |
4 | primerName | String | Primer name | [a-zA-Z0-9\-]+_[0-9]+_(LEFT\|RIGHT)_[0-9]+ |
5 | pool | Int | Primer pool | u64 |
6 | strand | String | Primer strand | [-+.] |
7 | primerSeq | String | Primer Sequence in 5’ to 3’ | [ABCDGHKMNRSTUVWY] |
8 | primerWeight | Optional(float) | Primer weight for rebalancing | f64 |
1.2 Field descriptions
chrom
: The name of the corresponding reference sequence chromosome for the primer. This must match a valid sequence ID inside an accompanying reference sequence FASTA file, by convention namedreference.fasta
.primerStart
: The start position of the primer on thechrom
.primerEnd
: The non-inclusive end position of the primer on thechrom
. Must be greater thanprimerStart
.primerName
: The name of the primer in the form{prefix}_{amplicon_number}_{direction}_{primer_number}
.prefix
: Must match regex[a-zA-Z0-9\-]
.amplicon_number
: The number of the amplicon. Must be a positive integer incrementing from 1.direction
: The direction of the primers. Must be eitherLEFT
orRIGHT
.primer_number
: The number of the primer. Must be a positive integer incrementing from 1.
pool
: The PCR pool the primer belongs to. Must be a positive integer incrementing from 1.strand
: The strand of the primer. Must be either+
or-
. Should match theprimerName:direction
(LEFT
=+
,RIGHT
=-
)primerSeq
: The sequence of the primer in the 5’ to 3’ direction. Restricted to DNA IUPAC codes.primerWeight
: The normalised weight for each primer for each pool. Can be left blank for equimolar pools.
1.3 Examples
A seven column primer.bed
file, without primerWeight
MN908947.3 47 78 SARS-CoV-2_1_LEFT_1 1 + CTCTTGTAGATCTGTTCTCTAAACGAACTTT
MN908947.3 419 447 SARS-CoV-2_1_RIGHT_1 1 - AAAACGCCTTTTTCAACTTCTACTAAGC
MN908947.3 344 366 SARS-CoV-2_2_LEFT_1 2 + TCGTACGTGGCTTTGGAGACTC
MN908947.3 707 732 SARS-CoV-2_2_RIGHT_1 2 - TCTTCATAAGGATCAGTGCCAAGCT
An eight column primer.bed
file, with concentrations defined in the optional eighth primerWeight
column, and a comment line.
# PrimerWeight included
MN908947.3 47 78 SARS-CoV-2_1_LEFT_1 1 + CTCTTGTAGATCTGTTCTCTAAACGAACTTT 1.4
MN908947.3 419 447 SARS-CoV-2_1_RIGHT_1 1 - AAAACGCCTTTTTCAACTTCTACTAAGC 1.4
MN908947.3 344 366 SARS-CoV-2_2_LEFT_1 2 + TCGTACGTGGCTTTGGAGACTC 1.6
MN908947.3 707 732 SARS-CoV-2_2_RIGHT_1 2 - TCTTCATAAGGATCAGTGCCAAGCT 1.6
1.4 Best practices
primer.bed
contain information about how to replicate the primer pools used in multiplexed PCR. They do not contain information about the PCR protocol, input material, or sequencing method and analysis. Therefore, additional information is needed for true reproducibility.
To explicitly differentiate different versions of primer.bed
, this spec is designed to fit into larger metadata standards, such as primal-page with PrimalScheme Labs or primaschema with pha4ge primer-schemes
primalbedtools carries out schema validation, and common operations on primer.bed
files.