spec version: 0.1.0


Summary

A primer.bed file describes an amplicon sequencing primer scheme and is generated by tooling. Its purpose is to encapsulate all the information needed to i) reproduce a primer scheme and ii) facilitate correct bioinformatic analysis of resulting sequencing data. It therefore incorporates both wet lab and analytical elements. These include primer sequences, their associated pools, and relative concentrations, as well as their coordinates with respect to one or more reference genome sequences.

1.1 Format overview

primer.bed files are tab-delimited text where each line describes a single primer that forms part of a primer pair associated with an amplicon. A compliant primer.bed file contains one or more pairs of primers. The format of primer.bed is loosely based on the Browser Extensible Data (BED) specification, with seven required columns followed by one optional column.

Column Name Type Brief description Restrictions
1 chrom String Chromosome name [A-Za-z0-9_]
2 primerStart Int Primer start position u64
3 primerEnd Int Primer end position u64
4 primerName String Primer name [a-zA-Z0-9\-]+_[0-9]+_(LEFT\|RIGHT)_[0-9]+
5 pool Int Primer pool u64
6 strand String Primer strand [-+.]
7 primerSeq String Primer Sequence in 5’ to 3’ [ABCDGHKMNRSTUVWY]
8 primerWeight Optional(float) Primer weight for rebalancing f64

1.2 Field descriptions

  1. chrom: The name of the corresponding reference sequence chromosome for the primer. This must match a valid sequence ID inside an accompanying reference sequence FASTA file, by convention named reference.fasta.
  2. primerStart: The start position of the primer on the chrom.
  3. primerEnd: The non-inclusive end position of the primer on the chrom. Must be greater than primerStart.
  4. primerName: The name of the primer in the form {prefix}_{amplicon_number}_{direction}_{primer_number}.
    • prefix: Must match regex [a-zA-Z0-9\-].
    • amplicon_number: The number of the amplicon. Must be a positive integer incrementing from 1.
    • direction: The direction of the primers. Must be either LEFT or RIGHT.
    • primer_number: The number of the primer. Must be a positive integer incrementing from 1.
  5. pool: The PCR pool the primer belongs to. Must be a positive integer incrementing from 1.
  6. strand: The strand of the primer. Must be either + or -. Should match the primerName:direction (LEFT=+, RIGHT=-)
  7. primerSeq: The sequence of the primer in the 5’ to 3’ direction. Restricted to DNA IUPAC codes.
  8. primerWeight: The normalised weight for each primer for each pool. Can be left blank for equimolar pools.

1.3 Examples

A seven column primer.bed file, without primerWeight

MN908947.3	47	78	SARS-CoV-2_1_LEFT_1	1	+	CTCTTGTAGATCTGTTCTCTAAACGAACTTT
MN908947.3	419	447	SARS-CoV-2_1_RIGHT_1	1	-	AAAACGCCTTTTTCAACTTCTACTAAGC
MN908947.3	344	366	SARS-CoV-2_2_LEFT_1	2	+	TCGTACGTGGCTTTGGAGACTC
MN908947.3	707	732	SARS-CoV-2_2_RIGHT_1	2	-	TCTTCATAAGGATCAGTGCCAAGCT

An eight column primer.bed file, with concentrations defined in the optional eighth primerWeight column, and a comment line.

# PrimerWeight included
MN908947.3	47	78	SARS-CoV-2_1_LEFT_1	1	+	CTCTTGTAGATCTGTTCTCTAAACGAACTTT	1.4
MN908947.3	419	447	SARS-CoV-2_1_RIGHT_1	1	-	AAAACGCCTTTTTCAACTTCTACTAAGC	1.4
MN908947.3	344	366	SARS-CoV-2_2_LEFT_1	2	+	TCGTACGTGGCTTTGGAGACTC	1.6
MN908947.3	707	732	SARS-CoV-2_2_RIGHT_1	2	-	TCTTCATAAGGATCAGTGCCAAGCT	1.6

1.4 Best practices

primer.bed contain information about how to replicate the primer pools used in multiplexed PCR. They do not contain information about the PCR protocol, input material, or sequencing method and analysis. Therefore, additional information is needed for true reproducibility.

To explicitly differentiate different versions of primer.bed, this spec is designed to fit into larger metadata standards, such as primal-page with PrimalScheme Labs or primaschema with pha4ge primer-schemes

primalbedtools carries out schema validation, and common operations on primer.bed files.