A vim plugin for DNA sequences and sequencing files | bioinformatics


This started as a joke colorscheme for vim where every character is the same color except for ACGT, but then I realized that the colors were actually useful for helping me visually parse DNA sequences. So I turned it into a plugin with a couple more features. There’s nothing super fancy in there, but it has made squinting at CIGAR strings and SAM flags less painful for me.

Link to the repository/screenshots

Current features:

  • A/C/G/T/U/N are colored (consistent with IGV colors for ACGT)
  • Using the commands :SAM, :BAM, :GAF, or :PAF in their respective files will tell you the description of the field your cursor is hovering over (e.g., using :SAM in column 2 of a SAM/BAM file will print a message along the lines of “FLAG: 2064 – supplementary alignment, reverse strand”)
  • Operation blocks within CIGAR strings are colored separately from each other
  • Using :Phred will decode the Phred score of the hovered character (e.g., using it on a D will print “D is score 35, 0.0003 probability of error”)
  • Sequence names in FASTA/FASTQ files are colored
  • Tags in alignment files are colored

The main problem I had when trying to develop this was related to performance. There is a noticeable lag when applying the syntax highlighting on long lines (around 10000 characters in length), which can of course be a problem for SAM or PAF files that contain the actual read sequences. I found that the performance was perfectly fine for long files with shorter lines (e.g., reference FA files with lot of 80 char lines).

To address the performance issue, I set the maximum column to which vim will highlight the characters to be 9000 (set local synmaxcol=9000). Though this means the ends of the line aren’t highlighted, it does remove the lag.


June 1, 2025
← Back to posts