A vim plugin for DNA sequences and sequencing files | bioinformatics
This started as a joke colorscheme for vim where every character is the same color except for ACGT, but then I realized that the colors were actually useful for helping me visually parse DNA sequences. So I turned it into a plugin with a couple more features. There’s nothing super fancy in there, but it has made squinting at CIGAR strings and SAM flags less painful for me.
Link to the repository/screenshots
Current features:
- A/C/G/T/U/N are colored (consistent with IGV colors for ACGT)
- Using the commands
:SAM
,:BAM
,:GAF
, or:PAF
in their respective files will tell you the description of the field your cursor is hovering over (e.g., using:SAM
in column 2 of a SAM/BAM file will print a message along the lines of “FLAG: 2064 – supplementary alignment, reverse strand”) - Operation blocks within CIGAR strings are colored separately from each other
- Using
:Phred
will decode the Phred score of the hovered character (e.g., using it on a D will print “D is score 35, 0.0003 probability of error”) - Sequence names in FASTA/FASTQ files are colored
- Tags in alignment files are colored
The main problem I had when trying to develop this was related to performance. There is a noticeable lag when applying the syntax highlighting on long lines (around 10000 characters in length), which can of course be a problem for SAM or PAF files that contain the actual read sequences. I found that the performance was perfectly fine for long files with shorter lines (e.g., reference FA files with lot of 80 char lines).
To address the performance issue, I set the maximum column to which vim will highlight the characters to be 9000 (set local synmaxcol=9000
). Though this means the ends of the line aren’t highlighted, it does remove the lag.
June 1, 2025