Exploration of a DNA Sequencing Basecaller using Activation Patching
This write-up for an undergraduate project is my first LW post, made with the objective of a) gathering feedback on the project and post, if more experienced authors are willing, and b) sending out results of a mech-interp-on-a-non-LLM (specifically, DNA basecaller) exploration in case the idea is interesting to anyone. Apologies in advance for any inconveniences and mistakes, and thank you in advance for your understanding.Summary As a first AI Safety/mech interp learning project, I tried applying activation patching to a DNA sequencing basecaller- a deep learning model used to convert time-series electrical signals into a sequence of DNA bases. While looking at data related to errors in DNA sequences, I found trends showing overall MLP dominance, especially in the earlier and later layers, greater self-attention mechanism activity in the middle layers, and higher activations concentrated in specific attention heads. This experiment interested me because basecallers are part of the modern DNA sequencing pipeline, and increasing the accuracy of sequencing methods could support work towards pathogen-agnostic surveillance systems. Though this technology is highly accurate, some difficulties (such as repeated bases or higher performance on species in training data) still remain, and mech interp seems like a potential path to understanding these systematic challenges. Additionally, in the mech interp field, finding similar patterns to LLMs would suggest universality and potentially add to insights about general behaviors of deep learning models. There are key limitations that prevent conclusive insights from my work, but I’d love to know if anyone more experienced is able to glean anything interesting or speak on if this is a worthwhile direction to research in the future. The full paper and codebase are also available online for anyone curious.Self-attention vs MLP recovery and degradation scores for the repeated-base (homopolymer) error group. Higher scores indicate pa