Amazingly, the majority of the human genome is made up of repeated
sequences. Repetitions were shown to be connected with diseases such
as as cancer, myotonic dystrophy, Huntington’s disease, and important
phenomena such as chromosome fragility, expansion diseases, silencing
genes, and rapid morphological variation. Repetitions are common in
other species as well, and are claimed to be a major evolutionary
force during vertebrate evolution.
In this work we mathematically model string duplication, and ask
several coding-theoretic questions:
1. Is there new information created strictly by duplication? What is
the capacity of such systems?
2. Can string duplication account for diversity? Can we reach every
We also mention other results concerning probabilistic models, and
error-correcting codes. The talk is based on joint works with Ohad
Elishco, Farzad Farnoud, Siddharth Jain, and Jehoshua Bruck.