In the past, I got good results with trying to reduce the variance in entropy in-between tokens, which you can implement very easily by starting with each single character as its own token and then doing a greedy merge of the most numerous outlier token pairs in a loop until you reach your desired token count. https://arxiv.org/abs/2206.12693