That's also an interesting one, but I think I found it, but it's a little different from what I remembered because it's not just the scientific notation that helps but more so the addition of positional tokens:
". In particular, the model fails to learn addition of five-digit numbers when
using subwords (e.g., “32”), and it struggles to learn with character-level representations (e.g., “3 2”). By introducing position tokens (e.g., “3 10e1 2”), the model learns to accurately add and subtract numbers up to 60 digits. We conclude
that modern pretrained language models can easily learn arithmetic from very
few examples, as long as we use the proper surface representation."
https://arxiv.org/abs/2102.13019
The essence of it from the abstract:
". In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., “32”), and it struggles to learn with character-level representations (e.g., “3 2”). By introducing position tokens (e.g., “3 10e1 2”), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as we use the proper surface representation."