That's also an interesting one, but I think I found it, but it's a little differ...

That's also an interesting one, but I think I found it, but it's a little different from what I remembered because it's not just the scientific notation that helps but more so the addition of positional tokens:

https://arxiv.org/abs/2102.13019

The essence of it from the abstract:

". In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., “32”), and it struggles to learn with character-level representations (e.g., “3 2”). By introducing position tokens (e.g., “3 10e1 2”), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as we use the proper surface representation."