I've done some charset detection, although it's been a while. Heuristics kind of...

I've done some charset detection, although it's been a while. Heuristics kind of work for somethings --- I'm a big fan of if it's decodable as utf-8, it's probably utf-8, unless there's zero bytes (in most text). If there's a lot of zero bytes, maybe it's UCS-2 or UTF-16, and you can try to figure out the byte order and if it decodes as utf-16.

If it doesn't fit in those categories, you've got a much harder guessing game. But usually you can't actually ask the source what it is, because they probably don't know and might not understand the question or might not be contactable. Usually, you have to guess something, so you may as well take someone else's work to guess, if you don't have better information.