How does this technology work? I'd think it would have to have a huge cache of video to make this work.
Update: I checked IntoNow and this is what they say:
"IntoNow, which is based our patented platform SoundPrint, analyzes the ambient audio being generated from your television in three-second increments. The audio is then converted into a “fingerprint”—basically, the show’s unique signature for ID—that is matched on the back-end to our reference set (which covers 130 channels of live broadcasting and has more than five years history). Once we make a match, we return all the metadata associated with that show and episode—things like title, description, cast, and associate links. This all happens in seconds."
So it actually uses the audio, which is much better, although I watch a lot of TV with CC in bed.
I still don't know how they analyze shows airing for the first time. They could recognize voices, but what if an actor opens in two different TV shows?
Presumably if they're on the East Coast they get the shows first (at least for the major networks and national cable). And then if you're watching it at the same time, they're encoding in real time.
Update: I checked IntoNow and this is what they say:
"IntoNow, which is based our patented platform SoundPrint, analyzes the ambient audio being generated from your television in three-second increments. The audio is then converted into a “fingerprint”—basically, the show’s unique signature for ID—that is matched on the back-end to our reference set (which covers 130 channels of live broadcasting and has more than five years history). Once we make a match, we return all the metadata associated with that show and episode—things like title, description, cast, and associate links. This all happens in seconds."
So it actually uses the audio, which is much better, although I watch a lot of TV with CC in bed.