Unfortunately the safety filters have enough false positives (basically any image with a large amount of fleshy color) to the point that it's just easier to disable it and handle it manually.
That'll only work for a little while longer (for future named big-public-release models, obviously the cat's out of the bag for the current version of stable diffusion), right up until the point where they incorporate the filter into the training process.
At which point, the end model users get to download will be incapable of producing anything that comes close to triggering the filter, and there will be no way to work around it short of training/fine-tuning your own model, which is prohibitively expensive for 'normal' people, even people with top-of-the-line graphics cards like a 4090.
That problem is being solved. Pornhub now has an AI R&D unit.[1] Their current project is to upscale and colorize out of copyright vintage porn. As a training set, they use modern porn. They point out that they have access to a big training set.
> Their current project is to upscale and colorize out of copyright vintage porn.
But not very well. I collect this stuff and I have my own copies, so I can tell you that this doesn't look better than the b/w originals in quality/detail, and it's easy to see that the color is not great, especially if there are lots of hard lights and shadows dancing around.
That being said, I don't know why it's not working. Seems like it should work. I'd expect it to at least be clean of scratches and stabilized. Any relevant papers I should read about AI restoration of old film?
This prediction doesn’t track with what is already happening. Dreambooth is allowing all kinds of people to fine tune their own models at home with nvidia graphics cards, and people are sharing all kinds of updated models that do really well at specific art styles or with NSFW subjects. Go check the nsfw subreddit unstable_diffusion for examples. It seems lots of people are training nsfw models with their own preferred data sets and last I saw someone merged all those checkpoints together in to one model.
So if I made a prediction it would be that the training sets for open models from big companies will get scrubbed of nsfw content and then nerds on Reddit will just release their own versions with it added in, and the big companies will make sure everyone knows they didn’t add that stuff and that’s where it will stand.
I agree with your prediction. Sorry, I was unclear in my post, and left that part unsaid. I agree that it will likely just be the big newly released 'base' models that will be scrubbed of NSFW images, but there's really no way to prevent these models from making those kinds of images at all.
It will only take some dedicated individuals, which I know there is no shortage of.
The AI-generated art with Dreambooth works only for avatar type pics. It cannot create fancy gestures (doing a complicated movement with hands, like patting a cat). For now.
I know a person who fine-tuned stable diffusion, and he said it took 2 weeks of 8xA100 80 GB training time, costing him somewhere between $500-$700 (he got a pretty big discount, too, at today's prices for peer GPU rental it would be over $1,000).
Sure, it's peanuts compared to what it must have cost to train stable diffusion from scratch. However, I think most normal people would not consider spending $500 to fine-tune one of these.
Edit: Though I do agree that once this kind of filtering is in place during training, NSFW models will begin to pop up all over the place.
For spot-finetuning with Dreambooth (not as good as full-finetuning but can get a specific subject/style much faster), it can be done with about $0.08 of GPU compute, although optimizing it is harder.
Are these services using textual-inversion? If so, I have to wonder how well they would work on a stable diffusion model that was trained with the filter in place from the start, so that it couldn't generate anything close to the filter.
As it is right now, stable diffusion can generate adult imagery by itself, however it seems like it's been fine-tuned after the fact to try to 'cover up' that fact as much as they could before releasing the model publicly.
I believe the safety filter is trivial to disable since it was added in one of the last commits prior to Stable Diffusion’s public release and not baked into the model, therefore most forks just remove the safety checker code [1]
As far as textual inversion, JoePenna’s Dreambooth [2] implementation uses Textual Inversion.
This (training a model with no NSFW content) would be preferable to me. No false positives to worry about. People who do want to generate NSFW stuff can fine-tune or train their own model, nobody owes that functionality to them in freely available ones.
Apple ended up not implementing that iirc. While Google Photos has had it the whole time.
Googles is actually worse. Apple was only going to match against known CSAM images while google has ML to identify new images which resulted in one parent being arrested for a medical image of their own child.
I am fine with photos that are uploaded to the cloud being scanned. I do not want my own device spending energy and scanning images even before they completely leave my network or device. Google scanning my Google drive files is fine with me. Apple is much worse.
The apple one was only going to scan photos stored on iCloud. It scans them on your phone but if you don’t use iCloud it doesn’t scan anything. It’s a neat trick that means the Apple servers can know they aren’t storing illegal content without ever having to look at it.
If it’s anything like the regular scanning iPhones do, it’s done overnight while plugged in.
That makes the phone the scanner and by definition it's before iCloud. Apple makes it very hard to use an iPhone/iPad without keeping iCloud enabled. I don't want my phone spying on me, that sets a dangerous precedent. I don't care how many times it is scanned by Apple after I store it unencrypted on iCloud servers, but scanning without my permission on my device before it leave my device is a violation of my privacy. Apple will look at it if your phone detects something and it will inform authorities without letting you know. With other companies, once it is detected on their server, they forward your information to government authorities. Apple instead wanted to have an inhouse team to filter false positives before notifying authorities. Apple is worse in every single way.
If you have things on your device that match entries in the CSAM database, yes there's a chance you're a victim of a targeted attack taking advantage of highly experimental collisions... but the odds you "accidentally generated" that content are not realistic.
Are you assuming that digital images are evenly distributed over the set of all possible 256 bit vectors?
Because I don't think that's a reasonable assumption.
Even if image recognition was perfectly solved with no known edge cases (ha!), when an entire topic is a semantic stop-sign for most people, you can't expect the mysterious opaque box that is a guilty-enough-to-investigate detection mechanism to be something that gets rapid updates and corrections when new failure modes are discovered.
You should spend some time with an internet search engine and the term "perceptual hashing". What you're talking about is another type of hashing, which can be useful for classifying image files, but not images. The former has a very concrete definition that is specified down to the bit; the latter is a fuzzy space because it's trying to yield similar (not necessarily identical) hashes for images that humans consider similar. Much different space, much different problem, much different collision situation. Cryptographic hashing is not the only kind of hashing.
Oh wow https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... so they essentially just use CNN output to automatically determine whether to report people to the authorities? For some reason I assumed they were just comparing the files they knew to be CSAM.
Yeah that's bad. What about deepdream/CNN reversing? Couldn't a rogue apple engineer just create a innocuous looking false positive, say a cat picture, share it on Reddit, and everybody who downloads it is flagged to police for CSAM?
No, there are two hashes used in the Apple system, one public and neural and one hidden, the intent of both is to match specific known images and not unknown new ones, and the result of passing both hashes is a manual review and not automatic reporting. I've never seen a published attack that would actually be a problem; they all misread how the system worked.
(Also, it's not reported to the police but to NCMEC, which is not a government agency. This is for 4th amendment privacy reasons.)
The CSAM flagging generally isn’t reported to police to prevent the situation you describe. Google would get the report and once some threshold is reached, a person reviews the report(s) and decides if the police are notified.
How can you be so sure? As I understand it, the hash is of features in the image and not the image itself. Are the CSAM feature detection heuristics public?