Toxicity Classifier
Check whether text is toxic, insulting, threatening, or hateful before you post or share it. Runs a small AI model 100% in your browser — no account, no upload.
Your data never leaves your deviceFrequently Asked Questions
-
What counts as "toxic"?
The model returns scores for six categories defined by the Jigsaw/Conversation AI dataset (used to train moderation models on Wikipedia comments) — toxic, severe toxic, obscene, threat, insult, and identity hate. Higher scores mean the model is more confident the category applies.
-
Which model is used?
A distilled BERT classifier (Xenova/toxic-bert) served as a quantized ONNX file (~60 MB). It runs via @huggingface/transformers on WebAssembly, and the weights are cached in your browser after the first use.
-
Is my text uploaded anywhere?
No. All classification runs in your browser. Nothing is sent to a server.
-
How accurate is it?
The model is trained on English-language social-media and comment data, so it performs best on similar content. It can miss sarcasm, dog whistles, and non-English toxicity. Always use it as an assistant, not a sole arbiter.