What counts as "toxic"?

The model returns scores for six categories defined by the Jigsaw/Conversation AI dataset (used to train moderation models on Wikipedia comments) — toxic, severe toxic, obscene, threat, insult, and identity hate. Higher scores mean the model is more confident the category applies.

A distilled BERT classifier (Xenova/toxic-bert) served as a quantized ONNX file (~60 MB). It runs via @huggingface/transformers on WebAssembly, and the weights are cached in your browser after the first use.

Is my text uploaded anywhere?

No. All classification runs in your browser. Nothing is sent to a server.

The model is trained on English-language social-media and comment data, so it performs best on similar content. It can miss sarcasm, dog whistles, and non-English toxicity. Always use it as an assistant, not a sole arbiter.

Toxicity Classifier

Check whether text is toxic, insulting, threatening, or hateful before you post or share it. Runs a small AI model 100% in your browser — no account, no upload.

Your data never leaves your device

Frequently Asked Questions

What counts as "toxic"?

The model returns scores for six categories defined by the Jigsaw/Conversation AI dataset (used to train moderation models on Wikipedia comments) — toxic, severe toxic, obscene, threat, insult, and identity hate. Higher scores mean the model is more confident the category applies.
Which model is used?

A distilled BERT classifier (Xenova/toxic-bert) served as a quantized ONNX file (~60 MB). It runs via @huggingface/transformers on WebAssembly, and the weights are cached in your browser after the first use.
Is my text uploaded anywhere?

No. All classification runs in your browser. Nothing is sent to a server.
How accurate is it?

The model is trained on English-language social-media and comment data, so it performs best on similar content. It can miss sarcasm, dog whistles, and non-English toxicity. Always use it as an assistant, not a sole arbiter.

Toxicity Classifier

Frequently Asked Questions

What counts as "toxic"?

Which model is used?

Is my text uploaded anywhere?

How accurate is it?