Zentropi CoPE

Zentropi CoPE (Content Policy Enforcement) is a policy-adaptive AI text classifier. Unlike classifiers with fixed taxonomies, CoPE has no predefined categories; instead, you write your own policy text describing what you want to detect, and the model classifies content against those policies. This makes it particularly useful for platforms with nuanced or unusual content policies that off-the-shelf classifiers handle poorly.

The model powering the integration is CoPE-A-9B (version 1.x, released July 2025).

Requirements

A Zentropi account with API access
One or more labeler versions created in the Zentropi UI, each with a policy definition

Configuration

In Coop, go to Settings → Integrations and add your Zentropi credentials:

API Key: your Zentropi API key
Labeler Versions (optional): a list of labeler version IDs and labels you’ve created in the Zentropi UI. Adding them here makes them available by name when building rules.

Signals

Each Zentropi labeler version you’ve created in the Zentropi UI is a separate signal in Coop. When building a rule condition, select the Zentropi signal and enter the labeler version ID in the subcategory field.

Coop sends the text field value to the Zentropi API and receives a score between 0 and 1:

0 = confidently safe (model is confident the content does not violate your policy)
0.5 = uncertain
1 = confidently violating (model is confident the content violates your policy)

This score can be used with any comparator in a rule condition, for example score > 0.8 to trigger only on high-confidence violations.

Writing a policy

Zentropi classifiers work best when policy definitions follow a structured format:

Overview: a brief description of the policy subject
Definition of Terms: precise definitions of key words and phrases
Interpretation of Language: guidance on how to handle ambiguous language
Definition of Labels: what is included and excluded from the label

The Zentropi documentation and sample code notebook walk through policy authoring in detail.

Limitations

Text only: the integration classifies text fields; image and video content are not supported
8,000 token limit: text longer than 8K tokens will be truncated
US English only: performance degrades significantly for other languages and locales
Binary classification: each labeler version returns either “violating” (1) or “not violating” (0) with a confidence score; there are no intermediate categories or multi-label outputs
Policy design matters: the model cannot classify content that requires external verification (e.g., whether a link is malicious). Biases in the training data may affect classification patterns across demographic groups; monitor and audit decisions regularly.

Model Card


Model	CoPE-A-9B
Version	1.x
Release date	July 20, 2025
Training data	~60,000 labels across unique policy/content pairs; mix of automated and manual annotation; covers hate speech, sexual content, self-harm, harassment, and toxicity
Annotation methodology	Novel training methodology for policy interpretation rather than memorization; trained across conflicting policy formulations
Performance	Hate Speech: 91% (internal), 84% (public Ethos benchmark); Sexual Content: 89%; Toxic Speech: 90%; Self-Harm: 88%; Harassment: 73%
Compared to	Outperforms GPT-4o, Llama-3.1-8B, LlamaGuard3-8B, and ShieldGemma-9B across most categories

Coop Documentation