
* Add SynthIDTextWatermarkLogitsProcessor * esolving comments. * Resolving comments. * esolving commits, * Improving SynthIDWatermark tests. * switch to PT version * detector as pretrained model + style * update training + style * rebase * Update logits_process.py * Improving SynthIDWatermark tests. * Shift detector training to wikitext negatives and stabilize with lower learning rate. * Clean up. * in for 7B * cleanup * upport python 3.8. * README and final cleanup. * HF Hub upload and initiaze. * Update requirements for synthid_text. * Adding SynthIDTextWatermarkDetector. * Detector testing. * Documentation changes. * Copyrights fix. * Fix detector api. * ironing out errors * ironing out errors * training checks * make fixup and make fix-copies * docstrings and add to docs * copyright * BC * test docstrings * move import * protect type hints * top level imports * watermarking example * direct imports * tpr fpr meaning * process_kwargs * SynthIDTextWatermarkingConfig docstring * assert -> exception * example updates * no immutable dict (cant be serialized) * pack fn * einsum equivalent * import order * fix test on gpu * add detector example --------- Co-authored-by: Sumedh Ghaisas <sumedhg@google.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co>
1.2 KiB
SynthID Text
This project showcases the use of SynthIDText for watermarking LLMs. The code shown in this repo also demostrates the training of the detector for detecting such watermarked text. This detector can be uploaded onto a private HF hub repo (private for security reasons) and can be initialized again through pretrained model loading also shown in this script.
See our blog post: https://huggingface.co/blog/synthid-text
Python version
User would need python 3.9 to run this example.
Installation and running
Once you install transformers you would need to install requirements for this project through requirements.txt provided in this folder.
pip install -r requirements.txt
To run the detector training
python detector_training.py --model_name=google/gemma-7b-it
Check the script for more parameters are are tunable and check out paper at link https://www.nature.com/articles/s41586-024-08025-4 for more information on these parameters.
Caveat
Make sure to run the training of the detector and the detection on the same hardware CPU, GPU or TPU to get consistent results (we use detecterministic randomness which is hardware dependent).