transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-16 11:08:23 +06:00

History

Kian Sierra McGettigan f7076cd346 Flax mistral (#26943 ) * direct copy from llama work * mistral modules forward pass working * flax mistral forward pass with sliding window * added tests * added layer collection approach * Revert "added layer collection approach" This reverts commit `0e2905bf22`. * Revert "Revert "added layer collection approach"" This reverts commit `fb17b6187a`. * fixed attention outputs * added mistral to init and auto * fixed import name * fixed layernorm weight dtype * freeze initialized weights * make sure conversion consideres bfloat16 * added backend * added docstrings * added cache * fixed sliding window causal mask * passes cache tests * passed all tests * applied make style * removed commented out code * applied fix-copies ignored other model changes * applied make fix-copies * removed unused functions * passed generation integration test * slow tests pass * fixed slow tests * changed default dtype from jax.numpy.float32 to float32 for docstring check * skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids * updated checkpoint since from_pt not included * applied black style * removed unused args * Applied styling and fixup * changed checkpoint for doc back * fixed rf after adding it to hf hub * Add dummy ckpt * applied styling * added tokenizer to new ckpt * fixed slice format * fix init and slice * changed ref for placeholder TODO * added copies from Llama * applied styling * applied fix-copies * fixed docs * update weight dtype reconversion for sharded weights * removed Nullable input ids * Removed unnecessary output attentions in Module * added embedding weight initialziation * removed unused past_key_values * fixed deterministic * Fixed RMS Norm and added copied from * removed input_embeds * applied make style * removed nullable input ids from sequence classification model * added copied from GPTJ * added copied from Llama on FlaxMistralDecoderLayer * added copied from to FlaxMistralPreTrainedModel methods * fix test deprecation warning * freeze gpt neox random_params and fix copies * applied make style * fixed doc issue * skipped docstring test to allign # copied from * applied make style * removed FlaxMistralForSequenceClassification * removed unused padding_idx * removed more sequence classification * removed sequence classification * applied styling and consistency * added copied from in tests * removed sequence classification test logic * applied styling * applied make style * removed freeze and fixed copies * undo test change * changed repeat_kv to tile * fixed to key value groups * updated copyright year * split casual_mask * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest * went back to 2023 for tests_pr_documentation_tests * went back to 2024 * changed tile to repeat * applied make style * empty for retry on Wav2Vec2		2024-01-31 14:19:02 +01:00
..
internal	Generate: consolidate output classes (#28494 )	2024-01-15 17:04:08 +00:00
main_classes	Improve Backbone API docs (#28666 )	2024-01-25 11:51:58 +00:00
model_doc	Flax mistral (#26943 )	2024-01-31 14:19:02 +01:00
tasks	[docs] Fix datasets in guides (#28715 )	2024-01-26 09:29:07 -08:00
_config.py	[`Styling`] stylify using ruff (#27144 )	2023-11-16 17:43:19 +01:00
_redirects.yml	Extended semantic segmentation to image segmentation (#27039 )	2023-11-23 15:58:21 +00:00
_toctree.yml	[`HfQuantizer`] Move it to "Developper guides" (#28768 )	2024-01-30 07:20:20 +01:00
accelerate.md	Fix typos (#25936 )	2023-09-04 11:15:12 +01:00
add_new_model.md	Update add_new_model.md (#26365 )	2023-09-25 12:58:11 +02:00
add_new_pipeline.md	Fix broken link on page (#28451 )	2024-01-11 09:26:13 -08:00
add_tensorflow_model.md	Remove `utils/documentation_tests.txt` (#26213 )	2023-09-18 13:33:01 +02:00
attention.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
autoclass_tutorial.md	Docs for AutoBackbone & Backbone (#27456 )	2023-12-11 08:22:17 -05:00
benchmarks.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
bertology.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
big_models.md	Fix typos (#25936 )	2023-09-04 11:15:12 +01:00
chat_templating.md	Update chat template warnings/guides (#27634 )	2023-11-27 18:40:10 +00:00
community.md	Update community.md (#25928 )	2023-09-04 11:16:34 +01:00
contributing.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
create_a_model.md	[docs] fixed links with 404 (#27327 )	2023-11-06 19:45:03 +00:00
custom_models.md	Add config tip to custom model docs (#28601 )	2024-01-22 13:46:04 +00:00
custom_tools.md	[doc] Always call it Agents for consistency (#25958 )	2023-09-05 12:27:20 +01:00
debugging.md	[docs] DeepSpeed (#28542 )	2024-01-24 08:31:28 -08:00
deepspeed.md	[docs] Fix doc format (#28684 )	2024-01-24 11:18:59 -08:00
fast_tokenizers.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
fsdp.md	[docs] Trainer docs (#28145 )	2023-12-20 10:37:23 -08:00
generation_strategies.md	Generate: fix speculative decoding (#28166 )	2023-12-20 18:55:35 +00:00
glossary.md	[Doc] Spanish translation of glossary.md (#27958 )	2023-12-13 09:21:59 -08:00
hf_quantizer.md	`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610 )	2024-01-30 02:48:25 +01:00
hpo_train.md	Remove-auth-token (#27060 )	2023-11-13 14:20:54 +01:00
index.md	Flax mistral (#26943 )	2024-01-31 14:19:02 +01:00
installation.md	README: install transformers from conda-forge channel (#28313 )	2024-01-04 09:36:16 -08:00
llm_tutorial_optimization.md	F.scaled_dot_product_attention support (#26572 )	2023-12-09 05:38:14 +09:00
llm_tutorial.md	Generate: All logits processors are documented and have examples (#27796 )	2023-12-07 15:11:35 +00:00
model_memory_anatomy.md	Fix typos (#25936 )	2023-09-04 11:15:12 +01:00
model_sharing.md	Fix typos (#25936 )	2023-09-04 11:15:12 +01:00
model_summary.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
multilingual.md	Fix typo in example code (#25583 )	2023-08-18 07:58:59 +02:00
notebooks.md	Enable doc in Spanish (#16518 )	2022-04-04 10:25:46 -04:00
pad_truncation.md	[Doc] Spanish translation of pad_truncation.md (#27890 )	2023-12-08 10:32:18 -08:00
peft.md	[`Peft`] `modules_to_save` support for peft integration (#27466 )	2023-11-14 10:32:57 +01:00
perf_hardware.md	docs: replace torch.distributed.run by torchrun (#27528 )	2023-11-27 16:26:33 +00:00
perf_infer_cpu.md	[docs] Update CPU/GPU inference docs (#26881 )	2023-10-31 09:44:51 -07:00
perf_infer_gpu_one.md	Add qwen2 (#28436 )	2024-01-17 16:02:22 +01:00
perf_torch_compile.md	Fix rendering for `torch.compile()` docs (#25432 )	2023-08-10 13:25:00 +02:00
perf_train_cpu_many.md	Doc (#28431 )	2024-01-11 08:55:48 -08:00
perf_train_cpu.md	improve efficient training on CPU documentation (#28646 )	2024-01-24 09:07:13 -08:00
perf_train_gpu_many.md	[`docs`] Improve visualization for vertical parallelism (#28583 )	2024-01-25 17:55:11 +00:00
perf_train_gpu_one.md	Improving Training Performance and Scalability Documentation (#28497 )	2024-01-16 11:30:26 +01:00
perf_train_special.md	[docs] MPS (#28016 )	2023-12-15 13:17:29 -08:00
perf_train_tpu_tf.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
performance.md	[docs] Update CPU/GPU inference docs (#26881 )	2023-10-31 09:44:51 -07:00
perplexity.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
philosophy.md	[docs] fixed links with 404 (#27327 )	2023-11-06 19:45:03 +00:00
pipeline_tutorial.md	[ASR Pipe] Improve docs and error messages (#26476 )	2023-09-29 18:32:37 +01:00
pipeline_webserver.md	Suggestions on Pipeline_webserver (#25570 )	2023-08-18 10:17:44 +02:00
pr_checks.md	Docstring check (#26052 )	2023-10-04 15:13:37 +02:00
preprocessing.md	[`docs`] Update preprocessing.md (#28719 )	2024-01-26 11:58:57 +00:00
quantization.md	`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610 )	2024-01-30 02:48:25 +01:00
quicktour.md	[TYPO] fix typo/format in quicktour.md (#25519 )	2023-08-16 08:03:23 +02:00
run_scripts.md	docs: replace torch.distributed.run by torchrun (#27528 )	2023-11-27 16:26:33 +00:00
sagemaker.md	[docs] fixed links with 404 (#27327 )	2023-11-06 19:45:03 +00:00
serialization.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
task_summary.md	[Doc] Fix token link in What 🤗 Transformers can do (#28123 )	2023-12-18 15:06:54 -08:00
tasks_explained.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
testing.md	[docs] DeepSpeed (#28542 )	2024-01-24 08:31:28 -08:00
tf_xla.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tflite.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
tokenizer_summary.md	Fix typo: Roberta -> RoBERTa (#25302 )	2023-08-03 14:17:30 -07:00
torchscript.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00
trainer.md	[docs] Trainer docs (#28145 )	2023-12-20 10:37:23 -08:00
training.md	Fix semantic error in evaluation section (#27675 )	2023-11-24 12:41:16 +01:00
transformers_agents.md	[doc] Always call it Agents for consistency (#25958 )	2023-09-05 12:27:20 +01:00
troubleshooting.md	Migrate doc files to Markdown. (#24376 )	2023-06-20 18:07:47 -04:00