transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-07-05 22:00:09 +06:00

History

Kian Sierra McGettigan f7076cd346 Flax mistral (#26943 ) * direct copy from llama work * mistral modules forward pass working * flax mistral forward pass with sliding window * added tests * added layer collection approach * Revert "added layer collection approach" This reverts commit `0e2905bf22`. * Revert "Revert "added layer collection approach"" This reverts commit `fb17b6187a`. * fixed attention outputs * added mistral to init and auto * fixed import name * fixed layernorm weight dtype * freeze initialized weights * make sure conversion consideres bfloat16 * added backend * added docstrings * added cache * fixed sliding window causal mask * passes cache tests * passed all tests * applied make style * removed commented out code * applied fix-copies ignored other model changes * applied make fix-copies * removed unused functions * passed generation integration test * slow tests pass * fixed slow tests * changed default dtype from jax.numpy.float32 to float32 for docstring check * skip cache test for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids * updated checkpoint since from_pt not included * applied black style * removed unused args * Applied styling and fixup * changed checkpoint for doc back * fixed rf after adding it to hf hub * Add dummy ckpt * applied styling * added tokenizer to new ckpt * fixed slice format * fix init and slice * changed ref for placeholder TODO * added copies from Llama * applied styling * applied fix-copies * fixed docs * update weight dtype reconversion for sharded weights * removed Nullable input ids * Removed unnecessary output attentions in Module * added embedding weight initialziation * removed unused past_key_values * fixed deterministic * Fixed RMS Norm and added copied from * removed input_embeds * applied make style * removed nullable input ids from sequence classification model * added copied from GPTJ * added copied from Llama on FlaxMistralDecoderLayer * added copied from to FlaxMistralPreTrainedModel methods * fix test deprecation warning * freeze gpt neox random_params and fix copies * applied make style * fixed doc issue * skipped docstring test to allign # copied from * applied make style * removed FlaxMistralForSequenceClassification * removed unused padding_idx * removed more sequence classification * removed sequence classification * applied styling and consistency * added copied from in tests * removed sequence classification test logic * applied styling * applied make style * removed freeze and fixed copies * undo test change * changed repeat_kv to tile * fixed to key value groups * updated copyright year * split casual_mask * empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest * went back to 2023 for tests_pr_documentation_tests * went back to 2024 * changed tile to repeat * applied make style * empty for retry on Wav2Vec2		2024-01-31 14:19:02 +01:00
..
de	Fix broken link on page (#28451 )	2024-01-11 09:26:13 -08:00
en	Flax mistral (#26943 )	2024-01-31 14:19:02 +01:00
es	Fix broken link on page (#28451 )	2024-01-11 09:26:13 -08:00
fr	README: install transformers from conda-forge channel (#28313 )	2024-01-04 09:36:16 -08:00
hi	Hindi translation of pipeline_tutorial.md (#26837 )	2023-10-25 11:21:49 -07:00
it	TF: purge `TFTrainer` (#28483 )	2024-01-12 16:56:34 +00:00
ja	Add tf_keras imports to prepare for Keras 3 (#28588 )	2024-01-30 17:26:36 +00:00
ko	Fix broken link on page (#28451 )	2024-01-11 09:26:13 -08:00
ms	TVP model (#25856 )	2023-11-21 16:41:55 +00:00
pt	README: install transformers from conda-forge channel (#28313 )	2024-01-04 09:36:16 -08:00
te	Added Telugu [te] translations (#26828 )	2023-10-20 15:27:55 -07:00
tr	Translate index.md to Turkish (#27093 )	2023-11-08 08:35:20 -05:00
zh	Generate: consolidate output classes (#28494 )	2024-01-15 17:04:08 +00:00
_config.py	[`Styling`] stylify using ruff (#27144 )	2023-11-16 17:43:19 +01:00