transformers/docs/source
Kian Sierra McGettigan f7076cd346
Flax mistral (#26943)
* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf22.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187a.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
..
de Fix broken link on page (#28451) 2024-01-11 09:26:13 -08:00
en Flax mistral (#26943) 2024-01-31 14:19:02 +01:00
es Fix broken link on page (#28451) 2024-01-11 09:26:13 -08:00
fr README: install transformers from conda-forge channel (#28313) 2024-01-04 09:36:16 -08:00
hi Hindi translation of pipeline_tutorial.md (#26837) 2023-10-25 11:21:49 -07:00
it TF: purge TFTrainer (#28483) 2024-01-12 16:56:34 +00:00
ja Add tf_keras imports to prepare for Keras 3 (#28588) 2024-01-30 17:26:36 +00:00
ko Fix broken link on page (#28451) 2024-01-11 09:26:13 -08:00
ms TVP model (#25856) 2023-11-21 16:41:55 +00:00
pt README: install transformers from conda-forge channel (#28313) 2024-01-04 09:36:16 -08:00
te Added Telugu [te] translations (#26828) 2023-10-20 15:27:55 -07:00
tr Translate index.md to Turkish (#27093) 2023-11-08 08:35:20 -05:00
zh Generate: consolidate output classes (#28494) 2024-01-15 17:04:08 +00:00
_config.py [Styling] stylify using ruff (#27144) 2023-11-16 17:43:19 +01:00