* Add models
* Add models and update `_toctree.yml`
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/bros.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/blip-2.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/camembert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* solve merge conflicts and update paper titles
* Update docs/source/ja/model_doc/bridgetower.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/canine.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/ja/model_doc/chinese_clip.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update the authons name in bros..md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add working convertion script
* first non-working version of modeling code
* update modeling code (working)
* make style
* make fix-copies
* add config docstrings
* add config to ignore docstrings formatage due to unconventional markdown
* fix copies
* fix generation num_return_sequences
* enrich docs
* add and fix tests beside integration tests
* update integration tests
* update repo id
* add tie weights and make style
* correct naming in .md
* fix imports and so on
* correct docstrings
* fix fp16 speech forward
* fix speechencoder attention
* make style
* fix copied from
* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2
* Apply suggestions on configuration
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* remove useless public models
* fix private models + better naming for T2U models
* clean speech encoder relative position embeddings
* refactor chunk attention
* add docstrings to chunk attention method
* improve naming and docstrings
* rename some attention variables + add temperature sampling in T2U model
* rename DOCSTRINGS variable names
* make style + remove 2 useless config parameters
* enrich model card
* remove any attention_head reference + fix temperature in T2U
* new fmt and make style
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* rename spkr_id->speaker_id and change docstrings of get_char_input_ids
* simplify v2attention
* make style
* Update seamless_m4t_v2.md
* update code and tests with last update
* update repo ids
* fill article name, abstract andauthors
* update not_doctested and slow_doc tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add distribution head to forecasting
* formatting
* Add generate function for forecasting
* Add generate function to prediction task
* formatting
* use argsort
* add past_observed_mask ordering
* fix arguments
* docs
* add back test_model_outputs_equivalence test
* formatting
* cleanup
* formatting
* use ACT2CLS
* formatting
* fix add_start_docstrings decorator
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* fix typos
* add forecast_masking
* fixed tests
* use set_seed
* fix doc test
* formatting
* Update docs/source/en/model_doc/patchtst.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* better var names
* rename PatchTSTTranspose
* fix argument names and docs string
* remove compute_num_patches and unused class
* remove assert
* renamed to PatchTSTMasking
* use num_labels for classification
* use num_labels
* use default num_labels from super class
* move model_type after docstring
* renamed PatchTSTForMaskPretraining
* bs -> batch_size
* more review fixes
* use hidden_state
* rename encoder layer and block class
* remove commented seed_number
* edit docstring
* Add docstring
* formatting
* use past_observed_mask
* doc suggestion
* make fix-copies
* use Args:
* add docstring
* add docstring
* change some variable names and add PatchTST before some class names
* formatting
* fix argument types
* fix tests
* change x variable to patch_input
* format
* formatting
* fix-copies
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* move loss to forward
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* formatting
* fix a bug when pre_norm is set to True
* output_hidden_states is set to False as default
* set pre_norm=True as default
* format docstring
* format
* output_hidden_states is None by default
* add missing docs
* better var names
* docstring: remove default to False in output_hidden_states
* change labels name to target_values in regression task
* format
* fix tests
* change to forecast_mask_ratios and random_mask_ratio
* change mask names
* change future_values to target_values param in the prediction class
* remove nn.Sequential and make PatchTSTBatchNorm class
* black
* fix argument name for prediction
* add output_attentions option
* add output_attentions to PatchTSTEncoder
* formatting
* Add attention output option to all classes
* Remove PatchTSTEncoderBlock
* create PatchTSTEmbedding class
* use config in PatchTSTPatchify
* Use config in PatchTSTMasking class
* add channel_attn_weights
* Add PatchTSTScaler class
* add output_attentions arg to test function
* format
* Update doc with image patchtst.md
* fix-copies
* rename Forecast <-> Prediction
* change name of a few parameters to match with PatchTSMixer.
* Remove *ForForecasting class to match with other time series models.
* make style
* Remove PatchTSTForForecasting in the test
* remove PatchTSTForForecastingOutput class
* change test_forecast_head to test_prediction_head
* style
* fix docs
* fix tests
* change num_labels to num_targets
* Remove PatchTSTTranspose
* remove arguments in PatchTSTMeanScaler
* remove arguments in PatchTSTStdScaler
* add config as an argument to all the scaler classes
* reformat
* Add norm_eps for batchnorm and layernorm
* reformat.
* reformat
* edit docstring
* update docstring
* change variable name pooling to pooling_type
* fix output_hidden_states as tuple
* fix bug when calling PatchTSTBatchNorm
* change stride to patch_stride
* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder
* formatting
* initialize scalers with configs
* edit output_hidden_states
* style
* fix forecast_mask_patches doc string
* doc improvements
* move summary to the start
* typo
* fix docstring
* turn off masking when using prediction, regression, classification
* return scaled output
* adjust output when using distribution head
* remove _num_patches function in the config
* get config.num_patches from patchifier init
* add output_attentions docstring, remove tuple in output_hidden_states
* change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput
* remove print("model_class: ", model_class)
* change encoder_attention_heads to num_attention_heads
* change norm to norm_layer
* change encoder_layers to num_hidden_layers
* change shared_embedding to share_embedding, shared_projection to share_projection
* add output_attentions
* more robust check of norm_type
* change dropout_path to path_dropout
* edit docstring
* remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding
* edit shape of cls_token and initialize it
* add a check on the num_input_channels.
* edit head_dim in the Prediction class to allow the use of cls_token
* remove some positional_encoding_type options, remove learn_pe arg, initalize pe
* change Exception to ValueError
* format
* norm_type is "batchnorm"
* make style
* change cls_token shape
* Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios.
* Bring PatchTSTClassificationHead on top of PatchTSTForClassification
* change encoder_ffn_dim to ffn_dim and edit the docstring.
* update variable names to match with the config
* add generation tests
* change num_mask_patches to num_forecast_mask_patches
* Add examples explaining the use of these models
* make style
* Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)"
This reverts commit 78f6ed6c70.
* make style
* fix default std scaler's minimum_scale
* fix docstring
* close code blocks
* Update docs/source/en/model_doc/patchtst.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/configuration_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix tests
* add add_start_docstrings
* move examples to the forward's docstrings
* update prepare_batch
* update test
* fix test_prediction_head
* fix generation test
* use seed to create generator
* add output_hidden_states and config.num_patches
* add loc and scale args in PatchTSTForPredictionOutput
* edit outputs if if not return_dict
* use self.share_embedding to check instead checking type.
* remove seed
* make style
* seed is an optional int
* fix test
* generator device
* Fix assertTrue test
* swap order of items in outputs when return_dict=False.
* add mask_type and random_mask_ratio to unittest
* Update modeling_patchtst.py
* add add_start_docstrings for regression model
* make style
* update model path
* Edit the ValueError comment in forecast_masking
* update examples
* make style
* fix commented code
* update examples: remove config from from_pretrained call
* Edit example outputs
* Set default target_values to None
* remove config setting in regression example
* Update configuration_patchtst.py
* Update configuration_patchtst.py
* remove config from examples
* change default d_model and ffn_dim
* norm_eps default
* set has_attentions to Trye and define self.seq_length = self.num_patche
* update docstring
* change variable mask_input to do_mask_input
* fix blank space.
* change logger.debug to logger.warning.
* remove unused PATCHTST_INPUTS_DOCSTRING
* remove all_generative_model_classes
* set test_missing_keys=True
* remove undefined params in the docstring.
---------
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* added flash attention for opt
* added to list
* fix use cache (#3)
* style fix
* fix text
* test fix2
* reverted until 689f599
* torch fx tests are working now!
* small fix
* added TODO docstring
* changes
* comments and .md file modification
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* initial commit
* Add inital testing files and modify __init__ files to add UnivNet imports.
* Fix some bugs
* Add checkpoint conversion script and add references to transformers pre-trained model.
* Add UnivNet entries for auto.
* Add initial docs for UnivNet.
* Handle input and output shapes in UnivNetGan.forward and add initial docstrings.
* Write tests and make them pass.
* Write docs.
* Add UnivNet doc to _toctree.yml and improve docs.
* fix typo
* make fixup
* make fix-copies
* Add upsample_rates parameter to config and improve config documentation.
* make fixup
* make fix-copies
* Remove unused upsample_rates config parameter.
* apply suggestions from review
* make style
* Verify and add reason for skipped tests inherited from ModelTesterMixin.
* Add initial UnivNetGan integration tests
* make style
* Remove noise_length input to UnivNetGan and improve integration tests.
* Fix bug and make style
* Make UnivNet integration tests pass
* Add initial code for UnivNetFeatureExtractor.
* make style
* Add initial tests for UnivNetFeatureExtractor.
* make style
* Properly initialize weights for UnivNetGan
* Get feature extractor fast tests passing
* make style
* Get feature extractor integration tests passing
* Get UnivNet integration tests passing
* make style
* Add UnivNetGan usage example
* make style and use feature extractor from hub in integration tests
* Update tips in docs
* apply suggestions from review
* make style
* Calculate padding directly instead of using get_padding methods.
* Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific.
* Update feature extractor to support using model(**inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__.
* Perform padding before generating noise to ensure the shapes are correct.
* Rename UnivNetGan.forward's noise_waveform argument to noise_sequence.
* make style
* Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__.
* Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model.
* Add expected mean and stddev checks to the integration tests and make them pass.
* make style
* Make it possible to use model(**inputs), where inputs is the output of the feature extractor.
* fix typo in UnivNetGanConfig example
* Calculate spectrogram_zero from other config values.
* apply suggestions from review
* make style
* Refactor UnivNet conversion script to use load_state_dict (following persimmon).
* Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor.
* make style
* Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices.
* make style
* Use config in UnivNetGan modeling blocks.
* make style
* Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper.
* make style
* Improving padding documentation.
* Add UnivNet usage example to the docs.
* apply suggestions from review
* Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor.
* Improve UnivNetGan.forward return docstring.
* Update table in docs/source/en/index.md.
* make fix-copies
* Rename UnivNet components to have pattern UnivNet*.
* make style
* make fix-copies
* Update docs
* make style
* Increase tolerance on flaky unbatched integration test.
* Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors.
* Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding.
* Update documentation and clean up padding code.
* make style
* make style
* Remove torch dependency from UnivNetFeatureExtractor.
* make style
* Fix UnivNetModel usage example
* Clean up feature extractor code/docstrings.
* apply suggestions from review
* make style
* Add comments for tests skipped via ModelTesterMixin flags.
* Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag.
* Add # Copied from statements to copied UnivNetFeatureExtractionTest tests.
* Simplify UnivNetFeatureExtractorTest.test_batch_decode.
* Add support for unbatched padding_masks in UnivNetModel.forward.
* Refactor unbatched padding_mask support.
* make style
* tvp model for video grounding
add tokenizer auto
fix param in TVPProcessor
add docs
clear comments and enable different torch dtype
add image processor test and model test and fix code style
* fix conflict
* fix model doc
* fix image processing tests
* fix tvp tests
* remove torch in processor
* fix grammar error
* add more details on tvp.md
* fix model arch for loss, grammar, and processor
* add docstring and do not regard TvpTransformer, TvpVisionModel as individual model
* use pad_image
* update copyright
* control first downsample stride
* reduce first only works for ResNetBottleNeckLayer
* fix param name
* fix style
* add testing
* fix style
* rm init_weight
* fix style
* add post init
* fix comments
* do not test TvpTransformer
* fix warning
* fix style
* fix example
* fix config map
* add link in config
* fix comments
* fix style
* rm useless param
* change attention
* change test
* add notes
* fix comments
* fix tvp
* import checkpointing
* fix gradient checkpointing
* Use a more accurate example in readme
* update
* fix copy
* fix style
* update readme
* delete print
* remove tvp test_forward_signature
* remove TvpTransformer
* fix test init model
* merge main and make style
* fix tests and others
* fix image processor
* fix style and model_input_names
* fix tests
* Enable large-v3 downloading and update language list
* Fix type annotation
* make fixup
* Export Whisper feature extractor
* Fix error after extractor loading
* Do not use pre-computed mel filters
* Save the full preprocessor properly
* Update docs
* Remove comment
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add alignment heads consistent with each Whisper version
* Remove alignment heads calculation
* Save fast tokenizer format as well
* Fix slow to fast conversion
* Fix bos/eos/pad token IDs in the model config
* Add decoder_start_token_id to config
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Updated albert.md doc for ALBERT model
* Update docs/source/en/model_doc/albert.md
Fixed Resources heading
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update the ALBERT model doc resources
Fixed resource example for fine-tuning the ALBERT sentence-pair classification.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
Removed resource duplicate
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Updated albert.md doc with reviewed changes
* Updated albert.md doc for ALBERT
* Update docs/source/en/model_doc/albert.md
Removed duplicates from updated docs/source/en/model_doc/albert.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/model_doc/albert.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Initial commit of PatchTST model classes
Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
* Add PatchTSTForPretraining
* update to include classification
Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
* clean up auto files
* Add PatchTSTForPrediction
* Fix relative import
* Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder
* temporary adding absolute path + add PatchTSTForForecasting class
* Update base PatchTSTModel + Unittest
* Update ForecastHead to use the config class
* edit cv_random_masking, add mask to model output
* Update configuration_patchtst.py
* add masked_loss to the pretraining
* add PatchEmbeddings
* Update configuration_patchtst.py
* edit loss which considers mask in the pretraining
* remove patch_last option
* Add commits from internal repo
* Update ForecastHead
* Add model weight initilization + unittest
* Update PatchTST unittest to use local import
* PatchTST integration tests for pretraining and prediction
* Added PatchTSTForRegression + update unittest to include label generation
* Revert unrelated model test file
* Combine similar output classes
* update PredictionHead
* Update configuration_patchtst.py
* Add Revin
* small edit to PatchTSTModelOutputWithNoAttention
* Update modeling_patchtst.py
* Updating integration test for forecasting
* Fix unittest after class structure changed
* docstring updates
* change input_size to num_input_channels
* more formatting
* Remove some unused params
* Add a comment for pretrained models
* add channel_attention option
add channel_attention option and remove unused positional encoders.
* Update PatchTST models to use HF's MultiHeadAttention module
* Update paper + github urls
* Fix hidden_state return value
* Update integration test to use PatchTSTForForecasting
* Adding dataclass decorator for model output classes
* Run fixup script
* Rename model repos for integration test
* edit argument explanation
* change individual option to shared_projection
* style
* Rename integration test + import cleanup
* Fix outpu_hidden_states return value
* removed unused mode
* added std, mean and nops scaler
* add initial distributional loss for predition
* fix typo in docs
* add generate function
* formatting
* add num_parallel_samples
* Fix a typo
* copy weighted_average function, edit PredictionHead
* edit PredictionHead
* add distribution head to forecasting
* formatting
* Add generate function for forecasting
* Add generate function to prediction task
* formatting
* use argsort
* add past_observed_mask ordering
* fix arguments
* docs
* add back test_model_outputs_equivalence test
* formatting
* cleanup
* formatting
* use ACT2CLS
* formatting
* fix add_start_docstrings decorator
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* add distribution head and generate function to regression task
add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput, PatchTSTForRegressionOutput.
* fix typos
* add forecast_masking
* fixed tests
* use set_seed
* fix doc test
* formatting
* Update docs/source/en/model_doc/patchtst.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* better var names
* rename PatchTSTTranspose
* fix argument names and docs string
* remove compute_num_patches and unused class
* remove assert
* renamed to PatchTSTMasking
* use num_labels for classification
* use num_labels
* use default num_labels from super class
* move model_type after docstring
* renamed PatchTSTForMaskPretraining
* bs -> batch_size
* more review fixes
* use hidden_state
* rename encoder layer and block class
* remove commented seed_number
* edit docstring
* Add docstring
* formatting
* use past_observed_mask
* doc suggestion
* make fix-copies
* use Args:
* add docstring
* add docstring
* change some variable names and add PatchTST before some class names
* formatting
* fix argument types
* fix tests
* change x variable to patch_input
* format
* formatting
* fix-copies
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* move loss to forward
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/patchtst/modeling_patchtst.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* formatting
* fix a bug when pre_norm is set to True
* output_hidden_states is set to False as default
* set pre_norm=True as default
* format docstring
* format
* output_hidden_states is None by default
* add missing docs
* better var names
* docstring: remove default to False in output_hidden_states
* change labels name to target_values in regression task
* format
* fix tests
* change to forecast_mask_ratios and random_mask_ratio
* change mask names
* change future_values to target_values param in the prediction class
* remove nn.Sequential and make PatchTSTBatchNorm class
* black
* fix argument name for prediction
* add output_attentions option
* add output_attentions to PatchTSTEncoder
* formatting
* Add attention output option to all classes
* Remove PatchTSTEncoderBlock
* create PatchTSTEmbedding class
* use config in PatchTSTPatchify
* Use config in PatchTSTMasking class
* add channel_attn_weights
* Add PatchTSTScaler class
* add output_attentions arg to test function
* format
* Update doc with image patchtst.md
* fix-copies
* rename Forecast <-> Prediction
* change name of a few parameters to match with PatchTSMixer.
* Remove *ForForecasting class to match with other time series models.
* make style
* Remove PatchTSTForForecasting in the test
* remove PatchTSTForForecastingOutput class
* change test_forecast_head to test_prediction_head
* style
* fix docs
* fix tests
* change num_labels to num_targets
* Remove PatchTSTTranspose
* remove arguments in PatchTSTMeanScaler
* remove arguments in PatchTSTStdScaler
* add config as an argument to all the scaler classes
* reformat
* Add norm_eps for batchnorm and layernorm
* reformat.
* reformat
* edit docstring
* update docstring
* change variable name pooling to pooling_type
* fix output_hidden_states as tuple
* fix bug when calling PatchTSTBatchNorm
* change stride to patch_stride
* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder
* formatting
* initialize scalers with configs
* edit output_hidden_states
* style
* fix forecast_mask_patches doc string
---------
Co-authored-by: Gift Sinthong <gift.sinthong@ibm.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: Ngoc Diep Do <diiepy@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* only dir not even init
* init
* tokenizer removed and reference of codegen added
* modeling file updated a lot remaining app_rotary_emb
* conversion script done
* conversion script fixed, a lot of factoring done and most tests pass
* added token_clf and extractive_QA_head
* integration tests pass
* flash attn tests pass!
* config done
* more docs in modeling file
* some style fix
* style and others
* doc test error fix
* more doc fix
* some attention fixes
* most fixes
* style and other fixes
* docs fix and config
* doc fix
* some comments
* conversion script updated
* conversion script updated
* Revert "conversion script updated"
This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f.
* final comments
* add Phi to language_modeling.md
* edit phi.md file
* rebase and fix
* removed phi-1.5 example
* changed model_type from 'phi'->'mixformer-sequential'
* small change
* small change
* revert \small change
* changed mixformer-sequential->phi
* small change
* added phi-1.5 example instead of phi-1
* doc test might pass now
* rebase and small change
* added the dropout layer
* more fixes
* modified .md file
* very very small doc change
* init commit
* attention arch done except rotary emb
* rotary emb done
* text encoder working
* outputs matching
* arch first pass done
* make commands done, tests and docs remaining
* all tests passed, only docs remaining
* docs done
* doc-builder fix
* convert script removed(not relevant)
* minor comments done
* added ckpt conversion script
* tokenizer done
* very minor fix of index.md 2
* mostly make fixup related
* all done except fe and rotary emb
* very small change
* removed unidecode dependency
* style changes
* tokenizer removed require_backends
* added require_inflect to tokenizer tests
* removed VOCAB_FILES in tokenizer test
* inflect dependency removed
* added rotary pos emb cache and simplified the apply method
* style
* little doc change
* more comments
* feature extractor added
* added processor
* auto-regressive config added
* added CLVPConditioningEncoder
* comments done except the test one
* weights added successfull(NOT tested)
* tokenizer fix with numbers
* generate outputs matching
* almost tests passing Integ tests not written
* Integ tests added
* major CUDA error fixed
* docs done
* rebase and multiple fixes
* fixed rebase overwrites
* generate code simplified and tests for AutoRegressive model added
* minor changes
* refectored gpt2 code in clvp file
* weights done and all code refactored
* mostly done except the fast_tokenizer
* doc test fix
* config file's doc fixes
* more config fix
* more comments
* tokenizer comments mostly done
* modeling file mostly refactored and can load modules
* ClvpEncoder tested
* ClvpDecoder, ClvpModel and ClvpForCausalLM tested
* integration and all tests passed
* more fixes
* docs almost done
* ckpt conversion refectored
* style and some failing tests fix
* comments
* temporary output fix but test_assisted_decoding_matches_greedy_search test fails
* majority changes done
* use_cache outputs same now! Along with the asisted_greedy_decoding test fix
* more comments
* more comments
* prepare_inputs_for_generation fixed and _prepare_model_inputs added
* style fix
* clvp.md change
* moved clvpconditionalencoder norms
* add model to new index
* added tokenizer input_ids_with_special_tokens
* small fix
* config mostly done
* added config-tester and changed conversion script
* more comments
* comments
* style fix
* some comments
* tokenizer changed back to prev state
* small commnets
* added output hidden states for the main model
* style fix
* comments
* small change
* revert small change
* .
* Update clvp.md
* Update test_modeling_clvp.py
* :)
* some minor change
* new fixes
* remove to_dict from FE
* Fix error in convert_openai_to_hf.py: "_download() missing 1 required positional argument: root"
* Fix error in convert_openai_to_hf.py: "TypeError: byte indices must be integers or slices, not str"
* Fix decoder_attention_heads value in convert_openai_to_hf.py.
Correct the assignment for `decoder_attention_heads` in the conversion script for the Whisper model.
* Black reformat convert_openai_to_hf.py file.
* Fix Whisper model configuration defaults (for Tiny).
- Correct encoder/decoder layers and attention heads count.
- Update model width (`d_model`) to 384.
* Add docstring to the convert_openai_to_hf.py script with a doctest
* Add shebang and +x permission to the convert_openai_to_hf.py
* convert_openai_to_hf.py: reuse the read model_bytes in the _download() function
* Move convert_openai_to_hf.py doctest example to whisper.md
* whisper.md: Add an inference example to the Conversion section.
* whisper.md: remove `model.config.forced_decoder_ids` from examples (deprecated)
* whisper.md: Remove "## Format Conversion" section; not used by users
* whisper.md: Use librispeech_asr_dummy dataset and load_dataset()
* first batch of structure improvements for model_docs
* second batch of structure improvements for model_docs
* more structure improvements for model_docs
* more structure improvements for model_docs
* structure improvements for cv model_docs
* more structural refactoring
* addressed feedback about image processors
* Add type annotations to TFConvNextDropPath
* Use tf.debugging.assert_equal for TFConvNextEmbeddings shape check
* Add TensorFlow implementation of ConvNeXTV2
* check_docstrings: add TFConvNextV2Model to exclusions
TFConvNextV2Model and TFConvNextV2ForImageClassification have docstrings
which are equivalent to their PyTorch cousins, but a parsing issue prevents them
from passing the test.
Adding exclusions for these two classes as discussed in #25558.
* first raw commit
* still POC
* tentative convert script
* almost working speech encoder conversion scripts
* intermediate code for encoder/decoders
* add modeling code
* first version of speech encoder
* make style
* add new adapter layer architecture
* add adapter block
* add first tentative config
* add working speech encoder conversion
* base model convert works now
* make style
* remove unnecessary classes
* remove unecessary functions
* add modeling code speech encoder
* rework logics
* forward pass of sub components work
* add modeling codes
* some config modifs and modeling code modifs
* save WIP
* new edits
* same output speech encoder
* correct attention mask
* correct attention mask
* fix generation
* new generation logics
* erase comments
* make style
* fix typo
* add some descriptions
* new state
* clean imports
* add tests
* make style
* make beam search and num_return_sequences>1 works
* correct edge case issue
* correct SeamlessM4TConformerSamePadLayer copied from
* replace ACT2FN relu by nn.relu
* remove unecessary return variable
* move back a class
* change name conformer_attention_mask ->conv_attention_mask
* better nit code
* add some Copied from statements
* small nits
* small nit in dict.get
* rename t2u model -> conditionalgeneration
* ongoing refactoring of structure
* update models architecture
* remove SeamlessM4TMultiModal classes
* add tests
* adapt tests
* some non-working code for vocoder
* add seamlessM4T vocoder
* remove buggy line
* fix some hifigan related bugs
* remove hifigan specifc config
* change
* add WIP tokenization
* add seamlessM4T working tokenzier
* update tokenization
* add tentative feature extractor
* Update converting script
* update working FE
* refactor input_values -> input_features
* update FE
* changes in generation, tokenizer and modeling
* make style and add t2u_decoder_input_ids
* add intermediate outputs for ToSpeech models
* add vocoder to speech models
* update valueerror
* update FE with languages
* add vocoder convert
* update config docstrings and names
* update generation code and configuration
* remove todos and update config.pad_token_id to generation_config.pad_token_id
* move block vocoder
* remove unecessary code and uniformize tospeech code
* add feature extractor import
* make style and fix some copies from
* correct consistency + make fix-copies
* add processor code
* remove comments
* add fast tokenizer support
* correct pad_token_id in M4TModel
* correct config
* update tests and codes + make style
* make some suggested correstion - correct comments and change naming
* rename some attributes
* rename some attributes
* remove unecessary sequential
* remove option to use dur predictor
* nit
* refactor hifigan
* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config
* add tests
* change tgt_lang logic
* update generation ToSpeech
* add support import SeamlessM4TProcessor
* fix generate
* make tests
* update integration tests, add option to only return text and update tokenizer fast
* fix wrong function call
* update import and convert script
* update integration tests + update repo id
* correct paths and add first test
* update how new attention masks are computed
* update tests
* take first care of batching in vocoder code
* add batching with the vocoder
* add waveform lengths to model outputs
* make style
* add generate kwargs + forward kwargs of M4TModel
* add docstrings forward methods
* reformate docstrings
* add docstrings t2u model
* add another round of modeling docstrings + reformate speaker_id -> spkr_id
* make style
* fix check_repo
* make style
* add seamlessm4t to toctree
* correct check_config_attributes
* write config docstrings + some modifs
* make style
* add docstrings tokenizer
* add docstrings to processor, fe and tokenizers
* make style
* write first version of model docs
* fix FE + correct FE test
* fix tokenizer + add correct integration tests
* fix most tokenization tests
* make style
* correct most processor test
* add generation tests and fix num_return_sequences > 1
* correct integration tests -still one left
* make style
* correct position embedding
* change numbeams to 1
* refactor some modeling code and correct one test
* make style
* correct typo
* refactor intermediate fnn
* refactor feedforward conformer
* make style
* remove comments
* make style
* fix tokenizer tests
* make style
* correct processor tests
* make style
* correct S2TT integration
* Apply suggestions from Sanchit code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* correct typo
* replace torch.nn->nn + make style
* change Output naming (waveforms -> waveform) and ordering
* nit renaming and formating
* remove return None when not necessary
* refactor SeamlessM4TConformerFeedForward
* nit typo
* remove almost copied from comments
* add a copied from comment and remove an unecessary dropout
* remove inputs_embeds from speechencoder
* remove backward compatibiliy function
* reformate class docstrings for a few components
* remove unecessary methods
* split over 2 lines smthg hard to read
* make style
* replace two steps offset by one step as suggested
* nice typo
* move warnings
* remove useless lines from processor
* make generation non-standard test more robusts
* remove torch.inference_mode from tests
* split integration tests
* enrich md
* rename control_symbol_vocoder_offset->vocoder_offset
* clean convert file
* remove tgt_lang and src_lang from FE
* change generate docstring of ToText models
* update generate docstring of tospeech models
* unify how to deal withtext_decoder_input_ids
* add default spkr_id
* unify tgt_lang for t2u_model
* simplify tgt_lang verification
* remove a todo
* change config docstring
* make style
* simplify t2u_tgt_lang_id
* make style
* enrich/correct comments
* enrich .md
* correct typo in docstrings
* add torchaudio dependency
* update tokenizer
* make style and fix copies
* modify SeamlessM4TConverter with new tokenizer behaviour
* make style
* correct small typo docs
* fix import
* update docs and add requirement to tests
* add convert_fairseq2_to_hf in utils/not_doctested.txt
* update FE
* fix imports and make style
* remove torchaudio in FE test
* add seamless_m4t.md to utils/not_doctested.txt
* nits and change the way docstring dataset is loaded
* move checkpoints from ylacombe/ to facebook/ orga
* refactor warning/error to be in the 119 line width limit
* round overly precised floats
* add stereo audio behaviour
* refactor .md and make style
* enrich docs with more precised architecture description
* readd undocumented models
* make fix-copies
* apply some suggestions
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* correct bug from previous commit
* refactor a parameter allowing to clean the code + some small nits
* clean tokenizer
* make style and fix
* make style
* clean tokenizers arguments
* add precisions for some tests
* move docs from not_tested to slow
* modify tokenizer according to last comments
* add copied from statements in tests
* correct convert script
* correct parameter docstring style
* correct tokenization
* correct multi gpus
* make style
* clean modeling code
* make style
* add copied from statements
* add copied statements
* add support with ASR pipeline
* remove file added inadvertently
* fix docstrings seamlessM4TModel
* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown
* add seamlessm4t to assisted generation ignored models
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* initial commit
* add processor, add fuyu naming
* add draft processor
* fix processor
* remove dropout to fix loading of weights
* add image processing fixes from Pedro
* fix
* fix processor
* add basic processing fuyu test
* add documentation and TODO
* address comments, add tests, add doc
* replace assert with torch asserts
* add Mixins and fix tests
* clean imports
* add model tester, clean imports
* fix embedding test
* add updated tests from pre-release model
* Processor: return input_ids used for inference
* separate processing and model tests
* relax test tolerance for embeddings
* add test for logit comparison
* make sure fuyu image processor is imported in the init
* fix formattingh
* more formatting issues
* and more
* fixups
* remove some stuff
* nits
* update init
* remove the fuyu file
* Update integration test with release model
* Update conversion script.
The projection is not used, as confirmed by the authors.
* improve geenration
* Remove duplicate function
* Trickle down patches to model call
* processing fuyu updates
* remove things
* fix prepare_inputs_for_generation to fix generate()
* remove model_input
* update
* add generation tests
* nits
* draft leverage automodel and autoconfig
* nits
* fix dtype patch
* address comments, update READMEs and doc, include tests
* add working processing test, remove refs to subsequences
* add tests, remove Sequence classification
* processing
* update
* update the conversion script
* more processing cleanup
* safe import
* take out ModelTesterMixin for early release
* more cl;eanup
* more cleanup
* more cleanup
* and more
* register a buffer
* nits
* add postprocessing of generate output
* nits
* updates
* add one working test
* fix test
* make fixup works
* fixup
* Arthur's updates
* nits
* update
* update
* fix processor
* update tests
* passe more fixups
* fix
* nits
* don't import torch
* skip fuyu config for now
* fixup done
* fixup
* update
* oups
* nits
* Use input embeddings
* no buffer
* update
* styling processing fuyu
* fix test
* update licence
* protect torch import
* fixup and update not doctested
* kwargs should be passed
* udpates
* update the impofixuprts in the test
* protect import
* protecting imports
* protect imports in type checking
* add testing decorators
* protect top level import structure
* fix typo
* fix check init
* move requires_backend to functions
* Imports
* Protect types
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre@huggingface.co>
* Chore: Typo fixed in multiple files of docs/source/en/model_doc
* Update docs/source/en/model_doc/nllb-moe.md
Co-authored-by: Aryan V S <avs050602@gmail.com>
---------
Co-authored-by: Aryan V S <avs050602@gmail.com>
* docs: feat: model resources for CLIP
* fix: resolve suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix: resolve suggestion
* fix: resolve suggestion
* fix: resolve suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix: resolve suggestion
* fix: resolve suggestions
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add FA-2 support for mistral
* fixup
* add sliding windows
* fixing few nits
* v1 slicing cache - logits do not match
* add comment
* fix bugs
* more mem efficient
* add warning once
* add warning once
* oops
* fixup
* more comments
* copy
* add safety checker
* fixup
* Update src/transformers/models/mistral/modeling_mistral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* copied from
* up
* raise when padding side is right
* fixup
* add doc + few minor changes
* fixup
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add Bros boilerplate
* copy and pasted modeling_bros.py from official Bros repo
* update copyright of bros files
* copy tokenization_bros.py from official repo and update import path
* copy tokenization_bros_fast.py from official repo and update import path
* copy configuration_bros.py from official repo and update import path
* remove trailing period in copyright line
* copy and paste bros/__init__.py from official repo
* save formatting
* remove unused unnecessary pe_type argument - using only crel type
* resolve import issue
* remove unused model classes
* remove unnecessary tests
* remove unused classes
* fix original code's bug - layer_module's argument order
* clean up modeling auto
* add bbox to prepare_config_and_inputs
* set temporary value to hidden_size (32 is too low because of the of the
Bros' positional embedding)
* remove decoder test, update create_and_check* input arguemnts
* add missing variable to model tests
* do make fixup
* update bros.mdx
* add boilerate plate for no_head inference test
* update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)
* add prepare_bros_batch_inputs function
* update modeling_common to add bbox inputs in Bros Model Test
* remove unnecessary model inference
* add test case
* add model_doc
* add test case for token_classification
* apply fixup
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* - update class name
* - add BrosSpadeOutput
- update BrosConfig arguments
* add boilerate plate for no_head inference test
* add prepare_bros_batch_inputs function
* add test case
* add test case for token_classification
* update modeling code
* update BrosForTokenClassification loss calculation logic
* revert logits preprocessing logic to make sure logits have original shape
* apply masking on the fly
* add BrosSpadeForTokenLinking
* update class name
put docstring to the beginning of the file
* separate the logits calculation logic and loss calculation logic
* update logic for loss calculation so that logits shape doesn't change
when return
* update typo
* update prepare_config_and_inputs
* update dummy node initialization
* update last_hidden_states getting logic to consider when return_dict is False
* update box first token mask param
* bugfix: remove random attention mask generation
* update keys to ignore on load missing
* run make style and quality
* apply make style and quality of other codes
* update box_first_token_mask to bool type
* update index.md
* apply make style and quality
* apply make fix-copies
* pass check_repo
* update bros model doc
* docstring bugfix fix
* add checkpoint for doc, tokenizer for doc
* Update README.md
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update bros.md
* Update src/transformers/__init__.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/bros.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* apply suggestions from code review
* apply suggestions from code review
* revert test_processor_markuplm.py
* Update test_processor_markuplm.py
* apply suggestions from code review
* apply suggestions from code review
* apply suggestions from code review
* update BrosSpadeELForTokenClassification head name to entity linker
* add doc string for config params
* update class, var names to more explicit and apply suggestions from code review
* remove unnecessary keys to ignore
* update relation extractor to be initialized with config
* add bros processor
* apply make style and quality
* update bros.md
* remove bros tokenizer, add bros processor that wraps bert tokenizer
* revert change
* apply make fix-copies
* update processor code, update itc -> initial token, stc -> subsequent token
* add type hint
* remove unnecessary condition branches in embedding forward
* fix auto tokenizer fail
* update docstring for each classes
* update bbox input dimension as standard 2 points and convert them to 4
points in forward pass
* update bros docs
* apply suggestions from code review : update Bros -> BROS in bros.md
* 1. box prefix var -> bbox
2. update variable names to be more explicit
* replace einsum with torch matmul
* apply style and quality
* remove unused argument
* remove unused arguments
* update docstrings
* apply suggestions from code review: add BrosBboxEmbeddings, replace
einsum with classical matrix operations
* revert einsum update
* update bros processor
* apply suggestions from code review
* add conversion script for bros
* Apply suggestions from code review
* fix readme
* apply fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* intiial commit
* updates
* nits
* update conversion script
* update conversion script
* use path to load
* add tips etc
* some modeling logic
* modeling update
* more nits
* nits
* normal layer norm
* update config and doc
* nits
* update doc remove unused
* update
* fix inits and stuff
* fixup
* revert wrong changes
* updates
* more nits
* add default config values to the configuration file
* fixup happy
* update
* 2 tests left
* update readmes
* more nits
* slow test and more documentation
* update readme
* fix licences
* styling
* use fast if possible when saving tokenizer
* remove todo
* remove tokenization tests
* small last nits
* Apply suggestions from code review
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* nits to skip the timout doctest
* fix integration test
* fix test
* update eos token
* update to allow fast tokenization
* styling
* fix codeLlama as well for the update post processor
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add more copied from statements
* update
* doc passes doctest
* remove `# final layer norm?`
* change docstring prompot
* update
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* don't doctest the conversion script as it requires more packages
* don't init a model in the config
* oups
* fix doctest
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* docs: feat: model resources for llama
* fix: resolve suggestion
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
* Add proper Falcon docs and conversion script
* Autodetect the decoder architecture instead of using an arg
* Update docs now that we can autodetect
* Fix doc error
* Add doc to toctree
* Quick doc update
* add a warning=True tip to the Llama2 doc
* code llama needs a tip too
* doc nit
* build PR doc
* doc nits
Co-authored-by: Lysandre <lysandre@huggingface.co>
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
* add all
* Revert "Delete .github directory"
This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.
* make conversion script backward compatible
* fixup
* more styling
* copy to llama changes
* fix repo consistency
* nits
* document correct classes
* updates
* more fixes
* nits
* update auto mappings
* add readmes
* smallupdates
* llama-code replace with llama_code
* make fixup
* updates to the testsing suite
* fix fast nits
* more small fixes
* fix decode
* fix template processing
* properly reset the normalizer
* nits processor
* tokenization tests pass
* styling
* last tests
* additional nits
* one test is left
* nits
Co-authored-by faabian <faabian@users.noreply.github.com>
* update failing test
* fixup
* remove decode infilling users should handle it on their onw after generation, padding can be a problem
* update
* make test slow and more meaningfull
* fixup
* doc update
* fixup
* Apply suggestions from code review
* add kwargs doc
* tokenizer requires `requires_backend`
* type requires_backends
* CodeLlama instead of LlamaCode
* more name cahnges
* nits
* make doctests happy
* small pipeline nits
* last nit
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update
* add codellama to toctree
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add FlaxClipTextModelWithProjection
This is necessary to support the Flax port of Stable Diffusion XL: fb6d705fb5/text_encoder_2/config.json (L3)
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
* Use FlaxCLIPTextModelOutput
* make fix-copies again
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Use `return_dict` for consistency with other uses.
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Fix docstring example.
* Add new model to FlaxCLIPTextModelTest
* Add to IGNORE_NON_AUTO_CONFIGURED list
* Fix naming convention.
---------
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: Juan Acevedo <juancevedo@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* docs: feat: model resources for llama2
Co-authored-by: Woojun Jung <hello_984@naver.com>
* fix: add description for dpo and rearrange posts
* docs: feat: add llama2 notebook resources
* style: one liners for each resource
Co-Authored-By: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-Authored-By: Kihoon Son <75935546+kihoon71@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix typo
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Woojun Jung <hello_984@naver.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* init commit
* config updated also some modeling
* Processor and Model config combined
* extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested
* model loading successful!
* feature extractor done!
* FE can now be called from HF
* postprocessing added in fe file
* same as prev commit
* Pop2PianoConfig doc done
* cfg docs slightly changed
* fe docs done
* batched
* batched working!
* temp
* v1
* checking
* trying to go with generate
* with generate and model tests passed
* before rebasing
* .
* tests done docs done remaining others & nits
* nits
* LogMelSpectogram shifted to FeatureExtractor
* is_tf rmeoved from pop2piano/init
* import solved
* tokenization tests added
* minor fixed regarding modeling_pop2piano
* tokenizer changed to only return midi_object and other changes
* Updated paper abstract(Camera-ready version) (#2)
* more comments and nits
* ruff changes
* code quality fix
* sg comments
* t5 change added and rebased
* comments except batching
* batching done
* comments
* small doc fix
* example removed from modeling
* ckpt
* forward it compatible with fe and generation done
* comments
* comments
* code-quality fix(maybe)
* ckpts changed
* doc file changed from mdx to md
* test fixes
* tokenizer test fix
* changes
* nits done main changes remaining
* code modified
* Pop2PianoProcessor added with tests
* other comments
* added Pop2PianoProcessor to dummy_objects
* added require_onnx to modeling file
* changes
* update .md file
* remove extra line in index.md
* back to the main index
* added pop2piano to index
* Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too
* changes
* added return types to 2 tokenizer methods
* the PR build test might work now
* added backends
* PR build fix
* vocab added
* comments
* refactored vocab into 1 file
* added conversion script
* comments
* essentia version changed in .md
* comments
* more tokenizer tests added
* minor fix
* tests extended for outputs acc check
* small fix
---------
Co-authored-by: Jongho Choi <sweetcocoa@snu.ac.kr>
* Initial addition of t5forsequenceclassification
* Adding imports and adding tests
* Formatting
* Running make fix-copies
* Adding mt5forseq
* Formatting
* run make fix-copies
* Adding to docs
* Add model_parallel
* Fix bug
* Fix
* Remove TODO
* Fixing tests for T5ForSequenceClassification
* Undo changes to dependency_versions_table.py
* Change classification head to work with T5Config directly
* Change seq length to let tests pass
* PR comments for formatting
* Formatting
* Initial addition of UMT5ForSequenceClassification
* Adding to inits and formatting
* run make fix-copies
* Add doc for UMT5ForSeqClass
* Update UMT5 config
* Fix docs
* Skip torch fx test for SequenceClassification
* Formatting
* Add skip to UMT5 tests as well
* Fix umt5 tests
* Running make fix-copies
* PR comments
* Fix for change to sentence_representation
* Rename seq_len to hidden_size since that's what it is
* Use base_model to follow format of the rest of the library
* Update docs
* Extract the decoder_input_ids changes and make one liner
* Make one-liner
* pull and push updates
* add docs
* fix modeling
* Add and run test
* make copies
* add task
* fix tests and fix small issues
* Checks on a Pull Request
* fix docs
* add desc pvt.md
* Resolve typo in check_repo.py
* Specify encoding when opening modeling files
* Deprecate the OpenLlama architecture
* Add disclaimer pointing to Llama
I'm open to different wordings here
* Match the capitalisation of LLaMA
* add llama
* add other readmes
* update padding id in readme
* add link to paper
* fix paths and tokenizer
* more nits
* styling
* fit operation in 2 lines when possible
* nits
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add form
* update reademe
* update readme, we don't have a default pad token
* update test and tokenization
* LLaMA instead of Llama
* nits
* add expected text
* add greeedy output
* styling
* Update src/transformers/models/llama/modeling_llama.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sequential device map
* skip relevant changes
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* first raw version of the bark integration
* working code on small models with single run
* add converting script from suno weights 2 hf
* many changes
* correct past_kv output
* working implementation for inference
* update the converting script according to the architecture changes
* add a working end-to-end inference code
* remove some comments and make small changes
* remove unecessary comment
* add docstrings and ensure no unecessary intermediary output during audio generation
* remove done TODOs
* make style + add config docstrings
* modification for batch inference support on the whole model
* add details to .generation_audio method
* add copyright
* convert EncodecModel from original library to transformers implementation
* add two class in order to facilitate model and sub-models loading from the hub
* add support of loading the whole model
* add BarkProcessor
* correct modeling according to processor output
* Add proper __init__ and auto support
* Add up-to-date copyright/license message
* add relative import instead of absolute
* cleaner head_dim computation
* small comment removal or changes
* more verbose LayerNorm init method
* specify eps for clearer comprehension
* more verbose variable naming in the MLP module
* remove unecessary BarkBlock parameter
* clearer code in the forward pass of the BarkBlock
* remove _initialize_modules method for cleaner code
* Remove unnecessary methods from sub-models
* move code to remove unnecessary function
* rename a variable for clarity and change an assert
* move code and change variable name for clarity
* remove unnecessary asserts
* correct small bug
* correct a comment
* change variable names for clarity
* remove asserts
* change import from absolute to relative
* correct small error due to comma missing + correct import
* Add attribute Bark config
* add first version of tests
* update attention_map
* add tie_weights and resize_token_embeddings for fineModel
* correct getting attention_mask in generate_text_semantic
* remove Bark inference trick
* leave more choices in barkProcessor
* remove _no_split_modules
* fixe error in forward of block and introduce clearer notations
* correct converting script with last changes
* make style + add draft bark.mdx
* correct BarkModelTest::test_generate_text_semantic
* add Bark in main README
* add dummy_pt_objects for Bark
* add missing models in the main init
* correct test_decoder_model_past_with_large_inputs
* disable torchscript test
* change docstring of BarkProcessor
* Add test_processor_bark
* make style
* correct copyrights
* add bark.mdx + make style, quality and consistency
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Remove unnecessary test method
* simply logic of a test
* Only check first ids for slow audio generation
* split full end-to-end generation tests
* remove unneccessary comment
* change submodel names for clearer naming
* remove ModuleDict from modeling_bark
* combine two if statements
* ensure that an edge misued won't happen
* modify variable name
* move code snippet to the right place (coarse instead of semantic)
* change BarkSemanticModule -> BarkSemanticModel
* align BarkProcessor with transformers paradigm
* correct BarkProcessor tests with last commit changes
* change _validate_voice_preset to an instance method instead of a class method
* tie_weights already called with post_init
* add codec_model config to configuration
* update bark modeling tests with recent BarkProcessor changes
* remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel
* change absolute imports to relative
* remove TODO
* change docstrings
* add examples to docs and docstrings
* make style
* uses BatchFeature in BarkProcessor insteads of dict
* continue improving docstrings and docs + make style
* correct docstrings examples
* more comprehensible speaker_embeddings load/Save
* rename speaker_embeddings_dict -> speaker_embeddings
* correct bark.mdx + add bark to documentation_tests
* correct docstrings configuration_bark
* integrate last nit suggestions
* integrate BarkGeneration configs
* make style
* remove bark tests from documentation_tests.txt because timeout - tested manually
* add proper generation config initialization
* small bark.mdx documentation changes
* rename bark.mdx -> bark.md
* add torch.no_grad behind BarkModel.generate_audio()
* replace assert by ValueError in convert_suno_to_hf.py
* integrate a series of short comments from reviewer
* move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings
* actually remove SemanticLogitsProcessor from modeling_bark.oy
* BarkProcessor returns a single output instead of tuple + correct docstrings
* make style + correct bug
* add initializer_range to BarkConfig + correct slow modeling tests
* add .clone() to history_prompt.coarse_prompt to avoid modifying input array
* Making sure no extra "`" are present
* remove extra characters in modeling_bark.py
* Correct output if history_prompt is None
* remove TODOs
* remove ravel comment
* completing generation_configuration_bark.py docstrings
* change docstrings - number of audio codebooks instead of Encodec codebooks
* change 'bias' docstrings in configuration_bark.py
* format code
* rename BarkModel.generate_audio -> BarkModel.generate_speech
* modify AutoConfig instead of EncodecConfig in BarkConfig
* correct AutoConfig wrong init
* refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic
* remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor
* move nb_codebook related config arguments to BarkFineConfig
* rename bark.mdx -> bark.md
* correcting BarkModelConfig from_pretrained + remove keys_to_ignore
* correct bark.md with correct hub path
* correct code bug in bark.md
* correct list tokens_to_suppress
* modify Processor to load nested speaker embeddings in a safer way
* correct batch sampling in BarkFineModel.generate_fine
* Apply suggestions from code review
Small docstrings correction and code improvements
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* give more details about num_layers in docstrings
* correct indentation mistake
* correct submodelconfig order of docstring variables
* put audio models in alphabetical order in utils/check_repo.my
* remove useless line from test_modeling_bark.py
* makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest
* make a Tester class for each sub-model instead of inheriting
* add test_resize_embeddings=True for Bark sub-models
* add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads
* remove 'Copied fom Bark' comment
* remove unneccessary comment
* change np.min -> min in modeling_bark.py
* refactored all custom layers to have Bark prefix
* add attention_mask as an argument of generate_text_semantic
* refactor sub-models start docstrings to have more precise config class definition
* move _tied_weights_keys overriding
* add docstrings to generate_xxx in modeling_bark.py
* add loading whole BarkModel to convert_suno_to_hf
* refactor attribute and variable names
* make style convert_suno
* update bark checkpoints
* remove never entered if statement
* move bark_modeling docstrings after BarkPretrainedModel class definition
* refactor modeling_bark.py: kv -> key_values
* small nits - code refactoring and removing unecessary lines from _init_weights
* nits - replace inplace method by variable assigning
* remove *optional* when necessary
* remove some lines in generate_speech
* add default value for optional parameter
* Refactor preprocess_histories_before_coarse -> preprocess_histories
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct usage after refactoring
* refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly
* update docstrings python in configuration_bark.py
* add bark files in utils/documentation_test.txt
* correct docstrings python snippet
* add the ability to use parameters in the form of e.g coarse_temperature
* add semantic_max_new_tokens in python snippet in docstrings for quicker generation
* Reformate sub-models kwargs in BakModel.generate
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* correct kwargs in BarkModel.generate
* correct attention_mask kwarg in BarkModel.generate
* add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16
* enrich BarkModel.generate docstrings with a description of how to use the kwargs
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial commit
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Cleanup config docstring
* Update src/transformers/models/falcon/configuration_falcon.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Convert to relative imports
* Remove torch < 1.8 warning
* Restructure cos_sin header
* qkv -> query, key, value
* Refactor attention calculation
* Add a couple of config variables to account for the different checkpoints
* Successful merging of the code paths!
* Fix misplaced line in the non-parallel attention path
* Update config and tests
* Add a pad_token_id when testing
* Support output_attentions when alibi is None
* make fixup
* Skip KV cache shape test
* No more _keys_to_ignore_on_load_missing
* Simplify self attention a bit
* Simplify self attention a bit
* make fixup
* stash commit
* Some more attention mask updates
* Should pass all tests except assisted generation!
* Add big model generation test
* make fixup
* Add temporary workaround for test
* Test overrides for assisted generation
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Test overrides for assisted generation
* Add generation demo
* Update copyright
* Make the docstring model actually small
* Add module-level docstring
* Remove all assertions
* Add copied from bloom
* Reformat the QKV layer
* Add copied from bloom
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove unused line and reformat
* No single letter variables
* Cleanup return names
* Add copied from line
* Remove the deprecated arguments blocks
* Change the embeddings test to an alibi on/off test
* Remove position_ids from FalconForQA
* Remove old check for token type IDs
* Fix the alibi path when multi_query is False
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/falcon/modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/falcon/test_modeling_falcon.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update config naming
* Fix typo for new_decoder_architecture
* Add some comments
* Fix docstring
* Fix docstring
* Create range in the right dtype from the start
* Review comment cleanup
* n_head_kv -> num_kv_heads
* self.alibi -> self.use_alibi
* self.num_kv -> self.num_kv_heads
* Reorder config args
* Made alibi arguments Optional
* Add all model docstrings
* Add extra checkpoints
* Add author info for Falcon
* Stop removing token_type_ids because our checkpoints shouldn't return it anymore
* Add one hopeful comment for the future
* Fix typo
* Update tests, fix cache issue for generation
* Use -1e9 instead of -inf to avoid float overflow
* Recompute the rotary embeddings much less often
* Re-enable disabled tests
* One final fix to attention mask calculation, and update tests
* Cleanup targeting falcon-40b equivalency
* Post-rebase docs update
* Update docstrings, especially in the config
* More descriptive variable names, and comments where we can't rename them
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Squash 88 commits
* Use markdown
* Remove mdx files due to bad rebase
* Fix modeling files due to bad rebase
* Fix style
* Update comment
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add tf code for efficientformer
* Fix return dict bug - return last hidden state after last stage
* Fix corresponding return dict bug
* Override test tol
* Change default values of training to False
* Set training to default False X3
* Rm axis from ln
* Set init in dense projection
* Rm debug stuff
* Make style; all tests pass.
* Modify year to 2023
* Fix attention biases codes
* Update the shape list logic
* Add a batch norm eps config
* Remove extract comments in test files
* Add conditional attn and hidden states return for serving output
* Change channel dim checking logic
* Add exception for withteacher model in training mode
* Revert layer count for now
* Add layer count for conditional layer naming
* Transpose for conv happens only in main layer
* Make tests smaller
* Make style
* Update doc
* Rm from_pt
* Change to actual expect image class label
* Remove stray print in tests
* Update image processor test
* Remove the old serving output logic
* Make style
* Make style
* Complete test
* First commit
* Add auto-translation with GPT-4
* make fixup
* Add a functional layernorm for TF
* Add all the auxiliary imports etc.
* Add the extra processor and tests
* rebase to main
* Add all the needed fixes to the GPT code
* make fixup
* Make convolutions channels-last so they run on CPU
* make fixup
* Fix final issues
* Fix other models affected by test change
* Clarify comment on the sparse_prompt_embeddings check
* Refactor functional_layernorm, use shape_list in place of .shape in some places
* Remove deprecated torch-alike code
* Update tests/models/sam/test_modeling_tf_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/sam/test_modeling_tf_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Refactor processor with common methods and separated private methods
* make fixup
* Quietly delete the file that didn't do anything (sorry Sylvain)
* Refactor the processor tests into one file
* make fixup
* Clean up some unnecessary indirection
* Fix TF mask postprocessing
* Add more processor equivalence tests
* Refactor generate_crop_boxes to use framework-neutral np code
* Make the serving output correctly conditional
* Fix error message line length
* Use dict keys rather than indices internally in both TF and PT SAM call/forward
* Return dicts internally in the call/forward methods
* Revert changes to common tests and just override check_pt_tf_outputs
* Revert changes to other model tests
* Clarify comments for functional layernorm
* Add missing transpose from PT code
* Removed unused copied from in PT code
* Remove overrides for tests that don't exist in TF
* Fix transpose and update tests for PT and TF to check pred_masks
* Add training flag
* Update tests to use TF checkpoints
* Update index.mdx
* Add missing cross-test decorator
* Remove optional extra asterisks
* Revert return_dict changes in PT code
* Update src/transformers/models/sam/modeling_tf_sam.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove None return annotations on init methods
* Update tests/models/sam/test_processor_sam.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Fix input_boxes shapes
* make fixup
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft of RWKV-4
* Add support for generate
* Style post-rebase
* Properly use state
* Write doc
* Fix doc
* More math
* Add model to README, dummies and clean config
* Fix init
* multiple fixes:
- fix common tests
- fix configuraion default values
- add CI test for checking state computation
- fix some CI tests
* correct tokenizer
* some tweaks
- fix config docstring
- fix failing tests
* fix CI tests
- add output_attention / output_hidden_states
- override test_initialization
- fix failing CIs
* fix conversion script
- fix sharded case
- add new arguments
* add slow tests + more fixes on conversion script
* add another test
* final fixes
* change single name variable
* add mock attention mask for pipeline to work
* correct eos token id
* fix nits
* add checkpoints
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add `tie_word_embeddings` in docstring
* change tensor name
* fix final nits
* Trigger CI
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* and gpt neox for qa
* use right model
* forgot this one
* base_model_prefix is "gpt_neox" for GPTNeoX* models
* unnecessary stuff
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* format
* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* removed gpt2 stuff
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* initial commit
* formatting
* adding the class to many places
* towards less unhappy checks
* nearly there
* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* avoid error
* moving to device of star/end_logits
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [doc] Try a few ≠ ways of linking to Papers, users, and org profiles
* Empty commit
* Empty commit now that the backend is fixed
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
* first draft - gives index error in question_answering.py
* maturing
* no labels
* pipeline should know about QA
* fixing checks
* formatting
* fixed docstring
* make sure legacy code executes
* comment
* like this
---------
Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
Adds FocalNet by Microsoft to transformers
---------
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: alaradirik <alaradirik@gmail.com>
* Add model to doc tests
* Remove generate and replace by prepare_inputs_for_generation
* More fixes
* Remove print statements
* Update integration tests
* Fix generate
* Remove model from auto mapping
* Use auto processor
* Fix integration tests
* Fix test
* Add inference code snippet
* Remove is_encoder_decoder
* Update docs
* Remove notebook link
* resolve conflicts
* rebase and make style
* test
* test
* test
* rebase and make style
* rebase and make style
* tests
* tests
* rewrite some functions
* rebase and make style
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* add models and tests
* solve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* fix some bugs & docstring
* save resolution
* make style
* delete redefinition code
* reformat function
* reformat
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* tests
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* resolve conflicts
* fix load_tf_weights_in_cpmant
* reformat some unrelated files
* upgrade quality
* resolve conflicts
* make style
* fix bugs and refactor
* modify docstrings and make style
* unify import format in __init__.py
* fix import-altclp bug
* fix copies to update index.md
* fix unused config parameters
* fix unused config parameters
* fix unused config parameters
* update README_ja.md
* dummy commit for unit test
* fix attention mask
* add CPMAntTokenizer&-Fast to auto-mapping
* drop redundant changes in README_ko
* fix defaults in docstring
* fix use_cache and some docstring
* add missing args in tokenizer
* modify tester inheritance
* add is_jieba_available
* fix some bugs
* make style and fix-copies
* add doctests
* skip integration tests
* add is_jieba_available
* fix bugs in common tests
* adjust docstrings and make style
* add argument docstring
* adjust code to some specifications
* make style and fix-copies
* add fast tokenization test
* dummy commit for unit test
* dummy commit for unit test
* dummy commit for unit test
* normalize some comments and names
* Bert->CPMAnt
* camel names and drop redundant codes
* make style and fix-coies
* add CpmTokenizerFast _import_structure
* drop cpmanttokenizerfast in model_doc
* fix some problems
* fix CPMAnt tokenization for common test
* make style and fixup
* fix copies and fixup
* fix bugs in tokenization test
* dummy commit for connection failure in unittest
* fix copies
* drop trailing comma
* fix decorator in tests
* dummy commit for connection failure in unittest
---------
Co-authored-by: Gong Baitao <gongbaitao11@gmail.com>
* Adding Llama FastTokenizer support.
- Requires https://github.com/huggingface/tokenizers/pull/1183 version
- Only support byte_fallback for llama, raise otherwise (safety net).
- Lots of questions are special tokens
How to test:
```python
from transformers.convert_slow_tokenizer import convert_slow_tokenizer
from transformers import AutoTokenizer
from tokenizers import Tokenizer
tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
if False:
new_tokenizer = Tokenizer.from_file("tok.json")
else:
new_tokenizer = convert_slow_tokenizer(tokenizer)
new_tokenizer.save("tok.json")
strings = [
"This is a test",
"生活的真谛是",
"生活的真谛是[MASK]。",
# XXX: This one is problematic because of special tokens
# "<s> Something something",
]
for string in strings:
encoded = tokenizer(string)["input_ids"]
encoded2 = new_tokenizer.encode(string).ids
assert encoded == encoded2, f"{encoded} != {encoded2}"
decoded = tokenizer.decode(encoded)
decoded2 = new_tokenizer.decode(encoded2)
assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
```
The converter + some test script.
The test script.
Tmp save.
Adding Fast tokenizer + tests.
Adding the tokenization tests.
Correct combination.
Small fix.
Fixing tests.
Fixing with latest update.
Rebased.
fix copies + normalized added tokens + copies.
Adding doc.
TMP.
Doc + split files.
Doc.
Versions + try import.
Fix Camembert + warnings -> Error.
Fix by ArthurZucker.
Not a decorator.
* Fixing comments.
* Adding more to docstring.
* Doc rewriting.
* Initial commit
* more stash commit
* Yet another stash commit
* yet more stash commit
* Mostly working except for docs / repo consistency
* Stop importing model list from torch file
* Add TF BLIP models to docs
* Add auto classes
* Move get_text_features and get_image_features
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/blip/test_modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update tests/models/blip/test_modeling_tf_blip_text.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Use channels_last convolutions in TF (better performance + compatibility)
* Remove _shape function
* Move multi-line statement to one line in PT + TF
* Specify tf.keras.layers instead of importing from it
* Remove test_gradient_checkpointing and empty test_training methods
* move some multi-line statements to one line
* Update docstring for generate
* Remove pruned heads set
* Remove self.seq_len_dim
* Fixed issues with loss computation, should resolve some tests. Also ensured that the PT version follows the config for output_attentions and output_hidden_states
* ensure original model follows config in more cases
* Skip the same cross-attention tests in the PT tests - didn't realize we did it twice!
* Add training args throughout the models and layers
* make fixup
* Fix docstring for inputs_embeds
* Add docstring for is_decoder
* Add docstrings to text models
* Remove redundant computation
* Add unpack_inputs / keras_serializable
* Add modeling_tf_blip to doctests
* Add config classes for keras serialization
* Changes to allow model porting with pt-to-tf
* Quick fix to decoder head and test tweaks
* Revert an issue with masking the embeddings outputs
* Allow missing keys in some equivalence tests (for unused layers)
* Add tf-pt equivalence tests back in
* Update src/transformers/models/blip/modeling_tf_blip.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/blip/modeling_tf_blip_text.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fixup
* Refactor invert_attention_mask out into tf_utils
* Re-enable cross-tests on the PT side too
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Initial commit
* update modeling code
* update doc
* add functions necessary
* fix impotrs
* revert changes
* fixup
* more styling to get going
* remove standalone encoder
* update code
* styling
* fix config and model
* update code and some refactoring
* make more tests pass
* Adding NLLB-200 - MoE - 54.5B for no language left behind
Fixes#21300
* fix mor common tests
* styke
* update testing file
* update
* update
* Router2 doc
* update check config with sparse layer
* add dummy router
* update current conversion script
* create on the fly conversion script
* Fixup
* style
* style 2
* fix empty return
* fix return
* Update default config sparse layers
* easier to create sparse layers
* update
* update conversion script
* update modeling
* add to toctree
* styling
* make ruff happy
* update docstring
* update conversion script
* update, will break tests but impelemting top2
* update
* ❗local groups are supported here
* ⚠️ Support for local groups is now removed ⚠️
This is because it has to work with model parallelism that we do not support
* finish simplificaiton
* Fix forward
* style
* fixup
* Update modelling and test, refactoring
* update tests
* remove final layer)norm as it is done in the FF
* routing works! Logits test added
* nit in test
* remove top1router
* style
* make sure sparse are tested. Had to change route_tokens a liottle bit
* add support for unslip models when converting
* fixup
* style
* update test s
* update test
* REFACTOR
* encoder outputs match!
* style
* update testing
* 🎉encoder and decoder logits match 🎉
* styleing
* update tests
* cleanup tests
* fix router test and CIs
* cleanup
* cleanup test styling
* fix tests
* Finally the generation tests match!
* cleanup
* update test
* style testing file
* remove script
* cleanup
* more cleanup
* nits
* update
* NLLB tokenizer is wrong and will be fixed soon
* use LongTensors
* update tests
* revert some small changes
* fix second expert sampling and batch prioritized routing
* update tests
* finish last tests
* make ruff happy
* update
* ruff again
* style
* Update docs/source/en/model_doc/nllb-moe.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updates based on review
* style and fix import issue
* nit
* more nits
* cleanup
* styling
* update test_seconde_expert_policy
* fix name
* last nit on the markdown examples
---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add mega file structure and plain pytorch version of mega source code
* added config class with old naming conventions
* filled in mega documentation
* added config class and embeddings with optional token types
* updated notes
* starting the conversion process, deleted intermediate and added use_cache back to config
* renamed config attributes in modeling_mega.py
* checkpointing before refactoring incremental decoding functions
* removed stateful incremental key/values for EMA and self-attention
* refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
* MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
* more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
* bug fix in attention mask handling in MovingAverageGatedAttention
* removed incremental state from GatedCrossAttention and removed IncrementalState class
* finished gated cross attention and got MegaLayer working
* fixed causal masking in mega decoder
* fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
* finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
* added optional dense hidden layer for masked and causal LM classes
* docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
* removed before_attn_fn in Mega class and updated docstrings and comments up to there
* bug fix in MovingAverageGatedAttention masking
* working conversion of MLM checkpoint in scratchpad script -- perfect matches
* moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
* renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
* finished checkpoint conversion script
* cleanup old class in mega config script
* removed 'copied from' statements and passing integration tests
* added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
* fixed tuple output of megamodel
* all common tests passing after fixing issues in decoder, gradient retention, and initialization
* added mega-specific tests, ready for more documentation and style checks
* updated docstrings; checkpoint before style fixes
* style and quality checks, fixed initialization problem in float_tensor, ready for PR
* added mega to toctree
* removed unnecessary arg in megaconfig
* removed unused arg and fixed code samples with leftover roberta models
* Apply suggestions from code review
Applied all suggestions except the one renaming a class, as I'll need to update that througout
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
* removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
* reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
* removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
* renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
* variable names in NFFN
* manual Mega->MEGA changes in docs
* Mega->MEGA in config auto
* style and quality fixes
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
* commit before dealing with merge conflicts
* made new attention activation functions available in ACT2FN and added generation test from OPT
* style and quality in activations and tests
* documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
* style and quality fixes after latest updates, before rotary position ids
* causal mask in MegaBlock docstring + added missing device passing
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
* style and quality fixes + readme updates pointing to main
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add new model of MGP-STR
* fix the check failings
* remove torch and numpy from mgp_tokenization
* remove unused import from modeling_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str.py
* add test_processing_mgp_str
* add test_processing_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str and add softmax outs to model
* rm test_processing_mgp_str and add softmax outs to model
* rewrite the code of mgp-str according to PR suggestions
* rewrite the code of mgp-str according to PR suggestions
* add new model of MGP-STR
* fix the check failings
* remove torch and numpy from mgp_tokenization
* remove unused import from modeling_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str.py
* add test_processing_mgp_str
* add test_processing_mgp_str
* add test_processing_mgp_str
* rm test_processing_mgp_str and add softmax outs to model
* rewrite the code of mgp-str according to PR suggestions
* rewrite the code of mgp-str according to PR suggestions
* remove representation_size from MGPSTRConfig
* reformat configuration_mgp_str.py
* format test_processor_mgp_str.py
* add test for tokenizer and complete model/processer test and model file
* rm Unnecessary tupple in modeling_mgp_str
* reduce hidden_size/layers/label_size in test_model
* add integration tests and change MGPSTR to Mgpstr
* add test for logit values
* reformat test model file
---------
Co-authored-by: yue kun <yuekun.wp@alibaba-inc.com>
* added informer to gitignore
* added informer to gitignore
* WIP informer2020
* added checking that instantiate works
* added config using gluonTS by kashif
* WIP config
* adding informeConfig. need to remove FeatureEmbedder
* done InformerConfig, but need to change the names
* Done informer model init. working on enc-dec
* added things to address, after reading again enc-dec in the paper
* done modeling - checking initialization work
* added informer to gitignore
* WIP informer2020
* added checking that instantiate works
* added config using gluonTS by kashif
* WIP config
* adding informeConfig. need to remove FeatureEmbedder
* done InformerConfig, but need to change the names
* Done informer model init. working on enc-dec
* added things to address, after reading again enc-dec in the paper
* done modeling - checking initialization work
* moved enc-dec init to InformerEncoder/Decoder init
* added 'init_std' to config, now model init works!
* WIP conversion script, and added code sources
* WIP conversion script: loading original informer pth works
* WIP conversion script: change defaults in the config
* WIP conversion script: supporting Informer input embedding
* WIP conversion script: added parameters for the informer embed
* WIP conversion script: change dim_feedforward=2048
* WIP conversion script: remove unused args for loading checkpoint
* just cleaning up
* DataEmbedding removed, after thinking with Kashif
* working on forward pass
* WIP forward pass: trying to establish working batch for forward pass
* cleaning and finalizing
* adding HF names and docs
* init after cleaning works
* WIP in tests
* added docs for the informer specific args
* fix style
* undo change
* cleaning informer, now need to work only enc-dec
* initial enc-dec classes
* added encoder and decoder
* added todo
* add todos for conv_layers
* added decoder docs from vanilla
* added encoder docs from vanilla
* remove encoder decoder from the original informer
* removed AttentionLayer from the original paper
* removed TriangularCausalMask, same as decoder_attention_mask
* initial sparse attention
* use conv_layers
* fixed test_config test
* fix parenthesis when itearting zip(layers, conv_layers)
* error found in prob attention, added sizes as comments
* fix sizes
* added proposal for q_reduce indexing, and remove unused
* WIP ProbMask, and changed factor=2 for testing
* remove unused libs for this PR for creating the env
* fix checking the attn_weights.size() after bmm
* Q_reduce: changed from torch.gather to simple slicing
* WIP calculate final attn_output
* finish adding v_aggregated, attn_output ready
* changed tgt_len to u in attention_mask, need to fix the size error
* comment attention_mask for encoder, and fix if cond for v_agg
* added ProbMask support (wip), removed old original code
* finished ProbMask 😃
* Revert "remove unused libs for this PR for creating the env"
This reverts commit 11a081e09e.
* fixes
* make style
* fix initial tests
* fix more tests
* dry
* make style
* remove unused files
* style
* added integration tests
* fix num_static_real_features
* fix header
* remove unused function
* fix example
* fix docs
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/modeling_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/informer/configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* fixes for reviewer
* use prediction_length from model
* fix style
* fixed informer.mdx
* added to index
* updated readme
* undo
* make fix-copies
* typo
* fix copy
* added Informer to toctree
* in order
* fixed comments
* remove unneeded new lines in docs
* make static real and cat optional
* fix use of distil conv layers
* fixed integration test
* added checkpoint for convlayer
* make fix-copies
* updated from time series model
* make fix-copies
* copy decoder
* fix unit tests
* updated scaling config
* fix integration tests
* IGNORE_NON_TESTED
* IGNORE_NON_AUTO_CONFIGURED
* IGNORE_NON_AUTO_CONFIGURED
* updated check configs
* fix formatting
* undo change from time series
* prediction_length should not be None
* aliign with the blog: prettify ProbSparse and change attention_factor to sampling_factor
* make style
* make fix-copies
* niels CR: update contributed by
* niels CR: update configuration_informer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* niels CR: update kashif -> huggingface
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* niels CR: `sampling_factor` only relevant when `attention_type`=prob
* make style
* fixed U_part: added multiplication by `L_Q`
* fixed bug: remove `is not None` from `if config.distil`
* fixed test: `decoder_seq_length` to `encoder_seq_length` in cross_attentions check
* fix integration tests
* updated model hub
* do not shift as in training
* undo
* fix make-copies
* make fix-copies
* added `if prediction_length is None`
* changed `ProbSparseAttention` to `InformerProbSparseAttention`
* changed `V_sum` -> `v_mean_dim_time`
* changed `ConvLayer` to `InformerConvLayer` and fixed `super()`
* TimeSeriesTansformer->Informer in decoder's Copied from
* more descriptive in ProbSparse
* make style
* fix coped from
* Revert "added `if prediction_length is None`"
This reverts commit b4cbddfa05.
* fixed indent
* use InformerSinusoidalPositionalEmbedding
* make fix-style
* fix from #21860
* fix name
* make fix-copies
* use time series utils
* fix dec num_heads
* docstring
* added time series util doc
* _import_structure
* formatting
* changes from review
* make style
* fix docs
* fix doc
* removed NegativeLogLikelihood
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* [Whisper] Add model for audio classification
* make fix-copies
* add to docs
* add docstring
* empty returns
* add code example
* switch to fleurs
* stick everything on one line
Adds the ALIGN model to transformers. ALIGN is introduced in "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision" by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
* first draft of model summary
* restructure docs
* finish first draft
* ✨minor reviews and edits
* apply feedbacks
* save important info, create new page for attention
* add attention doc to toctree
* ✨ few more minor fixes
* config and tokenization(fast too) changed and ErnieEncoder added
* Slow Tokenization Added
* Tokenizer(slow) is now working and Fast Tokenizer removed
* Added Config code
* Added Base Model and utils
* ErnieMModel is now working
* All added except tests
* All tests passed except ErnieUIEM
* All tests passed
* all fixes done
* all fixes done
* fixed MAP
* fixed check_code_quality
* fixed Build PR Documentation issue
* Added changes(comments) and also updated to the latest upstream/main
* Added fixup
* Added # Copied comments
* Added fixup
* Added more comments and some nits
* Added fixup
* Fixed README_hd.md
* Added more fixes
* ErnieMTokenizer (being sentencepiece) protected and other docs edited
* Added code_quality fix
* Fixed for
* Added more fix
* modified AZ
* ernie-m tokenization test added!
* attention mask part fixed(with 0->self.config.pad_token_id)
* applied make fixup
* Add X-MOD to Readme
* Add documentation for X-MOD
* Implement X-MOD
* Fix formatting of X-MOD docs
* Change signature of X-MOD forward methods to use lang_ids
* Minor changes
* Rebase with main and run make fix-copies
* Make suggested changes to docstrings
* Improve code readability
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Fix code style
* Conversion script: Remove asserts and type annotations
* Remove _TOKENIZER_FOR_DOC
* XMOD -> Xmod
* Update copyright note
* Fix doctests
* Fix docstring
* Add integration test for FillMaskPipeline
* Revert "Add integration test for FillMaskPipeline"
This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.
* Add end-to-end integration test for mask fill
* make style
* Rebase with main and make fix-copies
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* First draft
* More improvements
* More improvements
* Improve conversion script
* Convert all weights
* Make forward pass work
* Make logits match
* More improvements
* More improvements
* More improvements
* Use get_input_embeddings
* Improve some more
* Improve model tests
* Improve model tests
* More improvements
* Fix processor
* Update files
* Update prepare_inputs_for_generation
* More improvements
* Fix copies
* More fixes
* Make fixup
* More improvements
* Add support for seq2seq language model
* More improvements
* Fix test
* More improvements
* Improve conversion script
* Remove some todo's
* Fix README's
* Improve conversion script
* Fix generation
* Fix style and remove Blip2Model
* Fix model outputs
* More improvements
* Set eos_token_id in config
* Fix quality
* Small improvements
* Add processor tests
* More improvements
* Apply suggestions
* Apply suggestions
* Add integration test
* Update image URL
* Add integration test
* Fix model_type
* Update style
* Improve docs
* Add doc tests
* Fix copies
* Remove tests which are passing
* Improve some more
* Add tests for seq2seq language models
* Minor fix
* Convert more checkpoints
* finalize CI
* Fix blip and blip2 processors
* add `accelerate` support for `blip2`
* clean up
* make style
* Update conversion script
* Update conversion script some more
* Update organization
* revert toc file
* add blip-2 to toc file
* Some more improvements
* Fix docstring
* Improve docs
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
* doc: introduce new section for XLM-V model
* doc: mention more details for XLM-V integration
* docs: paper abstract in italics, model identifier for base model added
* doc: mention new XLM-V support
* auto: add XLM-V mapping
* doc: run make fix-copies ;)
* make SpeechT5 model by copying Wav2Vec2
* add paper to docs
* whoops added docs in wrong file
* remove SpeechT5Tokenizer + put CTC back in the name
* remove deprecated class
* remove unused docstring
* delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead
* remove classes we don't need right now
* initial stab at speech encoder prenet
* add more speech encoder prenet stuff
* improve SpeechEncoderPrenet
* add encoder (not finished yet)
* add relative position bias to self-attention
* add encoder CTC layers
* fix formatting
* add decoder from BART, doesn't work yet
* make it work with generate loop
* wrap the encoder into a speech encoder class
* wrap the decoder in a text decoder class
* changed my mind
* changed my mind again ;-)
* load decoder weights, make it work
* add weights for text decoder postnet
* add SpeechT5ForCTC model that uses only the encoder
* clean up EncoderLayer and DecoderLayer
* implement _init_weights in SpeechT5PreTrainedModel
* cleanup config + Encoder and Decoder
* add head + cross attention masks
* improve doc comments
* fixup
* more cleanup
* more fixup
* TextDecoderPrenet works now, thanks Kendall
* add CTC loss
* add placeholders for other pre/postnets
* add type annotation
* fix freeze_feature_encoder
* set padding tokens to 0 in decoder attention mask
* encoder attention mask downsampling
* remove features_pen calculation
* disable the padding tokens thing again
* fixup
* more fixup
* code review fixes
* rename encoder/decoder wrapper classes
* allow checkpoints to be loaded into SpeechT5Model
* put encoder into wrapper for CTC model
* clean up conversion script
* add encoder for TTS model
* add speech decoder prenet
* add speech decoder post-net
* attempt to reconstruct the generation loop
* add speech generation loop
* clean up generate_speech
* small tweaks
* fix forward pass
* enable always dropout on speech decoder prenet
* sort declaration
* rename models
* fixup
* fix copies
* more fixup
* make consistency checker happy
* add Seq2SeqSpectrogramOutput class
* doc comments
* quick note about loss and labels
* add HiFi-GAN implementation (from Speech2Speech PR)
* rename file
* add vocoder to TTS model
* improve vocoder
* working on tokenizer
* more better tokenizer
* add CTC tokenizer
* fix decode and batch_code in CTC tokenizer
* fix processor
* two processors and feature extractors
* use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2
* cleanup
* more cleanup
* even more fixup
* notebooks
* fix log-mel spectrograms
* support reduction factor
* fixup
* shift spectrograms to right to create decoder inputs
* return correct labels
* add labels for stop token prediction
* fix doc comments
* fixup
* remove SpeechT5ForPreTraining
* more fixup
* update copyright headers
* add usage examples
* add SpeechT5ProcessorForCTC
* fixup
* push unofficial checkpoints to hub
* initial version of tokenizer unit tests
* add slow test
* fix failing tests
* tests for CTC tokenizer
* finish CTC tokenizer tests
* processor tests
* initial test for feature extractors
* tests for spectrogram feature extractor
* fixup
* more fixup
* add decorators
* require speech for tests
* modeling tests
* more tests for ASR model
* fix imports
* add fake tests for the other models
* fixup
* remove jupyter notebooks
* add missing SpeechT5Model tests
* add missing tests for SpeechT5ForCTC
* add missing tests for SpeechT5ForTextToSpeech
* sort tests by name
* fix Hi-Fi GAN tests
* fixup
* add speech-to-speech model
* refactor duplicate speech generation code
* add processor for SpeechToSpeech model
* add usage example
* add tests for speech-to-speech model
* fixup
* enable gradient checkpointing for SpeechT5FeatureEncoder
* code review
* push_to_hub now takes repo_id
* improve doc comments for HiFi-GAN config
* add missing test
* add integration tests
* make number of layers in speech decoder prenet configurable
* rename variable
* rename variables
* add auto classes for TTS and S2S
* REMOVE CTC!!!
* S2S processor does not support save/load_pretrained
* fixup
* these models are now in an auto mapping
* fix doc links
* rename HiFiGAN to HifiGan, remove separate config file
* REMOVE auto classes
* there can be only one
* fixup
* replace assert
* reformat
* feature extractor can process input and target at same time
* update checkpoint names
* fix commit hash
* updated resources for LayoutLM
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fixed formatting, removed extra section
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Added resource section to GPT-J docs
* Added most of the links found
* Addressing review comments
* Fixing formatting
* Update docs/source/en/model_doc/gptj.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fixing one of the labels
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [FT] First commit for graphormer architecture.
The model has no tokenizer, as it uses a collator and preprocessing function for its input management.
Architecture to be tested against original one.
The arch might need to be changed to fit the checkpoint, but a revert to the original arch will make the code less nice to read.
TODO: doc
* [FIX] removed test model
* [FIX] import error
* [FIX] black and flake
* [DOC] added paper refs
* [FIX] [DOC]
* [FIX] black
* [DOC] Updated READMEs
* [FIX] Order of imports + rm Tokenizer calls
* [FIX] Moved assert in class to prevent doc build failure
* [FIX] make fix-copies
* [Doc] update from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [FIX] Removed Graphormer from Sequence classification model list
* [DOC] Added HF copyright to Cython file
* [DOC] Fixed comments
* [FIX] typos in class doc + removed config classes.
Todo: update doc from paper definitions
* [FIX] Removed dependency to fairseq, and replaced all asserts with Exception management
* [FIX] Homogeneized initialization of weights to pretrained constructor
* [FIX] [CP] Updated multi_hop parameter to get same results as in original implementation
* [DOC] Relevant parameter description in the configuration file
* [DOC] Updated doc and comments in main graphormer file
* [FIX] make style and quality checks
* [DOC] Fix doc format
* [FIX] [WIP] Updated part of the tests, though still a wip
* [FIX] [WIP]
* [FIX] repo consistency
* [FIX] Changed input names for more understandability
* [FIX] [BUG] updated num_classes params for propagation in the model
* simplified collator
* [FIX] Updated tests to follow new naming pattern
* [TESTS] Updated test suite along with model
* |FIX] rm tokenizer import
* [DOC] add link to graphormerdoc
* Changed section in doc from text model to graph model
* Apply suggestions from code review
Spacing, inits
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [DOC] Explain algos_graphormer functions
* Cython soft import protection
* Rm call to Callable in configuration graphormer
* [FIX] replaced asserts with Exceptions
* Add org to graphormer checkpoints
* Prefixed classes with Graphormer
* Management of init functions
* format
* fixes
* fix length file
* update indent
* relaunching ci
* Errors for missing cython imports
* fix style
* fix style doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* `blip` support for training
* remove labels creation
* remove unneeded `decoder_input_ids` creation
* final changes
- add colab link to documentation
- reduction = mean for loss
* fix nits
* update link
* clearer error message
* torch.jit._state
* Fix past CI
* Fix for perceiver
* Fix REALM
* Fix for Bloom
* Fix for SwinMode
* Fix for TrajectoryTransformerModel
* Fix for test_wav2vec2_with_lm
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Copy RoBERTa
* formatting
* implement RoBERTa with prelayer normalization
* update test expectations
* add documentation
* add convertion script for DinkyTrain weights
* update checkpoint repo
Unfortunately the original checkpoints assumes a hacked roberta model
* add to RoBERTa-PreLayerNorm docs to toc
* run utils/check_copies.py
* lint files
* remove unused import
* fix check_repo reporting wrongly a test is missing
* fix import error, caused by rebase
* run make fix-copies
* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS
* Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fixup: Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Add missing Flax header
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* expected_slice -> EXPECTED_SLICE
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update copies after rebase
* add missing copied from statements
* make fix-copies
* make prelayernorm explicit in code
* fix checkpoint path for the original implementation
* add flax integration tests
* improve docs
* update utils/documentation_tests.txt
* lint files
* Remove Copyright notice
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make fix-copies
* Remove EXPECTED_SLICE calculation comments
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add templates for gpt-sw3
* Add templates for gpt-sw3
* Added sentencepiece tokenizer
* intermediate commit with many changes
* fixed conflicts
* Init commit for tokenization port
* Tokenization progress
* Remove fast tokenizer
* Clean up and rename spm.model -> spiece.model
* Remove TF -> PT conversion script template, Clean up Megatron -> PT script
* Optimize encode & decode performance
* added new attention
* added new attention
* attention for gpt-sw3 working
* attention good
* Cache is now working
* fixed attention mask so that it works with causal attention
* fixed badbmm bug for cpu and caching
* updated config with correct parameters
* Refactor and leave optimizations as separate functions to avoid breaking expected functionality
* Fix special tokens mapping for both tokenizers
* cleaning up of code and comments
* HF compatible attention outputs
* Tokenizer now passing tests, add documentation
* Update documentation
* reverted back to base implementation after checking that it is identical to pretrained model
* updated gpt-sw3 config
* updated conversion script
* aligned parameters with gpt-sw3 config
* changed default scale_attn_by_inverse_layer_idx to true
* removed flag from conversion script
* added temporary model path
* reverted back to functioning convert script
* small changes to default config
* updated tests for gpt-sw3
* make style, make quality, minor cleanup
* Change local paths to testing online repository
* Change name: GptSw3 -> GPTSw3
* Remove GPTSw3TokenizerFast references
* Use official model repository and add more model sizes
* Added reference to 6.7b model
* Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel
* Remove pointers to non-existing TFGPTSw3
* Add GPTSw3 to docs/_toctree.yml
* Remove TF artifacts from GPTSw3 in __init__ files
* Update README:s with 'make fix-copies'
* Add 20b model to archive list
* Add documentation for GPT-Sw3
* Fix typo in documentation for GPT-Sw3
* Do 'make fix-copies' again after having updated docs
* Fix some typos in docs
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Resolve comments from PR feedback
* Resolve more comments from PR feedback, also set use_cache=True in convert script
* Add '# Copied from' comments for GPTSw3 modeling
* Set 'is_parallelizable = False'
* Remove '# Copied from' where code was modified and add 'with x->y' when appropriate
* Remove parallelize in mdx
* make style, make quality
* Update GPTSw3Config default values and corresponding documentation
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available
* Make style, make quality
* Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
* make fix-copies
* Remove GPTSw3 modeling classes
* make style, make quality
* Add GPTSw3 auto-mappings for other GPT2 heads
* Update docs/source/en/model_doc/gpt-sw3.mdx
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Remove old TODO-comment
* Add example usage to GPTSw3Tokenizer docstring
* make style, make quality
* Add implementation details and example usage to gpt-sw3.mdx
Co-authored-by: JoeyOhman <joeyoh@kth.se>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* read to load
* base functionality
* revert init
* fix dummy data
* moving right along
* moving right along
* finally
* cleanup
* pull out comment
* add test
* update docstring for main class
* flake comments and rewriting copies from make repo-consistency`
* remove irrelevant differences/accidental spaces
* put copies back after space removals
* mid
* final test pass
* stray comment
* update test file
* update test file
* fixup
* black
* missed
* black missed one more
* sytle
* add doc update
* fix order of output class
* comment
* Revert "comment"
This reverts commit 03f86b6948.
* remove redundant function, and redundant reshape
* move change out of common
* style
* put common spaces back
* reorder kwargs in output
* doc style
* biogpt initial commit
* updated init
* fix faster decoding with use_cache
* 1. fix input_ids and input_embeds with correct device
2. added _keys_to_ignore_on_load_missing
3. updated prepare_inputs_for_generation
* add activation_dropout and scale_embedding
* replace fsmt attention with bart attention
* added test
* run make fix-copies
* doc init and fix build
* updated README with proper information
* 1. added tips to docs
2. updated BioGptTokenizer func
* 1. added tokenizer test
2. refactor tokenizer
* make fixup
* add biogpt fairseq to hf converter
* updated layer names more
similar to original checkpoints
* config update doc string and set defaults
* added "#copied" from bart model and
updated doc strings
* enable model_input_names in tokenizer
* 1. positionalembedding depending on attention_mask
2. added attention mask to prepare for generation
* added test to verify past and generation
* BioGptLMHeadModel -> BioGptForCausalLM
* fix typo
* tokenization and test
Copyright and updated assertion
* updated Copyright and
one func at time in line
* Copyright updates and
minor doc fix
* replace assertion with ValueError
* rm extra space
* added code syntax
* revert cmnt position change
* add tokenizer to auto
* updated doc string
* tokenizer doc string update
* biogpt hub model update to microsoft/biogpt
* make fixup
* rm cmnt to fix flake8 5.0.4 vs 6 error
* add minimal working gpt2 tokenizer
* graph mode and output equivalence tests working
* not today tensorflow. serialization test passing!
* fix style, documentation, docstrings and all that jazz
* passing consistency checks
* move keras nlp to tf dependencies
* fix tf modeling utils and gpt2 attention to enable compiling
* fix (I hope) keras nlp dependencies
* rever changes on generation
* remove debug prints
* remove redundant tf dummy objects
* add from config, get config and max length settings to address review
* let flake ignore the error on distillation you are welcome
* test from config
* add padding test
* address sgugger review
* Add Donut image processor
* Update src/transformers/image_transforms.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Fix docstrings
* Full var names in docstring
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* First draft
* Make conversion script work
* Add id2label mapping, run code quality
* Fix copies
* Add first draft of feature extractor
* Update conversion script to use feature extractor
* Make more tests pass
* Add docs
* update input_features to input_values + pad by default to max length
* Fix doc tests
* Add feature extractor tests
* Add proper padding/truncation to feature extractor
* Add support for conversion of all audioset checkpoints
* Improve docs and extend conversion script
* Fix README
* Rename spectogram to spectrogram
* Fix copies
* Add integration test
* Remove dummy conv
* Update to ast
* Update organization
* Fix init
* Rename model to AST
* Add require_torchaudio annotator
* Move import of ASTFeatureExtractor under a is_speech_available
* Fix rebase
* Add pipeline config
* Update name of classifier head
* Rename time_dimension and frequency_dimension for clarity
* Remove print statement
* Fix pipeline test
* Fix pipeline test
* Fix index table
* Fix init
* Fix conversion script
* Rename to ForAudioClassification
* Fix index table
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* add model files etc for MobileNetV2
rename files for MobileNetV1
initial implementation of MobileNetV1
fix conversion script
cleanup
write docs
tweaks
fix conversion script
extract hidden states
fix test cases
make fixup
fixup it all
remove main from doc link
fixes
fix tests
fix up
use google org
fix weird assert
* fixup
* use google organization for checkpoints
* Add DiNAT
* Adds DiNAT + tests
* Minor fixes
* Added HF model
* Add natten to dependencies.
* Cleanup
* Minor fixup
* Reformat
* Optional NATTEN import.
* Reformat & add doc to _toctree
* Reformat (finally)
* Dummy objects for DiNAT
* Add NAT + minor changes
Adds NAT as its own independent model + docs, tests
Adds NATTEN to ext deps to ensure ci picks it up.
* Remove natten from `all` and `dev-torch` deps, add manual pip install to ci tests
* Minor fixes.
* Fix READMEs.
* Requested changes to docs + minor fixes.
* Requested changes.
* Add NAT/DiNAT tests to layoutlm_job
* Correction to Dinat doc.
* Requested changes.
* Add resources of OpenAI GPT
* Delete Deploy section and add .
* Add scripts
* Update docs/source/en/model_doc/openai-gpt.mdx
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Delete causal-language-modeling section
* Add TFOpenAIGPTLMHeadModel
* Add resources from community
* Delete a link
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Adds image-guided object detection method to OwlViTForObjectDetection class as described in the original paper. One-shot/ image-guided object detection enables users to use a query image to search for similar objects in the input image.
Co-Authored-By: Dhruv Karan k4r4n.dhruv@gmail.com
* WIP: Added CLIP resources from HuggingFace blog
* ADD: Notebooks documentation to clip
* Add link straight to notebook
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Change notebook links to colab
Co-authored-by: Ambuj Pawar <your_email@abc.example>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* allow loading projection in text and vision model
* begin tests
* finish test for CLIPTextModelTest
* style
* add slow tests
* add new classes for projection heads
* remove with_projection
* add in init
* add in doc
* fix tests
* fix some more tests
* fix copies
* fix docs
* remove leftover from fix-copies
* add the head models in IGNORE_NON_AUTO_CONFIGURED
* fix docstr
* fix tests
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add docstr for models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add model files etc for MobileNetV2
* rename files for MobileNetV1
* initial implementation of MobileNetV1
* fix conversion script
* cleanup
* write docs
* tweaks
* fix conversion script
* extract hidden states
* fix test cases
* make fixup
* fixup it all
* rename V1 to V2
* fix checkpoints
* fixup
* implement first block + weight conversion
* add remaining layers
* add output stride and dilation
* fixup
* add tests
* add deeplabv3+ head
* a bit of fixup
* finish deeplab conversion
* add link to doc
* fix issue with JIT trace
in_height and in_width would be Tensor objects during JIT trace, which caused Core ML conversion to fail on the remainder op. By making them ints, the result of the padding calculation becomes a constant value.
* cleanup
* fix order of models
* fix rebase error
* remove main from doc link
* add image processor
* remove old feature extractor
* fix converter + other issues
* fixup
* fix unit test
* add to onnx tests (but these appear broken now)
* add post_process_semantic_segmentation
* use google org
* remove unused imports
* move args
* replace weird assert
* move generation_*.py src files into generation/*.py
* populate generation.__init__ with lazy loading
* move imports and references from generation.xxx.object to generation.object
* Add first draft
* Update conversion script
* Improve conversion script
* Improve conversion script some more
* Add conditional embeddings
* Add initial decoder
* Fix activation function of decoder
* Make decoder outputs match original implementation
* Make decoder outputs match original implementation
* Add more copied from statements
* Improve model outputs
* Fix auto tokenizer file
* Fix more tests
* Add test
* Improve README and docs, improve conditional embeddings
* Fix more tests
* Remove print statements
* Remove initial embeddings
* Improve conversion script
* Add interpolation of position embeddings
* Finish addition of interpolation of position embeddings
* Add support for refined checkpoint
* Fix refined checkpoint
* Remove unused parameter
* Improve conversion script
* Add support for training
* Fix conversion script
* Add CLIPSegFeatureExtractor
* Fix processor
* Fix CLIPSegProcessor
* Fix conversion script
* Fix most tests
* Fix equivalence test
* Fix README
* Add model to doc tests
* Use better variable name
* Convert other checkpoint as well
* Update config, add link to paper
* Add docs
* Update organization
* Replace base_model_prefix with clip
* Fix base_model_prefix
* Fix checkpoint of config
* Fix config checkpoint
* Remove file
* Use logits for output
* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'
* docs: Resolve many typos in the English docs
Typos found via 'codespell ./docs/source/en'
* initial commit
* First draft that gets outputs without crashing!
* Add all the ported openfold dependencies
* testing
* Restructure config files for ESMFold
* Debugging to find output discrepancies
* Mainly style
* Make model runnable without extra deps
* Remove utils and merge them to the modeling file
* Use correct gelu and remove some debug prints
* More cleanup
* Update esm docs
* Update conversion script to support ESMFold properly
* Port some top-level changes from ESMFold repo
* Expand EsmFold docstrings
* Make attention_mask optional (default to all 1s)
* Add inference test for ESMFold
* Use config and not n kwargs
* Add modeling output class
* Remove einops
* Remove chunking in ESM FFN
* Update tests for ESMFold
* Quality
* REpo consistency
* Remove tree dependency from ESMFold
* make fixup
* Add an error in case my structure map function breaks later
* Remove needless code
* Stop auto-casting the LM to float16 so CPU tests pass
* Stop auto-casting the LM to float16 so CPU tests pass
* Final test updates
* Split test file
* Copyright and quality
* Unpin PyTorch to see built doc
* Fix config file to_dict() method
* Add some docstrings to the output
* Skip TF checkpoint tests for ESM until we reupload those
* make fixup
* More docstrings
* Unpin to get even with main
* Flag example to write
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>