Anton Vlasjuk
d95c864a25
🔴 🔴 🔴 [Attention
] Refactor Attention Interface for Bart-based Models ( #38108 )
...
* starting attn refactor for encoder decoder models via bart (eager + sdpa)
* flash attention works, remove unnecessary code
* flex attention support for bart!, gotta check if the renaming is not too aggressive
* some comments
* skip flex grad test for standalone as done with the other test
* revert flex attn rename (for now), sdpa simplify, and todos
* more todos
* refactor mask creation for reuse
* modular attempt at biogpt
* first batch of other models
* fix attn dropout
* fix autoformer copies
* hubert
* another batch of models
* copies/style + last round of bart models --> whisper next?
* remove unnecessary _reshape function and remove copy to whisper
* add skip for decoder-only models out of enc-dec (same as in bart)
* bring back licences
* remove comment, added to pr read instead
* mostly docs
* disable sew flex attn as it's unclear attn mask for now
* oops
* test fixes for enc-dec
* torch fx fixes + try at flex attn
* skip on mbart
* some more fixes
* musicgen skip / delete old attn class logic + sdpa compose compile skip
* disable flex attn for musicgen, not worth the effort
* more fixes and style
* flex attention test for dropout and encoder decoder that dont have main input names
* informer fixes
* the weirdest thing I've encountered yet...
* style
* remove empty tensor attempt, found core root in previous commits
* disable time series due to tests being very text centric on inputs
* add speech to text to be ignoring the other attns, also due to tests
* update docs
* remaining issues resolved ?
* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D
* some models have not set the is_causal flag...
* change dtype in softmax tol old behaviour + some modular fixes
* I hate it but it is what it is
* fixes from main for bart
* forgot this one
* some model fixes
* style
* current status
* marian works now
* fixing some copies
* some copy fixes + time series x informer
* last models possibly and fixes on style/copies
* some post merge fixes
* more fixes
* make attention interface callable and move warnings there
* style lol
* add comment to "unsupported"
* remove callable interface and change interface warnings + some copies
* fix
* ternary is ugly af, make it simpler
* how did that happen
* fix flex attn test
* failing the test
* no more fallback! fixing copies next
* style + attn fixed
* fixing copies and mask creation
* wrong copy
* fixup tests and disable flex attn for now
* fixup last tests?
2025-05-22 17:12:58 +02:00
Yuanyuan Chen
da4ff2a5f5
Add Optional to remaining types ( #37808 )
...
More Optional typing
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-28 14:20:45 +01:00
Cyril Vallez
0cfbf9c95b
Force torch>=2.6 with torch.load to avoid vulnerability issue ( #37785 )
...
* fix all main files
* fix test files
* oups forgot modular
* add link
* update message
2025-04-25 16:57:09 +02:00
cyyever
1e6b546ea6
Use Python 3.9 syntax in tests ( #37343 )
...
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
cyyever
41a0e58e5b
Set weights_only in torch.load ( #36991 )
2025-03-27 14:55:50 +00:00
Joao Gante
62c7ea0201
CI: avoid human error, automatically infer generative models ( #33212 )
...
* tmp commit
* move tests to the right class
* remove ALL all_generative_model_classes = ...
* skip tf roberta
* skip InstructBlipForConditionalGenerationDecoderOnlyTest
* videollava
* reduce diff
* reduce diff
* remove on vlms
* fix a few more
* manual rebase bits
* more manual rebase
* remove all manual generative model class test entries
* fix up to ernie
* a few more removals
* handle remaining cases
* recurrent gemma
* it's better here
* make fixup
* tf idefics is broken
* tf bert + generate is broken
* don't touch tf :()
* don't touch tf :(
* make fixup
* better comments for test skips
* revert tf changes
* remove empty line removal
* one more
* missing one
2025-02-13 16:27:11 +01:00
Arthur
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis ( #35659 )
...
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
2025-01-24 16:55:28 +01:00
amyeroberts
1de7dc7403
Skip tests properly ( #31308 )
...
* Skip tests properly
* [test_all]
* Add 'reason' as kwarg for skipTest
* [test_all] Fix up
* [test_all]
2024-06-26 21:59:08 +01:00
amyeroberts
25245ec26d
Rename test_model_common_attributes -> test_model_get_set_embeddings ( #31321 )
...
* Rename to test_model_common_attributes
The method name is misleading - it is testing being able to get and set embeddings, not common attributes to all models
* Explicitly skip
2024-06-07 19:40:26 +01:00
Arthur
673440d073
update ruff version ( #30932 )
...
* update ruff version
* fix research projects
* Empty
* Fix errors
---------
Co-authored-by: Lysandre <lysandre@huggingface.co>
2024-05-22 06:40:15 +02:00
Wesley Gifford
f72c7c22d9
PatchtTST and PatchTSMixer fixes ( #28083 )
...
* 🐛 fix .max bug
* remove prediction_length from regression output dimensions
* fix parameter names, fix output names, update tests
* ensure shape for PatchTST
* ensure output shape for PatchTSMixer
* update model, batch, and expected for regression distribution test
* update test expected
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtst/test_modeling_patchtst.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* standardize on patch_length
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Make arguments more explicit
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
* adjust prepared inputs
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
---------
Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-29 10:09:26 +00:00
Arindam Jati
749f94e460
Fix PatchTSMixer slow tests ( #27997 )
...
* fix slow tests
* revert formatting
---------
Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2023-12-13 13:34:25 +01:00
Arindam Jati
b242d0f297
[Time series] Add PatchTSMixer ( #26247 )
...
* patchtsmixer initial commit
* x,y->context_values,target_values, unittest addded
* cleanup code
* minor
* return hidden states
* model tests, partial integration tests
* ettm notebook temporary
* minor
* config mask bug fix, tests updated
* final ETT notebooks
* add selfattn
* init
* added docstrings
* PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining
* functionality tests added
* add start and input docstrings
* docstring edits
* testcase edits
* minor changes
* docstring error fixed
* ran make fixup
* finalize integration tests and docs
* minor
* cleaned gitignore
* added dataclass decorator, ran black formatter
* ran ruff
* formatting
* add slow decorator
* renamed in_Channel to input_size and default to 1
* shorten dataclass names
* use smaller model for testing
* moved the 3 heads to the modeling file
* use scalers instead of revin
* support forecast_channel_indices
* fix regression scaling
* undo reg. scaling
* removed unneeded classes
* forgot missing
* add more layers
* add copied positional_encoding
* use patchmask from patchtst
* removed dependency on layers directory
* formatting
* set seed
* removed unused imports
* fixed forward signature test
* adding distributional head for PatchTSMixerForecasting
* add generate to forecast
* testcases for generate
* add generate and distributional head for regression
* raise Exception for negative values for neg binominal distribution
* formatting changes
* remove copied from patchtst and add TODO for test passing
* make copies
* doc edits
* minor changes
* format issues
* minor changes
* minor changes
* format docstring
* change some class names to PatchTSMixer + class name
Transpose to PatchTSMixerTranspose
GatedAttention to PatchTSMixerGatedAttention
* change NormLayer to PatchTSMixerNormLayer
* change MLP to PatchTSMixerMLP
* change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock
* change ChannelFeatureMixer to ChannelFeatureMixerBlock
* change PatchMasking to PatchTSMixerMasking
* change Patchify to PatchTSMixerPatchify
* list to `list`
* fix docstrings
* formatting
* change bs to batch_size, edit forecast_masking
* edit random_masking
* change variable name and update docstring in PatchTSMixerMasking
* change variable name and update docstring in InjectScalerStatistics4D
* update forward call in PatchTSMixerTranspose
* change variable name and update docstring in PatchTSMixerNormLayer
* change variable name and update docstring in PatchTSMixerMLP
* change variable name and update docstring in ChannelFeatureMixerBlock
* formatting
* formatting issues
* docstring issue
* fixed observed_mask type in docstrings
* use FloatTensor type
* formatting
* fix rescaling issue in forecasting, fixed integration tests
* add docstring from decorator
* fix docstring
* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* PatchTSMixerChannelFeatureMixerBlock
* formatting
* ForPretraining
* use num_labels instead of n_classes
* remove commented out code
* docstring fixed
* nn.functional used instead of one letter F
* x_tmp renamed
* one letter variable x removed from forward calls
* one letter variable y removed
* remove commented code
* rename patch_size, in_channels, PatchTSMixerBackbone
* add config to heads
* add config to heads tests
* code reafactoring to use config instead of passing individual params
* Cdocstring fixes part 1
* docstring fixes part 2
* removed logger.debug
* context_values -> past_values
* formatting changes
* pe -> positional_encoding
* removed unused target variable
* self.mode logic fixed
* formatting change
* edit docstring and var name
* change n_targets to num_targets
* rename input_size to num_input_channels
* add head names with prefix PatchTSMixer
* edit docstring in PatchTSMixerForRegression
* fix var name change in testcases
* add PatchTSMixerAttention
* return dict for all exposed classes, test cases added
* format
* move loss function to forward call
* make style
* adding return dict/tuple
* make repo-consistency
* remove flatten mode
* code refactoring
* rename data
* remove PatchTSMixer and keep only PatchTSMixerEncoder
* docstring fixes
* removed unused code
* format
* format
* remove contiguous and formatting changes
* remove model description from config
* replace asserts with ValueError
* remove nn.Sequential from PatchTSMixerNormLayer
* replace if-else with map
* remove all nn.Sequential
* format
* formatting
* fix gradient_checkpointing error after merge, and formatting
* make fix-copies
* remove comments
* reshape
* doesnt support gradient checkpointing
* corect Patchify
* masking updates
* batchnorm copy from
* format checks
* scaler edits
* remove comments
* format changes
* remove self.config
* correct class PatchTSMixerMLP(nn.Module):
* makr fix
* doc updates
* fix-copies
* scaler class correction
* doc edits
* scaler edits
* update readme with links
* injectstatistics add
* fix-copies
* add norm_eps option to LayerNorm
* format changes
* fix copies
* correct make copies
* use parametrize
* fix doc string
* add docs to toctree
* make style
* doc segmenting
* docstring edit
* change forecast to prediction
* edit doc
* doc edits
* remove PatchTSMixerTranspose
* add PatchTSMixerPositionalEncoding and init position_enc
* remove positional_encoding
* edit forecast_masking, remove forecast_mask_ratios
* fix broken code
* var rename target_values -> future_values
* num_features -> d_model
* fix broken code after master merge
* repo consistency
* use postional embedding
* prediction_logits -> prediction_outputs, make fix-copies
* uncommented @slow
* minor changes
* loss first in tuple
* tuple and dict same ordering
* style edits
* minor changes
* dict/tuple consistent enablement
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix formatting
* formatting
* usage tip
* test on cpu only
* add sample usage
* change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification
* push changes
* fix copies
* std scaling set to default True case
* minor changes
* stylechanges
---------
Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: vijaye12 <vijaye12@in.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: vijaye12 <vijaykr.e@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-05 15:31:35 +01:00