1. 05 Oct, 2021 1 commit
  2. 02 Oct, 2021 1 commit
  3. 30 Sep, 2021 2 commits
    • Po-Yao Huang's avatar
      MMPT (#2373) · 666d8c26
      Po-Yao Huang authored
      Summary:
      # Before submitting
      
      - [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      - [x] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/main/CONTRIBUTING.md)?
      - [x] Did you make sure to update the docs?
      - [x] Did you write any new necessary tests?
      
      ## What does this PR do?
      Release the code and model for two of our papers at FAIR:
      1. VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Xu et. al., EMNLP 2021)
      2. VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (Xu et. al., ACL Findings 2021)
      
      ## PR review
      dianaml0 (Diana Liskovich, referred by Myle Ott)
      
      ## Did you have fun?
      Yes! {emoji:1f44d}
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2373
      
      Reviewed By: dianaml0
      
      Differential Revision: D31278832
      
      Pulled By: berniebear
      
      fbshipit-source-id: b6a0fad4caf44b062be0c46c12842b26792b35a3
      666d8c26
    • Yun Tang's avatar
      Update speech_text_joint_to_text example to the latest fairseq · c0d098ea
      Yun Tang authored
      Summary:
      There are mismatches for the code in speech_text_joint_to_text example and the code in the latest fairseq codebase
      1. import task class twice
      2. newly added TransformerEncoderLayerBase is equal to TransformerEncoderLayer
      3. Wav2VecEncoder API change (wav2vec2_asr.py)
      
      Reviewed By: kahne
      
      Differential Revision: D31299458
      
      fbshipit-source-id: 6eb64e2692ca3c2729248d55ccefe74283fe4ef0
      c0d098ea
  4. 27 Sep, 2021 1 commit
    • Myle Ott's avatar
      Use safe_getattr and safe_hasattr (#2347) · f34abcf2
      Myle Ott authored
      Summary:
      We use omegaconf.DictConfig objects in non-strict mode, so hasattr behaves weirdly:
      ```
      >>> import omegaconf
      >>> omegaconf.__version__
      '2.0.6'
      >>> x = omegaconf.DictConfig({"a": 1})
      >>> hasattr(x, "foo")
      True
      ```
      
      This violates some assumptions in various parts of the code. For example, previously this command was incorrectly missing the final layer norm due to upgrade logic that relied on `hasattr`, but is fixed after this diff:
      ```
      CUDA_VISIBLE_DEVICES=0 python train.py --task dummy_lm --arch transformer_lm_gpt3_small --optimizer adam --lr 0.0001 --max-sentences 8 --log-format json --log-interval 1
      ```
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2347
      
      Reviewed By: alexeib
      
      Differential Revision: D31170584
      
      Pulled By: myleott
      
      fbshipit-source-id: bd767b7497794314f58f0f8073cdd4332b214006
      f34abcf2
  5. 20 Sep, 2021 3 commits
  6. 16 Sep, 2021 1 commit
    • dianaml0's avatar
      update on branch renaming (#3879) · f6abcc2a
      dianaml0 authored
      Summary:
      # Before submitting
      
      - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
      - [ ] Did you make sure to update the docs?
      - [ ] Did you write any new necessary tests?
      
      ## What does this PR do?
      Fixes # (issue).
      
      ## PR review
      Anyone in the community is free to review the PR once the tests have passed.
      If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
      
      ## Did you have fun?
      Make sure you had fun coding �
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/3879
      
      Reviewed By: myleott
      
      Differential Revision: D30969142
      
      Pulled By: dianaml0
      
      fbshipit-source-id: 902154c03fd68ae6645d3e0ac07b7d729dfc7934
      f6abcc2a
  7. 15 Sep, 2021 2 commits
  8. 14 Sep, 2021 1 commit
    • Changhan Wang's avatar
      add TTS · 0ac3f327
      Changhan Wang authored
      Summary: [fairseq-py] add TTS
      
      Reviewed By: wnhsu
      
      Differential Revision: D30720666
      
      fbshipit-source-id: b5288acec72bea1d3a9f3884a4ed51b616c7a403
      0ac3f327
  9. 13 Sep, 2021 4 commits
  10. 09 Sep, 2021 2 commits
    • Xianfeng Rui's avatar
      annotation added for jitable · e3fafbdf
      Xianfeng Rui authored
      Summary:
      1) add annotation for encoder_out
      2) force dropout to be float for jitable purpose.
      
      Reviewed By: cndn
      
      Differential Revision: D30826657
      
      fbshipit-source-id: aca79845d7ae48d450b602a7be8f56404f4c7bab
      e3fafbdf
    • Yuan Shangguan (June)'s avatar
      Fairseq needs to store and load metadata from model state_dict · 50b65368
      Yuan Shangguan (June) authored
      Summary:
      ## TL;DR
      Fairseq checkpoint saving and loading should mirror torch's checkpoint by saving and loading "state_dict()._metadata".
      
      ## Long Story:
      
      #### What happened:
      During model loading and saving, Quantization-aware-training models in Pytorch encounters a weird bug that says state_dict "fake_weight_quant.weight.min_val" is mismatched to "min_vals".
      
      #### What was the reason:
      - We found the issue in that torch uses state_dict()._metadata to store module._version, but the metadata was never store in checkpoint, nor are they loaded during checkpoint loading in fairseq.
      
      Reviewed By: frankseide
      
      Differential Revision: D30649933
      
      fbshipit-source-id: ce262486b9b95fbcece463fa05c4e1903d4232d7
      50b65368
  11. 08 Sep, 2021 1 commit
  12. 07 Sep, 2021 1 commit
    • Jingfei Du's avatar
      fix default lprob score of beam search with prefix tokens (#2267) · 5cfd3738
      Jingfei Du authored
      Summary:
      # Before submitting
      the default score was set as min score of all lprobs, which would let us select tokens other than prefix tokens during beam search. having a pretty hacky way to make it smaller than any lprobs.
      - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
      - [ ] Did you make sure to update the docs?
      - [ ] Did you write any new necessary tests?
      
      ## What does this PR do?
      Fixes # (issue).
      
      ## PR review
      Anyone in the community is free to review the PR once the tests have passed.
      If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
      
      ## Did you have fun?
      Make sure you had fun coding �
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2267
      
      Reviewed By: myleott
      
      Differential Revision: D30730475
      
      Pulled By: jingfeidu
      
      fbshipit-source-id: 7dab4e9ed2fc094910467bad776155230987e21a
      5cfd3738
  13. 01 Sep, 2021 2 commits
    • Koustuv Sinha's avatar
      Releasing models for our paper "Masked Language Modeling and the Distributional Hypothesis" (#1930) · 14c5bd02
      Koustuv Sinha authored
      Summary:
      Paper submitted to EMNLP: https://arxiv.org/abs/2104.06644
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/1930
      
      Reviewed By: lematt1991
      
      Differential Revision: D28885634
      
      Pulled By: shruti-bh
      
      fbshipit-source-id: d433c87cff3603b3e676a129029a827c510a72c7
      14c5bd02
    • Vimal Manohar's avatar
      EMA · 8feccf94
      Vimal Manohar authored
      Summary:
      Adds Exponential moving average (EMA) model for Kaizen semi-supervised training https://arxiv.org/abs/2106.07759
      
      1. Add `ema.store_ema` to enable storing EMA. EMA will be written to extra_state in the state dict while saving checkpoint.
      2. `ema.ema_start_update` to control when the EMA starts accumulating
      3. Tasks can use `uses_ema` property to decide if the EMA should be passed to the task. (Default is False)
      4. `load_ema_from_checkpoint` can be used to load EMA model in place of the model to be used for evalutation. Pyspeech has eval-ema option for this.
      
      ```
      This module has the EMA class used to store a copy of the exponentially decayed
      model params.
      
      Typical usage of EMA class involves initializing an object using an existing
      model (random or from a seed model) and setting the config like ema_decay,
      ema_start_update which determine how the EMA model is updated. After every
      update of the model i.e. at the end of the train_step, the EMA should be updated
      by passing the new model to the EMA.step function. The EMA model state dict
      can be stored in the extra state under the key of "ema" and dumped
      into a checkpoint and loaded. The EMA object can be passed to tasks
      by setting task.uses_ema property.
      EMA is a smoothed/ensemble model which might have better performance
      when used for inference or further fine-tuning. EMA class has a
      reverse function to load the EMA params into a model and use it
      like a regular model.
      ```
      
      Reviewed By: cruvadom
      
      Differential Revision: D24238379
      
      fbshipit-source-id: 879d3ba5070a614b7d365f9503af357001e875b2
      8feccf94
  14. 31 Aug, 2021 3 commits
    • Pierre Andrews's avatar
      Indexed Huffman Coded dataset (#2029) · 68a81202
      Pierre Andrews authored
      Summary:
      ## What does this PR do?
      
      Currently, binarized dataset are stored as a bin representation of int tensors. At best, each int is coded as uint16 on disk.
      
      When coding a fixed size vocabulary dataset where we know the frequency of each symbol and where some symbols are more common than other, we can do better. This happens in particular when binarizing a dataset split in subword units as the most common "tokenizers" like bpe and spm will choose subwords with high frequencies over subwords with low frequencies.
      
      In practice, if we know the frequency of all symbols (or a good estimate), we can use entropy encoding methods to compress the data. The idea is to assign a compressed representation where frequent symbols will have shorter representations than unfrequent symbols.
      
      In this PR, we build a Huffman code from a frequency table and use this code to encode a dataset. The PR provides the huffman coder implementation (using the single queue approach as we usually start with a sorted set of symbols) as well as a memory map implementation of a dataset that stores the data compressed with a huffman code and can return indexed tensors from it.
      
      Over a whole dataset, depending on how many symbols we sample to evaluate the frequency, we can save between 25% and 30% of storage space.
      
      ## Follow Ups
      
      currently the binarizer/preprocess script make too many assumptions about the dataset writers so the huffman dataset writer cannot be used straight out of the box with it. I will make follow ups PRs to provide easy to use scripts to build such datasets. But it's as simple as doing:
      ```
      code_builder = HuffmanCodeBuilder()
      with open(sample_file, 'r', encoding="utf-8") as input:
          for line in input:
              code_builder.add(*line.strip().split(" "))
      
      coder = code_builder.build_code()
      
      with HuffmanMMapIndexedDatasetBuilder('/tmp/testing_huffman', coder) as builder:
          with open(dataset_file, 'r', encoding="utf-8") as input:
              for line in input:
                  builder.add_item(line.strip().split(' '))
      ```
      
      a lot of the `HuffmanMMapIndexedDataset` code comes from the normal `MMapIndexedDataset` and we could probably extract commonalities in a base class
      
      the `HuffmanCoder` is also really a special kind of `Dictionary` and again, a common base class could be abstracted out of them.
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2029
      
      Reviewed By: dianaml0
      
      Differential Revision: D29557468
      
      Pulled By: Mortimerp9
      
      fbshipit-source-id: a01b6d98f38f937934cadebb3786133e257adefe
      68a81202
    • Rengan Xu's avatar
      Fix test_eval_bleu unittest (#2236) · 5277ec47
      Rengan Xu authored
      Summary:
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2236
      
      The test_eval_bleu unittest in TestTranslation in tests/test_binaries.py failed after the scarebleu version is updated to 2.0.0 in OSS testing tool. Added the fix so that the test can pass when scarebleu version is both 1.x and 2.0.0.
      
      Reviewed By: myleott, sravyapopuri388
      
      Differential Revision: D30525920
      
      fbshipit-source-id: 8ef27509cec45422a8d22003c87c2a7acb55225d
      5277ec47
    • Jingfei Du's avatar
      fix beam search with prefix tokens (#2227) · 932a3d4a
      Jingfei Du authored
      Summary:
      1. added test for genereting pad tokens during beam search with prefix
      tokens
      2. modified lprobs for pad token and prefix tokens to avoid generating
      pad
      
      # Before submitting
      
      - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
      - [ ] Did you make sure to update the docs?
      - [ ] Did you write any new necessary tests?
      
      ## What does this PR do?
      Fixes # (issue).
      
      ## PR review
      Anyone in the community is free to review the PR once the tests have passed.
      If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
      
      ## Did you have fun?
      Make sure you had fun coding �
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2227
      
      Reviewed By: xianxl
      
      Differential Revision: D30649356
      
      Pulled By: jingfeidu
      
      fbshipit-source-id: d94903a912e767391c8fca61f98f65b5cea3b56e
      932a3d4a
  15. 27 Aug, 2021 1 commit
  16. 26 Aug, 2021 1 commit
  17. 19 Aug, 2021 1 commit
    • Pierre Andrews's avatar
      (fix #2177) Erase the encoder_embed_dim default (#2213) · 1f7ef9ed
      Pierre Andrews authored
      Summary:
      Fix https://github.com/fairinternal/fairseq-py/issues/2177 for the transformer conversion to Hydra.
      
      The way the defaults are dealt with now is different so when you use the legacy Namespace configuration, you end up with a default encoder_embed_dim, which in the VGG case sets up a encoder attention in the TransformerDecoderLayer with the wrong dimentions.
      The easiest solution is to erase the default value for encoder_embed_dim (by forcing it to None) when converting the VGG config to the raw Namespace for the decoder layer.
      
      Tested with:
      `pytest tests/speech_recognition/test_vggtransformer.py -k Transformer`
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2213
      
      Test Plan: pytest tests/speech_recognition/test_vggtransformer.py -k Transformer
      
      Reviewed By: sshleifer
      
      Differential Revision: D30425143
      
      Pulled By: Mortimerp9
      
      fbshipit-source-id: 92f6dea2ffbb68e441700bcc55274b3167a587b3
      1f7ef9ed
  18. 17 Aug, 2021 2 commits
  19. 12 Aug, 2021 1 commit
  20. 05 Aug, 2021 1 commit
    • Sam Shleifer's avatar
      --fp16-adam-stats (#2139) · 9825786f
      Sam Shleifer authored
      Summary:
      - stores exp_avg and exp_sq_avg in fp16, with `scale` variables to avoid overflow.
      - myleott added this to gshard, following github.com/openai/jukebox/blob/master/jukebox/utils/fp16.py
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2139
      
      Reviewed By: myleott
      
      Differential Revision: D30113175
      
      Pulled By: sshleifer
      
      fbshipit-source-id: 03995c8eb096629675eadec4e7b8e7f18fc2730e
      9825786f
  21. 03 Aug, 2021 2 commits
    • Ishani Karmarkar's avatar
      Quant Noise · 3d90df4a
      Ishani Karmarkar authored
      Summary: Implemented fix bit scalar quantization with quant noise for pytext models
      
      Reviewed By: AkshatSh
      
      Differential Revision: D29662977
      
      fbshipit-source-id: ebab68a4a5ff1583a0c6dfadcf2671663e232c18
      3d90df4a
    • Edan Tessel Sneh's avatar
      Adding Hydra based trainer target to fairseq in fbcode · fe15926d
      Edan Tessel Sneh authored
      Summary: Adding fairseq entrypoint section of e2e pipeline so FairseqConfig to hydra_main, runs smoothly
      
      Reviewed By: jieru-hu
      
      Differential Revision: D29714729
      
      fbshipit-source-id: e3694e0037bb4c4f69208c1d6ec7df91d42fb588
      fe15926d
  22. 02 Aug, 2021 2 commits
  23. 31 Jul, 2021 1 commit
    • Ishani Karmarkar's avatar
      iPQ · 9d70f9ca
      Ishani Karmarkar authored
      Summary: Implemented iterative product quantization (iPQ trainer) and unit tests
      
      Reviewed By: AkshatSh, AdithyaSagar007
      
      Differential Revision: D29662949
      
      fbshipit-source-id: fdc1f124decc722b54225a7fe0031695823e1c69
      9d70f9ca
  24. 30 Jul, 2021 3 commits
    • Ann Lee's avatar
      add paper link (#2116) · 97240193
      Ann Lee authored
      Summary:
      # Before submitting
      
      - [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      - [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
      - [ ] Did you make sure to update the docs?
      - [ ] Did you write any new necessary tests?
      
      ## What does this PR do?
      Fixes # (issue).
      
      ## PR review
      Anyone in the community is free to review the PR once the tests have passed.
      If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
      
      ## Did you have fun?
      Make sure you had fun coding �
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2116
      
      Reviewed By: michaelauli
      
      Differential Revision: D30019908
      
      Pulled By: an918tw
      
      fbshipit-source-id: ca8d7a6e97ed81e7df9a15e778c68fad8fb0a308
      97240193
    • Wei-Ning Hsu's avatar
      update hubert decode config (#2106) · 20fbc348
      Wei-Ning Hsu authored
      Summary:
      Update HuBERT decode config yaml to make compatible with the new decoder config
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2106
      
      Reviewed By: alexeib
      
      Differential Revision: D29967631
      
      Pulled By: wnhsu
      
      fbshipit-source-id: fe39c5126f50c3024022f8333e2f3aa97065cbfc
      20fbc348
    • Yun Tang's avatar
      Add speech/text joint training for speech to text task (step 2) · 7a6706f5
      Yun Tang authored
      Summary:
      Add scripts for speech/text joint training for the speech to text task. It
      includes scripts/recipes from the following papers
      "A General Multi-Task Learning Framework to Leverage Text Data for Speech to
      Text Tasks", ICASSP 2021
      "Improving Speech Translation by Understanding and Learning from the Auxiliary
      Text Translation Task", ACL 2021
      "FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared
      Task", IWSLT 2021
      
      Reviewed By: kahne
      
      Differential Revision: D29820444
      
      fbshipit-source-id: 925eaedb69233e0a6f4c110045db63a6007a2b60
      7a6706f5