Reduce the limit on attention weights from 50 to 25.

This commit is contained in:
Daniel Povey 2022-10-21 12:13:23 +08:00
parent c5cb52fed1
commit 9f68b5717c

View File

@ -1116,7 +1116,7 @@ class RelPositionMultiheadAttention(nn.Module):
# this mechanism instead of, say, a limit on entropy, because once the entropy # this mechanism instead of, say, a limit on entropy, because once the entropy
# gets very small gradients through the softmax can become very small, and # gets very small gradients through the softmax can become very small, and
# some mechanisms like that become ineffective. # some mechanisms like that become ineffective.
attn_weights_limit = 50.0 attn_weights_limit = 25.0
# caution: this penalty will be affected by grad-scaling in amp. # caution: this penalty will be affected by grad-scaling in amp.
# It's OK; this is just an emergency brake, and under normal # It's OK; this is just an emergency brake, and under normal
# conditions it shouldn't be active # conditions it shouldn't be active