Refactoring that does not affect results.

This commit is contained in:
Daniel Povey 2022-07-30 07:45:29 +08:00
parent 3110138ab5
commit 17bc002e6e

View File

@ -124,10 +124,9 @@ param_rms_smooth1: Smoothing proportion for parameter matrix, if assumed rank of
(4) is for smoothing the grad covariance used for (2) (4) is for smoothing the grad covariance used for (2)
(5) is for smoothing the final learning-rate matrix Z relative to (5) is for smoothing the inverse Z^{-1} final learning-rate matrix Z relative to
its own diagonal. Only the cov_max is actually used. its own diagonal. Only the cov_min[4] is actually used, we ignore
the cov_min ends up not affecting the result, so we set it cov_max[4]
to 0.0.
cov_pow: This was mainly added for development and experimentation purposes; cov_pow: This was mainly added for development and experimentation purposes;
it allows you to smooth the parameter covariance matrices at the it allows you to smooth the parameter covariance matrices at the
stages (1), (2), (3) of smoothing mentioned above, and also stages (1), (2), (3) of smoothing mentioned above, and also
@ -167,8 +166,8 @@ param_rms_smooth1: Smoothing proportion for parameter matrix, if assumed rank of
lr=3e-02, lr=3e-02,
betas=(0.9, 0.98), betas=(0.9, 0.98),
size_lr_scale=0.1, size_lr_scale=0.1,
cov_min=(0.025, 0.0025, 0.02, 0.0001, 0.0), cov_min=(0.025, 0.0025, 0.02, 0.0001, 0.1),
cov_max=(10.0, 80.0, 5.0, 400.0, 10.0), cov_max=(10.0, 80.0, 5.0, 400.0, 100.0),
cov_pow=(1.0, 1.0, 1.0, 1.0), cov_pow=(1.0, 1.0, 1.0, 1.0),
param_rms_smooth0=0.4, param_rms_smooth0=0.4,
param_rms_smooth1=0.2, param_rms_smooth1=0.2,
@ -974,7 +973,7 @@ param_rms_smooth1: Smoothing proportion for parameter matrix, if assumed rank of
# this is smoothing Z relative to its own diagonal. This is z_inv, # this is smoothing Z relative to its own diagonal. This is z_inv,
# so by applying a minimum here, we are applying a maximum of the # so by applying a minimum here, we are applying a maximum of the
# eigs of Z after normalizing so the diagonal is 1. # eigs of Z after normalizing so the diagonal is 1.
Z_prime_inv_diag *= (1. + 1. / group["cov_max"][4]) Z_prime_inv_diag *= (1. + group["cov_min"][4])
# We really want the SVD on Z, which will be used for the learning-rate matrix # We really want the SVD on Z, which will be used for the learning-rate matrix
# Q, but Z_prime is better, numerically, to work on because it's closer to # Q, but Z_prime is better, numerically, to work on because it's closer to