Parametrized artificial neural networks (ANNs) can be very expressive ansatzes for variational algorithms, reaching state-of-the-art energies on many quantum many-body Hamiltonians. Nevertheless, the training of the ANN can be slow and stymied by the presence of local minima in the parameter landscape. One approach to mitigate this issue is to use parallel tempering methods, and in this work, we focus on the role played by the temperature distribution of the parallel tempering replicas.