I don't know how the x264 settings were chosen, but I think some settings could be improved.
1. Use at maximum Level 4.1 settings for good playability/compatibility in hardware:
This means --ref 4 for 1080p, --ref 9 for 720p. You can also calculate the maximum --ref allowed for a given resolution in Level 4.1 using this formula adapted from the h264 specification --ref = Truncate(8388608/(width * height)) [PS: Truncate is there to make the following examples true: division gave 4.165 -> 4 reference frames; division gave 9.856 -> 9 reference frames)
2. Motion estimation method:
Even the x264 developers think using more than --me umh is overkill unless you have time to waste. That's why --me tesa is only used in the "placebo" --preset. And if using more than umh they recommned using tesa over esa.
3.Subpixel motion estimation:
--subme 9 at least. You could also try --subme 10, that is 3-5% more efficient than --subme 9. It requires Trellis 2, so it's slower, but you could try using --subme 10 / --trellis 2 with --me umh instead of --me esa.
Since you are not using an insane number of Bframes, you could try using --b-adapt 2 to increase the quality even more.
5. --tune film
For CGI/3D animation, x264 developers recommend trying --tune film.