MaxEnt‑Guided Policy Optimization