tract: The impact of user satisfaction in policy learning task-oriented dialogue systems has long been a subject of research interest. Most current models for estimating the user satisfaction either (i) treat out-of-context short-texts, such as product reviews, or (ii) rely on turn features instead of on distributed semantic representations. In this work we adopt deep neural networks that use distributed semantic representation learning for estimating the user satisfaction in conversations. We evaluate the impact of modelling context length in these networks. Moreover, we show that the proposed hierarchical network outperforms state-of-the-art quality estimators. Furthermore, we show that applying these networks to infer the reward function in a Partial Observable Markov Decision Process (POMDP) yields to a great improvement in the task success rate.