tract: Microscopic traffic simulation provides a controllable, repeatable, and efficient testing environment for autonomous vehicles (AVs). To evaluate AVs' safety performance unbiasedly, ideally, the probability distributions of the joint state space of all vehicles in the simulated naturalistic driving environment (NDE) needs to be consistent with those from the real-world driving environment. However, although human driving behaviors have been extensively investigated in the transportation engineering field, most existing models were developed for traffic flow analysis without consideration of distributional consistency of driving behaviors, which may cause significant evaluation biasedness for AV testing. To fill this research gap, a distributionally consistent NDE modeling framework is proposed. Using large-scale naturalistic driving data, empirical distributions are obtained to construct the stochastic human driving behavior models under different conditions, which serve as the basic behavior models. To reduce the model errors caused by the limited data quantity and mitigate the error accumulation problem during the simulation, an optimization framework is designed to further enhance the basic models. Specifically, the vehicle state evolution is modeled as a Markov chain and its stationary distribution is twisted to match the distribution from the real-world driving environment. In the case study of highway driving environment using real-world naturalistic driving data, the distributional accuracy of the generated NDE is validated. The generated NDE is further utilized to test the safety performance of an AV model to validate its effectiveness.