The use of optimal transport distance has recently yielded significant progress in image processing for pattern recognition, shape identification, and histograms matching. In this study, the use of this distance is investigated for a seismic tomography problem exploiting the complete waveform; the full waveform inversion. In its conventional formulation, this high resolution seismic imaging method is based on the minimization of the L2 distance between predicted and observed data. Application of this method is generally hampered by the local minima of the associated L2 misfit function, which correspond to velocity models matching the data up to one or several phase shifts. Conversely, the optimal transport distance appears as a more suitable tool to compare the misfit between oscillatory signals, for its ability to detect shifted patterns. However, its application to full waveform inversion is not straightforward, as the mass conservation between the compared data can not be guaranteed, a crucial assumption for optimal transport. In this study, the use of a distance based on the Kantorovich-Rubinstein norm is introduced to overcome this difficulty.