NVIDIA Nemo Codec Demo (22kHz)

This app demonstrates the NVIDIA Nemo Codec model (nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps) used in Kani TTS.

How it works:

  1. Upload an audio file (wav, mp3, flac, etc.).
  2. The audio will be automatically resampled to 22kHz if needed.
  3. The 22kHz audio is encoded into discrete tokens by the Nemo codec.
  4. These tokens are then decoded back into audio by the Nemo codec.
  5. You can listen to the original, the 22kHz version (if resampled), and the final reconstructed audio.

Technical details:

  • Sample rate: 22kHz
  • Compression: ~0.6kbps
  • Frame rate: 12.5fps
  • 4 codebook levels per frame

Note: Processing happens locally. Larger files will take longer. If the input is stereo, only the first channel is processed.