


Youtube2text is designed to factor the toggling of audio sample rate. With that context, it’s important for the audio data retrieving process to allows the toggling of the sample rate. The selected sampling rate should match to the ones where the existing NLP model has been trained with. Any attempts in using large language pretrained model would require the newly added audio data to be sampled in a desired frequency (example: 16 kHz). The goal of mine is to prepare audio and translated text to train a custom Automatic Speech Recognition (ASR) model. Csv file in the text folder is the output of the translation from audio to text. Audio folder contains the audio file as a whole, while audio-chunks folder stores snippets of audio file matching to the metadata in the text file. The layout of the directories is listed below. The library currently supports audio in wav or flac format and text in csv format, where the output is stored in the sub-folders of the designated path or default path (\youtube2text) from youtube2text import Youtube2Text converter = Youtube2Text() converter.url2text("") To retrieve a youtube URL as audio and text output, run the following command in a python environment. Install the library by pip with the following command. Download Youtube audio in an audio file format (.wav.Retrieve Youtube URL as audio and text output.The library supports three functionalities at the time of writing. Youtube2text library is designed to get a suitable format for the audiotext pairing.

Due to that, sequential steps of post-processing work have to be performed to get working pairs of audio and text. Notice how the texts are all lowercase and do not separate with the end of sentence punctuation.
