Pipecat-AI's smart-turn is an open-source, community-driven audio turn detection model designed to improve the functionality of conversational voice AI systems. It uses Meta AI's Wav2Vec2-BERT as its backbone and aims to closely mimic human speech patterns beyond traditional voice activity detection. The model is still in its initial phases, currently supporting English with limited training data. Future goals include multi-language support, faster inference times, and broader dataset inclusivity. Contributions and experimentation from the community are encouraged.
Table of contents
Current state of the modelRun the proof-of-concept model checkpointProject goalsModel architectureInferenceTrainingThings to doContributorsSort: