A developer shares a frustrating 3-month experience trying to mirror the WildDeepfake dataset (67GB) from HuggingFace to Kaggle for a final-year deepfake detection project. The journey involved cloning via git-lfs, uploading to Google Drive over 1.5 months, copying to a GCS bucket over 30 days using gsutil, and then hitting a wall when Kaggle's upload-from-GCS feature silently failed. Additional struggles with gdown rate limits and a 24GB Polyglotfake dataset are also described. The motivation was to leverage Kaggle's faster cached dataset loading and non-interactive notebook execution for overnight ML experiments.

7 Comments

Sort: