unknown

A developer shares a frustrating 3-month experience trying to mirror the WildDeepfake dataset (67GB) from HuggingFace to Kaggle for a final-year deepfake detection project. The journey involved cloning via git-lfs, uploading to Google Drive over 1.5 months, copying to a GCS bucket over 30 days using gsutil, and then hitting a wall when Kaggle's upload-from-GCS feature silently failed. Additional struggles with gdown rate limits and a 24GB Polyglotfake dataset are also described. The motivation was to leverage Kaggle's faster cached dataset loading and non-interactive notebook execution for overnight ML experiments.

So, you know what? I just wasted 3 months of my life

Debajyati Dey

Maybe I am old, but at the time it took me more than a day to transfer a file of (in today‘s standards) not unreasonable size I would sit back and think if the approach I am trying is even a good idea.
I don‘t know the platform, but a simple HTTP server and a port forward (and maybe a friend with a fast internet connection if this is the bottleneck) sounds like a more suitable tool for the job of transferring data from local to somewhere I can run code (did I understand that correctly?)…
But then I am still in my „why use cloud when I can run it myself“ phase…

Less framed it like this: you were wasting two years for that project and now you are going back in time just three months after starting it. And now you are deciding to stop it before it’s too late.