Automatically detecting broken videos
Hi,
Sorry if this has been mentioned before, I did a search in Sift Talk and didn't find nothing.
I've been sifting through the top videos for the last year and I found an sizable amount of dead ones (which I can't tag as such unfortunately). Is a scheduled task checking for dead videos already in place? If not, how easy would it be to implement? I don't know if any video sites have an API that lets you check if a link is still valid or not, I'd guess that big names would. In case they do it would be only a matter of polling one single video every minute, sorting the videos by number of votes times number of days since last seen in decreasing order, or something along those lines.
Sorry if this has been mentioned before, I did a search in Sift Talk and didn't find nothing.
I've been sifting through the top videos for the last year and I found an sizable amount of dead ones (which I can't tag as such unfortunately). Is a scheduled task checking for dead videos already in place? If not, how easy would it be to implement? I don't know if any video sites have an API that lets you check if a link is still valid or not, I'd guess that big names would. In case they do it would be only a matter of polling one single video every minute, sorting the videos by number of votes times number of days since last seen in decreasing order, or something along those lines.
2 Comments
It's a valid idea, but technically infeasible.
If you consider polling into the tens and hundreds of thousands of videos every minute, it'd be impossible. Just doing that once would probably take hours (because for every video we'd need to perform slow Intertube processing) and kill our servers in the interim.
[edit]
Half reading as usual... Just noticed you suggested checking 1 video per minute. However, that's equally infeasible. If we polled 100,000 videos at the rate of 1 per minute, it'd take a few months for a single run.
I don't think it would be necessary to poll for all videos, only those which are more likely to be dead. I don't have the data at hand but I would assume it's correlated to how old they are and how long ago they were last viewed. Also there's no need to poll unsifted videos or videos with a low number of votes or views since they're less likely to be viewed sooner than a video with a high number of votes or views.
You can poll around 10,000 videos per week (60 x 24 x 7 = 10,080). It'd be interesting to calculate the total number of views for the top 10,000 most viewed videos compared to the total number of views for all videos. I bet the ratio is over 50%!
Discuss...
Enable JavaScript to submit a comment.