Scraping vs. The Youtube API.

Who needs the Youtube API when you have scraping?

Created: Sep 14, 2021

By

~5 min read


If you haven’t seen the FIRST article, no need to. It’s basically meaningless now. but you know what here you go anyway!

Hahahahahaha. Hahaha. Ha.

So here’s a funny story. I went through all of that trouble to use the youtube API, right? The OAuth tokens and API keys and everything. Here’s the problem right? I actually maxed the quota. Yep, that’s right. Used up all my calls instantly.

So! That isn’t gonna work! But as you can see, the support page still works? What kind of magic is that? Why don’t I show you?

scrape-youtube

Let me introduce you to scrape-youtube. This is a package that basically says fuck you to all the APIs and quota bullshit and just uses the public youtube pages to scrape the info off of them. It’s technically optimized for discord bots, but as everything I touch usually devolves into a chaotic discord mess, close enough right?

Let me show you a snippet to explain how it works.

youtube.search('lofi hip hop beats to relax/study to', { type: 'live' }).then((results) => {
    console.log(results.streams);
});

So, pretty simple right? The first part is the actual search query, something you’d type on the youtube search bar. The second part is the type of results you want. You can leave this one out if you only care about videos. On my end though, I need to look for channels and live streams. Luckily, those are supported: type: 'channel' and type: 'live'.

Once you do this, you end up getting a result! This result has videos, channels, streams, and playlists. Or movies. Either or. Anyways! Now that we’ve found that, let’s go into the step by step explanation on how to use it!

Step 1: The Setup.

Yeah so this isn’t hard. One thing to note: This won’t work on the client-side. You need to be on the server-side here. Node and all that. Now that we’ve clarified that…

So you need to install it by doing npm install scrape-youtube. Once you do that, import it by doing something like this.

import youtube from 'scrape-youtube';

Step 2: The Query.

So, I need to get my main channel, live channel, and an active live stream on my channel. Fortunately, active live streams are the only kind of live streams that are retrieved when you do type: 'live', so we don’t need to make sure of that.

Anyways. Let me show you the queries.

// For sub count.
const enbyssResult = await youtube.search('enbyss', { type: 'channel' });
const liveResult = await youtube.search('enbyss live', { type: 'channel' });
// For streams.
const liveResult = await youtube.search('enbyss live', { type: 'live' });

Pretty simple eh? I mean, if I look up my name directly, then I’m bound to show up. Of course though, it’s best to be safe.

const extractSubs = (result, id) => 
  result.channels.filter(channel => channel.id === id)[0].subscriberCount;
const extractStreams = (result, channelId) => 
  result.streams.filter(stream => stream.channel.id === channelId);

Here’s two small functions. The first parameter is the result you got, so something like enbyssResult, the second part is the channel id in question. So for me, I pass in the ID of my main and live channel. This isn’t private, by the way. If you go to any channel, the URL should look like this:

https://www.youtube.com/channel/UC88yu6qLzwoM53aXLGHiKJQ

That part after channel/? That’s the ID of that channel. The reason I pass this in is to make sure that the result I got is mine, and not some other result. Of course this only works if your channel is one of the top 20 results for the query you used, but honestly I’d say that’s reasonable enough.

As you can see, extractSubs gets the sub count automatically. It also does [0], because once you filter by your channel ID, you’re only gonna end up with one result. That’s a unique ID, it can’t exactly be copied, so there you go!

But extractStreams stops before the [0]! Why? Well let me show you!

Step 3: I live for this, haha. I’m funny.

So firstly, fun fact, I’m not always live - which means that sometimes a live stream just doesn’t exist. So we have to make sure that I’m actually live.

if (liveDetails.length !== 0) {
  // ...
}

Yep, pretty easy. If I’m live, there should be one result - my stream. If I’m not, well there’s no live stream associated with my channel right? So there shouldn’t be any. In other words, that if block only runs if I’m live. So what do I put in it?

var streamTitle;
var streamLink;
var streamWatchers;
var streamThumbnail;

if (liveDetails.length !== 0) {
  streamTitle = liveDetails[0].title;
  streamLink = liveDetails[0].link;
  streamWatchers = liveDetails[0].watching;
  streamThumbnail = liveDetails[0].thumbnail;
}

Oh yeah! All the details! Here I’m getting the title of my stream, a direct link to it, the number of viewers, and a link to the thumbnail. Why a link to the thumbnail? Just incase I want to display it! Not sure if I would, but why not eh? Once I do this, all I need to do is return the deets!

return {
  isLive: liveDetails.length !== 0,
  streamTitle,
  streamLink,
  streamWatchers,
  streamThumbnail,
}

I need the isLive as an easy way of checking that I’m actually live. Of course I can just use the same condition, so there you go!

That’s it?

Mhm! That’s it! No fucking with Google systems to get tokens and feel like you’re Indiana Jones going through the Matrix, only for the Matrix trial to run out and blue ball you. Just use a library, and it works - easy! So yeah, if you were thinking of using the Youtube API because you want to get how many subs you have, or see if you’re live, no need!

Thank you for reading, fuck Google for its API, and thank GOD for Dr Kain.