This is my very first post here on Medium and hopefully, I will keep posting the new findings and experiences as my career in Computer Science moves forward.
So, last weekend I was trying to build a system that would scrape the news from the sites that doesn’t expose it’s public API and mail me the headlines along with the link to the news for those particular news I am interested in. I spend a lot of time reading the news when I visit the sites, so this seems reasonably good project because now I would just read the news that I am interested in.
Enough said about the project, let’s move to the part where I had encountered the problem. I needed to Scrape the news for the whole day(in a fixed intervals cause new news might be uploaded anytime) but I needed to send the mail to my email only once a day in the evening. So I needed some sort of database where I can store the news headlines and links that are scraped the whole day before sending out the mail. I thought of going with the NoSQL database and thought of trying out Redis as it’s very easy to get started with. Also, I had to schedule the scraping and mailing functions for which I was gonna use Celery, so Redis would help as a message broker as well.
Here’s the thing… I have never worked on Redis myself before so I went to Redis official page only to find out that Redis isn’t available on Windows. I did some research in Stack Overflow and found some of the solutions but that were either outdated or require you to have Windows Subsystem for Linux(WSL). Long story short, I wasn’t willing to spend my time setting up Redis. That’s when I realized this problem can be solved easily using Docker. I had docker installed on my Windows already. So all I had to do was run a command from command prompt.
docker run -p 6379:6379 redis
Docker will get the latest image of redis from DockerHub and start the container. You are now running Redis locally on your machine within a minute. Now all I had to do was use this Redis URL on my project as environment variable.
Now, I can initialize the Redis from Python project as:
r = redis.from_url(os.environ.get('REDIS_URL'))
With the above code, I can easily add and retrieve things from Redis. Here’s a short example of adding news headline and link and retrieving it:
# add to Redis using headline as key
r.set(<headline>, <link_to_news>)# retrieve from Redis
NOTE that while retrieving the data from Redis, it is retrieved as bytes so decode it to get the string.
That’s it. You just got Redis working in your Windows machine within a minute. I could have just shared this last part of code directly, but I wanted to share my experiences and events that led me to take this approach as “how to approach a problem” is very crucial in the field of Computer Science. I hope you liked it. Have a good day…