Docker

How to scrape page source with Go and chromedp

How to scrape page source with Go and chromedp

It’s clear what we are trying to achieve, so let’s consider the requirements. Firstly, we need a tool to render web pages since JavaScript is commonly used nowadays. Secondly, we require an API to communicate with the headless browser. Lastly, saving the result can be challenging as browsers are designed to interact with rendered results rather than directly with the source code.

Headless browser

So we are looking for a headless browser. We are going to use Chromes headless-shell because it’s easy to use, and it’s based on Chromium . The most significant advantage is docker image, which we can efficiently run on our local machine or anywhere in the cloud.

Docker, GO and CGO application build

Docker, GO and CGO application build

I’ve been avoiding Docker for a very long time. I started as a sysadmin setting up servers running FreeBSD and early versions of Debian on bare-bone servers. As soon as “cloud” came into the market, I switched to AWS and GCE and have used them since then. EC2 was my always goto choice to deploy something on the internet quickly. But Docker — for some reason, I was avoiding it until this two years ago, when I started using it. Without too much trust.

But today, history is not about my love & hate relationship with Docker, but about deploying the GO application and VIPS library. A couple of problems appeared during the process, and it’s worth mentioning them for future generations because I could find much help around my issues.