Ask Me Anything
April 22nd 2020

AMA With Hamlet Batista: Python for SEO

6 min read

Below is an excerpt from an AMA (ask me anything) with RankSense Founder and CEO, Hamlet Batista. This event was held on our newly launched Slack community, the Content Strategy Collective. Upcoming AMAs include Jill Nicholson, Senior Director of Customer Education at Chartbeat, Lisa Deignan, Global SEO Analyst, Lionbridge, Mike Leonhard, Founder of Composely, and more.

Join the Content Strategy Collective here.

How did you get started with Python? Why did you turn to that as a practice for SEO/marketing?

I have a developer background, so I learned Python many years ago. 2004 to be exact. We used it extensively over the years internally to automate tasks here and there. It is very easy to use to teach team members that are not technical.

Can you show us some examples and uses cases related to you and RankSense generating titles and subtitles using NLG?

I would love to. I’m really excited about this capability and can’t wait to get this out! We are hoping to release this in our product next week. 

Do you also focus on long-form text generation or do you stay in the 50 – 250 words range?

I try to stay up to date on the progress of long-form text generation as well. The main challenge with NLG currently is keeping the text factual. I plan to write an article covering that using question/answering. I research/try +5x what I share primarily because most stuff is not practical or too complex to share.

What resources would you recommend for people moving into the world of machine learning and applying those concepts to the SEO space?

I’d recommend this Coursera course and if you are a bit technical, you can try this deep learning course. I’ve also written many articles in SEJ that are more SEO specific, including:

Any specific Python libraries you use for SEO that we should familiarize ourselves with?

I don’t use any SEO specific libraries unless you consider this nice wrapper for Google Search Console. I submitted a pull request to enable its use in Colab. Now, you reminded me to address his comments to get it merged.

How do you prioritize product efforts at RankSense and what’s the decision-making process?

We work on weekly sprints, which gives us a lot of flexibility. I use the feedback from my articles and presentations to re-prioritize our roadmap. For example, the last SEJ article was a big home run, so we dropped everything else and pushing it out ASAP!

Do you manage a longer-term (6-12 month) roadmap on top of this that paints your broader vision?

My product vision expands 2-3 years, including things where the latest tech is not there yet. We have a lot of computer vision work that is too slow, expensive or impractical to add to the roadmap. So, I try to be patient, waiting for the SOTA (State of the Art) to improve enough to include in our immediate plans.

How much automation do you think ML would take over in the content creation pipeline in the near future?

I’d say definitely over 50%. Check out the amazing work by MarketMuse already and the NLP industry is progressing exponentially. We live in an amazing time!

Do you have any recommendations (courses or techniques) for brushing up your public speaking skills?

Absolutely, I took a public speaking course in Coursera a couple of years ago. It depends if you prefer in-person too. But, I definitely recommend professional training it makes a big difference. This one is really good. It teaches a specific technique that worked well for me.

Since Google started using BERT in its search engine, how much weight do you think it has on ranking by now?

I wouldn’t think about it as a weight in their rankings. It is more of a way to interpret queries better. Before they would not match queries and documents as precisely and now they are better at that. I’d say it would lead to completely different rankings. Remember that rankings are tied to queries and probably semantics.

What are some of the possibilities/challenges you envision in the future when we talk about deploying the state-of-the-art NLP models like T5, BART, etc. to solve tasks in SEO?

I am excited as I see more menial work getting done and faster. There are many SEO tasks that don’t get done because they are boring and time-consuming. I’m also seeing this as raising the bar across the board in the industry, which is something I’d love to see more respect and appreciation. I also think SEOs will enjoy doing more higher level, strategic work.

What resources better explain the role played by structured data into the establishment of entities and the knowledge graph?

The structured data you add to a website specifies the entities and relationships. A knowledge graph binds all site data together into a bigger data structure. Here is a simple tutorial that you can follow to build a knowledge graph in Python from scratch.

There are relational/tabular databases more similar to spreadsheets and then there are hierarchal/graph databases. The main difference is they make it easy to connect relevant entities. A JSON-LD is better represented using a graph. Building a graph means adding and connecting relevant entities, for example, people, places, etc. My talk at SMX Advanced will focus on a specific use case: an internal search engine.

Summary of Translate Model for Knowledge Graph Embedding should be a good starting point, but the math might be a bit intimidating. Take advantage of these popular papers with the code to try the ideas. Knowledge Graph Embedding is a complex topic overall, and one of the things I will do in my talk is to simplify and make practical. Google recently released an article on generating structured data with Javascript. The developer resources should be a must-have for tech SEOs. Try this codelab too.

I also wrote an intro to JS that includes a structured data example.

Do you have any simple scripts that are must-haves (ie, identifying 404s, etc)?

Absolutely. I shared many in How to Use Python to Analyze SEO Data: A Reference Guide.

Do you know of any good resources for analyzing file logs?

There are many resources on this topic: 

Is there a specific methodology to follow for specific NLP tasks? Any recommended resources on that topic?

I’d recommend trying the Hugging Face pipelines first, so you can see which ones are more practical for your use cases. This notebook should be a good starting point. Then you can investigate how to create the pipeline steps directly as you will have more control.

Stephen Jeske

Written by Stephen Jeske stephenjeske