SEO For Web Engineers: 38 Hard-Earned Lessons and Tips

I like to joke that SEO stands for "somebody else's obligation" because it's easy to point a finger when something goes wrong. Engineers know this pain. Lots of fingers get pointed at them, sometimes by "SEO people."  But, the reality is this: There is no such thing as search engine optimization unless your technical ducks are in a row.

Engineers have a responsibility to understand their role in SEO, and likewise, those who work with engineers have a responsibility to partner with them -- not blame them when something goes wrong. That kind of relationship requires information to be shared openly and honestly.

I hope this article highlights important-yet-seldom-discussed topics that are worthy of discussion within engineering circles, but also among those who rely on engineering teams.

  • The "503 Service Unavailable" HTTP response code is the best way to handle planned or unexpected downtime. It has a minimal impact on search rankings compared to other 5XX responses.

  • 500, 502 and 504 HTTP response codes cause Google to suppress a webpage or de-index it altogether. Each time a person or search crawler receives one of these response codes, expect to lose 5-10 organic visits over time.

  • Communicate quickly and send updates to stakeholders when something goes wrong. Otherwise, you'll be pestered by lots of people looking for answers (and this gets in the way of your team looking for solutions).

  • Create a custom skin and tracking event for each harmful response code (e.g. 4XX and 5XX errors). Problems are easier to diagnose when a layperson can provide some details.

  • Be careful when calculating error rates. A complex web page might call the server 150 times as it loads to completion. That means that log files will understate the frequency of harmful response codes that happen up front. E.g. Imagine that a web page is loaded twice. First it responds with a "200-OK" status and loads everything else on the page. On the second attempt, it responds with a "502-Bad Gateway" status and the rest of the page can't load. The server was called 151 times total and only one of those calls was a 502 status, however, the error rate for a user is 50%, not 0.6%!

  • Resist the temptation to dismiss anecdotal evidence. Many bugs that are dismissed as "can't reproduce" are symptomatic of a bigger issue.

  • Caching is no substitute for fundamental site optimization. Think of a cached page as having a great photo on a dating website. It's the first thing that people see, but when you start a relationship, a person gets to know the "real you." Same goes for users and search engines.

  • Similarly, AMP-enabled pages are no substitute for a slow mobile site.

  • Be aware of page size restrictions. For example, Akamai has a hard 1MB file size limit that results in a 500 response code when it is exceeded.

  • Merge your internal logs with your CDN's logs, otherwise, more than 90% of a problem may go undetected.

  • Consider using the "304-Not Modified" response code on large websites with lots of pages that don't change very often.

  • Look for queries that don't need to be dynamic (e.g. logic that populates listing pages that rarely change). You can avoid unnecessary strain on the server by caching queries and scheduling refreshes.

  • When you change URLs, make sure that redirections are verified at launch. This will carry-over the maximum amount of the old pages' trust and equity. Letting URLs break and then fixing them later can be suicide: Google decays the value of a broken page over time.

  • Check to see if a rewrite flag or rule is causing a redirect chain. This can easily happen when a site began its life as HTTP and was migrated to HTTPs. Some URLs ping-pong between secure and non-secure versions until reaching a final destination. These extra hops decay the equity that the original URL had.

  • If you undo or reverse a redirection, clear your CDN's cache to avoid a redirect loop.

  • Err on the side of innocent until proven guilty. A website's power users are the ones whom are most likely to resemble bots because of "unnatural" browsing speeds or browser-installed plug-ins that may trip a honeypot. It may sound like a fringe case, but a single power user on a community website like Quora can attract 10,000 to 12,000 monthly visits. 

  • A bot in Russia isn't automatically bad and a bot in the United States isn't automatically good. Plenty of bad actors deploy bots from within Amazon's U.S.-based AWS servers.

  • Pick a measurement tool quickly (like Rigor, Lighthouse or PageSpeed Insights) and stick with it. Trends are more important than exact numbers and it's easy to waste time quibbling over a tool.

  • Mobile page speed matters, even if you're running an AMP version of the site. Google judges websites by their native mobile experiences (including speed, UX and other factors).

  • Demand that someone take ownership of each tracking pixel and tag that's added to a page, then, make these stakeholders justify their tags every six months. If you don't, people will ask you to add junk to the page until your team gets blamed for a slow site.

  • Server response time is especially important for sites with millions of pages. Google won't stick around for long if your servers are slow to respond.

  • If you're running NGINX on a large website, make sure that on-the-fly Gzip compression isn't doing more harm than good (i.e. causing a bottleneck that slows the server response time).

  • Clear out anything that blocks the rendering of a page. This will improve a bunch of metrics all at once. (Even a plain-text website can get bottlenecked when loading JS, CSS and fonts.)

  • Focus on what happens during the first 200ms and 2s of page load. Some pages never load in "full" because of dynamic elements (like advertisements).

  • Time-to-first-byte is an important metric, but just as important is what is contained inside that first byte. The browser should be able to construct above-the-fold content before opening a new connection with the server.

  • Define the size of page elements to avoid jumping and dancing pages. Users get frustrated when pages move around and it makes the entire page feel slow, even if it is loading quickly.

  • Read what Ilya Grigorik publishes on this subject. Even veteran developers may learn something.

  • Client-side rendering can mean SEO death. (Look what happened to Hulu.) Google recommends that you provide their search crawler with a server-side rendered page, even if users will see a client-side rendered page. (Note: Google doesn't consider this cloaking, even though it seems like it is.)

  • Provide Googlebot with simple pagination in cases where users see "infinite scroll." 

  • Avoid using "block-level" links, even though it simplifies the code. All of that extra stuff that's packed into that single <a> tag makes it harder for Googlebot to pass contextual value to the destination page.

  • Use the robots.txt file to disallow search engines from crawling staging and QA sites.

  • Register staging and QA sites in Google Search Console. It sounds counterintuitive (because you don't want search engines to find these domains), but if the test domain gets indexed accidentally, you can deindex the entire domain in Search Console.

  • Get someone (preferably on the SEO team, but if there isn't one, a product manager) to define everything that must be built into a page, including the obvious things like <title> tags and other metadata. It's tedious and they will hate you for asking, but they will hate you more if you build a page that doesn't make allowances for these critical tags.

  • Links are the lifeblood of a website and the web overall. Anything important should never be more than five clicks away from the homepage, so have lots of questions for the "great simplifiers" who want to eliminate landing pages, navigation links, etc. 

  • Never, ever, let marketers send newsletters and promotional e-mails from the same IPs that the websites are hosted on. A rogue employee who violates the CAN-SPAM act may result in the entire website being blacklisted.

  • Make sure someone takes the time to fill out the annual "is your contact information up-to-date" survey that registrars require. If you don't, you'll make it easier for some bad actor to steal the domain away from you on a technicality.

  • A page that starts to render, then becomes plain white, is often breaking because of an open write() tag.

  • Google will try to follow relative paths inside of Javascript, even when they don't exist. This can result in polluted crawl error reports.

  • Act fast because Google is a fickle lover. It takes months to build a house and minutes to burn it down, so snuff that match quickly and take the time to teach everyone about fire safety!

Did I miss something important? Tell me on Twitter or LinkedIn. Thanks for reading!