Adding a sitemap and robots.txt to a Remix app

20/05/2023

Intro

To rank good with search engines, apart from from setting the meta data (which you can read about in this post the code from which is also our starting point), it is also very beneficial (I would even say mandatory) to also have a sitemap and a robots.txt file. The first provides a list of all the pages that our website has (which means it also needs to be updated when we add blog posts, projects, or whatever else we have in our app) and the second is pretty much just a text file with instructions for crawlers how they should approach our site.

Adding robots.txt

Adding the robots.txt file is the easier part and remix routing allows us to create static files very easily. In the routes folder, we need to create a file called [robots.txt].tsx which when accessed from the browser will create a file called robots.txt in the root of our domain. After creating the file, we add the following content to it:

export const loader = () => {

  const robotText = `
        User-agent: Googlebot
        Disallow: /nogooglebot/
        User-agent: *
        Allow: /
        Sitemap: https://lukalazic.com/sitemap.xml`;

  return new Response(robotText, {
    status: 200,
    headers: {
      "Content-Type": "text/plain",
    },
  });
};

All done! This is everything we need for our robots file!

Adding the sitemap

For our sitemap, the process is a bit more complex. We need to add all of our pages to it, both static and dynamic (when I say dynamic pages here, I refer to those read from a database, in our case the blog and project pages).

Static pages

To start, we will create a file named [sitemap.xml].tsx in the routes folder, just as the robots file. In there we need to add our static pages (in the case of this website, at the time of writing, it's the homepage, the projects main page and the blog main page). We will grab the latest blog post and project for it's last modified dates and use those as the last modified date for that page as well. We need to return that as a application/xml response through a remix loader function. The initial code will look something like this:

import { LoaderFunction } from "@remix-run/node";
import { getPosts } from "~/models/post.server";
import { getProjects } from "~/models/project.server";

export const loader: LoaderFunction = async () => {
  // These two functions get a list of all the posts and projects, using prisma
  // I will write more blog posts on prisma in the future and explain how it's used
  const posts = await getPosts();
  const projects = await getProjects();

  const lastModifiedBlogDate = posts[posts.length - 1]?.updatedAt.toISOString();
  const lastModifiedProjectDate = posts[projects.length - 1]?.updatedAt.toISOString();
  
  const content = `
      <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <url>
      <loc>https://lukalazic.com/</loc>
      <lastmod>2022-11-12T16:43:34.833Z</lastmod>
      </url>
      <url>
      <loc>https://lukalazic.com/blog</loc>
      <lastmod>${lastModifiedBlogDate}</lastmod>
      </url>
      <url>
      <loc>https://lukalazic.com/projects</loc>
      <lastmod>${lastModifiedProjectDate}</lastmod>
      </url>
      </urlset>
      `;

  return new Response(content, {
    status: 200,
    headers: {
      "Content-Type": "application/xml",
      "xml-version": "1.0",
      encoding: "UTF-8",
    },
  });
  
};

After this is set up, when we run the app and navigate to the sitemap.xml route we will see our initial sitemap. The homepage, blog page and projects page will be there and visible. Next, we need to add the dynamic content, the blog posts and projects.

Dynamic Pages

For the pages coming from our DB, we'll create the following functions:

  const postList = posts.map(
    (post) => `<url>
    <loc>https://lukalazic.com/blog/${post.slug}</loc>
    <lastmod>${post.updatedAt.toISOString()}</lastmod>
    </url>`
  );

  const projectList = projects.map(
    (project) => `<url>
    <loc>https://lukalazic.com/projects/${project.slug}</loc>
    <lastmod>${project.updatedAt.toISOString()}</lastmod>
    </url>`
  );

We are mapping through all of our posts/projects and returning the xml needed to display them in the sitemap. Now, we just drop those into our page list:

  const content = `
      <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      <url>
      <loc>https://lukalazic.com/</loc>
      <lastmod>2022-11-12T16:43:34.833Z</lastmod>
      </url>
      <url>
      <loc>https://lukalazic.com/blog</loc>
      <lastmod>${lastModifiedBlogDate}</lastmod>
      </url>
      ${postList.join("")}
      <url>
      <loc>https://lukalazic.com/projects</loc>
      <lastmod>${lastModifiedProjectDate}</lastmod>
      </url>
      ${projectList.join("")}
      </urlset>
      `;

And we are done, now all our pages are in our sitemap and google should index us with much better results. The complete file can be looked up on github in case I make some updates to it or you want to copy it.

Share this post :