Adding a sitemap and robots.txt to a Remix app
By Luka Lazic
20/05/2023
Intro
To rank good with search engines, apart from from setting the meta data (which you can read about in this post the code from which is also our starting point), it is also very beneficial (I would even say mandatory) to also have a sitemap and a robots.txt file. The first provides a list of all the pages that our website has (which means it also needs to be updated when we add blog posts, projects, or whatever else we have in our app) and the second is pretty much just a text file with instructions for crawlers how they should approach our site.
Adding robots.txt
Adding the robots.txt file is the easier part and remix routing allows us to create static files very easily. In the routes
folder, we need to create a file called [robots.txt].tsx
which when accessed from the browser will create a file called robots.txt
in the root of our domain. After creating the file, we add the following content to it:
export const loader = () => {
const robotText = `
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://lukalazic.com/sitemap.xml`;
return new Response(robotText, {
status: 200,
headers: {
"Content-Type": "text/plain",
},
});
};
All done! This is everything we need for our robots file!
Adding the sitemap
For our sitemap, the process is a bit more complex. We need to add all of our pages to it, both static and dynamic (when I say dynamic pages here, I refer to those read from a database, in our case the blog
and project
pages).
Static pages
To start, we will create a file named [sitemap.xml].tsx
in the routes folder, just as the robots file. In there we need to add our static pages (in the case of this website, at the time of writing, it's the homepage, the projects main page and the blog main page). We will grab the latest blog post and project for it's last modified dates and use those as the last modified date for that page as well. We need to return that as a application/xml response through a remix loader function. The initial code will look something like this:
import { LoaderFunction } from "@remix-run/node";
import { getPosts } from "~/models/post.server";
import { getProjects } from "~/models/project.server";
export const loader: LoaderFunction = async () => {
// These two functions get a list of all the posts and projects, using prisma
// I will write more blog posts on prisma in the future and explain how it's used
const posts = await getPosts();
const projects = await getProjects();
const lastModifiedBlogDate = posts[posts.length - 1]?.updatedAt.toISOString();
const lastModifiedProjectDate = posts[projects.length - 1]?.updatedAt.toISOString();
const content = `
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://lukalazic.com/</loc>
<lastmod>2022-11-12T16:43:34.833Z</lastmod>
</url>
<url>
<loc>https://lukalazic.com/blog</loc>
<lastmod>${lastModifiedBlogDate}</lastmod>
</url>
<url>
<loc>https://lukalazic.com/projects</loc>
<lastmod>${lastModifiedProjectDate}</lastmod>
</url>
</urlset>
`;
return new Response(content, {
status: 200,
headers: {
"Content-Type": "application/xml",
"xml-version": "1.0",
encoding: "UTF-8",
},
});
};
After this is set up, when we run the app and navigate to the sitemap.xml
route we will see our initial sitemap. The homepage, blog page and projects page will be there and visible. Next, we need to add the dynamic content, the blog posts and projects.
Dynamic Pages
For the pages coming from our DB, we'll create the following functions:
const postList = posts.map(
(post) => `<url>
<loc>https://lukalazic.com/blog/${post.slug}</loc>
<lastmod>${post.updatedAt.toISOString()}</lastmod>
</url>`
);
const projectList = projects.map(
(project) => `<url>
<loc>https://lukalazic.com/projects/${project.slug}</loc>
<lastmod>${project.updatedAt.toISOString()}</lastmod>
</url>`
);
We are mapping through all of our posts/projects and returning the xml needed to display them in the sitemap. Now, we just drop those into our page list:
const content = `
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://lukalazic.com/</loc>
<lastmod>2022-11-12T16:43:34.833Z</lastmod>
</url>
<url>
<loc>https://lukalazic.com/blog</loc>
<lastmod>${lastModifiedBlogDate}</lastmod>
</url>
${postList.join("")}
<url>
<loc>https://lukalazic.com/projects</loc>
<lastmod>${lastModifiedProjectDate}</lastmod>
</url>
${projectList.join("")}
</urlset>
`;
And we are done, now all our pages are in our sitemap and google should index us with much better results. The complete file can be looked up on github in case I make some updates to it or you want to copy it.