Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sitemap.xml as a plain text - bug in declaration in urlset #10515

Open
poizon opened this issue Dec 9, 2022 · 8 comments · Fixed by #10516
Open

sitemap.xml as a plain text - bug in declaration in urlset #10515

poizon opened this issue Dec 9, 2022 · 8 comments · Fixed by #10516

Comments

@poizon
Copy link

poizon commented Dec 9, 2022

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.104.3-58b824581360148f2d91f5cc83f69bd22c1aa331+extended linux/amd64 BuildDate=2022-10-04T14:25:23Z VendorInfo=gohugoio

Does this issue reproduce with the latest release?

I found bug in default sitemap template (for multilingial sites)
After generate sitemap (i.e https://example.org/en/sitemap.xml) and open it in browser - you see xml as a plain text, because urlset declaration is not valid

Solution

You need add to sitemap.xml template in urlset tag this declarations:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.w3.org/TR/xhtml11/xhtml11_schema.html http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/TR/xhtml11/xhtml11_schema.html">

@jmooring
Copy link
Member

jmooring commented Dec 9, 2022

The problem is a 301 redirect with:

http://www.w3.org/1999/xhtml
wget http://www.w3.org/1999/xhtml

--2022-12-09 15:20:18-- http://www.w3.org/1999/xhtml
Resolving www.w3.org (www.w3.org)... 104.18.23.19, 104.18.22.19, 2606:4700::6812:1613, ...
Connecting to www.w3.org (www.w3.org)|104.18.23.19|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.w3.org/1999/xhtml [following]
--2022-12-09 15:20:18-- https://www.w3.org/1999/xhtml
Connecting to www.w3.org (www.w3.org)|104.18.23.19|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.w3.org/1999/xhtml/ [following]
--2022-12-09 15:20:18-- https://www.w3.org/1999/xhtml/
Reusing existing connection to www.w3.org:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘xhtml’

@idarek
Copy link

idarek commented Dec 11, 2022

The default sitemap template for sitemap.xml in Hugo contain

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">

Changing it just to

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/TR/xhtml11/xhtml11_schema.html">

adding the way of displaying sitemaps in subfolders exactly like the root one.


As mentioned here:
https://stackoverflow.com/questions/16798979/xsd-for-sitemap-with-hreflang

but then Google Search Console will start complaining about Incorrect namespace.

Firstly it's better to think if we need to fix that. I don't think we need it. Google search engines are reading these files, not users. I understand that users may want to preview that, but overall I am sticking with defaults.

@jmooring
Copy link
Member

@idarek The PR I submitted (#10516) fixes this trivial redirect problem, and adheres to both the sitemap protocol and Google's recommendations for multilingual sites.

bep pushed a commit that referenced this issue Dec 11, 2022
@idarek
Copy link

idarek commented Dec 17, 2022

Works locally. Fails in production (Cloudflare Pages).

Work on both on my end. I copied templates from Hugo GitHub repo and created in my layout all works fine now.

@McShelby
Copy link

I also commented on the changeset:

A XML namespace is usually not meant for browsing. The expected value should be using a http:// prefix as mentioned in the docs.

Switching to https:// may result in unexpected behaviour (eg. tools that consume those XML files may have trouble with this).

@jmooring jmooring reopened this Dec 22, 2022
@jmooring
Copy link
Member

@McShelby

I agree. Given that the host + path is unique, I incorrectly assumed that the protocol was irrelevant. Looking at the spec, it is indeed a full string comparison.

John Mueller of Google, in this thread, states:

Yes, it's essentially an identifier. Google accepts both, since people tend to use them interchangeably nowadays.

But just because Google does, it does not mean that everyone else does. I will revert 3fd0b78.

In the future, please open a new issue instead of commenting on a commit or a closed issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@poizon @idarek @jmooring @McShelby and others