Click here to edit contents of this page.
Click here to toggle editing of individual sections of the page (if possible). Watch headings for an "edit" link when available.
Append content without editing the whole page source.
Check out how this page has evolved in the past.
If you want to discuss contents of this page - this is the easiest way to do it.
View and manage file attachments for this page.
A few useful tools to manage this Site.
See pages that link to and include this page.
Change the name (also URL address, possibly the category) of the page.
View wiki source for this page without editing.
View/set parent page (used for creating breadcrumbs and structured layout).
Notify administrators if there is objectionable content in this page.
Something does not work as expected? Find out what you can do.
General Wikidot.com documentation and help section.
Wikidot.com Terms of Service - what you can, what you should not etc.
Wikidot.com Privacy Policy.
Thank you for the report. We only support automatic conversion from "extended" Latin characters like Polish ą, ś, ć, German ä, ü, Nordic special characters and so on. When it comes to other alphabets like Cyrillic, Greek or Chinese ones you simply need to make your own "english" names for pages. Or you can use auto-numbering feature for certain categories.
After all I'm not sure the characters in non-Latin alphabets are easily mappable to Latin ones. If there is some standard way of doing it, please point us to the reference and we'll consider adding support for mapping characters in those alphabets to Latin in the URLs.
Piotr Gabryjeluk
visit my blog
Mapping Greek / Chinese / Japanese characters to Latin equivalents does not make sense in my opinion. I remember at least a few more people complaining about this issue before. Although this is not necessarily a bug, it definitely is a serious disadvantage for users who write in non-Latin charsets.
An idea just crossed my mind:
- if page name contains only Latin characters, or all non-ASCII characters can be mapped to Latin ones, we make the conversion as we do it today, e.g.
Dog -> dog
dojść -> dojsc
- if there are characters that cannot be mapped to ASCII charset, we calculate the page name using a hashing function, e.g.
ἀγαθή -> hashing(ἀγαθή) -> b2f14ec5337be003d0a9aa6c7542aa73
This way we can still consistently produce unique page names without adding extra complexity.
Note that the the conversion we are discussing takes place in many places in Wikidot, e.g.
- creating pages
- making links to pages
- resolving page addresses
Any ideas if you think this is a solution worth exploring?
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
Isn't there an easier way to do this, and less horrible too.
Try this URL:
That's a 7-bit ASCII encoding of a UTF-8 page name. Note what happens when the page is displayed. It seems that the address bar is interpreted using the same charset as the page content. I've looked for a few hours now, and I can't find this behaviour codified, but I'm fairly sure that is what happens. The %-encoded form is displayed until the page loads, and then it is decoded as UTF-8.
Unfortunately, something in the Wikidot path is preventing similar names getting through. Even if entered by hand. For instance if part of the above link is copied and pasted into the address-bar suffixing a wikidot address it magically disappears.
Is this filtering necessary? You are serving UTF-8 content, why is the server filtering out 8-bit data in the address?
It was one of the assumptions (in the beginning) to allow only [a-z0-9] characters in page addresses. Some people did appreciate it at the time, because it was eliminating ambiguities in page names. E.g. in MediaWiki (and Wikipedia) you can have pages called Wikidot, wikidot, WikiDot, and they are all different pages.
OK, we will talk about it internally.
Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me
I can see that making page names case-insensitive is a major win. Maybe you could go half-way and down-case only the latin characters (0-127).
Decoding and down-casing the Unicode is not something to get into, (see here) The 0-127 set of characters can be down-cased without regard to the UTF-8 decode, because multi-byte characters always have the top bit set on all component bytes.
I think that solution would be understood and accepted by users.
Another option would be to only down-case if there were no characters with the top-bit set.
Since no further answer was given by original author. I'm closing this bug.
Piotr Gabryjeluk
visit my blog