Language codes with variants

serettig's picture

Hi,

On translate-h5p.tk there's interest in working on the Chinese translation of H5P. While preparing for this, it became clear that H5P currently is pretty inconsistent, when it comes to the way Chinese variants are marked. This leads to a lot of duplication and confusion.

There currently are files with these language codes:

  • zh
  • zh-hans (= simplified Chinese characters)
  • zh-hant (= traditional Chinese characters)
  • zh-cn (= Chinese language variant, uses simplified characters)
  • zh-tw (= Taiwanese language variant, uses traditional characters)

It is reasonable to assume that zh-hans and zh-cn are duplicates of each other, just like zh-hant and zh-tw. This confusion needs to be cleared up to proceed with the translation effort. So I'd like ask Joubel to decide which variant should be used for H5P. I think it makes most sense to settle for zh-hans and zh-hant. The other files should be deleted and the language list in the core should be updated accordingly.

Another thing that arises here is how H5P implements fallback mechanisms for languages. So if a user has set zh-hans as their language, will it use zh as a fallback if no file exists for this variant? If this is not the case, then it would mean that all Chinese language files that only use "zh" must be duplicated as "zh-hans" again... This would be less than ideal. So would it make sense to remove all "zh" files and move them to "zh-hans" instead?

I'm not sure if a fallback to simplified Chinese would even make sense, as mixing simplified and traditional characters is not sensible.

Sebastian

BV52's picture

Hi Sebastian,

Thank you for bringing this up. I think we need more input from members of the community that are experts in the Chinese language. As per the core team the language code most likely came from Drupal.

-BV