Assess your systems for Unicode-readiness

Last updated on June 21, 2024

This page provides resources and guidance for developers working with Indigenous languages to align their systems with the Indigenous Languages Technology Standard.


On this page:


Unicode-readiness

To support Indigenous languages, systems need to support many characters. Right now, many computer systems can only support limited characters from the Latin script:

  • The letters A-Z
  • Numeric digits
  • Limited punctuation

The Unicode Standard is the only character set that can support all writing systems in the world. Unicode can encode over one million characters, including those used by Indigenous languages. This allows it to support all the world’s languages and scripts in a single, universal standard. Unicode expands to support more characters as they become used in the world’s languages. 

Unicode-ready

When all applicable components of a system support Unicode, the system is Unicode-ready. When a system is Unicode-ready any Unicode characters, including those of Indigenous languages can be:

  • Read
  • Written 
  • Stored
  • Processed
  • Displayed

If any component of a system can’t handle Unicode, data gets corrupted in translation and the system breaks.  This is why it is important to consider the whole system when assessing Unicode-readiness.


Why a system needs to be Unicode-ready

The Unicode character set enables systems to use B.C.’s inclusive font (BC Sans). It can display text from multiple languages including every character used in Indigenous languages in B.C.

Many systems currently use the American Standard Code for Information Interchange (ASCII) character set or a limited extended version like:

  •  ISO-8859-1 ( Latin1)
  •  Windows-1252 (Western European)

These character sets use one byte of storage for encoding. Unlike these character sets, Unicode supports many encoding methods. The one best suited to Indigenous languages is UTF-8

Unicode characters may use more than one byte when encoded as UTF-8.  The systems architecture needs to be tested to ensure it correctly handles multi-byte characters.


How to assess a system for Unicode-readiness

To assess whether a system is ready for Unicode, take the following approach: 

Step 1: Learn foundational concepts about languages in IM/IT systems 

Reviewing important terminology and the history of language to make it easier to complete your Unicode-readiness assessment

Step 2: Review system components

Identify the technologies used in your systems to create a gap assessment for Unicode-readiness

Step 3: Identify problematic system operations

Consider the text operations your system performs. For example, does it check for specific characters or calculate a text length

Step 4: Test dataflows and evaluate data exchanges 

Examine how data moves through your system, from entry, processing, storage, and output. Identify the other systems your system communicates with and check if they can use Indigenous languages


Developer resources

We've set up a DevHub repository to assist in adapting existing systems or creating new ones compatible with Unicode. The site contains resources to support assessment, testing and other developer activities. Resources include:


Learn IM/IT terminology and concepts for Unicode-readiness

Learn the foundational IM/IT terminology and concepts needed to assess your system for Unicode-readiness.