Assess your systems for Unicode-readiness

Last updated on July 9, 2024

This page provides resources and guidance for developers working with Indigenous languages to align their systems with the Indigenous Languages Technology Standard.

Unicode-readiness

To support Indigenous languages, systems need to support many characters. Right now, many computer systems can only support limited characters from the Latin script:

The letters A-Z
Numeric digits
Limited punctuation

The Unicode Standard is the only character set that can support all writing systems in the world. Unicode can encode over one million characters, including those used by Indigenous languages. This allows it to support all the world’s languages and scripts in a single, universal standard. Unicode expands to support more characters as they become used in the world’s languages.

Unicode-ready

When all applicable components of a system support Unicode, the system is Unicode-ready. When a system is Unicode-ready any Unicode characters, including those of Indigenous languages can be:

Read
Written
Stored
Processed
Displayed

If any component of a system can’t handle Unicode, data gets corrupted in translation and the system breaks. This is why it is important to consider the whole system when assessing Unicode-readiness.

Why a system needs to be Unicode-ready

The Unicode character set enables systems to use B.C.’s inclusive font (BC Sans). It can display text from multiple languages including every character used in Indigenous languages in B.C.

Many systems currently use the American Standard Code for Information Interchange (ASCII) character set or a limited extended version like:

ISO-8859-1 ( Latin1)
Windows-1252 (Western European)

These character sets use one byte of storage for encoding. Unlike these character sets, Unicode supports many encoding methods. The one best suited to Indigenous languages is UTF-8.

Unicode characters may use more than one byte when encoded as UTF-8. The systems architecture needs to be tested to ensure it correctly handles multi-byte characters.

How to assess a system for Unicode-readiness

To assess whether a system is ready for Unicode, take the following approach:

Step 1: Learn foundational concepts about languages in IM/IT systems

Reviewing important terminology and the history of language to make it easier to complete your Unicode-readiness assessment

Step 2: Review system components

Identify the technologies used in your systems to create a gap assessment for Unicode-readiness

If your system components don't support Unicode learn how to update or replace your system

Step 3: Identify problematic system operations

Consider the text operations your system performs. For example, does it check for specific characters or calculate a text length

Step 4: Test dataflows and evaluate data exchanges

Examine how data moves through your system, from entry, processing, storage, and output. Identify the other systems your system communicates with and check if they can use Indigenous languages

Developer resources

We've set up a DevHub repository to assist in adapting existing systems or creating new ones compatible with Unicode. The site contains resources to support assessment, testing and other developer activities. Resources include:

Learn IM/IT terminology and concepts for Unicode-readiness

Learn the foundational IM/IT terminology and concepts needed to assess your system for Unicode-readiness.

Did you find what you were looking for?

The B.C. Public Service acknowledges the territories of First Nations around B.C. and is grateful to carry out our work on these lands. We acknowledge the rights, interests, priorities, and concerns of all Indigenous Peoples - First Nations, Métis, and Inuit - respecting and acknowledging their distinct cultures, histories, rights, laws, and governments.

More topics