Kaveh Ranjbar

Internationalisation of the RIPE Database Content

Kaveh Ranjbar
Contributors: Denis Walker
2

At RIPE 63, the RIPE NCC was asked to consider possible options for Internationalisation of the RIPE Database content. This article provides a quick overview of the current situation as well as possible developments and challenges.


Background

The RIPE NCC provides services to members from 76 countries. The RIPE Database is a public and open database used by the Internet community from the RIPE NCC service region and beyond.

The RIPE Database design documentation mentions ASCII as the chosen character set for the RIPE Database content. Currently, most of the data is in US-ASCII English characters.

However, the wide user community of the RIPE Database use many different character sets and in some cases, such as person names or addresses, restricting user input to US-ASCII results in inaccurate data. On the other hand, allowing non US-ASCII characters has some challenges, for example it might render the data unusable for users who query it, because they might not be able to read the character set used for the data.

Current situation

Right now there are no policies regarding the use of non-latin characters in the RIPE Database. From a technical point of view, the core storage of the RIPE Database stores data in bytes and is agnostic to encoding. This means users could choose their own encoding when entering data into the RIPE Database. However, this has not been tested and could generate unexpected results. To analyse the current behaviour we can divide user interaction with the RIPE Database in two main sections: Database Updates and Queries.

RIPE Database updates

For email updates, the message is decoded based on the encoding used for the email. If no encoding is found, US-ASCII is the preferred encoding. If the email is encoded with UTF-8 the data will be stored as UTF-8.

For Syncupdates, the connection is HTTPS and our servers offer UTF-8 as well as ISO-8859-1, but prefers UTF-8. So if the client supports UTF-8, the data will be stored as UTF-8. The same situation applies for web based updates. Our web based update tools (Webupdates and Quick updates) all use HTTPS and our server offers and prefers UTF-8 and if that's not available then ISO-8859-1 is selected.

The RIPE Database Update API also uses HTTPS and behaves exactly the same as the Syncupdates.

On the data entry level, many attributes, including all object primary keys, restrict the syntax to a very specific set of ASCII characters. Attributes that allow free form text input, like address and description, would not reject non ASCII characters during the syntax checking process. But as this has not been tested we cannot guarantee that the data finally entered into the database is what you expected.

RIPE Database queries

Command line queries, sent through the CLI return the results as they are stored in the database. Since port 43 queries are based on RAW TCP connections, raw data from RIPE Database storage is sent to the user's terminal and interpretation of data is totally dependent on the behaviour of the user's terminal. If the terminal supports UTF-8, then the user can see any data that is stored in UTF-8.

Web based queries and API based queries behave in the same way. Again our webserver offers UTF-8 and ISO-8859-1 and prefers UTF-8. The data is then presented to the user as it is stored in the RIPE Database.

Challenges

For the current technical situation, the update software needs significant end to end testing using different UTF-8 characters. Depending on the test results, changes may be necessary. From a policy point of view, nothing is set. Most of the current data is in Latin characters but there is nothing limiting or enforcing a user to enter their address in their local script. This means, at the moment, a user in Iran, can choose to use Farsi to enter their organisation address. There is no policy to prohibit nor to encourage this action.

The benefit is that the address will be more accurate and it will be more relevant for local users. The downside, however, is that the whole address field might be unreadable to any user who cannot read Farsi. Most users will not even know which city the address is registered in.

Possible Changes

Looking at similar implementations, the current data set should probably be restricted to US-ASCII characters. This means that all current object attributes would only accept US-ASCII characters. An additional set of optional attributes (mainly concerned with contact and locally relevant information such as: name, address, city and description) could be made available in some objects to duplicate the standard information. These may be identified with a suffix on the attribute name, for example "address-local-lang:".

These additional optional attributes should only be allowed in an instance of an object if the original attribute is present. They provide local language versions of existing information rather than replacing existing information. So users will have the option to provide information in their local language, but they always have to provide the information in English for an international audience. It should also be made clear that in case of any dispute or question, the English data is the authoritative registered information. The local language information is just a complementary optional dataset.

2

You may also like

View more

About the author

Kaveh Ranjbar Based in Amsterdam

As Chief Information Officer at the RIPE NCC, I am mainly involved with the planning, operation and development of the RIPE NCC's global information services as well as research and development. This includes the RIPE NCC's authoritative DNS services as well as K-root infrastructure, data collection and measurement networks such as RIPE Atlas and RIS, data provisioning systems such as RIPEstat, and the RIPE NCC's data analysis efforts. I have been with the RIPE NCC since 2008 working in different capacities. Before the RIPE NCC, I worked for more than 12 years in the Internet services and ISP sectors, mostly in senior technical management positions. I was the engineering founder of one of the largest Iranian ISPs and helped several IT startups with their software and business process implementations. I have a M.Sc. in Software Engineering from the University of Oxford, UK and Lean Engineering/Agile Management training at MIT.

Comments 2