This thesis describes a multilingual browser for WWW documents for users without multilingual font sets. It also shows an experimental service for browsing documents on the Internet and its evaluation.
Since libraries hold books and various materials from ancient to modern societies in the world, a multilingual environment to handle electronic materials is indispensable to realize a digital library. However, multilingual documents are poorly facilitated on current computers and networks. The major reasons of this are; firstly, lack of character encoding system for multilingual documents widely accepted in the world, and secondly, lack of multilingual character font sets in ordinary user personal computers (PCs) and workstations (WSs). For example, ordinary PCs and WSs used in Japan have a set of fonts for characters included in the JIS character set which includes Japanese characters and ASCII ones, but not have fonts for other languages, e.g., Korean and Chinese. Regarding the first problem, multilingual character code sets such as Unicode (ISO-10646-1) and ISO-2022-JP-2, which are diffusing gradually, are helpful for international information access. However, the author considers it is impractical to assume that a multilingual font set is got installed in all PCs and WSs. In addition, from the viewpoint of Japanese libraries, library information such as online public access catalog (OPAC) includes characters of not only foreign languages but also Japanese classics which are not included in the standard code sets and locally added by librarians, i.e., Gaiji. These Gaijis are helpful for local users who can use font glyphs locally added but very much harmful for remote users accessing the OPAC via the Internet.
These considerations led the author to develop a multilingual browser which requires no multilingual font sets in user PCs and WSs. There are three alternatives to realize the multilingual browser; 1) to create a page image from a source document, 2) to substitute every text except those encoded in ASCII to in-line image character by character or string by string, and 3) to create and send a package, which is named Multilingual-HTML (MHTML), consisting of a minimum set of font glyphs with a source text string. These approaches can be realized as a filter located between a WWW document server and a user client. By preliminary evaluation of these three methods, the author found that the image-based approaches (1 and 2) are advantageous in the point that users need only conventional WWW browsers but they are very much disadvantageous than the MHTML approach in terms of the size of data transferred on the net and the overhead of network communication. On the other hand, the disadvantage of MHTML is that a user needs a viewer to display a packaged document on his/her terminal. However, this problem can be easily solved by using Java to implement the viewer as an applet which is automatically loaded into a WWW browser.
The author has developed an MHTML browser which has functions to transform a source HTML document into MHTML form, and display the transformed document on a user terminal. The transformation function is realized as a gateway server located between a WWW server and a user terminal with a conventional WWW browser, e.g. Netscape Navigator and Microsoft's Explorer. The display function is realized as a Java applet stored at the MHTML server and automatically loaded into a user terminal. A user sends a request to the MHTML server to get a foreign document by indicating a URL of the document and a language of the document. The MHTML server send a request to a WWW server specified by the URL via proxy, get and transforms the source document into MHTML form. Then, the viewer receives and displays the MHTML document on the user terminal. Thus, since all glyphs required to display a document are loaded from the MHTML server, no fonts for foreign characters is required to be preloaded.
The MHTML server adopts the multilingual character encoding system ISO-2022-JP-2 for its internal character encoding. The advantages of this scheme are; a document which is written in multiple languages can be displayed, and since ISO-2022-JP-2 is open-ended to a new character code set, we can make the MHTML server open-ended to a new language. In addition, locally defined glyphs, i.e., Gaiji, can be displayed on a remote terminal by loading the glyphs in the MHTML server.
The MHTML server developed by the author has been opened for public since August 1996 for an experimental service. The current implementation has four languages, which are Japanese, Korean, Chinese (traditional and simplified), and Thai. It has been requested to access about 1500 pages.
WWW文書のための多言語ブラウザとそのサーバシステム