Jonathan Fung

Latest Pages

  • Refinement Types

    Jun 03, 2025

  • Searching Annotations in Zotero

    May 29, 2025

  • Tables in Emacs

    May 28, 2025

      • A Simple Way to Run Project Commands in Emacs
      • Abstract Data Types vs Data Structures
      • Barendregt Convention
      • Building the Hylo Compiler
      • Busqer
      • Bypassing Spotify Desktop's Minimum Window Size
      • Curry-Howard Correspondence
      • Emacs Charsets
      • Equality in Programming Languages
      • Fuzzing vs Property Testing
      • Hashing in Emacs
      • Hashing Modulo Alpha-Equivalence
      • Haskell Curry
      • Herman Miller Eames Lounge Chair
      • History of Crystallography
      • How to get RSS feed of a YouTube Channel
      • Hyphenation
      • Kpop Vocabulary
      • Link Dump
      • MapleStory
      • OCaml
      • Pick a Color, any Color
      • Programming Languages Vocabulary
      • Refinement Types
      • Searching Annotations in Zotero
      • Sequent Calculus
      • Setting up Better BibTex in Zotero
      • Skin in the Game
      • Standards
      • Tables in Emacs
      • Tagged Pointer
      • Tail Recursion
      • The Metaphysics of Nothing - in Programming Languages
      • TRMNL
      • UTF-8 vs UTF-16 for CJK Data
      • Ways to Export Denote to HTML
      • Ways to Export Org to HTML
      • What happens after Unicode Version 255?
      • Why You Should Write
      • Zipper
    Home

    ❯

    UTF-8 vs UTF-16 for CJK Data

    UTF-8 vs UTF-16 for CJK Data

    May 14, 20251 min read

    • unicode
    • Character encoding in corpus construction (2004) - Anthony McEnery, Zhonghua Xiao
      • Page 10: “While UTF-32 is wasteful of memory and disk space for all languages, UTF-16 also doubles the size of a file containing single-byte characters (such as English), though for CJK languages that have already used 2-byte encodings traditionally, the file size remains more or less the same.”

    Graph View

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community