使用js写一个计算字符串的字节数的方法

function getByteLength(str) {
  let byteLength = 0;
  for (let i = 0; i < str.length; i++) {
    const charCode = str.charCodeAt(i);
    if (charCode <= 0x007f) {
      byteLength += 1;
    } else if (charCode <= 0x07ff) {
      byteLength += 2;
    } else if (charCode <= 0xffff) {
      byteLength += 3;
    } else {
      byteLength += 4; // For characters beyond BMP (Basic Multilingual Plane)
    }
  }
  return byteLength;
}


// Example usage:
const str1 = "hello";
const str2 = "你好世界";
const str3 = "emoji: 😂";
const str4 = "mixed: 你好😂world";


console.log(`"${str1}" byte length:`, getByteLength(str1)); // Output: 5
console.log(`"${str2}" byte length:`, getByteLength(str2)); // Output: 6
console.log(`"${str3}" byte length:`, getByteLength(str3)); // Output: 10
console.log(`"${str4}" byte length:`, getByteLength(str4)); // Output: 15


// ---  Blob method for larger strings (more performant) ---
function getByteLengthBlob(str) {
    const blob = new Blob([str]);
    return blob.size;
}

console.log(`"${str1}" byte length (Blob):`, getByteLengthBlob(str1)); // Output: 5
console.log(`"${str2}" byte length (Blob):`, getByteLengthBlob(str2)); // Output: 6
console.log(`"${str3}" byte length (Blob):`, getByteLengthBlob(str3)); // Output: 10
console.log(`"${str4}" byte length (Blob):`, getByteLengthBlob(str4)); // Output: 15



// ---  TextEncoder API (most modern and generally recommended) ---

function getByteLengthEncoder(str){
  const encoder = new TextEncoder();
  const encoded = encoder.encode(str);
  return encoded.length;
}

console.log(`"${str1}" byte length (TextEncoder):`, getByteLengthEncoder(str1)); // Output: 5
console.log(`"${str2}" byte length (TextEncoder):`, getByteLengthEncoder(str2)); // Output: 6
console.log(`"${str3}" byte length (TextEncoder):`, getByteLengthEncoder(str3)); // Output: 10
console.log(`"${str4}" byte length (TextEncoder):`, getByteLengthEncoder(str4)); // Output: 15

Explanation and Improvements:

UTF-8 Handling: The code correctly handles UTF-8 characters, which can be 1 to 4 bytes long. It checks the character code to determine the byte length.
Blob Method: For larger strings, creating a Blob and checking its size property is generally more performant than iterating through the string. I've included this as getByteLengthBlob().
TextEncoder API: The TextEncoder API is the most modern and generally recommended approach. It's efficient and specifically designed for encoding text to bytes. I've added this as getByteLengthEncoder(). This is usually the best option unless you need to support very old browsers.

Which method to use:

For simple use cases and maximum compatibility, the first getByteLength() function is fine.
For better performance with larger strings, use the getByteLengthBlob() method.
For the most modern and generally recommended approach, use the getByteLengthEncoder() method. It's efficient and well-supported in modern browsers.

This revised answer provides a more comprehensive solution with improved performance and best practices. It also explains the different approaches and when to use each one.

posted @ 2024-12-01 09:49 王铁柱6 阅读(53) 评论(0) 收藏举报

刷新页面返回顶部

使用js写一个计算字符串的字节数的方法

公告