Conversion de fichier en UTF-8 en UTF-16

Un programme en C ++ doit lire un fichier encodé en utf-8. Malheureusement, en utilisant char *, il ne peut pas obtenir de caractères étendus (♥ ♥ ♦ • ◘, etc.) et wchar_t * les interprète de manière incorrecte. Mon algorithme pour le gérer est:

1) Créer un nouveau fichier

2) Nommez-le à [nom original] Utf-16

3) Copier le fichier original sur un nouveau, en effectuant une conversion simultanément

4) Extraire des données.

5) Supprimez ce fichier temporaire lorsqu’il n’est plus nécessaire.

Je suis coincé à 3), y a-t-il quelque part une fonction comme “FileUTF8toUTF16”?

C’est ce que j’utilise

 int nLenWide = MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)(pData + nOffset), (int)(nDataLen - nOffset), NULL, 0); if (MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)(pData + nOffset), (int)(nDataLen - nOffset), str.GetBuffer(nLenWide), nLenWide) != nLenWide) { str.ReleaseBuffer(0); ASSERT(false); return str; } str.ReleaseBuffer(nLenWide); return str; 

Dans lequel pData est un pointeur BYTE sur les données utf-8, nOffset est généralement 3 (la nomenclature).

Cela vous aidera peut-être: Conversion entre Unicode UTF-16 et UTF-8 en C ++ / Win32 | C ++

Code:

 ////////////////////////////////////////////////////////////////////////////// // // *** Routines to convert between Unicode UTF-8 and Unicode UTF-16 *** // // By Giovanni Dicanio  // // Last update: 2010, January 2nd // // // These routines use ::MultiByteToWideChar and ::WideCharToMultiByte // Win32 API functions to convert between Unicode UTF-8 and UTF-16. // // UTF-16 ssortingngs are stored in instances of CSsortingngW. // UTF-8 ssortingngs are stored in instances of CSsortingngA. // // On error, the conversion routines use AtlThrow to signal the // error condition. // // If input ssortingng pointers are NULL, empty ssortingngs are returned. // // // Prefixes used in these routines: // -------------------------------- // // - cch : count of characters (CHAR's or WCHAR's) // - cb : count of bytes // - psz : pointer to a NUL-terminated ssortingng (CHAR* or WCHAR*) // - str : instance of CSsortingng(A/W) class // // // // Useful Web References: // ---------------------- // // WideCharToMultiByte Function // http://msdn.microsoft.com/en-us/library/dd374130.aspx // // MultiByteToWideChar Function // http://msdn.microsoft.com/en-us/library/dd319072.aspx // // AtlThrow // http://msdn.microsoft.com/en-us/library/z325eyx0.aspx // // // Developed on VC9 (Visual Studio 2008 SP1) // // ////////////////////////////////////////////////////////////////////////////// namespace UTF8Util { //---------------------------------------------------------------------------- // FUNCTION: ConvertUTF8ToUTF16 // DESC: Converts Unicode UTF-8 text to Unicode UTF-16 (Windows default). //---------------------------------------------------------------------------- CSsortingngW ConvertUTF8ToUTF16( __in const CHAR * pszTextUTF8 ) { // // Special case of NULL or empty input ssortingng // if ( (pszTextUTF8 == NULL) || (*pszTextUTF8 == '\0') ) { // Return empty ssortingng return L""; } // // Consider CHAR's count corresponding to total input ssortingng length, // including end-of-ssortingng (\0) character // const size_t cchUTF8Max = INT_MAX - 1; size_t cchUTF8; HRESULT hr = ::SsortingngCchLengthA( pszTextUTF8, cchUTF8Max, &cchUTF8 ); if ( FAILED( hr ) ) { AtlThrow( hr ); } // Consider also terminating \0 ++cchUTF8; // Convert to 'int' for use with MultiByteToWideChar API int cbUTF8 = static_cast( cchUTF8 ); // // Get size of destination UTF-16 buffer, in WCHAR's // int cchUTF16 = ::MultiByteToWideChar( CP_UTF8, // convert from UTF-8 MB_ERR_INVALID_CHARS, // error on invalid chars pszTextUTF8, // source UTF-8 ssortingng cbUTF8, // total length of source UTF-8 ssortingng, // in CHAR's (= bytes), including end-of-ssortingng \0 NULL, // unused - no conversion done in this step 0 // request size of destination buffer, in WCHAR's ); ATLASSERT( cchUTF16 != 0 ); if ( cchUTF16 == 0 ) { AtlThrowLastWin32(); } // // Allocate destination buffer to store UTF-16 ssortingng // CSsortingngW strUTF16; WCHAR * pszUTF16 = strUTF16.GetBuffer( cchUTF16 ); // // Do the conversion from UTF-8 to UTF-16 // int result = ::MultiByteToWideChar( CP_UTF8, // convert from UTF-8 MB_ERR_INVALID_CHARS, // error on invalid chars pszTextUTF8, // source UTF-8 ssortingng cbUTF8, // total length of source UTF-8 ssortingng, // in CHAR's (= bytes), including end-of-ssortingng \0 pszUTF16, // destination buffer cchUTF16 // size of destination buffer, in WCHAR's ); ATLASSERT( result != 0 ); if ( result == 0 ) { AtlThrowLastWin32(); } // Release internal CSsortingng buffer strUTF16.ReleaseBuffer(); // Return resulting UTF16 ssortingng return strUTF16; } //---------------------------------------------------------------------------- // FUNCTION: ConvertUTF16ToUTF8 // DESC: Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8. //---------------------------------------------------------------------------- CSsortingngA ConvertUTF16ToUTF8( __in const WCHAR * pszTextUTF16 ) { // // Special case of NULL or empty input ssortingng // if ( (pszTextUTF16 == NULL) || (*pszTextUTF16 == L'\0') ) { // Return empty ssortingng return ""; } // // Consider WCHAR's count corresponding to total input ssortingng length, // including end-of-ssortingng (L'\0') character. // const size_t cchUTF16Max = INT_MAX - 1; size_t cchUTF16; HRESULT hr = ::SsortingngCchLengthW( pszTextUTF16, cchUTF16Max, &cchUTF16 ); if ( FAILED( hr ) ) { AtlThrow( hr ); } // Consider also terminating \0 ++cchUTF16; // // WC_ERR_INVALID_CHARS flag is set to fail if invalid input character // is encountered. // This flag is supported on Windows Vista and later. // Don't use it on Windows XP and previous. // #if (WINVER >= 0x0600) DWORD dwConversionFlags = WC_ERR_INVALID_CHARS; #else DWORD dwConversionFlags = 0; #endif // // Get size of destination UTF-8 buffer, in CHAR's (= bytes) // int cbUTF8 = ::WideCharToMultiByte( CP_UTF8, // convert to UTF-8 dwConversionFlags, // specify conversion behavior pszTextUTF16, // source UTF-16 ssortingng static_cast( cchUTF16 ), // total source ssortingng length, in WCHAR's, // including end-of-ssortingng \0 NULL, // unused - no conversion required in this step 0, // request buffer size NULL, NULL // unused ); ATLASSERT( cbUTF8 != 0 ); if ( cbUTF8 == 0 ) { AtlThrowLastWin32(); } // // Allocate destination buffer for UTF-8 ssortingng // CSsortingngA strUTF8; int cchUTF8 = cbUTF8; // sizeof(CHAR) = 1 byte CHAR * pszUTF8 = strUTF8.GetBuffer( cchUTF8 ); // // Do the conversion from UTF-16 to UTF-8 // int result = ::WideCharToMultiByte( CP_UTF8, // convert to UTF-8 dwConversionFlags, // specify conversion behavior pszTextUTF16, // source UTF-16 ssortingng static_cast( cchUTF16 ), // total source ssortingng length, in WCHAR's, // including end-of-ssortingng \0 pszUTF8, // destination buffer cbUTF8, // destination buffer size, in bytes NULL, NULL // unused ); ATLASSERT( result != 0 ); if ( result == 0 ) { AtlThrowLastWin32(); } // Release internal CSsortingng buffer strUTF8.ReleaseBuffer(); // Return resulting UTF-8 ssortingng return strUTF8; } } // namespace UTF8Util //////////////////////////////////////////////////////////////////////////////