UTF8
A port of phputf8 to a unified set of files. Provides multi-byte aware replacement string functions.
For UTF-8 support to work correctly, the following requirements must be met:
API - Core_UTF8
- UTF8::$called
- UTF8::clean - Recursively cleans arrays, objects, and strings. Removes ASCII control
- UTF8::is_ascii - Tests whether a string contains only 7-bit ASCII bytes. This is used to
- UTF8::strip_ascii_ctrl - Strips out device control codes in the ASCII range.
- UTF8::strip_non_ascii - Strips out all non-7bit ASCII bytes.
- UTF8::transliterate_to_ascii - Replaces special/accented UTF-8 characters by ASCII-7 "equivalents".
- UTF8::strlen - Returns the length of the given string. This is a UTF8-aware version
- UTF8::strpos - Finds position of first occurrence of a UTF-8 string. This is a
- UTF8::strrpos - Finds position of last occurrence of a char in a UTF-8 string. This is
- UTF8::substr - Returns part of a UTF-8 string. This is a UTF8-aware version
- UTF8::substr_replace - Replaces text within a portion of a UTF-8 string. This is a UTF8-aware
- UTF8::strtolower - Makes a UTF-8 string lowercase. This is a UTF8-aware version
- UTF8::strtoupper - Makes a UTF-8 string uppercase. This is a UTF8-aware version
- UTF8::ucfirst - Makes a UTF-8 string's first character uppercase. This is a UTF8-aware
- UTF8::ucwords - Makes the first character of every word in a UTF-8 string uppercase.
- UTF8::strcasecmp - Case-insensitive UTF-8 string comparison. This is a UTF8-aware version
- UTF8::str_ireplace - Returns a string or an array with all occurrences of search in subject
- UTF8::stristr - Case-insenstive UTF-8 version of strstr. Returns all of input string
- UTF8::strspn - Finds the length of the initial segment matching mask. This is a
- UTF8::strcspn - Finds the length of the initial segment not matching mask. This is a
- UTF8::str_pad - Pads a UTF-8 string to a certain length with another string. This is a
- UTF8::str_split - Converts a UTF-8 string to an array. This is a UTF8-aware version of
- UTF8::strrev - Reverses a UTF-8 string. This is a UTF8-aware version of [strrev](http://php.net/strrev).
- UTF8::trim - Strips whitespace (or other UTF-8 characters) from the beginning and
- UTF8::ltrim - Strips whitespace (or other UTF-8 characters) from the beginning of
- UTF8::rtrim - Strips whitespace (or other UTF-8 characters) from the end of a string.
- UTF8::ord - Returns the unicode ordinal for a character. This is a UTF8-aware
- UTF8::to_unicode - Takes an UTF-8 string and returns an array of ints representing the Unicode characters.
- UTF8::from_unicode - Takes an array of ints representing the Unicode characters and returns a UTF-8 string.
Recursively cleans arrays, objects, and strings. Removes ASCII control codes and converts to the requested charset while silently discarding incompatible characters.
UTF8::clean($_GET); // Clean GET data
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$var |
mixed |
Variable to clean | |
$charset |
string |
Character set, defaults to UTF-8 | string(5) "UTF-8" |
mixed
Tests whether a string contains only 7-bit ASCII bytes. This is used to determine when to use native functions or UTF-8 functions.
$ascii = UTF8::is_ascii($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
mixed |
String or array of strings to check |
boolean
Strips out device control codes in the ASCII range.
$str = UTF8::strip_ascii_ctrl($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
String to clean |
string
Strips out all non-7bit ASCII bytes.
$str = UTF8::strip_non_ascii($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
String to clean |
string
Replaces special/accented UTF-8 characters by ASCII-7 "equivalents".
$ascii = UTF8::transliterate_to_ascii($utf8);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
String to transliterate | |
$case |
integer |
-1 lowercase only, +1 uppercase only, 0 both cases | integer 0 |
string
Returns the length of the given string. This is a UTF8-aware version of strlen.
$length = UTF8::strlen($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
String being measured for length |
integer
Finds position of first occurrence of a UTF-8 string. This is a UTF8-aware version of strpos.
$position = UTF8::strpos($str, $search);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Haystack | |
$search |
string |
Needle | |
$offset |
integer |
Offset from which character in haystack to start searching | integer 0 |
integer
position of needleboolean
FALSE if the needle is not foundFinds position of last occurrence of a char in a UTF-8 string. This is a UTF8-aware version of strrpos.
$position = UTF8::strrpos($str, $search);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Haystack | |
$search |
string |
Needle | |
$offset |
integer |
Offset from which character in haystack to start searching | integer 0 |
integer
position of needleboolean
FALSE if the needle is not foundReturns part of a UTF-8 string. This is a UTF8-aware version of substr.
$sub = UTF8::substr($str, $offset);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$offset |
integer |
Offset | |
$length |
integer |
Length limit | null |
string
Replaces text within a portion of a UTF-8 string. This is a UTF8-aware version of substr_replace.
$str = UTF8::substr_replace($str, $replacement, $offset);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$replacement |
string |
Replacement string | |
$offset |
integer |
Offset | |
$length |
unknown |
null |
string
Makes a UTF-8 string lowercase. This is a UTF8-aware version of strtolower.
$str = UTF8::strtolower($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Mixed case string |
string
Makes a UTF-8 string uppercase. This is a UTF8-aware version of strtoupper.
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Mixed case string |
string
Makes a UTF-8 string's first character uppercase. This is a UTF8-aware version of ucfirst.
$str = UTF8::ucfirst($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Mixed case string |
string
Makes the first character of every word in a UTF-8 string uppercase. This is a UTF8-aware version of ucwords.
$str = UTF8::ucwords($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Mixed case string |
string
Case-insensitive UTF-8 string comparison. This is a UTF8-aware version of strcasecmp.
$compare = UTF8::strcasecmp($str1, $str2);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str1 |
string |
String to compare | |
$str2 |
string |
String to compare |
integer
less than 0 if str1 is less than str2integer
greater than 0 if str1 is greater than str2integer
0 if they are equalReturns a string or an array with all occurrences of search in subject (ignoring case) and replaced with the given replace value. This is a UTF8-aware version of str_ireplace.
using it when possible.
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$search |
string|array |
Text to replace | |
$replace |
string|array |
Replacement text | |
$str |
string|array |
Subject text | |
$count |
integer |
Number of matched and replaced needles will be returned via this parameter which is passed by reference | null |
string
if the input was a stringarray
if the input was an arrayCase-insenstive UTF-8 version of strstr. Returns all of input string from the first occurrence of needle to the end. This is a UTF8-aware version of stristr.
$found = UTF8::stristr($str, $search);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$search |
string |
Needle |
string
matched substring if foundFALSE
if the substring was not foundFinds the length of the initial segment matching mask. This is a UTF8-aware version of strspn.
$found = UTF8::strspn($str, $mask);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$mask |
string |
Mask for search | |
$offset |
integer |
Start position of the string to examine | null |
$length |
integer |
Length of the string to examine | null |
integer
length of the initial segment that contains characters in the maskFinds the length of the initial segment not matching mask. This is a UTF8-aware version of strcspn.
$found = UTF8::strcspn($str, $mask);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$mask |
string |
Mask for search | |
$offset |
integer |
Start position of the string to examine | null |
$length |
integer |
Length of the string to examine | null |
integer
length of the initial segment that contains characters not in the maskPads a UTF-8 string to a certain length with another string. This is a UTF8-aware version of str_pad.
$str = UTF8::str_pad($str, $length);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$final_str_length |
integer |
Desired string length after padding | |
$pad_str |
string |
String to use as padding | string(1) " " |
$pad_type |
string |
Padding type: STR_PAD_RIGHT, STR_PAD_LEFT, or STR_PAD_BOTH | integer 1 |
string
Converts a UTF-8 string to an array. This is a UTF8-aware version of str_split.
$array = UTF8::str_split($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$split_length |
integer |
Maximum length of each chunk | integer 1 |
array
Reverses a UTF-8 string. This is a UTF8-aware version of strrev.
$str = UTF8::strrev($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
String to be reversed |
string
Strips whitespace (or other UTF-8 characters) from the beginning and end of a string. This is a UTF8-aware version of trim.
$str = UTF8::trim($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$charlist |
string |
String of characters to remove | null |
string
Strips whitespace (or other UTF-8 characters) from the beginning of a string. This is a UTF8-aware version of ltrim.
$str = UTF8::ltrim($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$charlist |
string |
String of characters to remove | null |
string
Strips whitespace (or other UTF-8 characters) from the end of a string. This is a UTF8-aware version of rtrim.
$str = UTF8::rtrim($str);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
Input string | |
$charlist |
string |
String of characters to remove | null |
string
Returns the unicode ordinal for a character. This is a UTF8-aware version of ord.
$digit = UTF8::ord($character);
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$chr |
string |
UTF-8 encoded character |
integer
Takes an UTF-8 string and returns an array of ints representing the Unicode characters. Astral planes are supported i.e. the ints in the output can be > 0xFFFF. Occurrences of the BOM are ignored. Surrogates are not allowed.
$array = UTF8::to_unicode($str);
The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. Ported to PHP by Henri Sivonen hsivonen@iki.fi, see http://hsivonen.iki.fi/php-utf8/ Slight modifications to fit with phputf8 library by Harry Fuecks hfuecks@gmail.com
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$str |
string |
UTF-8 encoded string |
array
unicode code pointsFALSE
if the string is invalidTakes an array of ints representing the Unicode characters and returns a UTF-8 string. Astral planes are supported i.e. the ints in the input can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates are not allowed.
$str = UTF8::to_unicode($array);
The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. Ported to PHP by Henri Sivonen hsivonen@iki.fi, see http://hsivonen.iki.fi/php-utf8/ Slight modifications to fit with phputf8 library by Harry Fuecks hfuecks@gmail.com.
参数 | 类型 | 描述 | 默认值 |
---|---|---|---|
$arr |
array |
Unicode code points representing a string |
string
utf8 string of charactersboolean
FALSE if a code point cannot be found