22 文本处理

22.1 String 对象

22.1.1 String 构造函数

String 构造函数：

是 %String%。
是全局对象 "String" 属性的初始值。
作为构造函数调用时创建并初始化一个新的 String 对象。
作为函数而非构造函数调用时执行类型转换。
可用作类定义中 extends 子句的值。打算继承指定 String 行为的子类构造函数必须包含对 String 构造函数的 super 调用，以使用 [[StringData]] 内部槽创建并初始化子类实例。

22.1.1.1 String ( `value` )

调用该函数时执行以下步骤：

如果未传入 value，则
1. 令 s 为空字符串。
否则，
1. 如果 NewTarget 为 undefined 且 value 是一个 Symbol，返回 SymbolDescriptiveString(value)。
2. 令 s 为 ? ToString(value)。
如果 NewTarget 为 undefined，返回 s。
返回 StringCreate(s, ? GetPrototypeFromConstructor(NewTarget, "%String.prototype%"))。

22.1.2 String 构造函数的属性

String 构造函数：

有一个 [[Prototype]] 内部槽，其值为 %Function.prototype%。
具有以下属性：

22.1.2.1 String.fromCharCode ( ...`codeUnits` )

此函数可用任意数量的参数调用，这些参数组成剩余参数 codeUnits。

调用时执行以下步骤：

令 result 为空字符串。
对于 codeUnits 中的每个元素 next，执行
1. 令 nextCU 为数值为 ℝ(? ToUint16(next)) 的代码单元。
2. 将 result 设为 result 与 nextCU 的字符串拼接。
返回 result。

该函数的 "length" 属性为 1_𝔽。

22.1.2.2 String.fromCodePoint ( ...`codePoints` )

此函数可用任意数量的参数调用，这些参数组成剩余参数 codePoints。

调用时执行以下步骤：

令 result 为空字符串。
对于 codePoints 中的每个元素 next，执行
1. 令 nextCP 为 ? ToNumber(next)。
2. 如果 nextCP 不是整数数值，抛出 RangeError 异常。
3. 如果 ℝ(nextCP) < 0 或 ℝ(nextCP) > 0x10FFFF，抛出 RangeError 异常。
4. 将 result 设为 result 与 UTF16EncodeCodePoint(ℝ(nextCP)) 的字符串拼接。
断言：若 codePoints 为空，则 result 为空字符串。
返回 result。

该函数的 "length" 属性为 1_𝔽。

22.1.2.3 String.prototype

String.prototype 的初始值是 String 原型对象。

该属性具有 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }。

22.1.2.4 String.raw ( `template`, ...`substitutions` )

此函数可用可变数量参数调用。第一个参数为 template，其余参数构成列表 substitutions。

调用时执行以下步骤：

令 substitutionCount 为 substitutions 中元素数量。
令 cooked 为 ? ToObject(template)。
令 literals 为 ? ToObject(? Get(cooked, "raw"))。
令 literalCount 为 ? LengthOfArrayLike(literals)。
若 literalCount ≤ 0，返回空字符串。
令 R 为空字符串。
令 nextIndex 为 0。
重复，
1. 令 nextLiteralVal 为 ? Get(literals, ! ToString(𝔽(nextIndex)))。
2. 令 nextLiteral 为 ? ToString(nextLiteralVal)。
3. 将 R 设为 R 与 nextLiteral 的字符串拼接。
4. 若 nextIndex + 1 = literalCount，返回 R。
5. 若 nextIndex < substitutionCount，则
  1. 令 nextSubVal 为 substitutions[nextIndex]。
  2. 令 nextSub 为 ? ToString(nextSubVal)。
  3. 将 R 设为 R 与 nextSub 的字符串拼接。
6. 将 nextIndex 设为 nextIndex + 1。

Note

此函数旨在作为标记模板（13.3.11）的标签函数使用。如此调用时，第一个参数将是格式正确的模板对象，剩余参数包含替换值。

22.1.3 String 原型对象的属性

String 原型对象：

是 %String.prototype%。
是一个 String 特异（exotic）对象，并具有为此类对象指定的内部方法。
有一个 [[StringData]] 内部槽，其值为空字符串。
有一个 "length" 属性，其初始值为 +0_𝔽，属性为 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }。
有一个 [[Prototype]] 内部槽，其值为 %Object.prototype%。

除非另有明确说明，下面定义的 String 原型对象的方法不是泛型的，传入的 this 值必须是一个 String 值或者具有已初始化为某个 String 值的 [[StringData]] 内部槽的对象。

22.1.3.1 String.prototype.at ( `index` )

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 len 为 S 的长度。
令 relativeIndex 为 ? ToIntegerOrInfinity(index)。
如果 relativeIndex ≥ 0，则
1. 令 k 为 relativeIndex。
否则，
1. 令 k 为 len + relativeIndex。
若 k < 0 或 k ≥ len，返回 undefined。
返回 S 中从 k 到 k + 1 的子串。

22.1.3.2 String.prototype.charAt ( `pos` )

Note 1

该方法返回一个包含在将此对象转换为 String 后所得字符串中索引 pos 处代码单元的单元素字符串。如果该索引处无元素，则结果为空字符串。结果是 String 值，而非 String 对象。

如果 pos 是整数数值，则 x.charAt(pos) 的结果等价于 x.substring(pos, pos + 1) 的结果。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 position 为 ? ToIntegerOrInfinity(pos)。
令 size 为 S 的长度。
若 position < 0 或 position ≥ size，返回空字符串。
返回 S 中从 position 到 position + 1 的子串。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.3 String.prototype.charCodeAt ( `pos` )

Note 1

该方法返回一个 Number（非负整数且小于 2¹⁶），表示在将此对象转换为 String 后得到的字符串中索引 pos 处代码单元的数值。如果该索引处无元素，结果为 NaN。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 position 为 ? ToIntegerOrInfinity(pos)。
令 size 为 S 的长度。
若 position < 0 或 position ≥ size，返回 NaN。
返回在字符串 S 中索引 position 处代码单元的数值对应的 Number 值。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.4 String.prototype.codePointAt ( `pos` )

Note 1

该方法返回一个非负整数 Number（≤ 0x10FFFF_𝔽），表示在将此对象转换为 String 后得到的字符串中索引 pos 处起始的 UTF-16 编码代码点（6.1.4）的数值。如果该索引处没有元素，返回 undefined。如果在 pos 处没有以有效 UTF-16 代理对开始，则结果为 pos 处的代码单元。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 position 为 ? ToIntegerOrInfinity(pos)。
令 size 为 S 的长度。
若 position < 0 或 position ≥ size，返回 undefined。
令 cp 为 CodePointAt(S, position)。
返回 𝔽(cp.[[CodePoint]])。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.5 String.prototype.concat ( ...`args` )

Note 1

调用此方法时，它返回一个由 this 值（转换为字符串）的代码单元后接每个参数（各自转换为字符串）的代码单元组成的字符串值。结果是一个 String 值，而非 String 对象。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 R 为 S。
对于 args 中的每个元素 next，执行
1. 令 nextString 为 ? ToString(next)。
2. 将 R 设为 R 与 nextString 的字符串拼接。
返回 R。

该方法的 "length" 属性为 1_𝔽。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.6 String.prototype.constructor

String.prototype.constructor 的初始值是 %String%。

22.1.3.7 String.prototype.endsWith ( `searchString` [ , `endPosition` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 isRegExp 为 ? IsRegExp(searchString)。
若 isRegExp 为 true，抛出 TypeError 异常。
令 searchStr 为 ? ToString(searchString)。
令 len 为 S 的长度。
若 endPosition 为 undefined，令 pos 为 len；否则令 pos 为 ? ToIntegerOrInfinity(endPosition)。
令 end 为将 pos 钳制在 0 和 len 之间的结果。
令 searchLength 为 searchStr 的长度。
若 searchLength = 0，返回 true。
令 start 为 end - searchLength。
若 start < 0，返回 false。
令 substring 为 S 中从 start 到 end 的子串。
若 substring 是 searchStr，返回 true。
返回 false。

Note 1

若 searchString（转为 String）的代码单元序列与该对象（转为 String）在 endPosition - length(this) 处开始的对应代码单元相同，则此方法返回 true；否则返回 false。

Note 2

当第一个参数是 RegExp 时抛出异常是为了允许未来版本定义允许此类参数值的扩展。

Note 3

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.8 String.prototype.includes ( `searchString` [ , `position` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 isRegExp 为 ? IsRegExp(searchString)。
若 isRegExp 为 true，抛出 TypeError 异常。
令 searchStr 为 ? ToString(searchString)。
令 pos 为 ? ToIntegerOrInfinity(position)。
断言：如果 position 为 undefined，则 pos 为 0。
令 len 为 S 的长度。
令 start 为将 pos 钳制在 0 和 len 之间的结果。
令 index 为 StringIndexOf(S, searchStr, start)。
若 index 为 not-found，返回 false。
返回 true。

Note 1

如果 searchString 作为 substring 出现在将该对象转换为 String 的结果中，并且其至少一个出现位置索引 ≥ position，则该函数返回 true；否则返回 false。若 position 为 undefined，则视为 0，因此搜索全部字符串。

Note 2

当第一个参数是 RegExp 时抛出异常是为了允许未来版本定义允许此类参数值的扩展。

Note 3

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.9 String.prototype.indexOf ( `searchString` [ , `position` ] )

Note 1

如果 searchString 作为 substring 出现在将该对象转换为 String 的结果中，并且至少一个出现位置索引 ≥ position，则返回最小的此类索引；否则返回 -1_𝔽。若 position 为 undefined，则视为 +0_𝔽，以便搜索整个字符串。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 searchStr 为 ? ToString(searchString)。
令 pos 为 ? ToIntegerOrInfinity(position)。
断言：如果 position 为 undefined，则 pos 为 0。
令 len 为 S 的长度。
令 start 为将 pos 钳制在 0 和 len 之间的结果。
令 result 为 StringIndexOf(S, searchStr, start)。
若 result 为 not-found，返回 -1_𝔽。
返回 𝔽(result)。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.10 String.prototype.isWellFormed ( )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
返回 IsStringWellFormedUnicode(S)。

22.1.3.11 String.prototype.lastIndexOf ( `searchString` [ , `position` ] )

Note 1

如果 searchString 作为 substring 出现在将该对象转换为 String 的结果中，且出现位置索引 ≤ position，则返回最大的此类索引；否则返回 -1_𝔽。若 position 为 undefined，则视为该字符串长度，以便搜索整个字符串。

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 searchStr 为 ? ToString(searchString)。
令 numPos 为 ? ToNumber(position)。
断言：如果 position 是 undefined，则 numPos 为 NaN。
如果 numPos 为 NaN，则令 pos 为 +∞；否则令 pos 为 ! ToIntegerOrInfinity(numPos)。
令 len 为 S 的长度。
令 searchLen 为 searchStr 的长度。
如果 len < searchLen，则返回 -1_𝔽。
令 start 为将 pos 限定在 0 和 len - searchLen 之间的结果。
令 result 为 StringLastIndexOf(S, searchStr, start)。
如果 result 为 not-found，则返回 -1_𝔽。
返回 𝔽(result)。

Note 2

该方法特意设计为泛型；其 this 值不要求为 String 对象，因此可转移到其他对象类型上作为方法使用。

22.1.3.12 String.prototype.localeCompare ( `that` [ , `reserved1` [ , `reserved2` ] ] )

包含 ECMA-402 国际化 API 的 ECMAScript 实现必须按 ECMA-402 的规范实现此方法。若不包含，则使用下述规范：

此方法返回一个非 NaN 的 Number，表示将 this 值（转换为字符串 S）与 that（转换为字符串 thatValue）进行实现定义的、对区域敏感的字符串比较的结果。该结果意图与宿主环境当前区域设置的字符串排序规则一致：当 S 排在 thatValue 之前时为负，排在之后时为正，其他情况为零（表示 S 与 thatValue 之间无相对次序）。

在执行比较前，此方法进行以下准备步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 thatValue 为 ? ToString(that)。

第二与第三可选参数的含义在 ECMA-402 规范中定义；不包含 ECMA-402 支持的实现不得为这些参数位置赋予其他含义。

实际返回值是实现定义的，以允许编码附加信息；但是当该方法视作二元比较器使用时，必须定义所有字符串集合上的一个总排序。该方法还必须识别并遵循 Unicode 标准中的规范等价（canonical equivalence），包括在比较可区分但规范等价的字符串时返回 +0_𝔽。

Note 1

该方法本身不适合作为 Array.prototype.sort 的参数，因为后者需要一个二元函数。

Note 2

该方法可依赖宿主环境提供的语言或区域敏感比较功能，并意图依据宿主环境当前区域设置的习惯比较。然而，无论比较能力如何，该方法必须识别并遵循 Unicode 标准的规范等价——例如以下比较都必须返回 +0_𝔽：

// Å ANGSTROM SIGN vs.
// Å LATIN CAPITAL LETTER A + COMBINING RING ABOVE
"\u212B".localeCompare("A\u030A")

// Ω OHM SIGN vs.
// Ω GREEK CAPITAL LETTER OMEGA
"\u2126".localeCompare("\u03A9")

// ṩ LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE vs.
// ṩ LATIN SMALL LETTER S + COMBINING DOT ABOVE + COMBINING DOT BELOW
"\u1E69".localeCompare("s\u0307\u0323")

// ḍ̇ LATIN SMALL LETTER D WITH DOT ABOVE + COMBINING DOT BELOW vs.
// ḍ̇ LATIN SMALL LETTER D WITH DOT BELOW + COMBINING DOT ABOVE
"\u1E0B\u0323".localeCompare("\u1E0D\u0307")

// 가 HANGUL CHOSEONG KIYEOK + HANGUL JUNGSEONG A vs.
// 가 HANGUL SYLLABLE GA
"\u1100\u1161".localeCompare("\uAC00")

关于规范等价的定义与讨论，参见 Unicode 标准第 2、3 章，以及 Unicode Standard Annex #15, Unicode Normalization Forms 和 Unicode Technical Note #5, Canonical Equivalence in Applications。另参见 Unicode Technical Standard #10, Unicode Collation Algorithm。

建议该方法不应遵循 Unicode 标准第 3 章第 3.7 节中定义的兼容等价或兼容分解。

Note 3

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.13 String.prototype.match ( `regexp` )

当调用此方法时，执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
如果 regexp 是一个 Object，则
1. 令 matcher 为 ? GetMethod(regexp, %Symbol.match%)。
2. 如果 matcher 不是 undefined，则
  1. 返回 ? Call(matcher, regexp, « O »)。
令 S 为 ? ToString(O)。
令 rx 为 ? RegExpCreate(regexp, undefined)。
返回 ? Invoke(rx, %Symbol.match%, « S »)。

Note

该方法特意设计为泛型；其 this 值不要求为 String 对象。因此可转移到其他对象类型上作为方法使用。

22.1.3.14 String.prototype.matchAll ( `regexp` )

该方法对表示 this 值的字符串执行正则表达式匹配并返回一个产出匹配结果的迭代器。每个匹配结果是一个数组，其首元素为匹配的子串，后续元素为任意捕获组匹配的部分。若正则表达式从未匹配，则返回的迭代器不产出任何匹配结果。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
如果 regexp 是一个 Object，则
1. 令 isRegExp 为 ? IsRegExp(regexp)。
2. 如果 isRegExp 为 true，则
  1. 令 flags 为 ? Get(regexp, "flags")。
  2. 执行 ? RequireObjectCoercible(flags)。
  3. 如果 ? ToString(flags) 不包含 "g"，抛出 TypeError 异常。
3. 令 matcher 为 ? GetMethod(regexp, %Symbol.matchAll%)。
4. 如果 matcher 不是 undefined，则
  1. 返回 ? Call(matcher, regexp, « O »)。
令 S 为 ? ToString(O)。
令 rx 为 ? RegExpCreate(regexp, "g")。
返回 ? Invoke(rx, %Symbol.matchAll%, « S »)。

Note 1

该方法特意设计为泛型，不要求其 this 值为 String 对象。因此可转移到其他对象类型上使用。

Note 2

与 String.prototype.split 相似，String.prototype.matchAll 被设计为通常不修改其输入。

22.1.3.15 String.prototype.normalize ( [ `form` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
若 form 为 undefined，令 f 为 "NFC"。
否则，令 f 为 ? ToString(form)。
若 f 不在 "NFC"、"NFD"、"NFKC"、"NFKD" 之一中，抛出 RangeError 异常。
令 ns 为将 S 归一化为由 f 命名的标准化形式所得的字符串值（参见最新 Unicode 标准 - 归一化形式）。
返回 ns。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可以转移到其他对象类型使用。

22.1.3.16 String.prototype.padEnd ( `maxLength` [ , `fillString` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
返回 ? StringPaddingBuiltinsImpl(O, maxLength, fillString, end)。

22.1.3.17 String.prototype.padStart ( `maxLength` [ , `fillString` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
返回 ? StringPaddingBuiltinsImpl(O, maxLength, fillString, start)。

22.1.3.17.1 StringPaddingBuiltinsImpl ( `O`, `maxLength`, `fillString`, `placement` )

The abstract operation StringPaddingBuiltinsImpl takes arguments O (一个 ECMAScript 语言值), maxLength (一个 ECMAScript 语言值), fillString (一个 ECMAScript 语言值), and placement (start 或 end) and returns 返回包含一个字符串的普通完成或抛出完成. It performs the following steps when called:

令 S 为 ? ToString(O)。
令 intMaxLength 为 ℝ(? ToLength(maxLength))。
令 stringLength 为 S 的长度。
若 intMaxLength ≤ stringLength，返回 S。
若 fillString 为 undefined，将 fillString 设为仅包含代码单元 0x0020 (SPACE) 的字符串。
否则，将 fillString 设为 ? ToString(fillString)。
返回 StringPad(S, intMaxLength, fillString, placement)。

22.1.3.17.2 StringPad ( `S`, `maxLength`, `fillString`, `placement` )

The abstract operation StringPad takes arguments S (一个字符串), maxLength (非负整数), fillString (一个字符串), and placement (start 或 end) and returns 一个字符串. It performs the following steps when called:

令 stringLength 为 S 的长度。
若 maxLength ≤ stringLength，返回 S。
若 fillString 为空字符串，返回 S。
令 fillLen 为 maxLength - stringLength。
令 truncatedStringFiller 为由重复拼接 fillString 直至截断到长度 fillLen 得到的字符串。
若 placement 为 start，返回 truncatedStringFiller 与 S 的字符串拼接。
否则，返回 S 与 truncatedStringFiller 的字符串拼接。

Note 1

参数 maxLength 会被钳制，使其不小于 S 的长度。

Note 2

参数 fillString 的默认值为 " "（仅包含代码单元 0x0020 SPACE 的字符串）。

22.1.3.17.3 ToZeroPaddedDecimalString ( `n`, `minLength` )

The abstract operation ToZeroPaddedDecimalString takes arguments n (非负整数) and minLength (非负整数) and returns 一个字符串. It performs the following steps when called:

令 S 为 n 的十进制格式字符串表示。
返回 StringPad(S, minLength, "0", start)。

22.1.3.18 String.prototype.repeat ( `count` )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 n 为 ? ToIntegerOrInfinity(count)。
若 n < 0 或 n = +∞，抛出 RangeError 异常。
若 n = 0，返回空字符串。
返回由 n 个 S 依次拼接而成的字符串值。

Note 1

此方法创建一个字符串值，其内容为 this 值（转换为字符串）重复 count 次。

Note 2

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移至其他对象类型使用。

22.1.3.19 String.prototype.replace ( `searchValue`, `replaceValue` )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
如果 searchValue 是一个 Object，则
1. 令 replacer 为 ? GetMethod(searchValue, %Symbol.replace%)。
2. 如果 replacer 不是 undefined，则
  1. 返回 ? Call(replacer, searchValue, « O, replaceValue »)。
令 string 为 ? ToString(O)。
令 searchString 为 ? ToString(searchValue)。
令 functionalReplace 为 IsCallable(replaceValue)。
如果 functionalReplace 为 false，则
1. 设置 replaceValue 为 ? ToString(replaceValue)。
令 searchLength 为 searchString 的长度。
令 position 为 StringIndexOf(string, searchString, 0)。
如果 position 是 not-found，返回 string。
令 preceding 为 string 的子串，从 0 到 position。
令 following 为 string 的子串，从 position + searchLength 开始。
如果 functionalReplace 为 true，则
1. 令 replacement 为 ? ToString(? Call(replaceValue, undefined, « searchString, 𝔽(position), string »))。
否则，
1. 断言：replaceValue 为一个字符串。
2. 令 captures 为一个新的空列表。
3. 令 replacement 为 ! GetSubstitution(searchString, string, position, captures, undefined, replaceValue)。
返回 preceding、replacement 和 following 的字符串连接结果。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型作为方法使用。

22.1.3.19.1 GetSubstitution ( `matched`, `str`, `position`, `captures`, `namedCaptures`, `replacementTemplate` )

The abstract operation GetSubstitution takes arguments matched (字符串), str (字符串), position (非负整数), captures (由字符串或 undefined 组成的列表), namedCaptures (对象或 undefined), and replacementTemplate (字符串) and returns 返回包含一个字符串的普通完成或抛出完成. 在此抽象操作中，十进制数字 指代码单元范围 0x0030 (DIGIT ZERO) 至 0x0039 (DIGIT NINE) 内的任一代码单元。 It performs the following steps when called:

令 stringLength 为 str 的长度。
断言：position ≤ stringLength。
令 result 为空字符串。
令 templateRemainder 为 replacementTemplate。
当 templateRemainder 非空字符串时重复，
1. 注：下列步骤提取 ref（templateRemainder 的前缀）、确定 refReplacement（其替换），然后将替换附加到 result。
2. 若 templateRemainder 以 "$$" 开头，则
  1. 令 ref 为 "$$"。
  2. 令 refReplacement 为 "$"。
3. 否则若以 "$`" 开头，则
  1. 令 ref 为 "$`"。
  2. 令 refReplacement 为 str 中从 0 到 position 的子串。
4. 否则若以 "$&" 开头，则
  1. 令 ref 为 "$&"。
  2. 令 refReplacement 为 matched。
5. 否则若以 "$'"（0x0024 (DOLLAR SIGN) 后接 0x0027 (APOSTROPHE)）开头，则
  1. 令 ref 为 "$'"。
  2. 令 matchLength 为 matched 的长度。
  3. 令 tailPos 为 position + matchLength。
  4. 令 refReplacement 为 str 中从 min(tailPos, stringLength) 起的子串。
  5. 注：tailPos 仅当此抽象操作由 %RegExp.prototype% 的本征 %Symbol.replace% 方法被一个 "exec" 属性非其本征 %RegExp.prototype.exec% 的对象调用时才可能超过 stringLength。
6. 否则若以 "$" 后接 1 个或多个十进制数字开头，则
  1. 若以 "$" 后接至少 2 个十进制数字开头，令 digitCount 为 2；否则令 digitCount 为 1。
  2. 令 digits 为 templateRemainder 中从 1 到 1 + digitCount 的子串。
  3. 令 index 为 ℝ(StringToNumber(digits))。
  4. 断言：0 ≤ index ≤ 99。
  5. 令 captureLen 为 captures 的元素数量。
  6. 若 index > captureLen 且 digitCount = 2，则
    1. 注：当两位数字替换模式指定的索引超过捕获组数量时，它被视为一位数字替换模式后随一个字面数字。
    2. 将 digitCount 设为 1。
    3. 将 digits 设为 digits 中从 0 到 1 的子串。
    4. 将 index 设为 ℝ(StringToNumber(digits))。
  7. 令 ref 为 templateRemainder 中从 0 到 1 + digitCount 的子串。
  8. 若 1 ≤ index ≤ captureLen，则
    1. 令 capture 为 captures[index - 1]。
    2. 若 capture 为 undefined，则
      1. 令 refReplacement 为空字符串。
    3. 否则，
      1. 令 refReplacement 为 capture。
  9. 否则，
    1. 令 refReplacement 为 ref。
7. 否则若以 "$<" 开头，则
  1. 令 gtPos 为 StringIndexOf(templateRemainder, ">", 0)。
  2. 若 gtPos 为 not-found 或 namedCaptures 为 undefined，则
    1. 令 ref 为 "$<"。
    2. 令 refReplacement 为 ref。
  3. 否则，
    1. 令 ref 为 templateRemainder 中从 0 到 gtPos + 1 的子串。
    2. 令 groupName 为 templateRemainder 中从 2 到 gtPos 的子串。
    3. 断言：namedCaptures 是一个对象。
    4. 令 capture 为 ? Get(namedCaptures, groupName)。
    5. 若 capture 为 undefined，则
      1. 令 refReplacement 为空字符串。
    6. 否则，
      1. 令 refReplacement 为 ? ToString(capture)。
8. 否则，
  1. 令 ref 为 templateRemainder 中从 0 到 1 的子串。
  2. 令 refReplacement 为 ref。
9. 令 refLength 为 ref 的长度。
10. 将 templateRemainder 设为其从 refLength 起的子串。
11. 将 result 设为 result 与 refReplacement 的字符串拼接。
返回 result。

22.1.3.20 String.prototype.replaceAll ( `searchValue`, `replaceValue` )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
若 searchValue 是一个 Object，则
1. 令 isRegExp 为 ? IsRegExp(searchValue)。
2. 若 isRegExp 为 true，则
  1. 令 flags 为 ? Get(searchValue, "flags")。
  2. 执行 ? RequireObjectCoercible(flags)。
  3. 若 ? ToString(flags) 不包含 "g"，抛出 TypeError 异常。
3. 令 replacer 为 ? GetMethod(searchValue, %Symbol.replace%)。
4. 若 replacer 不为 undefined，则
  1. 返回 ? Call(replacer, searchValue, « O, replaceValue »)。
令 string 为 ? ToString(O)。
令 searchString 为 ? ToString(searchValue)。
令 functionalReplace 为 IsCallable(replaceValue)。
若 functionalReplace 为 false，则
1. 将 replaceValue 设为 ? ToString(replaceValue)。
令 searchLength 为 searchString 长度。
令 advanceBy 为 max(1, searchLength)。
令 matchPositions 为新的空列表。
令 position 为 StringIndexOf(string, searchString, 0)。
当 position 非 not-found 时重复，
1. 将 position 附加到 matchPositions。
2. 将 position 设为 StringIndexOf(string, searchString, position + advanceBy)。
令 endOfLastMatch 为 0。
令 result 为空字符串。
对于 matchPositions 中的每个元素 p，执行
1. 令 preserved 为 string 中从 endOfLastMatch 到 p 的子串。
2. 若 functionalReplace 为 true，则
  1. 令 replacement 为 ? ToString(? Call(replaceValue, undefined, « searchString, 𝔽(p), string »))。
3. 否则，
  1. 断言：replaceValue 为字符串。
  2. 令 captures 为新的空列表。
  3. 令 replacement 为 ! GetSubstitution(searchString, string, p, captures, undefined, replaceValue)。
4. 将 result 设为 result、preserved、replacement 的字符串拼接。
5. 将 endOfLastMatch 设为 p + searchLength。
若 endOfLastMatch < string 的长度，则
1. 将 result 设为 result 与 string 中从 endOfLastMatch 起的子串的字符串拼接。
返回 result。

22.1.3.21 String.prototype.search ( `regexp` )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
若 regexp 是一个 Object，则
1. 令 searcher 为 ? GetMethod(regexp, %Symbol.search%)。
2. 若 searcher 不为 undefined，则
  1. 返回 ? Call(searcher, regexp, « O »)。
令 string 为 ? ToString(O)。
令 rx 为 ? RegExpCreate(regexp, undefined)。
返回 ? Invoke(rx, %Symbol.search%, « string »)。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象。因此可转移到其他对象类型作为方法使用。

22.1.3.22 String.prototype.slice ( `start`, `end` )

该方法返回将此对象转换为字符串后的 substring，起始于索引 start，结束于但不包括索引 end（若 end 为 undefined 则到字符串末尾）。若 start 为负，则视为 sourceLength + start，其中 sourceLength 为字符串长度。若 end 为负，则视为 sourceLength + end。结果是一个字符串值而非 String 对象。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 len 为 S 的长度。
令 intStart 为 ? ToIntegerOrInfinity(start)。
若 intStart = -∞，令 from 为 0。
否则若 intStart < 0，令 from 为 max(len + intStart, 0)。
否则，令 from 为 min(intStart, len)。
若 end 为 undefined，令 intEnd 为 len；否则令 intEnd 为 ? ToIntegerOrInfinity(end)。
若 intEnd = -∞，令 to 为 0。
否则若 intEnd < 0，令 to 为 max(len + intEnd, 0)。
否则，令 to 为 min(intEnd, len)。
若 from ≥ to，返回空字符串。
返回 S 中从 from 到 to 的子串。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.23 String.prototype.split ( `separator`, `limit` )

该方法返回一个数组，其中存放了将此对象转换为字符串后的子串。子串通过自左向右查找 separator 的出现确定；这些出现本身不包含在结果数组的任一字符串中，而是用来分割字符串值。separator 的值可以是任意长度的字符串，也可以是一个对象（如具有 %Symbol.split% 方法的 RegExp）。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
若 separator 是一个 Object，则
1. 令 splitter 为 ? GetMethod(separator, %Symbol.split%)。
2. 若 splitter 不为 undefined，则
  1. 返回 ? Call(splitter, separator, « O, limit »)。
令 S 为 ? ToString(O)。
若 limit 为 undefined，令 lim 为 2³² - 1；否则令 lim 为 ℝ(? ToUint32(limit))。
令 R 为 ? ToString(separator)。
若 lim = 0，则
1. 返回 CreateArrayFromList(« »)。
若 separator 为 undefined，则
1. 返回 CreateArrayFromList(« S »)。
令 separatorLength 为 R 的长度。
若 separatorLength = 0，则
1. 令 strLen 为 S 的长度。
2. 令 outLen 为将 lim 钳制在 0 与 strLen 之间的结果。
3. 令 head 为 S 中从 0 到 outLen 的子串。
4. 令 codeUnits 为 head 代码单元序列组成的列表。
5. 返回 CreateArrayFromList(codeUnits)。
若 S 为空字符串，返回 CreateArrayFromList(« S »)。
令 substrings 为新的空列表。
令 i 为 0。
令 j 为 StringIndexOf(S, R, 0)。
当 j 非 not-found 时重复，
1. 令 T 为 S 中从 i 到 j 的子串。
2. 将 T 添加至 substrings。
3. 若 substrings 的元素个数为 lim，返回 CreateArrayFromList(substrings)。
4. 将 i 设为 j + separatorLength。
5. 将 j 设为 StringIndexOf(S, R, i)。
令 T 为 S 中自 i 起的子串。
将 T 添加至 substrings。
返回 CreateArrayFromList(substrings)。

Note 1

separator 的值可以为空字符串。在此情况下，separator 不匹配输入字符串开始或末尾的空 substring，也不匹配前一个分隔符匹配末尾的空 substring。如果 separator 为空字符串，则字符串被拆分为单个代码单元元素；结果数组长度等于字符串长度，每个 substring 含一个代码单元。

如果 this 值（或其转换结果）为空字符串，则结果取决于 separator 是否能匹配空字符串：若能，则结果数组为空；否则数组包含一个元素，即空字符串。

若 separator 为 undefined，结果数组只包含一个字符串，即 this 值（转换为 String）。若 limit 非 undefined，输出数组会被截断为不超过 limit 个元素。

Note 2

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.24 String.prototype.startsWith ( `searchString` [ , `position` ] )

调用该方法时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 isRegExp 为 ? IsRegExp(searchString)。
若 isRegExp 为 true，抛出 TypeError 异常。
令 searchStr 为 ? ToString(searchString)。
令 len 为 S 的长度。
若 position 为 undefined，令 pos 为 0；否则令 pos 为 ? ToIntegerOrInfinity(position)。
令 start 为将 pos 钳制在 0 与 len 之间的结果。
令 searchLength 为 searchStr 的长度。
若 searchLength = 0，返回 true。
令 end 为 start + searchLength。
若 end > len，返回 false。
令 substring 为 S 中从 start 到 end 的子串。
若 substring 是 searchStr，返回 true。
返回 false。

Note 1

若 searchString（转换为 String）的代码单元序列与该对象（转换为 String）从索引 position 开始的对应代码单元相同，则该方法返回 true；否则返回 false。

Note 2

当第一个参数是 RegExp 时抛出异常是为了允许未来版本扩展允许此类参数值。

Note 3

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.25 String.prototype.substring ( `start`, `end` )

该方法返回将此对象转换为字符串后，从索引 start 开始到（但不包含）索引 end（或 end 为 undefined 时到字符串末尾）的 substring。结果为字符串值，不是 String 对象。

若任一参数为 NaN 或为负，则替换为 0；若任一参数严格大于字符串长度，则替换为字符串长度。

若 start 严格大于 end，则交换它们。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 len 为 S 的长度。
令 intStart 为 ? ToIntegerOrInfinity(start)。
若 end 为 undefined，令 intEnd 为 len；否则令 intEnd 为 ? ToIntegerOrInfinity(end)。
令 finalStart 为将 intStart 钳制在 0 与 len 之间的结果。
令 finalEnd 为将 intEnd 钳制在 0 与 len 之间的结果。
令 from 为 min(finalStart, finalEnd)。
令 to 为 max(finalStart, finalEnd)。
返回 S 中从 from 到 to 的子串。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.26 String.prototype.toLocaleLowerCase ( [ `reserved1` [ , `reserved2` ] ] )

包含 ECMA-402 国际化 API 的 ECMAScript 实现必须按 ECMA-402 规范实现此方法。若不包含，则使用以下规范：

此方法将一个字符串值视为 UTF-16 编码代码点序列，参见 6.1.4。

其行为与 toLowerCase 相同，但意在生成与宿主环境当前区域设置约定相符的对区域敏感结果。只有在少数（如土耳其语）与 Unicode 常规大小写映射冲突的语言情况下，结果才会不同。

该方法可选参数的含义在 ECMA-402 中定义；未包含 ECMA-402 支持的实现不得对这些参数位置赋予其他含义。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.27 String.prototype.toLocaleUpperCase ( [ `reserved1` [ , `reserved2` ] ] )

包含 ECMA-402 国际化 API 的 ECMAScript 实现必须按 ECMA-402 规范实现此方法。若不包含，则使用以下规范：

此方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

其行为与 toUpperCase 相同，但意在生成与宿主环境当前区域设置约定相符的对区域敏感结果。仅在少数如土耳其语的情形中，语言规则与常规 Unicode 大小写映射冲突时才会不同。

该方法可选参数含义由 ECMA-402 规范定义；未包含 ECMA-402 支持的实现不得赋予其他含义。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.28 String.prototype.toLowerCase ( )

此方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 sText 为 StringToCodePoints(S)。
令 lowerText 为依据 Unicode 默认大小写转换算法对 sText 执行 toLowercase 的结果。
令 L 为 CodePointsToString(lowerText)。
返回 L。

结果必须依据 Unicode 字符数据库中的与区域无关的大小写映射获得（明确包括 UnicodeData.txt 文件以及伴随的 SpecialCasing.txt 中所有与区域无关的映射）。

Note 1

某些代码点的大小写映射会产生多个代码点。在此情况下结果字符串长度可能不同于源字符串。由于 toUpperCase 与 toLowerCase 均具有上下文相关行为，这两个方法并不对称；即 s.toUpperCase().toLowerCase() 不一定等于 s.toLowerCase()。

Note 2

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.29 String.prototype.toString ( )

调用该方法时执行以下步骤：

返回 ? ThisStringValue(this value)。

Note

对于 String 对象，该方法恰好与 valueOf 方法返回相同的值。

22.1.3.30 String.prototype.toUpperCase ( )

此方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

其行为与 String.prototype.toLowerCase 相同，只是字符串使用 Unicode 默认大小写转换中的 toUppercase 算法映射。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.31 String.prototype.toWellFormed ( )

该方法返回此对象的字符串表示，其中所有不成对的前导或后导代理项被替换为 U+FFFD（REPLACEMENT CHARACTER）。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 S 为 ? ToString(O)。
令 strLen 为 S 的长度。
令 k 为 0。
令 result 为空字符串。
当 k < strLen 时重复，
1. 令 cp 为 CodePointAt(S, k)。
2. 若 cp.[[IsUnpairedSurrogate]] 为 true，则
  1. 将 result 设为 result 与 0xFFFD (REPLACEMENT CHARACTER) 的字符串拼接。
3. 否则，
  1. 将 result 设为 result 与 UTF16EncodeCodePoint(cp.[[CodePoint]]) 的字符串拼接。
4. 将 k 设为 k + cp.[[CodeUnitCount]]。
返回 result。

22.1.3.32 String.prototype.trim ( )

该方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

调用时执行以下步骤：

令 S 为 this 值。
返回 ? TrimString(S, start+end)。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.32.1 TrimString ( `string`, `where` )

The abstract operation TrimString takes arguments string (一个 ECMAScript 语言值) and where (start, end, 或 start+end) and returns 返回包含一个字符串的普通完成或抛出完成. 它将 string 视为 UTF-16 编码代码点序列，参见 6.1.4。 It performs the following steps when called:

执行 ? RequireObjectCoercible(string)。
令 S 为 ? ToString(string)。
若 where 为 start，则
1. 令 T 为移除前导空白后的 S 副本。
否则若 where 为 end，则
1. 令 T 为移除尾随空白后的 S 副本。
否则，
1. 断言：where 为 start+end。
2. 令 T 为移除前后空白后的 S 副本。
返回 T。

空白的定义为 WhiteSpace 与 LineTerminator 的并集。判断某 Unicode 代码点是否属于 Unicode 通用类别 “Space_Separator”(“Zs”) 时，代码单元序列按 6.1.4 指定的 UTF-16 编码代码点序列解释。

22.1.3.33 String.prototype.trimEnd ( )

该方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

调用时执行以下步骤：

令 S 为 this 值。
返回 ? TrimString(S, end)。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.34 String.prototype.trimStart ( )

该方法将字符串值视为 UTF-16 编码代码点序列（参见 6.1.4）。

调用时执行以下步骤：

令 S 为 this 值。
返回 ? TrimString(S, start)。

Note

该方法特意设计为泛型；其 this 不要求为 String 对象，因此可转移到其他对象类型使用。

22.1.3.35 String.prototype.valueOf ( )

调用该方法时执行以下步骤：

返回 ? ThisStringValue(this value)。

22.1.3.35.1 ThisStringValue ( `value` )

The abstract operation ThisStringValue takes argument value (一个 ECMAScript 语言值) and returns 返回包含一个字符串的普通完成或抛出完成. It performs the following steps when called:

若 value 是字符串，返回 value。
若 value 是对象且具有 [[StringData]] 内部槽，则
1. 令 s 为 value.[[StringData]]。
2. 断言：s 为字符串。
3. 返回 s。
抛出 TypeError 异常。

22.1.3.36 String.prototype [ %Symbol.iterator% ] ( )

该方法返回一个迭代器对象，该对象按字符串值的代码点进行迭代，并将每个代码点作为字符串值返回。

调用时执行以下步骤：

令 O 为 this 值。
执行 ? RequireObjectCoercible(O)。
令 s 为 ? ToString(O)。
令 closure 为一个不带参数的新抽象闭包，捕获 s，调用时执行：
1. 令 len 为 s 的长度。
2. 令 position 为 0。
3. 当 position < len 时重复，
  1. 令 cp 为 CodePointAt(s, position)。
  2. 令 nextIndex 为 position + cp.[[CodeUnitCount]]。
  3. 令 resultString 为 s 中从 position 到 nextIndex 的子串。
  4. 将 position 设为 nextIndex。
  5. 执行 ? GeneratorYield(CreateIteratorResultObject(resultString, false))。
4. 返回 NormalCompletion(unused)。
返回 CreateIteratorFromClosure(closure, "%StringIteratorPrototype%", %StringIteratorPrototype%)。

该方法的 "name" 属性值为 "[Symbol.iterator]"。

22.1.4 String 实例的属性

String 实例是 String 特异对象，并具有为此类对象指定的内部方法。String 实例从 String 原型对象继承属性。String 实例还具有 [[StringData]] 内部槽。[[StringData]] 内部槽是该 String 对象表示的字符串值。

String 实例有一个 "length" 属性，以及一组按整数索引命名的可枚举属性。

22.1.4.1 length

该 String 对象表示的字符串值中的元素数量。

一旦 String 对象初始化，该属性不再变化。其属性为 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }。

22.1.5 String 迭代器对象

String Iterator 是表示某个特定 String 实例对象上的一次特定迭代的对象。String Iterator 对象没有命名构造函数；相反，它们由调用某些 String 实例对象的方法创建。

22.1.5.1 %StringIteratorPrototype% 对象

%StringIteratorPrototype% 对象：

具有所有 String Iterator 对象继承的属性。
是一个普通对象。
有一个 [[Prototype]] 内部槽，其值为 %Iterator.prototype%。
具有以下属性：

22.1.5.1.1 %StringIteratorPrototype%.next ( )

返回 ? GeneratorResume(this value, empty, "%StringIteratorPrototype%")。

22.1.5.1.2 %StringIteratorPrototype% [ %Symbol.toStringTag% ]

%Symbol.toStringTag% 属性的初始值为字符串值 "String Iterator"。

该属性具有 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }。

22.2 RegExp（正则表达式）对象

RegExp 对象包含一个正则表达式以及其相关的标志（flags）。

Note

正则表达式的形式和功能参考 Perl 5 编程语言中的正则表达式机制。

22.2.1 模式（Patterns）

RegExp 构造函数对输入的模式字符串应用以下语法。如果该语法无法将该字符串解释为 Pattern 的展开，则会产生错误。

语法（Syntax）

Pattern

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Disjunction

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

Alternative

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Alternative

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Alternative

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

[empty]

Alternative

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Term

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Term

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

Assertion

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Atom

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Atom

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

Quantifier

Assertion

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

(?=

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

(?!

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

(?<=

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

(?<!

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

Quantifier

QuantifierPrefix

{

DecimalDigits

[~Sep]

}

{

DecimalDigits

[~Sep]

{

DecimalDigits

[~Sep]

DecimalDigits

[~Sep]

}

Atom

[UnicodeMode, UnicodeSetsMode, NamedCaptureGroups]

PatternCharacter

AtomEscape

[?UnicodeMode, ?NamedCaptureGroups]

CharacterClass

[?UnicodeMode, ?UnicodeSetsMode]

(

GroupSpecifier

[?UnicodeMode]

opt

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

RegularExpressionModifiers

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

RegularExpressionModifiers

Disjunction

[?UnicodeMode, ?UnicodeSetsMode, ?NamedCaptureGroups]

)

RegularExpressionModifiers

[empty]

RegularExpressionModifiers

RegularExpressionModifier

one of

SyntaxCharacter

one of

(

)

[

]

{

}

PatternCharacter

SourceCharacter

but not SyntaxCharacter

AtomEscape

[UnicodeMode, NamedCaptureGroups]

DecimalEscape

CharacterClassEscape

[?UnicodeMode]

CharacterEscape

[?UnicodeMode]

[+NamedCaptureGroups]

[?UnicodeMode]

[UnicodeMode]

[lookahead ∉ DecimalDigit]

HexEscapeSequence

RegExpUnicodeEscapeSequence

[?UnicodeMode]

[?UnicodeMode]

one of

[UnicodeMode]

[?UnicodeMode]

[UnicodeMode]

[?UnicodeMode]

[UnicodeMode]

RegExpIdentifierStart

[?UnicodeMode]

RegExpIdentifierName

[?UnicodeMode]

RegExpIdentifierPart

[?UnicodeMode]

RegExpIdentifierStart

[UnicodeMode]

IdentifierStartChar

RegExpUnicodeEscapeSequence

[+UnicodeMode]

[~UnicodeMode]

UnicodeLeadSurrogate

UnicodeTrailSurrogate

RegExpIdentifierPart

[UnicodeMode]

IdentifierPartChar

RegExpUnicodeEscapeSequence

[+UnicodeMode]

[~UnicodeMode]

UnicodeLeadSurrogate

UnicodeTrailSurrogate

RegExpUnicodeEscapeSequence

[UnicodeMode]

[+UnicodeMode]

[+UnicodeMode]

[+UnicodeMode]

[+UnicodeMode]

[~UnicodeMode]

[+UnicodeMode]

}

any Unicode code point in the inclusive interval from U+D800 to U+DBFF

UnicodeTrailSurrogate

any Unicode code point in the inclusive interval from U+DC00 to U+DFFF

对于每个 \u HexTrailSurrogate，若其关联的 u HexLeadSurrogate 的选择存在歧义，则应与最近的、否则将没有对应 \u HexTrailSurrogate 的 u HexLeadSurrogate 关联。

HexLeadSurrogate

Hex4Digits

but only if the MV of Hex4Digits is in the inclusive interval from 0xD800 to 0xDBFF

HexTrailSurrogate

Hex4Digits

but only if the MV of Hex4Digits is in the inclusive interval from 0xDC00 to 0xDFFF

HexNonSurrogate

Hex4Digits

but only if the MV of Hex4Digits is not in the inclusive interval from 0xD800 to 0xDFFF

IdentityEscape

[UnicodeMode]

[+UnicodeMode]

SyntaxCharacter

[+UnicodeMode]

[~UnicodeMode]

SourceCharacter

but not UnicodeIDContinue

DecimalEscape

NonZeroDigit

DecimalDigits

[~Sep]

opt

[lookahead ∉ DecimalDigit]

CharacterClassEscape

[UnicodeMode]

[+UnicodeMode]

UnicodePropertyValueExpression

}

[+UnicodeMode]

UnicodePropertyValueExpression

}

UnicodePropertyValueExpression

UnicodePropertyName

UnicodePropertyValue

LoneUnicodePropertyNameOrValue

UnicodePropertyName

UnicodePropertyNameCharacters

UnicodePropertyNameCharacter

UnicodePropertyNameCharacters

opt

UnicodePropertyValue

UnicodePropertyValueCharacters

LoneUnicodePropertyNameOrValue

UnicodePropertyValueCharacters

UnicodePropertyValueCharacter

UnicodePropertyValueCharacters

opt

UnicodePropertyValueCharacter

UnicodePropertyNameCharacter

DecimalDigit

UnicodePropertyNameCharacter

AsciiLetter

CharacterClass

[UnicodeMode, UnicodeSetsMode]

[

[lookahead ≠ ^]

ClassContents

[?UnicodeMode, ?UnicodeSetsMode]

]

ClassContents

[?UnicodeMode, ?UnicodeSetsMode]

]

ClassContents

[UnicodeMode, UnicodeSetsMode]

[empty]

[~UnicodeSetsMode]

NonemptyClassRanges

[?UnicodeMode]

[+UnicodeSetsMode]

[UnicodeMode]

[?UnicodeMode]

[?UnicodeMode]

NonemptyClassRangesNoDash

[?UnicodeMode]

ClassAtom

[?UnicodeMode]

ClassAtom

[?UnicodeMode]

ClassContents

[?UnicodeMode, ~UnicodeSetsMode]

NonemptyClassRangesNoDash

[UnicodeMode]

ClassAtom

[?UnicodeMode]

ClassAtomNoDash

[?UnicodeMode]

NonemptyClassRangesNoDash

[?UnicodeMode]

ClassAtomNoDash

[?UnicodeMode]

ClassAtom

[?UnicodeMode]

ClassContents

[?UnicodeMode, ~UnicodeSetsMode]

[UnicodeMode]

[?UnicodeMode]

[UnicodeMode]

but not one of \ or ] or -

ClassEscape

[?UnicodeMode]

ClassEscape

[UnicodeMode]

[+UnicodeMode]

[?UnicodeMode]

[?UnicodeMode]

opt

opt

[lookahead ≠ &]

[lookahead ≠ &]

ClassStringDisjunction

ClassSetCharacter

NestedClass

[

[lookahead ≠ ^]

ClassContents

[+UnicodeMode, +UnicodeSetsMode]

]

ClassContents

[+UnicodeMode, +UnicodeSetsMode]

]

CharacterClassEscape

[+UnicodeMode]

Note 1

此处前两行等价于 CharacterClass。

ClassStringDisjunction

\q{

ClassStringDisjunctionContents

}

ClassStringDisjunctionContents

ClassString

ClassStringDisjunctionContents

[empty]

opt

[lookahead ∉ ClassSetReservedDoublePunctuator]

SourceCharacter

but not ClassSetSyntaxCharacter

CharacterEscape

[+UnicodeMode]

ClassSetReservedPunctuator

ClassSetReservedDoublePunctuator

one of

;;

ClassSetSyntaxCharacter

one of

(

)

[

]

{

}

ClassSetReservedPunctuator

one of

;

Note 2

本节中的若干产生式在 B.1.2 中给出了替代定义。

22.2.1.1 静态语义：早期错误（Early Errors）

Note

本节在 B.1.2.1 中有补充。

Pattern

Disjunction

若 CountLeftCapturingParensWithin(Pattern) ≥ 2³² - 1，则为语法错误（Syntax Error）。
若 Pattern 含有两个不同的 GroupSpecifier x 和 y，使得 x 的 CapturingGroupName 与 y 的 CapturingGroupName 相同，且 MightBothParticipate(x, y) 为 true，则为语法错误。

QuantifierPrefix

{

DecimalDigits

}

若第一个 DecimalDigits 的 MV 严格大于第二个 DecimalDigits 的 MV，则为语法错误。

Atom

RegularExpressionModifiers

Disjunction

)

若 RegularExpressionModifiers 匹配的源文本包含同一代码点多次，则为语法错误。

Atom

RegularExpressionModifiers

Disjunction

)

若第一个 RegularExpressionModifiers 和第二个 RegularExpressionModifiers 匹配的源文本都为空，则为语法错误。
若第一个 RegularExpressionModifiers 匹配的源文本包含同一代码点多次，则为语法错误。
若第二个 RegularExpressionModifiers 匹配的源文本包含同一代码点多次，则为语法错误。
若第一个 RegularExpressionModifiers 匹配的任何代码点也出现在第二个 RegularExpressionModifiers 匹配的源文本中，则为语法错误。

AtomEscape

GroupName

若 GroupSpecifiersThatMatch(GroupName) 为空，则为语法错误。

AtomEscape

DecimalEscape

若 DecimalEscape 的 CapturingGroupNumber 严格大于包含该 AtomEscape 的 Pattern 中 CountLeftCapturingParensWithin 的结果，则为语法错误。

NonemptyClassRanges

ClassAtom

ClassContents

若第一个 ClassAtom 的 IsCharacterClass 为 true 或第二个 ClassAtom 的 IsCharacterClass 为 true，则为语法错误。
若两者 IsCharacterClass 均为 false 且第一个 ClassAtom 的 CharacterValue 严格大于第二个 ClassAtom 的 CharacterValue，则为语法错误。

NonemptyClassRangesNoDash

ClassAtomNoDash

ClassAtom

ClassContents

若 ClassAtomNoDash 的 IsCharacterClass 为 true 或 ClassAtom 的 IsCharacterClass 为 true，则为语法错误。
若二者 IsCharacterClass 均为 false 且 ClassAtomNoDash 的 CharacterValue 严格大于 ClassAtom 的 CharacterValue，则为语法错误。

RegExpIdentifierStart

RegExpUnicodeEscapeSequence

若 RegExpUnicodeEscapeSequence 的 CharacterValue 不是 IdentifierStartChar 词法语法产生式所匹配的某个代码点的数值，则为语法错误。

RegExpIdentifierStart

UnicodeLeadSurrogate

UnicodeTrailSurrogate

若 RegExpIdentifierStart 的 RegExpIdentifierCodePoint 未被 UnicodeIDStart 词法语法产生式匹配，则为语法错误。

RegExpIdentifierPart

RegExpUnicodeEscapeSequence

若 RegExpUnicodeEscapeSequence 的 CharacterValue 不是 IdentifierPartChar 词法语法产生式所匹配的某个代码点的数值，则为语法错误。

RegExpIdentifierPart

UnicodeLeadSurrogate

UnicodeTrailSurrogate

若 RegExpIdentifierPart 的 RegExpIdentifierCodePoint 未被 UnicodeIDContinue 词法语法产生式匹配，则为语法错误。

UnicodePropertyValueExpression

UnicodePropertyName

UnicodePropertyValue

如果 UnicodePropertyName 匹配的源文本不是 Unicode 属性名或未在 Table 67 的 “属性名及别名” 列中列出的属性别名，则为语法错误。
如果 UnicodePropertyValue 匹配的源文本不是由 UnicodePropertyName 匹配的 Unicode 属性或属性别名在 PropertyValueAliases.txt 中列出的属性值或属性值别名，则为语法错误。

UnicodePropertyValueExpression

LoneUnicodePropertyNameOrValue

如果 LoneUnicodePropertyNameOrValue 匹配的源文本既不是 PropertyValueAliases.txt 中 General_Category (gc) 属性的 Unicode 属性值或属性值别名，也不是 Table 68 的 “属性名及别名” 列中列出的二进制属性或二进制属性别名，亦不是 Table 69 的 “属性名” 列中列出的字符串的二进制属性，则为语法错误。
如果外层 Pattern 没有 _{[UnicodeSetsMode]} 参数，且 LoneUnicodePropertyNameOrValue 匹配的源文本是 Table 69 的 “属性名” 列中列出的字符串的二进制属性，则为语法错误。

CharacterClassEscape

UnicodePropertyValueExpression

}

若该 UnicodePropertyValueExpression 的 MayContainStrings 为 true，则为语法错误。

CharacterClass

ClassContents

]

若该 ClassContents 的 MayContainStrings 为 true，则为语法错误。

NestedClass

ClassContents

]

若该 ClassContents 的 MayContainStrings 为 true，则为语法错误。

ClassSetRange

ClassSetCharacter

若第一个 ClassSetCharacter 的 CharacterValue 严格大于第二个 ClassSetCharacter 的 CharacterValue，则为语法错误。

22.2.1.2 静态语义：CountLeftCapturingParensWithin ( `node` )

The abstract operation 静态语义：CountLeftCapturingParensWithin takes argument node (一个解析节点) and returns 非负整数. 返回 node 中左捕获括号（左捕获括号）的数量。左捕获括号是由产生式 Atom :: ( GroupSpecifieropt Disjunction ) 中的 ( 终结符匹配的任意 ( 模式字符。

Note

本节在 B.1.2.2 中有补充。

It performs the following steps when called:

断言：node 是 RegExp 模式语法中某个产生式实例。
返回 node 内包含的 Atom :: ( GroupSpecifieropt Disjunction ) 解析节点的数量。

22.2.1.3 静态语义：CountLeftCapturingParensBefore ( `node` )

The abstract operation 静态语义：CountLeftCapturingParensBefore takes argument node (一个解析节点) and returns 非负整数. 返回封闭模式中出现在 node 左侧的左捕获括号数量。

Note

本节在 B.1.2.2 中有补充。

It performs the following steps when called:

断言：node 是 RegExp 模式语法中某个产生式实例。
令 pattern 为包含 node 的 Pattern。
返回 pattern 中所有 Atom :: ( GroupSpecifieropt Disjunction ) 解析节点里，出现在 node 之前或包含 node 的那些的数量。

22.2.1.4 静态语义：MightBothParticipate ( `x`, `y` )

The abstract operation 静态语义：MightBothParticipate takes arguments x (一个解析节点) and y (一个解析节点) and returns 布尔值. It performs the following steps when called:

断言：x 与 y 具有相同的封闭 Pattern。
若封闭 Pattern 中存在一个 Disjunction :: Alternative | Disjunction 解析节点，使得 x 位于该 Alternative 内而 y 位于其派生的 Disjunction 内，或 x 位于派生的 Disjunction 而 y 位于该 Alternative 内，则返回 false。
返回 true。

22.2.1.5 静态语义：CapturingGroupNumber

The syntax-directed operation 静态语义：CapturingGroupNumber takes no arguments and returns 正整数.

Note

本节在 B.1.2.1 中有补充。

It is defined piecewise over the following productions:

DecimalEscape

NonZeroDigit

返回 NonZeroDigit 的 MV。

DecimalEscape

NonZeroDigit

DecimalDigits

令 n 为 DecimalDigits 中代码点数量。
返回 (NonZeroDigit 的 MV × 10ⁿ 加上 DecimalDigits 的 MV)。

“NonZeroDigit 的 MV” 与 “DecimalDigits 的 MV” 的定义见 12.9.3。

22.2.1.6 静态语义：IsCharacterClass

The syntax-directed operation 静态语义：IsCharacterClass takes no arguments and returns 布尔值.

Note

本节在 B.1.2.3 中有补充。

It is defined piecewise over the following productions:

ClassAtom

ClassAtomNoDash

SourceCharacter

but not one of \ or ] or -

ClassEscape

CharacterEscape

返回 false。

ClassEscape

CharacterClassEscape

返回 true。

22.2.1.7 静态语义：CharacterValue

The syntax-directed operation 静态语义：CharacterValue takes no arguments and returns 非负整数.

Note 1

本节在 B.1.2.4 中有补充。

It is defined piecewise over the following productions:

ClassAtom

返回 U+002D（连字号 HYPHEN-MINUS）的数值。

ClassAtomNoDash

SourceCharacter

but not one of \ or ] or -

令 ch 为 SourceCharacter 匹配的代码点。
返回 ch 的数值。

ClassEscape

返回 U+0008（BACKSPACE）的数值。

ClassEscape

返回 U+002D（连字号 HYPHEN-MINUS）的数值。

CharacterEscape

ControlEscape

按 Table 65 返回对应数值。

Table 65: ControlEscape 代码点数值

ControlEscape	数值	代码点	Unicode 名称	符号
`t`	9	`U+0009`	CHARACTER TABULATION	<HT>
`n`	10	`U+000A`	LINE FEED (LF)	<LF>
`v`	11	`U+000B`	LINE TABULATION	<VT>
`f`	12	`U+000C`	FORM FEED (FF)	<FF>
`r`	13	`U+000D`	CARRIAGE RETURN (CR)	<CR>

CharacterEscape

AsciiLetter

令 ch 为 AsciiLetter 匹配的代码点。
令 i 为 ch 的数值。
返回 i 除以 32 的余数。

CharacterEscape

[lookahead ∉ DecimalDigit]

返回 U+0000（NULL）的数值。

Note 2

\0 表示 <NUL> 字符，且后面不能跟十进制数字。

CharacterEscape

HexEscapeSequence

返回 HexEscapeSequence 的 MV。

RegExpUnicodeEscapeSequence

HexLeadSurrogate

HexTrailSurrogate

令 lead 为 HexLeadSurrogate 的 CharacterValue。
令 trail 为 HexTrailSurrogate 的 CharacterValue。
令 cp 为 UTF16SurrogatePairToCodePoint(lead, trail)。
返回 cp 的数值。

RegExpUnicodeEscapeSequence

Hex4Digits

返回 Hex4Digits 的 MV。

RegExpUnicodeEscapeSequence

CodePoint

}

返回 CodePoint 的 MV。

返回 Hex4Digits 的 MV。

CharacterEscape

IdentityEscape

令 ch 为 IdentityEscape 匹配的代码点。
返回 ch 的数值。

ClassSetCharacter

SourceCharacter

but not ClassSetSyntaxCharacter

令 ch 为 SourceCharacter 匹配的代码点。
返回 ch 的数值。

ClassSetCharacter

ClassSetReservedPunctuator

令 ch 为 ClassSetReservedPunctuator 匹配的代码点。
返回 ch 的数值。

ClassSetCharacter

返回 U+0008（BACKSPACE）的数值。

22.2.1.8 静态语义：MayContainStrings

The syntax-directed operation 静态语义：MayContainStrings takes no arguments and returns 布尔值. It is defined piecewise over the following productions:

CharacterClassEscape

UnicodePropertyValueExpression

}

UnicodePropertyValueExpression

]

[empty]

返回 false。

UnicodePropertyValueExpression

LoneUnicodePropertyNameOrValue

如果 LoneUnicodePropertyNameOrValue 匹配的源文本是 Table 69 中 “属性名” 列列出的字符串的二进制属性，返回 true。
返回 false。

ClassUnion

ClassSetRange

ClassUnion

opt

若 ClassUnion 存在，返回该 ClassUnion 的 MayContainStrings。
返回 false。

ClassUnion

ClassSetOperand

ClassUnion

opt

若该 ClassSetOperand 的 MayContainStrings 为 true，返回 true。
若 ClassUnion 存在，返回该 ClassUnion 的 MayContainStrings。
返回 false。

ClassIntersection

ClassSetOperand

若第一个 ClassSetOperand 的 MayContainStrings 为 false，返回 false。
若第二个 ClassSetOperand 的 MayContainStrings 为 false，返回 false。
返回 true。

ClassIntersection

ClassSetOperand

若该 ClassIntersection 的 MayContainStrings 为 false，返回 false。
若该 ClassSetOperand 的 MayContainStrings 为 false，返回 false。
返回 true。

ClassSubtraction

ClassSetOperand

返回第一个 ClassSetOperand 的 MayContainStrings。

ClassSubtraction

ClassSetOperand

返回该 ClassSubtraction 的 MayContainStrings。

ClassStringDisjunctionContents

ClassString

ClassStringDisjunctionContents

若该 ClassString 的 MayContainStrings 为 true，返回 true。
返回该 ClassStringDisjunctionContents 的 MayContainStrings。

ClassString

[empty]

返回 true。

ClassString

NonEmptyClassString

返回该 NonEmptyClassString 的 MayContainStrings。

NonEmptyClassString

ClassSetCharacter

NonEmptyClassString

opt

若 NonEmptyClassString 存在，返回 true。
返回 false。

22.2.1.9 静态语义：GroupSpecifiersThatMatch ( `thisGroupName` )

The abstract operation 静态语义：GroupSpecifiersThatMatch takes argument thisGroupName (一个 GroupName 解析节点) and returns GroupSpecifier 解析节点的列表. It performs the following steps when called:

令 name 为 thisGroupName 的 CapturingGroupName。
令 pattern 为包含 thisGroupName 的 Pattern。
令 result 为新的空列表。
对 pattern 中的每个 GroupSpecifier gs，执行
1. 若 gs 的 CapturingGroupName 为 name，则
  1. 将 gs 追加至 result。
返回 result。

22.2.1.10 静态语义：CapturingGroupName

The syntax-directed operation 静态语义：CapturingGroupName takes no arguments and returns 字符串. It is defined piecewise over the following productions:

GroupName

RegExpIdentifierName

令 idTextUnescaped 为 RegExpIdentifierName 的 RegExpIdentifierCodePoints。
返回 CodePointsToString(idTextUnescaped)。

22.2.1.11 静态语义：RegExpIdentifierCodePoints

The syntax-directed operation 静态语义：RegExpIdentifierCodePoints takes no arguments and returns 代码点列表. It is defined piecewise over the following productions:

RegExpIdentifierName

RegExpIdentifierStart

令 cp 为 RegExpIdentifierStart 的 RegExpIdentifierCodePoint。
返回 « cp »。

RegExpIdentifierName

RegExpIdentifierPart

令 cps 为派生的 RegExpIdentifierName 的 RegExpIdentifierCodePoints。
令 cp 为 RegExpIdentifierPart 的 RegExpIdentifierCodePoint。
返回 cps 与 « cp » 的列表拼接。

22.2.1.12 静态语义：RegExpIdentifierCodePoint

The syntax-directed operation 静态语义：RegExpIdentifierCodePoint takes no arguments and returns 代码点. It is defined piecewise over the following productions:

RegExpIdentifierStart

IdentifierStartChar

返回 IdentifierStartChar 匹配的代码点。

RegExpIdentifierPart

IdentifierPartChar

返回 IdentifierPartChar 匹配的代码点。

RegExpIdentifierStart

RegExpUnicodeEscapeSequence

RegExpIdentifierPart

RegExpUnicodeEscapeSequence

返回其数值为 RegExpUnicodeEscapeSequence 的 CharacterValue 的代码点。

RegExpIdentifierStart

UnicodeLeadSurrogate

UnicodeTrailSurrogate

RegExpIdentifierPart

UnicodeLeadSurrogate

UnicodeTrailSurrogate

令 lead 为数值等于 UnicodeLeadSurrogate 匹配代码点数值的代码单元。
令 trail 为数值等于 UnicodeTrailSurrogate 匹配代码点数值的代码单元。
返回 UTF16SurrogatePairToCodePoint(lead, trail)。

22.2.2 模式语义（Pattern Semantics）

正则表达式模式通过下述过程被转换为一个抽象闭包（Abstract Closure）。实现被鼓励使用比下述更高效的算法，只要结果一致。该抽象闭包用作 RegExp 对象 [[RegExpMatcher]] 内部槽的值。

一个 Pattern 在其关联标志不包含 u 且不包含 v 时是一个 BMP 模式；否则是一个 Unicode 模式。BMP 模式针对被解释为由一系列 16 位值组成的字符串进行匹配，这些 16 位值是在基本多文种平面范围内的 Unicode 代码点。Unicode 模式针对被解释为由 UTF-16 编码的 Unicode 代码点组成的字符串进行匹配。在描述 BMP 模式行为的语境中，“character（字符）”指单个 16 位 Unicode BMP 代码点；在描述 Unicode 模式行为的语境中，“character” 指一个 UTF-16 编码的代码点（6.1.4）。在任一语境下，“character value（字符值）”指对应未编码代码点的数值。

Pattern 的语法与语义按如下方式定义：将 Pattern 的源文本视为 SourceCharacter 值的一个列表，其中每个 SourceCharacter 对应一个 Unicode 代码点。若一个 BMP 模式包含非 BMP 的 SourceCharacter，则整个模式使用 UTF-16 编码，该编码的各个代码单元作为该列表的元素。

Note

例如，考虑一个在源文本中表示为单个非 BMP 字符 U+1D11E（MUSICAL SYMBOL G CLEF）的模式。作为 Unicode 模式解释时，它将是一个只含该代码点 U+1D11E 的单元素（字符）列表。然而，作为 BMP 模式解释时，它首先被 UTF-16 编码，产生一个包含代码单元 0xD834 和 0xDD1E 的双元素列表。

模式以 ECMAScript 字符串值的形式传给 RegExp 构造函数，其中非 BMP 字符以 UTF-16 编码。例如，单字符 MUSICAL SYMBOL G CLEF 的模式，作为一个字符串值，是长度为 2 的字符串，其元素是代码单元 0xD834 与 0xDD1E。因此，将其作为由两个模式字符组成的 BMP 模式处理无需进一步转换；然而，要把它作为 Unicode 模式处理，必须使用 UTF16SurrogatePairToCodePoint 生成仅含一个模式字符（代码点 U+1D11E）的列表。

实现也许实际上不会执行这些到 UTF-16 之间的转换，但本规范语义要求匹配结果等效于执行了这些转换。

22.2.2.1 记号（Notation）

下述描述使用以下内部数据结构：

CharSetElement 是下列两种实体之一：
- 若 rer.[[UnicodeSets]] 为 false，则 CharSetElement 是前述“模式语义”意义里的一个字符。
- 若 rer.[[UnicodeSets]] 为 true，则 CharSetElement 是一个序列，其元素为前述“模式语义”意义里的字符。包含空序列、单字符序列以及多字符序列。为方便起见，处理此类 CharSetElement 时，单个字符与单字符序列可互换视之。
CharSet 是 CharSetElement 的数学集合。
CaptureRange 是一个记录 { [[StartIndex]], [[EndIndex]] }，表示一次捕获中包含的字符范围；[[StartIndex]] 是表示范围在 Input 中起始索引（含）的整数，[[EndIndex]] 是表示范围在 Input 中结束索引（不含）的整数。任意 CaptureRange 必须满足 [[StartIndex]] ≤ [[EndIndex]] 的不变量。
MatchState 是记录 { [[Input]], [[EndIndex]], [[Captures]] }，其中 [[Input]] 是表示被匹配字符串的字符列表，[[EndIndex]] 是一个整数，[[Captures]] 是一个值列表，对应模式中每个左捕获括号。MatchState 用于在正则表达式匹配算法中表示部分匹配状态。[[EndIndex]] 是当前模式已匹配的最后一个输入字符索引加 1；[[Captures]] 保存捕获括号的结果。[[Captures]] 的第 n 个元素要么是表示第 n 个捕获括号所捕获字符范围的 CaptureRange，要么在该捕获括号尚未到达时为 undefined。由于回溯，匹配过程中任意时刻可能存在多个 MatchState。
MatcherContinuation 是一个抽象闭包，接收一个 MatchState 参数，返回一个 MatchState 或 failure。该闭包尝试从其 MatchState 参数给出的中间状态起，将剩余（由闭包捕获的值指定）的模式部分与 Input 匹配。若匹配成功，返回达到的最终 MatchState；失败则返回 failure。
Matcher 是一个抽象闭包，接收两个参数——一个 MatchState 和一个 MatcherContinuation——返回一个 MatchState 或 failure。Matcher 尝试从其 MatchState 参数给出的中间状态起，将模式的一个中间子模式（由闭包捕获值指定）与该 MatchState 的 [[Input]] 匹配。MatcherContinuation 参数应为匹配剩余模式的闭包。匹配子模式得到新的 MatchState 后，Matcher 调用 MatcherContinuation 测试剩余模式是否也能匹配。若可以，返回该 MatcherContinuation 返回的 MatchState；否则 Matcher 可以在其选择点尝试不同选择，不断调用 MatcherContinuation，直到成功或所有可能性耗尽。

22.2.2.1.1 RegExp 记录（RegExp Records）

RegExp 记录是一个记录值，用于存储编译（并可能在匹配）过程中需要的 RegExp 信息。

其具有以下字段：

Table 66: RegExp 记录字段

字段名	值	含义
`[[IgnoreCase]]`	一个布尔值	指示 "i" 是否出现在该 RegExp 的标志中
`[[Multiline]]`	一个布尔值	指示 "m" 是否出现在标志中
`[[DotAll]]`	一个布尔值	指示 "s" 是否出现在标志中
`[[Unicode]]`	一个布尔值	指示 "u" 是否出现在标志中
`[[UnicodeSets]]`	一个布尔值	指示 "v" 是否出现在标志中
`[[CapturingGroupsCount]]`	非负整数	该 RegExp 模式中的左捕获括号数量

22.2.2.2 运行时语义：CompilePattern

The syntax-directed operation 运行时语义：CompilePattern takes argument rer (一个 RegExp 记录) and returns 一个抽象闭包，接收一个字符列表与一个非负整数，返回 MatchState 或 failure. It is defined piecewise over the following productions:

Pattern

Disjunction

令 m 为 Disjunction 的 CompileSubpattern，参数为 rer 与 forward。
返回一个新的抽象闭包，具参数 (Input, index)，捕获 rer 与 m，被调用时执行：
1. 断言：Input 是字符列表。
2. 断言：0 ≤ index ≤ Input 元素数量。
3. 令 c 为新的 MatcherContinuation，参数 (y)，捕获无，其执行：
  1. 断言：y 是 MatchState。
  2. 返回 y。
4. 令 cap 为一个长度为 rer.[[CapturingGroupsCount]] 的列表，索引 1..rer.[[CapturingGroupsCount]]，初值皆为 undefined。
5. 令 x 为 MatchState { [[Input]]: Input, [[EndIndex]]: index, [[Captures]]: cap }。
6. 返回 m(x, c)。

Note

一个 Pattern 编译成抽象闭包值。随后 RegExpBuiltinExec 可将此过程应用于字符列表与其中的偏移量，以判断该模式是否在该偏移精确匹配，并在匹配时确定捕获括号的值。22.2.2 中的算法设计为：编译模式可能抛出 SyntaxError；另一方面，一旦成功编译，将所得抽象闭包应用于字符列表查找匹配时不会抛异常（除非出现诸如内存不足的实现定义异常）。

22.2.2.3 运行时语义：CompileSubpattern

The syntax-directed operation 运行时语义：CompileSubpattern takes arguments rer (一个 RegExp 记录) and direction (forward 或 backward) and returns 一个 Matcher.

Note 1

本节在 B.1.2.5 中有补充。

It is defined piecewise over the following productions:

Disjunction

Alternative

Disjunction

令 m1 为 Alternative 的 CompileSubpattern，参数 rer 与 direction。
令 m2 为 Disjunction 的 CompileSubpattern，参数 rer 与 direction。
返回 MatchTwoAlternatives(m1, m2)。

Note 2

| 运算符分隔两个候选。模式先尝试匹配左侧 Alternative（后接正则表达式余部）；若失败，再尝试右侧 Disjunction（后接余部）。若左 Alternative、右 Disjunction 及余部均有选择点，则在转向左 Alternative 的下一个选择之前，会先穷尽余部的所有选择；在穷尽左 Alternative 的所有选择后，才改试右 Disjunction。被 | 跳过的部分中的捕获括号产生 undefined 而不是字符串。例如：

/a|ab/.exec("abc")

返回 "a" 而非 "ab"。并且，

/((a)|(ab))((c)|(bc))/.exec("abc")

返回数组

["abc", "a", "a", undefined, "bc", undefined, "bc"]

而非

["abc", "ab", undefined, "ab", "c", "c", undefined]

两候选的尝试顺序与 direction 的值无关。

Alternative

[empty]

返回 EmptyMatcher()。

Alternative

Term

令 m1 为 Alternative 的 CompileSubpattern，参数 rer 与 direction。
令 m2 为 Term 的 CompileSubpattern，参数 rer 与 direction。
返回 MatchSequence(m1, m2, direction)。

Note 3

相邻的 Term 会尝试同时匹配 Input 的相邻片段。当 direction 为 forward 时，若左 Alternative、右 Term 和余部存在选择点，则在进入右 Term 的下一个选择前会先穷尽余部所有选择，在进入左 Alternative 的下一个选择前会先穷尽右 Term 的所有选择；当 direction 为 backward 时，Alternative 与 Term 的求值顺序反转。

Term

Assertion

返回 Assertion 的 CompileAssertion，参数 rer。

Note 4

结果 Matcher 与 direction 无关。

Term

Atom

返回 Atom 的 CompileAtom，参数 rer 与 direction。

Term

Atom

Quantifier

令 m 为 Atom 的 CompileAtom，参数 rer 与 direction。
令 q 为 Quantifier 的 CompileQuantifier。
断言：q.[[Min]] ≤ q.[[Max]]。
令 parenIndex 为 CountLeftCapturingParensBefore(Term)。
令 parenCount 为 CountLeftCapturingParensWithin(Atom)。
返回新的 Matcher，参数 (x, c)，捕获 m、q、parenIndex、parenCount，其执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 返回 RepeatMatcher(m, q.[[Min]], q.[[Max]], q.[[Greedy]], x, c, parenIndex, parenCount)。

22.2.2.3.1 RepeatMatcher ( `m`, `min`, `max`, `greedy`, `x`, `c`, `parenIndex`, `parenCount` )

The abstract operation RepeatMatcher takes arguments m (一个 Matcher), min (非负整数), max (非负整数或 +∞), greedy (布尔值), x (MatchState), c (MatcherContinuation), parenIndex (非负整数), and parenCount (非负整数) and returns MatchState 或 failure. It performs the following steps when called:

若 max = 0，返回 c(x)。
令 d 为新的 MatcherContinuation，参数 (y)，捕获 m、min、max、greedy、x、c、parenIndex、parenCount，其执行：
1. 断言：y 是 MatchState。
2. 若 min = 0 且 y.[[EndIndex]] = x.[[EndIndex]]，返回 failure。
3. 若 min = 0，令 min2 为 0；否则令 min2 为 min - 1。
4. 若 max = +∞，令 max2 为 +∞；否则令 max2 为 max - 1。
5. 返回 RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount)。
令 cap 为 x.[[Captures]] 的拷贝。
对区间 parenIndex + 1 至 parenIndex + parenCount（含）内每个整数 k，设 cap[k] 为 undefined。
令 Input 为 x.[[Input]]。
令 e 为 x.[[EndIndex]]。
令 xr 为 MatchState { [[Input]]: Input, [[EndIndex]]: e, [[Captures]]: cap }。
若 min ≠ 0，返回 m(xr, d)。
若 greedy 为 false，则
1. 令 z 为 c(x)。
2. 若 z 不为 failure，返回 z。
3. 返回 m(xr, d)。
令 z 为 m(xr, d)。
若 z 不为 failure，返回 z。
返回 c(x)。

Note 1

Atom 后随 Quantifier 会按 Quantifier 指定的次数重复。Quantifier 可为非贪婪，此时 Atom 尽可能少重复仍满足余部匹配；或为贪婪，此时尽可能多重复仍满足余部匹配。重复的是 Atom 模式本身，而非其已匹配的输入字符序列，因此不同重复可匹配不同子串。

Note 2

若 Atom 与余部均有选择点，Atom 首先按尽可能多（或在非贪婪时尽可能少）重复。会在进入 Atom 最后一次重复的下一选择前穷尽余部选择；在进入倒数第二次重复的下一选择前穷尽最后一次重复的所有选择；此时可能出现允许更多或更少重复的情况，再次（按多或少的策略）穷尽后再进入倒数第二次重复的下一选择，依此类推。

比较：

/a[a-z]{2,4}/.exec("abcdefghi")

返回 "abcde" 与

/a[a-z]{2,4}?/.exec("abcdefghi")

返回 "abc"。

再考虑：

/(aa|aabaac|ba|b|c)*/.exec("aabaac")

依据上述选择点顺序返回数组

["aaba", "ba"]

而不是：

["aabaac", "aabaac"]
["aabaac", "c"]

上述选择点次序可用来编写计算两个一元表示数的最大公约数的正则。以下示例计算 10 与 15 的 gcd：

"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")

返回一元表示的 gcd "aaaaa"。

Note 3

RepeatMatcher 的步骤 4 在每次重复 Atom 时清除其捕获。其行为可见于：

/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")

返回数组

["zaacbbbcac", "z", "ac", "a", undefined, "c"]

而非

["zaacbbbcac", "z", "ac", "a", "bbb", "c"]

因为最外层 * 的每次迭代都会清除量词作用域内 Atom 的捕获（此例中为第 2、3、4、5 号捕获）。

Note 4

RepeatMatcher 的步骤 2.b 指出：一旦最小重复次数满足，任何再匹配空字符序列的 Atom 展开将不再考虑，防止在如下模式中无限循环：

/(a*)*/.exec("b")

或稍复杂的：

/(a*)b\1+/.exec("baaaac")

其返回数组

["b", ""]

22.2.2.3.2 EmptyMatcher ( )

The abstract operation EmptyMatcher takes no arguments and returns 一个 Matcher. It performs the following steps when called:

返回新的 Matcher，参数 (x, c)，捕获无，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 返回 c(x)。

22.2.2.3.3 MatchTwoAlternatives ( `m1`, `m2` )

The abstract operation MatchTwoAlternatives takes arguments m1 (一个 Matcher) and m2 (一个 Matcher) and returns 一个 Matcher. It performs the following steps when called:

返回新的 Matcher，参数 (x, c)，捕获 m1 与 m2，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 r 为 m1(x, c)。
4. 若 r 不为 failure，返回 r。
5. 返回 m2(x, c)。

22.2.2.3.4 MatchSequence ( `m1`, `m2`, `direction` )

The abstract operation MatchSequence takes arguments m1 (一个 Matcher), m2 (一个 Matcher), and direction (forward 或 backward) and returns 一个 Matcher. It performs the following steps when called:

若 direction 为 forward，则
1. 返回新的 Matcher，参数 (x, c)，捕获 m1、m2，执行：
  1. 断言：x 是 MatchState。
  2. 断言：c 是 MatcherContinuation。
  3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获 c、m2，执行：
    1. 断言：y 是 MatchState。
    2. 返回 m2(y, c)。
  4. 返回 m1(x, d)。
否则，
1. 断言：direction 为 backward。
2. 返回新的 Matcher，参数 (x, c)，捕获 m1、m2，执行：
  1. 断言：x 是 MatchState。
  2. 断言：c 是 MatcherContinuation。
  3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获 c、m1，执行：
    1. 断言：y 是 MatchState。
    2. 返回 m1(y, c)。
  4. 返回 m2(x, d)。

22.2.2.4 运行时语义：CompileAssertion

The syntax-directed operation 运行时语义：CompileAssertion takes argument rer (一个 RegExp 记录) and returns 一个 Matcher.

Note 1

本节在 B.1.2.6 中有补充。

It is defined piecewise over the following productions:

Assertion

返回新的 Matcher，参数 (x, c)，捕获 rer，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 e 为 x.[[EndIndex]]。
5. 若 e = 0，或 (rer.[[Multiline]] 为 true 且字符 Input[e - 1] 被 LineTerminator 匹配)，则
  1. 返回 c(x)。
6. 返回 failure。

Note 2

即使使用 y 标志，^ 也只匹配 Input 的开始，或（若 rer.[[Multiline]] 为 true）行首。

Assertion

返回新的 Matcher，参数 (x, c)，捕获 rer，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 e 为 x.[[EndIndex]]。
5. 令 InputLength 为 Input 元素数量。
6. 若 e = InputLength，或 (rer.[[Multiline]] 为 true 且字符 Input[e] 被 LineTerminator 匹配)，
  1. 返回 c(x)。
7. 返回 failure。

Assertion

返回新的 Matcher，参数 (x, c)，捕获 rer，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 e 为 x.[[EndIndex]]。
5. 令 a 为 IsWordChar(rer, Input, e - 1)。
6. 令 b 为 IsWordChar(rer, Input, e)。
7. 若 (a 为 true 且 b 为 false) 或 (a 为 false 且 b 为 true)，返回 c(x)。
8. 返回 failure。

Assertion

返回新的 Matcher，参数 (x, c)，捕获 rer，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 e 为 x.[[EndIndex]]。
5. 令 a 为 IsWordChar(rer, Input, e - 1)。
6. 令 b 为 IsWordChar(rer, Input, e)。
7. 若 (a 为 true 且 b 为 true) 或 (a 为 false 且 b 为 false)，返回 c(x)。
8. 返回 failure。

Assertion

(?=

Disjunction

)

令 m 为 Disjunction 的 CompileSubpattern，参数 rer 与 forward。
返回新的 Matcher，参数 (x, c)，捕获 m，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获无，执行：
  1. 断言：y 是 MatchState。
  2. 返回 y。
4. 令 r 为 m(x, d)。
5. 若 r 为 failure，返回 failure。
6. 断言：r 是 MatchState。
7. 令 cap 为 r.[[Captures]]。
8. 令 Input 为 x.[[Input]]。
9. 令 xe 为 x.[[EndIndex]]。
10. 令 z 为 MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }。
11. 返回 c(z)。

Note 3

(?= Disjunction ) 是零宽正向先行断言。其成功要求 Disjunction 在当前位置匹配，但当前位置在匹配余部前不前进。若 Disjunction 在当前位置可多种方式匹配，仅尝试第一种。与其他正则运算不同，不会对 (?= 形式进行回溯（此异常行为源自 Perl）。仅当 Disjunction 含捕获括号且余部含对其的反向引用时此行为才重要。

例如：

/(?=(a+))/.exec("baaabac")

在首个 b 之后的空字符串处匹配，返回：

["", "aaa"]

为说明缺少回溯，考虑：

/(?=(a+))a*b\1/.exec("baaabac")

该表达式返回

["aba", "a"]

而非：

["aaaba", "a"]

Assertion

(?!

Disjunction

)

令 m 为 Disjunction 的 CompileSubpattern，参数 rer 与 forward。
返回新的 Matcher，参数 (x, c)，捕获 m，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获无，执行：
  1. 断言：y 是 MatchState。
  2. 返回 y。
4. 令 r 为 m(x, d)。
5. 若 r 不为 failure，返回 failure。
6. 返回 c(x)。

Note 4

(?! Disjunction ) 是零宽负向先行断言。其成功要求 Disjunction 在当前位置匹配失败。当前位置在匹配余部前不前进。Disjunction 可含捕获括号，但对它们的反向引用仅在 Disjunction 内有意义；模式其它位置引用这些捕获总是返回 undefined，因为负向先行必须失败整个模式才成功。例如：

/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")

寻找一个 a，其后不紧跟着若干（正数 n）个 a、一个 b、再 n 个 a（由第一个 \2 指定）以及一个 c。第二个 \2 在负向先行外，匹配 undefined 因而总成功。整体返回数组：

["baaabaac", "ba", undefined, "abaac"]

Assertion

(?<=

Disjunction

)

令 m 为 Disjunction 的 CompileSubpattern，参数 rer 与 backward。
返回新的 Matcher，参数 (x, c)，捕获 m，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获无，执行：
  1. 断言：y 是 MatchState。
  2. 返回 y。
4. 令 r 为 m(x, d)。
5. 若 r 为 failure，返回 failure。
6. 断言：r 是 MatchState。
7. 令 cap 为 r.[[Captures]]。
8. 令 Input 为 x.[[Input]]。
9. 令 xe 为 x.[[EndIndex]]。
10. 令 z 为 MatchState { [[Input]]: Input, [[EndIndex]]: xe, [[Captures]]: cap }。
11. 返回 c(z)。

Assertion

(?<!

Disjunction

)

令 m 为 Disjunction 的 CompileSubpattern，参数 rer 与 backward。
返回新的 Matcher，参数 (x, c)，捕获 m，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获无，执行：
  1. 断言：y 是 MatchState。
  2. 返回 y。
4. 令 r 为 m(x, d)。
5. 若 r 不为 failure，返回 failure。
6. 返回 c(x)。

22.2.2.4.1 IsWordChar ( `rer`, `Input`, `e` )

The abstract operation IsWordChar takes arguments rer (一个 RegExp 记录), Input (字符列表), and e (整数) and returns 布尔值. It performs the following steps when called:

令 InputLength 为 Input 元素数量。
若 e = -1 或 e = InputLength，返回 false。
令 c 为字符 Input[e]。
若 WordCharacters(rer) 包含 c，返回 true。
返回 false。

22.2.2.5 运行时语义：CompileQuantifier

The syntax-directed operation 运行时语义：CompileQuantifier takes no arguments and returns 一个记录，字段 [[Min]]（非负整数）、[[Max]]（非负整数或 +∞）、[[Greedy]]（布尔值）. It is defined piecewise over the following productions:

Quantifier

QuantifierPrefix

令 qp 为 QuantifierPrefix 的 CompileQuantifierPrefix。
返回记录 { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: true }。

Quantifier

QuantifierPrefix

令 qp 为 QuantifierPrefix 的 CompileQuantifierPrefix。
返回记录 { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]: false }。

22.2.2.6 运行时语义：CompileQuantifierPrefix

The syntax-directed operation 运行时语义：CompileQuantifierPrefix takes no arguments and returns 记录，字段 [[Min]]（非负整数）与 [[Max]]（非负整数或 +∞）. It is defined piecewise over the following productions:

QuantifierPrefix

返回记录 { [[Min]]: 0, [[Max]]: +∞ }。

QuantifierPrefix

返回记录 { [[Min]]: 1, [[Max]]: +∞ }。

QuantifierPrefix

返回记录 { [[Min]]: 0, [[Max]]: 1 }。

QuantifierPrefix

{

DecimalDigits

}

令 i 为 DecimalDigits 的 MV（参见 12.9.3）。
返回记录 { [[Min]]: i, [[Max]]: i }。

QuantifierPrefix

{

DecimalDigits

令 i 为 DecimalDigits 的 MV。
返回记录 { [[Min]]: i, [[Max]]: +∞ }。

QuantifierPrefix

{

DecimalDigits

}

令 i 为第一组 DecimalDigits 的 MV。
令 j 为第二组 DecimalDigits 的 MV。
返回记录 { [[Min]]: i, [[Max]]: j }。

22.2.2.7 运行时语义：CompileAtom

The syntax-directed operation 运行时语义：CompileAtom takes arguments rer (一个 RegExp 记录) and direction (forward 或 backward) and returns 一个 Matcher.

Note 1

本节在 B.1.2.7 中有补充。

It is defined piecewise over the following productions:

Atom

PatternCharacter

令 ch 为 PatternCharacter 匹配的字符。
令 A 为包含字符 ch 的单元素 CharSet。
返回 CharacterSetMatcher(rer, A, false, direction)。

Atom

令 A 为 AllCharacters(rer)。
若 rer.[[DotAll]] 不为 true，则
1. 从 A 中移除所有对应 LineTerminator 产生式右侧代码点的字符。
返回 CharacterSetMatcher(rer, A, false, direction)。

Atom

CharacterClass

令 cc 为 CharacterClass 的 CompileCharacterClass，参数 rer。
令 cs 为 cc.[[CharSet]]。
若 rer.[[UnicodeSets]] 为 false，或 cs 的每个 CharSetElement 都是单字符（包括 cs 为空），返回 CharacterSetMatcher(rer, cs, cc.[[Invert]], direction)。
断言：cc.[[Invert]] 为 false。
令 lm 为空的 Matcher 列表。
对 cs 中每个包含超过 1 个字符的 CharSetElement s（按长度降序）：
1. 令 cs2 为仅含 s 最后代码点的单元素 CharSet。
2. 令 m2 为 CharacterSetMatcher(rer, cs2, false, direction)。
3. 对 s 中每个代码点 c1（从倒数第二个向前迭代）：
  1. 令 cs1 为仅含 c1 的单元素 CharSet。
  2. 令 m1 为 CharacterSetMatcher(rer, cs1, false, direction)。
  3. 设 m2 为 MatchSequence(m1, m2, direction)。
4. 将 m2 追加入 lm。
令 singles 为包含 cs 中所有单字符 CharSetElement 的 CharSet。
将 CharacterSetMatcher(rer, singles, false, direction) 追加入 lm。
若 cs 含空字符序列，将 EmptyMatcher() 追加入 lm。
令 m2 为 lm 中最后一个 Matcher。
对 lm 中每个 Matcher m1（自倒数第二个向前）：
1. 设 m2 为 MatchTwoAlternatives(m1, m2)。
返回 m2。

Atom

(

GroupSpecifier

opt

Disjunction

)

令 m 为 Disjunction 的 CompileSubpattern，参数 rer 与 direction。
令 parenIndex 为 CountLeftCapturingParensBefore(Atom)。
返回新 Matcher，参数 (x, c)，捕获 direction、m、parenIndex，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 d 为新的 MatcherContinuation，参数 (y)，捕获 x、c、direction、parenIndex，执行：
  1. 断言：y 是 MatchState。
  2. 令 cap 为 y.[[Captures]] 的拷贝。
  3. 令 Input 为 x.[[Input]]。
  4. 令 xe 为 x.[[EndIndex]]。
  5. 令 ye 为 y.[[EndIndex]]。
  6. 若 direction 为 forward，则
    1. 断言：xe ≤ ye。
    2. 令 r 为 CaptureRange { [[StartIndex]]: xe, [[EndIndex]]: ye }。
  7. 否则，
    1. 断言：direction 为 backward。
    2. 断言：ye ≤ xe。
    3. 令 r 为 CaptureRange { [[StartIndex]]: ye, [[EndIndex]]: xe }。
  8. 设 cap[parenIndex + 1] = r。
  9. 令 z 为 MatchState { [[Input]]: Input, [[EndIndex]]: ye, [[Captures]]: cap }。
  10. 返回 c(z)。
4. 返回 m(x, d)。

Note 2

( Disjunction ) 形式的括号既分组 Disjunction 的组件，又保存匹配结果。该结果可用于反向引用（\ 加非零十进制数）、替换字符串引用，或作为正则匹配抽象闭包返回数组的一部分。若要禁止捕获行为，使用 (?: Disjunction )。

Atom

RegularExpressionModifiers

Disjunction

)

令 addModifiers 为 RegularExpressionModifiers 匹配的源文本。
令 removeModifiers 为空字符串。
令 modifiedRer 为 UpdateModifiers(rer, CodePointsToString(addModifiers), removeModifiers)。
返回 Disjunction 的 CompileSubpattern，参数 modifiedRer 与 direction。

Atom

RegularExpressionModifiers

Disjunction

)

令 addModifiers 为第一个 RegularExpressionModifiers 匹配的源文本。
令 removeModifiers 为第二个 RegularExpressionModifiers 匹配的源文本。
令 modifiedRer 为 UpdateModifiers(rer, CodePointsToString(addModifiers), CodePointsToString(removeModifiers))。
返回 Disjunction 的 CompileSubpattern，参数 modifiedRer 与 direction。

AtomEscape

DecimalEscape

令 n 为 DecimalEscape 的 CapturingGroupNumber。
断言：n ≤ rer.[[CapturingGroupsCount]]。
返回 BackreferenceMatcher(rer, « n », direction)。

Note 3

\ 后随非零十进制数字 n 的转义序列匹配第 n 组捕获括号的结果（22.2.2.1）。若模式捕获括号少于 n 则错误。若第 n 组存在但为 undefined（未捕获任何内容），该反向引用总是成功。

AtomEscape

CharacterEscape

令 cv 为 CharacterEscape 的 CharacterValue。
令 ch 为字符值为 cv 的字符。
令 A 为包含 ch 的单元素 CharSet。
1. 返回 CharacterSetMatcher(rer, A, false, direction)。

AtomEscape

CharacterClassEscape

令 cs 为 CharacterClassEscape 的 CompileToCharSet，参数 rer。
若 rer.[[UnicodeSets]] 为 false，或 cs 的每个 CharSetElement 都是单字符（含空），返回 CharacterSetMatcher(rer, cs, false, direction)。
令 lm 为空 Matcher 列表。
对 cs 中每个含多于 1 字符的 CharSetElement s（按长度降序）：
1. 令 cs2 为仅含 s 最后代码点的单元素 CharSet。
2. 令 m2 为 CharacterSetMatcher(rer, cs2, false, direction)。
3. 对 s 中每个代码点 c1（自倒数第二个向前）：
  1. 令 cs1 为仅含 c1 的 CharSet。
  2. 令 m1 为 CharacterSetMatcher(rer, cs1, false, direction)。
  3. 设 m2 为 MatchSequence(m1, m2, direction)。
4. 将 m2 追加入 lm。
令 singles 为包含 cs 中所有单字符元素的 CharSet。
将 CharacterSetMatcher(rer, singles, false, direction) 追加入 lm。
若 cs 含空序列，将 EmptyMatcher() 追加入 lm。
令 m2 为 lm 最末 Matcher。
对 lm 中每个 m1（自倒数第二向前）：
1. 设 m2 为 MatchTwoAlternatives(m1, m2)。
返回 m2。

AtomEscape

GroupName

令 matchingGroupSpecifiers 为 GroupSpecifiersThatMatch(GroupName)。
令 parenIndices 为新空列表。
对 matchingGroupSpecifiers 中每个 GroupSpecifier groupSpecifier：
1. 令 parenIndex 为 CountLeftCapturingParensBefore(groupSpecifier)。
2. 将 parenIndex 追加入 parenIndices。
返回 BackreferenceMatcher(rer, parenIndices, direction)。

22.2.2.7.1 CharacterSetMatcher ( `rer`, `A`, `invert`, `direction` )

The abstract operation CharacterSetMatcher takes arguments rer (一个 RegExp 记录), A (一个 CharSet), invert (布尔值), and direction (forward 或 backward) and returns 一个 Matcher. It performs the following steps when called:

若 rer.[[UnicodeSets]] 为 true，则
1. 断言：invert 为 false。
2. 断言：A 的每个 CharSetElement 皆为单字符。
返回新的 Matcher，参数 (x, c)，捕获 rer、A、invert、direction，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 e 为 x.[[EndIndex]]。
5. 若 direction 为 forward，令 f = e + 1。
6. 否则令 f = e - 1。
7. 令 InputLength 为 Input 元素数量。
8. 若 f < 0 或 f > InputLength，返回 failure。
9. 令 index = min(e, f)。
10. 令 ch 为字符 Input[index]。
11. 令 cc = Canonicalize(rer, ch)。
12. 若存在 A 中某 CharSetElement 含恰一字符 a 且 Canonicalize(rer, a) = cc，令 found = true；否则 found = false。
13. 若 invert = false 且 found = false，返回 failure。
14. 若 invert = true 且 found = true，返回 failure。
15. 令 cap 为 x.[[Captures]]。
16. 令 y 为 MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }。
17. 返回 c(y)。

22.2.2.7.2 BackreferenceMatcher ( `rer`, `ns`, `direction` )

The abstract operation BackreferenceMatcher takes arguments rer (一个 RegExp 记录), ns (正整数列表), and direction (forward 或 backward) and returns 一个 Matcher. It performs the following steps when called:

返回新的 Matcher，参数 (x, c)，捕获 rer、ns、direction，执行：
1. 断言：x 是 MatchState。
2. 断言：c 是 MatcherContinuation。
3. 令 Input 为 x.[[Input]]。
4. 令 cap 为 x.[[Captures]]。
5. 令 r 为 undefined。
6. 对 ns 中每个整数 n：
  1. 若 cap[n] 不为 undefined，则
    1. 断言：r 为 undefined。
    2. 设 r = cap[n]。
7. 若 r 为 undefined，返回 c(x)。
8. 令 e 为 x.[[EndIndex]]。
9. 令 rs 为 r.[[StartIndex]]。
10. 令 re 为 r.[[EndIndex]]。
11. 令 len = re - rs。
12. 若 direction 为 forward，令 f = e + len；否则 f = e - len。
13. 令 InputLength 为 Input 元素数量。
14. 若 f < 0 或 f > InputLength，返回 failure。
15. 令 g = min(e, f)。
16. 若存在区间 [0, len) 内整数 i 使 Canonicalize(rer, Input[rs + i]) ≠ Canonicalize(rer, Input[g + i])，返回 failure。
17. 令 y 为 MatchState { [[Input]]: Input, [[EndIndex]]: f, [[Captures]]: cap }。
18. 返回 c(y)。

22.2.2.7.3 Canonicalize ( `rer`, `ch` )

The abstract operation Canonicalize takes arguments rer (一个 RegExp 记录) and ch (一个字符) and returns 一个字符. It performs the following steps when called:

若 HasEitherUnicodeFlag(rer) 为 true 且 rer.[[IgnoreCase]] 为 true，则
1. 若 Unicode 字符数据库文件 CaseFolding.txt 为 ch 提供简单或常用大小写折叠映射，则返回应用该映射后的 ch。
2. 返回 ch。
若 rer.[[IgnoreCase]] 为 false，返回 ch。
断言：ch 是 UTF-16 代码单元。
令 cp 为数值等于 ch 数值的代码点。
令 u 为根据 Unicode 默认大小写转换算法 toUppercase(« cp ») 的结果。
令 uStr 为 CodePointsToString(u)。
若 uStr 长度 ≠ 1，返回 ch。
令 cu 为 uStr 的单个代码单元。
若 ch 数值 ≥ 128 且 cu 数值 < 128，返回 ch。
返回 cu。

Note

在 HasEitherUnicodeFlag(rer) 为 true 的不区分大小写匹配中，比较前所有字符按 Unicode 标准提供的 simple case folding 隐式折叠。该映射始终映射到单个代码点，不会将 ß 映射到 ss/SS。但可将非 Basic Latin 代码点映射到该块内，如 ſ 折叠到 s，K 折叠到 k。包含这些代码点的字符串可被 /[a-z]/ui 匹配。

在 HasEitherUnicodeFlag(rer) 为 false 的不区分大小写匹配中，使用的是 toUppercase 而非 toCasefold，有细微差别。例如 Ω 被 toUppercase 映射为自身，但 toCasefold 映射为 ω，与 Ω 一起；因此 "\u2126" 被 /[ω]/ui 与 /[\u03A9]/ui 匹配，却不被 /[ω]/i 或 /[\u03A9]/i 匹配。且无 Basic Latin 外代码点映射进该块，所以 "\u017F ſ" 与 "\u212A K" 不被 /[a-z]/i 匹配。

22.2.2.7.4 UpdateModifiers ( `rer`, `add`, `remove` )

The abstract operation UpdateModifiers takes arguments rer (一个 RegExp 记录), add (字符串), and remove (字符串) and returns 一个 RegExp 记录. It performs the following steps when called:

断言：add 与 remove 不含共同元素。
令 ignoreCase = rer.[[IgnoreCase]]。
令 multiline = rer.[[Multiline]]。
令 dotAll = rer.[[DotAll]]。
令 unicode = rer.[[Unicode]]。
令 unicodeSets = rer.[[UnicodeSets]]。
令 capturingGroupsCount = rer.[[CapturingGroupsCount]]。
若 remove 含 "i"，设 ignoreCase = false；否则若 add 含 "i"，设其为 true。
若 remove 含 "m"，设 multiline = false；否则若 add 含 "m"，设其为 true。
若 remove 含 "s"，设 dotAll = false；否则若 add 含 "s"，设其为 true。
返回 RegExp 记录 { [[IgnoreCase]]: ignoreCase, [[Multiline]]: multiline, [[DotAll]]: dotAll, [[Unicode]]: unicode, [[UnicodeSets]]: unicodeSets, [[CapturingGroupsCount]]: capturingGroupsCount }。

22.2.2.8 运行时语义：CompileCharacterClass

The syntax-directed operation 运行时语义：CompileCharacterClass takes argument rer (一个 RegExp 记录) and returns 记录，字段 [[CharSet]]（CharSet）与 [[Invert]]（布尔值）. It is defined piecewise over the following productions:

CharacterClass

[

ClassContents

]

令 A 为 ClassContents 的 CompileToCharSet，参数 rer。
返回记录 { [[CharSet]]: A, [[Invert]]: false }。

CharacterClass

ClassContents

]

令 A 为 ClassContents 的 CompileToCharSet，参数 rer。
若 rer.[[UnicodeSets]] 为 true，则
1. 返回记录 { [[CharSet]]: CharacterComplement(rer, A), [[Invert]]: false }。
返回记录 { [[CharSet]]: A, [[Invert]]: true }。

22.2.2.9 运行时语义：CompileToCharSet

The syntax-directed operation 运行时语义：CompileToCharSet takes argument rer (一个 RegExp 记录) and returns 一个 CharSet.

Note 1

本节在 B.1.2.8 中有补充。

It is defined piecewise over the following productions:

ClassContents

[empty]

返回空 CharSet。

NonemptyClassRanges

ClassAtom

NonemptyClassRangesNoDash

令 A 为 ClassAtom 的 CompileToCharSet，参数 rer。
令 B 为 NonemptyClassRangesNoDash 的 CompileToCharSet，参数 rer。
返回 A 与 B 的并集。

NonemptyClassRanges

ClassAtom

ClassContents

令 A 为第一个 ClassAtom 的 CompileToCharSet，参数 rer。
令 B 为第二个 ClassAtom 的 CompileToCharSet，参数 rer。
令 C 为 ClassContents 的 CompileToCharSet，参数 rer。
令 D 为 CharacterRange(A, B)。
返回 D 与 C 的并集。

NonemptyClassRangesNoDash

ClassAtomNoDash

NonemptyClassRangesNoDash

令 A 为 ClassAtomNoDash 的 CompileToCharSet，参数 rer。
令 B 为 NonemptyClassRangesNoDash 的 CompileToCharSet，参数 rer。
返回 A 与 B 的并集。

NonemptyClassRangesNoDash

ClassAtomNoDash

ClassAtom

ClassContents

令 A 为 ClassAtomNoDash 的 CompileToCharSet，参数 rer。
令 B 为 ClassAtom 的 CompileToCharSet，参数 rer。
令 C 为 ClassContents 的 CompileToCharSet，参数 rer。
令 D 为 CharacterRange(A, B)。
返回 D 与 C 的并集。

Note 2

ClassContents 可展开为单个 ClassAtom，以及/或由连字符分隔的两个 ClassAtom 范围。后者包含两端之间所有字符（含端点）；若任一 ClassAtom 不表示单字符（如 \w）或首字符值大于第二个字符值则错误。

Note 3

即便模式忽略大小写，范围两端大小写仍影响范围所含字符。例如 /[E-F]/i 只匹配 E、F、e、f，而 /[E-f]/i 匹配 Basic Latin 中所有大小写字母以及符号 [, \, ], ^, _, `。

Note 4

- 可字面或作范围界定符。位于 ClassContents 首/末、范围起止、或紧随范围后时按字面处理。

ClassAtom

返回含单字符 - U+002D 的 CharSet。

ClassAtomNoDash

SourceCharacter

but not one of \ or ] or -

返回含 SourceCharacter 匹配字符的 CharSet。

ClassEscape

CharacterEscape

令 cv 为此 ClassEscape 的 CharacterValue。
令 c 为字符值为 cv 的字符。
返回含单字符 c 的 CharSet。

Note 5

ClassAtom 可使用除 \b、\B、反向引用外的其他正则转义。在 CharacterClass 内，\b 表示退格，\B 与反向引用报错；在 ClassAtom 内使用反向引用导致错误。

CharacterClassEscape

返回含字符 0..9 的十元素 CharSet。

CharacterClassEscape

令 S 为 CharacterClassEscape :: d 返回的 CharSet。
返回 CharacterComplement(rer, S)。

CharacterClassEscape

返回含 WhiteSpace 或 LineTerminator 产生式右侧代码点对应字符的 CharSet。

CharacterClassEscape

令 S 为 CharacterClassEscape :: s 返回的 CharSet。
返回 CharacterComplement(rer, S)。

CharacterClassEscape

返回 MaybeSimpleCaseFolding(rer, WordCharacters(rer))。

CharacterClassEscape

令 S 为 CharacterClassEscape :: w 返回的 CharSet。
返回 CharacterComplement(rer, S)。

CharacterClassEscape

UnicodePropertyValueExpression

}

返回 UnicodePropertyValueExpression 的 CompileToCharSet，参数 rer。

CharacterClassEscape

UnicodePropertyValueExpression

}

令 S 为 UnicodePropertyValueExpression 的 CompileToCharSet，参数 rer。
断言：S 仅含单代码点。
返回 CharacterComplement(rer, S)。

UnicodePropertyValueExpression

UnicodePropertyName

UnicodePropertyValue

令 ps 为 UnicodePropertyName 匹配的源文本。
令 p 为 UnicodeMatchProperty(rer, ps)。
断言：p 是 Unicode 属性名或 Table 67 的 “属性名及别名” 列中的属性别名。
令 vs 为 UnicodePropertyValue 匹配的源文本。
令 v 为 UnicodeMatchPropertyValue(p, vs)。
令 A 为包含所有 Unicode 码点的 CharSet，这些码点的字符数据库定义中包含属性 p 且值为 v。
返回 MaybeSimpleCaseFolding(rer, A)。

UnicodePropertyValueExpression

LoneUnicodePropertyNameOrValue

令 s 为 LoneUnicodePropertyNameOrValue 匹配的源文本。
如果 UnicodeMatchPropertyValue(General_Category, s) 是 PropertyValueAliases.txt 中 General_Category (gc) 属性的 Unicode 属性值或属性值别名，则
1. 返回包含所有 Unicode 码点的 CharSet，这些码点的字符数据库定义中属性为 “General_Category” 且值为 s。
令 p 为 UnicodeMatchProperty(rer, s)。
断言：p 是 Table 68 的 “属性名及别名” 列中的二进制 Unicode 属性或属性别名，或 Table 69 的 “属性名” 列中的字符串二进制 Unicode 属性。
令 A 为包含所有 CharSetElements 的 CharSet，这些元素的字符数据库定义中属性 p 的值为 “True”。
返回 MaybeSimpleCaseFolding(rer, A)。

ClassUnion

ClassSetRange

ClassUnion

opt

令 A 为 ClassSetRange 的 CompileToCharSet，参数 rer。
若 ClassUnion 存在，则
1. 令 B 为 ClassUnion 的 CompileToCharSet，参数 rer。
2. 返回 A ∪ B。
返回 A。

ClassUnion

ClassSetOperand

ClassUnion

opt

令 A 为 ClassSetOperand 的 CompileToCharSet，参数 rer。
若 ClassUnion 存在，则
1. 令 B 为 ClassUnion 的 CompileToCharSet，参数 rer。
2. 返回 A ∪ B。
返回 A。

ClassIntersection

ClassSetOperand

令 A 为第一个 ClassSetOperand 的 CompileToCharSet，参数 rer。
令 B 为第二个 ClassSetOperand 的 CompileToCharSet，参数 rer。
返回 A ∩ B。

ClassIntersection

ClassSetOperand

令 A 为 ClassIntersection 的 CompileToCharSet，参数 rer。
令 B 为 ClassSetOperand 的 CompileToCharSet，参数 rer。
返回 A ∩ B。

ClassSubtraction

ClassSetOperand

令 A 为第一个 ClassSetOperand 的 CompileToCharSet，参数 rer。
令 B 为第二个 ClassSetOperand 的 CompileToCharSet，参数 rer。
返回含 A 中不在 B 中的元素的 CharSet。

ClassSubtraction

ClassSetOperand

令 A 为 ClassSubtraction 的 CompileToCharSet，参数 rer。
令 B 为 ClassSetOperand 的 CompileToCharSet，参数 rer。
返回含 A 中不在 B 中元素的 CharSet。

ClassSetRange

ClassSetCharacter

令 A 为第一个 ClassSetCharacter 的 CompileToCharSet，参数 rer。
令 B 为第二个 ClassSetCharacter 的 CompileToCharSet，参数 rer。
返回 MaybeSimpleCaseFolding(rer, CharacterRange(A, B))。

Note 6

结果通常由多段范围组成。当 UnicodeSets 与 IgnoreCase 同为 true 时，MaybeSimpleCaseFolding(rer, [Ā-č]) 仅包含该范围中奇数序号代码点。

ClassSetOperand

ClassSetCharacter

令 A 为 ClassSetCharacter 的 CompileToCharSet，参数 rer。
返回 MaybeSimpleCaseFolding(rer, A)。

ClassSetOperand

ClassStringDisjunction

令 A 为 ClassStringDisjunction 的 CompileToCharSet，参数 rer。
返回 MaybeSimpleCaseFolding(rer, A)。

ClassSetOperand

NestedClass

返回 NestedClass 的 CompileToCharSet，参数 rer。

NestedClass

[

ClassContents

]

返回 ClassContents 的 CompileToCharSet，参数 rer。

NestedClass

ClassContents

]

令 A 为 ClassContents 的 CompileToCharSet，参数 rer。
返回 CharacterComplement(rer, A)。

NestedClass

CharacterClassEscape

返回 CharacterClassEscape 的 CompileToCharSet，参数 rer。

ClassStringDisjunction

\q{

ClassStringDisjunctionContents

}

返回 ClassStringDisjunctionContents 的 CompileToCharSet，参数 rer。

ClassStringDisjunctionContents

ClassString

令 s 为 ClassString 的 CompileClassSetString，参数 rer。
返回含唯一字符串 s 的 CharSet。

ClassStringDisjunctionContents

ClassString

ClassStringDisjunctionContents

令 s 为 ClassString 的 CompileClassSetString，参数 rer。
令 A 为含字符串 s 的 CharSet。
令 B 为 ClassStringDisjunctionContents 的 CompileToCharSet，参数 rer。
返回 A 与 B 的并集。

ClassSetCharacter

SourceCharacter

but not ClassSetSyntaxCharacter

CharacterEscape

ClassSetReservedPunctuator

令 cv 为此 ClassSetCharacter 的 CharacterValue。
令 c 为字符值为 cv 的字符。
返回含单字符 c 的 CharSet。

ClassSetCharacter

返回含单字符 U+0008 (BACKSPACE) 的 CharSet。

22.2.2.9.1 CharacterRange ( `A`, `B` )

The abstract operation CharacterRange takes arguments A (一个 CharSet) and B (一个 CharSet) and returns 一个 CharSet. It performs the following steps when called:

断言：A 与 B 各含恰一字符。
令 a 为 A 中该字符。
令 b 为 B 中该字符。
令 i 为 a 的字符值。
令 j 为 b 的字符值。
断言：i ≤ j。
返回含所有字符值位于 [i, j] 间（含）的字符集合的 CharSet。

22.2.2.9.2 HasEitherUnicodeFlag ( `rer` )

The abstract operation HasEitherUnicodeFlag takes argument rer (一个 RegExp 记录) and returns 布尔值. It performs the following steps when called:

若 rer.[[Unicode]] 为 true 或 rer.[[UnicodeSets]] 为 true，
1. 返回 true。
返回 false。

22.2.2.9.3 WordCharacters ( `rer` )

The abstract operation WordCharacters takes argument rer (一个 RegExp 记录) and returns 一个 CharSet. 返回一个包含被视为 \b、\B、\w、\W 之“单词字符”的 CharSet。 It performs the following steps when called:

令 basicWordChars 为含所有 ASCII 单词字符的 CharSet。
令 extraWordChars 为 CharSet，包含所有不在 basicWordChars 中但 Canonicalize(rer, c) 在其中的字符 c。
断言：除非 HasEitherUnicodeFlag(rer) 为 true 且 rer.[[IgnoreCase]] 为 true，否则 extraWordChars 为空。
返回 basicWordChars 与 extraWordChars 的并集。

22.2.2.9.4 AllCharacters ( `rer` )

The abstract operation AllCharacters takes argument rer (一个 RegExp 记录) and returns 一个 CharSet. 依据正则标志返回“全部字符”集合。 It performs the following steps when called:

若 rer.[[UnicodeSets]] 为 true 且 rer.[[IgnoreCase]] 为 true，则
1. 返回含所有 Unicode 代码点 c 且其 Simple Case Folding 映射为空（scf(c) = c）的 CharSet。
否则若 HasEitherUnicodeFlag(rer) 为 true，
1. 返回含所有代码点值的 CharSet。
否则，
1. 返回含所有代码单元值的 CharSet。

22.2.2.9.5 MaybeSimpleCaseFolding ( `rer`, `A` )

The abstract operation MaybeSimpleCaseFolding takes arguments rer (一个 RegExp 记录) and A (一个 CharSet) and returns 一个 CharSet. 若 rer.[[UnicodeSets]] 为 false 或 rer.[[IgnoreCase]] 为 false，返回 A。否则使用 Unicode 字符数据库文件 Simple Case Folding（scf(cp)）定义，对 A 中每个 CharSetElement 进行逐字符映射到规范形式并返回结果。 It performs the following steps when called:

若 rer.[[UnicodeSets]] 为 false 或 rer.[[IgnoreCase]] 为 false，返回 A。
令 B 为新空 CharSet。
对 A 的每个 CharSetElement s：
1. 令 t 为空字符序列。
2. 对 s 中每个单代码点 cp：
  1. 追加 scf(cp) 至 t。
3. 将 t 加入 B。
返回 B。

22.2.2.9.6 CharacterComplement ( `rer`, `S` )

The abstract operation CharacterComplement takes arguments rer (一个 RegExp 记录) and S (一个 CharSet) and returns 一个 CharSet. It performs the following steps when called:

令 A 为 AllCharacters(rer)。
返回含 A 中不在 S 中的 CharSetElement 的 CharSet。

22.2.2.9.7 UnicodeMatchProperty ( `rer`, `p` )

The abstract operation UnicodeMatchProperty takes arguments rer (一个 RegExp 记录) and p (ECMAScript 源文本) and returns 一个 Unicode 属性名. It performs the following steps when called:

如果 rer.[[UnicodeSets]] 为 true 并且 p 是 Table 69 的 “属性名” 列中列出的 Unicode 属性名，那么
1. 返回 Unicode 码点 p 的列表。
断言：p 是 Table 67 或 Table 68 的 “属性名及别名” 列中列出的 Unicode 属性名或属性别名。
令 c 为与 p 对应行 “规范属性名” 列中给出的规范属性名。
返回 Unicode 码点 c 的列表。

实现必须支持 Table 67、Table 68 和 Table 69 中列出的 Unicode 属性名及其别名。为保证互操作性，实现不得支持任何其他属性名或别名。

Note 1

例如，Script_Extensions（属性名）和 scx（属性别名）是有效的，但 script_extensions 或 Scx 不是。

Note 2

所列属性是 UTS18 RL1.2 要求的超集。

Note 3

这些表中的条目拼写（包括大小写）与 Unicode 字符数据库中的 PropertyAliases.txt 文件所用拼写一致。该文件中的精确拼写保证稳定。

Table 67: Non-binary Unicode property aliases and their canonical property names

Property name and aliases	Canonical property name
`General_Category`	`General_Category`
`gc`	`General_Category`
`Script`	`Script`
`sc`	`Script`
`Script_Extensions`	`Script_Extensions`
`scx`	`Script_Extensions`

Table 68: Binary Unicode property aliases and their canonical property names

Property name and aliases	Canonical property name
`ASCII`	`ASCII`
`ASCII_Hex_Digit`	`ASCII_Hex_Digit`
`AHex`	`ASCII_Hex_Digit`
`Alphabetic`	`Alphabetic`
`Alpha`	`Alphabetic`
`Any`	`Any`
`Assigned`	`Assigned`
`Bidi_Control`	`Bidi_Control`
`Bidi_C`	`Bidi_Control`
`Bidi_Mirrored`	`Bidi_Mirrored`
`Bidi_M`	`Bidi_Mirrored`
`Case_Ignorable`	`Case_Ignorable`
`CI`	`Case_Ignorable`
`Cased`	`Cased`
`Changes_When_Casefolded`	`Changes_When_Casefolded`
`CWCF`	`Changes_When_Casefolded`
`Changes_When_Casemapped`	`Changes_When_Casemapped`
`CWCM`	`Changes_When_Casemapped`
`Changes_When_Lowercased`	`Changes_When_Lowercased`
`CWL`	`Changes_When_Lowercased`
`Changes_When_NFKC_Casefolded`	`Changes_When_NFKC_Casefolded`
`CWKCF`	`Changes_When_NFKC_Casefolded`
`Changes_When_Titlecased`	`Changes_When_Titlecased`
`CWT`	`Changes_When_Titlecased`
`Changes_When_Uppercased`	`Changes_When_Uppercased`
`CWU`	`Changes_When_Uppercased`
`Dash`	`Dash`
`Default_Ignorable_Code_Point`	`Default_Ignorable_Code_Point`
`DI`	`Default_Ignorable_Code_Point`
`Deprecated`	`Deprecated`
`Dep`	`Deprecated`
`Diacritic`	`Diacritic`
`Dia`	`Diacritic`
`Emoji`	`Emoji`
`Emoji_Component`	`Emoji_Component`
`EComp`	`Emoji_Component`
`Emoji_Modifier`	`Emoji_Modifier`
`EMod`	`Emoji_Modifier`
`Emoji_Modifier_Base`	`Emoji_Modifier_Base`
`EBase`	`Emoji_Modifier_Base`
`Emoji_Presentation`	`Emoji_Presentation`
`EPres`	`Emoji_Presentation`
`Extended_Pictographic`	`Extended_Pictographic`
`ExtPict`	`Extended_Pictographic`
`Extender`	`Extender`
`Ext`	`Extender`
`Grapheme_Base`	`Grapheme_Base`
`Gr_Base`	`Grapheme_Base`
`Grapheme_Extend`	`Grapheme_Extend`
`Gr_Ext`	`Grapheme_Extend`
`Hex_Digit`	`Hex_Digit`
`Hex`	`Hex_Digit`
`IDS_Binary_Operator`	`IDS_Binary_Operator`
`IDSB`	`IDS_Binary_Operator`
`IDS_Trinary_Operator`	`IDS_Trinary_Operator`
`IDST`	`IDS_Trinary_Operator`
`ID_Continue`	`ID_Continue`
`IDC`	`ID_Continue`
`ID_Start`	`ID_Start`
`IDS`	`ID_Start`
`Ideographic`	`Ideographic`
`Ideo`	`Ideographic`
`Join_Control`	`Join_Control`
`Join_C`	`Join_Control`
`Logical_Order_Exception`	`Logical_Order_Exception`
`LOE`	`Logical_Order_Exception`
`Lowercase`	`Lowercase`
`Lower`	`Lowercase`
`Math`	`Math`
`Noncharacter_Code_Point`	`Noncharacter_Code_Point`
`NChar`	`Noncharacter_Code_Point`
`Pattern_Syntax`	`Pattern_Syntax`
`Pat_Syn`	`Pattern_Syntax`
`Pattern_White_Space`	`Pattern_White_Space`
`Pat_WS`	`Pattern_White_Space`
`Quotation_Mark`	`Quotation_Mark`
`QMark`	`Quotation_Mark`
`Radical`	`Radical`
`Regional_Indicator`	`Regional_Indicator`
`RI`	`Regional_Indicator`
`Sentence_Terminal`	`Sentence_Terminal`
`STerm`	`Sentence_Terminal`
`Soft_Dotted`	`Soft_Dotted`
`SD`	`Soft_Dotted`
`Terminal_Punctuation`	`Terminal_Punctuation`
`Term`	`Terminal_Punctuation`
`Unified_Ideograph`	`Unified_Ideograph`
`UIdeo`	`Unified_Ideograph`
`Uppercase`	`Uppercase`
`Upper`	`Uppercase`
`Variation_Selector`	`Variation_Selector`
`VS`	`Variation_Selector`
`White_Space`	`White_Space`
`space`	`White_Space`
`XID_Continue`	`XID_Continue`
`XIDC`	`XID_Continue`
`XID_Start`	`XID_Start`
`XIDS`	`XID_Start`

Table 69: Binary Unicode properties of strings

Property name
`Basic_Emoji`
`Emoji_Keycap_Sequence`
`RGI_Emoji_Modifier_Sequence`
`RGI_Emoji_Flag_Sequence`
`RGI_Emoji_Tag_Sequence`
`RGI_Emoji_ZWJ_Sequence`
`RGI_Emoji`

22.2.2.9.8 UnicodeMatchPropertyValue ( `p`, `v` )

The abstract operation UnicodeMatchPropertyValue takes arguments p (ECMAScript 源文本) and v (ECMAScript 源文本) and returns 一个 Unicode 属性值. It performs the following steps when called:

断言：p 是 Table 67 的 “规范属性名” 列中列出的规范、未别名的 Unicode 属性名。
断言：v 是 PropertyValueAliases.txt 中 Unicode 属性 p 的属性值或属性值别名。
令 value 为对应行 “规范属性值” 列中给出的 v 的规范属性值。
返回 Unicode 码点 value 的列表。

实现必须支持 Table 67 中列出属性的 PropertyValueAliases.txt 所列的 Unicode 属性值和属性值别名。为保证互操作性，实现不得支持任何其他属性值或属性值别名。

Note 1

例如，Xpeo 和 Old_Persian 是有效的 Script_Extensions 属性值，但 xpeo 和 Old Persian 不是。

Note 2

该算法与 UAX44 中符号值匹配规则不同：不会忽略大小写、空白字符、U+002D（连字符）和 U+005F（下划线），也不支持 Is 前缀。

22.2.2.10 运行时语义：CompileClassSetString

The syntax-directed operation 运行时语义：CompileClassSetString takes argument rer (一个 RegExp 记录) and returns 一个字符序列. It is defined piecewise over the following productions:

ClassString

[empty]

返回空字符序列。

ClassString

NonEmptyClassString

返回 NonEmptyClassString 的 CompileClassSetString，参数 rer。

NonEmptyClassString

ClassSetCharacter

NonEmptyClassString

opt

令 cs 为 ClassSetCharacter 的 CompileToCharSet，参数 rer。
令 s1 为 cs 的单个 CharSetElement 所对应的字符序列。
若 NonEmptyClassString 存在，则
1. 令 s2 为 NonEmptyClassString 的 CompileClassSetString，参数 rer。
2. 返回 s1 与 s2 的连接。
返回 s1。

22.2.3 用于创建 RegExp 的抽象操作（Abstract Operations for RegExp Creation）

22.2.3.1 RegExpCreate ( `P`, `F` )

The abstract operation RegExpCreate takes arguments P (一个 ECMAScript 语言值) and F (一个字符串或 undefined) and returns 包含一个对象的正常完成或抛出完成. It performs the following steps when called:

令 obj 为 ! RegExpAlloc(%RegExp%)。
返回 ? RegExpInitialize(obj, P, F)。

22.2.3.2 RegExpAlloc ( `newTarget` )

The abstract operation RegExpAlloc takes argument newTarget (一个构造函数) and returns 包含一个对象的正常完成或抛出完成. It performs the following steps when called:

令 obj 为 ? OrdinaryCreateFromConstructor(newTarget, "%RegExp.prototype%", « [[OriginalSource]], [[OriginalFlags]], [[RegExpRecord]], [[RegExpMatcher]] »)。
执行 ! DefinePropertyOrThrow(obj, "lastIndex", PropertyDescriptor { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false })。
返回 obj。

22.2.3.3 RegExpInitialize ( `obj`, `pattern`, `flags` )

The abstract operation RegExpInitialize takes arguments obj (一个对象), pattern (一个 ECMAScript 语言值), and flags (一个 ECMAScript 语言值) and returns 包含一个对象的正常完成或抛出完成. It performs the following steps when called:

如果 pattern 是 undefined，令 P 为空字符串。
否则，令 P 为 ? ToString(pattern)。
如果 flags 是 undefined，令 F 为空字符串。
否则，令 F 为 ? ToString(flags)。
如果 F 包含 "d"、"g"、"i"、"m"、"s"、"u"、"v"、"y" 之外的任一代码单元，或 F 中任一代码单元出现多于一次，抛出 SyntaxError 异常。
如果 F 包含 "i"，令 i 为 true；否则令 i 为 false。
如果 F 包含 "m"，令 m 为 true；否则令 m 为 false。
如果 F 包含 "s"，令 s 为 true；否则令 s 为 false。
如果 F 包含 "u"，令 u 为 true；否则令 u 为 false。
如果 F 包含 "v"，令 v 为 true；否则令 v 为 false。
如果 u 为 true 或 v 为 true，则
1. 令 patternText 为 StringToCodePoints(P)。
否则，
1. 令 patternText 为将 P 的每个 16 位元素按 Unicode BMP 代码点解释的结果。不会对这些元素应用 UTF-16 解码。
令 parseResult 为 ParsePattern(patternText, u, v)。
如果 parseResult 是一个非空 SyntaxError 对象列表，抛出 SyntaxError 异常。
断言：parseResult 是一个 Pattern 解析节点。
设 obj.[[OriginalSource]] 为 P。
设 obj.[[OriginalFlags]] 为 F。
令 capturingGroupsCount 为 CountLeftCapturingParensWithin(parseResult)。
令 rer 为 RegExp 记录 { [[IgnoreCase]]: i, [[Multiline]]: m, [[DotAll]]: s, [[Unicode]]: u, [[UnicodeSets]]: v, [[CapturingGroupsCount]]: capturingGroupsCount }。
设 obj.[[RegExpRecord]] 为 rer。
设 obj.[[RegExpMatcher]] 为 CompilePattern of parseResult with argument rer。
执行 ? Set(obj, "lastIndex", +0_𝔽, true)。
返回 obj。

22.2.3.4 静态语义：ParsePattern ( `patternText`, `u`, `v` )

The abstract operation 静态语义：ParsePattern takes arguments patternText (一个 Unicode 代码点序列), u (一个布尔值), and v (一个布尔值) and returns 一个解析节点或非空 SyntaxError 对象列表.

Note

本节在 B.1.2.9 中有补充。

It performs the following steps when called:

如果 v 为 true 且 u 为 true，则
1. 令 parseResult 为包含一个或多个 SyntaxError 对象的列表。
否则如果 v 为 true，则
1. 令 parseResult 为 ParseText(patternText, Pattern[+UnicodeMode, +UnicodeSetsMode, +NamedCaptureGroups])。
否则如果 u 为 true，则
1. 令 parseResult 为 ParseText(patternText, Pattern[+UnicodeMode, ~UnicodeSetsMode, +NamedCaptureGroups])。
否则，
1. 令 parseResult 为 ParseText(patternText, Pattern[~UnicodeMode, ~UnicodeSetsMode, +NamedCaptureGroups])。
返回 parseResult。

22.2.4 RegExp 构造函数（The RegExp Constructor）

RegExp 构造函数：

是 %RegExp%。
是全局对象 "RegExp" 属性的初始值。
作为构造函数调用时创建并初始化一个新的 RegExp 对象。
作为函数（而非构造函数）调用时，返回一个新的 RegExp 对象，或在唯一参数本身是 RegExp 对象时返回该参数。
可用作类定义 extends 子句的值。打算继承指定 RegExp 行为的子类构造函数必须包含对 RegExp 构造函数的 super 调用，以创建并初始化带有所需内部槽的子类实例。

22.2.4.1 RegExp ( `pattern`, `flags` )

调用该函数时执行以下步骤：

令 patternIsRegExp 为 ? IsRegExp(pattern)。
如果 NewTarget 为 undefined，则
1. 令 newTarget 为活动函数对象。
2. 如果 patternIsRegExp 为 true 且 flags 为 undefined，则
  1. 令 patternConstructor 为 ? Get(pattern, "constructor")。
  2. 如果 SameValue(newTarget, patternConstructor) 为 true，返回 pattern。
否则，
1. 令 newTarget 为 NewTarget。
如果 pattern 是一个对象且具有 [[RegExpMatcher]] 内部槽，则
1. 令 P 为 pattern.[[OriginalSource]]。
2. 如果 flags 是 undefined，令 F 为 pattern.[[OriginalFlags]]。
3. 否则，令 F 为 flags。
否则如果 patternIsRegExp 为 true，则
1. 令 P 为 ? Get(pattern, "source")。
2. 如果 flags 是 undefined，则
  1. 令 F 为 ? Get(pattern, "flags")。
3. 否则，
  1. 令 F 为 flags。
否则，
1. 令 P 为 pattern。
2. 令 F 为 flags。
令 O 为 ? RegExpAlloc(newTarget)。
返回 ? RegExpInitialize(O, P, F)。

Note

如果以 StringLiteral 形式传入 pattern，则在本函数处理该字符串之前会执行通常的转义序列替换。如果 pattern 必须包含一个转义序列才能被本函数识别，则 StringLiteral 中的任何 U+005C（反斜杠）代码点必须再被转义，以避免在形成 StringLiteral 内容时被移除。

22.2.5 RegExp 构造函数的属性（Properties of the RegExp Constructor）

RegExp 构造函数：

有一个 [[Prototype]] 内部槽，其值为 %Function.prototype%。
具有以下属性：

22.2.5.1 RegExp.escape ( `S` )

该函数返回 S 的一个拷贝，其中在正则表达式 Pattern 中可能具有特殊意义的字符已被等效转义序列替换。

调用时执行以下步骤：

如果 S 不是字符串，抛出 TypeError 异常。
令 escaped 为空字符串。
令 cpList 为 StringToCodePoints(S)。
对 cpList 中的每个代码点 cp，执行
1. 如果 escaped 为空字符串且 cp 被 DecimalDigit 或 AsciiLetter 匹配，则
  1. 注：转义前导数字确保输出对应的模式文本可在 \0 字符转义或 DecimalEscape（如 \1）之后使用并仍匹配 S，而不是被解释为前一转义序列的延伸。转义前导 ASCII 字母在 \c 后的情境中亦如此。
  2. 令 numericValue 为 cp 的数值。
  3. 令 hex 为 Number::toString(𝔽(numericValue), 16)。
  4. 断言：hex 的长度为 2。
  5. 设 escaped 为代码单元 0x005C (REVERSE SOLIDUS)、"x" 与 hex 的字符串拼接。
2. 否则，
  1. 设 escaped 为 escaped 与 EncodeForRegExpEscape(cp) 的字符串拼接。
返回 escaped。

Note

尽管名字相似，EscapeRegExpPattern 与 RegExp.escape 所做的事情并不相同。前者转义一个模式以便作为字符串表示，而此函数转义一个字符串以便在模式内部表示。

22.2.5.1.1 EncodeForRegExpEscape ( `cp` )

The abstract operation EncodeForRegExpEscape takes argument cp (一个代码点) and returns 一个字符串. 返回一个用于匹配 cp 的 Pattern 的字符串。如果 cp 是空白或 ASCII 标点，则返回值是一个转义序列；否则返回值是 cp 自身的字符串表示。 It performs the following steps when called:

如果 cp 被 SyntaxCharacter 匹配或 cp 是 U+002F (SOLIDUS)，则
1. 返回 0x005C (REVERSE SOLIDUS) 与 UTF16EncodeCodePoint(cp) 的字符串拼接。
否则如果 cp 是 Table 65 “Code Point” 列所列出的代码点，则
1. 返回 0x005C (REVERSE SOLIDUS) 与该行 “ControlEscape” 列中字符串的拼接。
令 otherPunctuators 为 ",-=<>#&!%:;@~'`" 与代码单元 0x0022 (QUOTATION MARK) 的字符串拼接。
令 toEscape 为 StringToCodePoints(otherPunctuators)。
如果 toEscape 包含 cp，或 cp 被 WhiteSpace 或 LineTerminator 匹配，或 cp 数值与前导代理或尾随代理相同，则
1. 令 cpNum 为 cp 的数值。
2. 如果 cpNum ≤ 0xFF，则
  1. 令 hex 为 Number::toString(𝔽(cpNum), 16)。
  2. 返回代码单元 0x005C (REVERSE SOLIDUS)、"x" 与 StringPad(hex, 2, "0", start) 的字符串拼接。
3. 令 escaped 为空字符串。
4. 令 codeUnits 为 UTF16EncodeCodePoint(cp)。
5. 对 codeUnits 中的每个代码单元 cu，执行
  1. 设 escaped 为 escaped 与 UnicodeEscape(cu) 的字符串拼接。
6. 返回 escaped。
返回 UTF16EncodeCodePoint(cp)。

22.2.5.2 RegExp.prototype

RegExp.prototype 的初始值是 RegExp 原型对象。

该属性具有 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }。

22.2.5.3 get RegExp [ %Symbol.species% ]

RegExp[%Symbol.species%] 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

返回 this 值。

此函数的 "name" 属性值为 "get [Symbol.species]"。

Note

RegExp 原型方法通常使用其 this 值的构造函数来创建派生对象。然而，子类构造函数可通过重新定义其 %Symbol.species% 属性来覆盖该默认行为。

22.2.6 RegExp 原型对象的属性（Properties of the RegExp Prototype Object）

RegExp 原型对象：

是 %RegExp.prototype%。
是一个普通对象。
不是 RegExp 实例，且没有 [[RegExpMatcher]] 内部槽或 RegExp 实例对象的其他内部槽。
有一个 [[Prototype]] 内部槽，其值为 %Object.prototype%。

Note

RegExp 原型对象自身没有 "valueOf" 属性；但它从 Object 原型对象继承 "valueOf" 属性。

22.2.6.1 RegExp.prototype.constructor

RegExp.prototype.constructor 的初始值是 %RegExp%。

22.2.6.2 RegExp.prototype.exec ( `string` )

该方法在 string 中搜索一次正则表达式模式的出现，并返回包含匹配结果的数组；若 string 未匹配则返回 null。

调用时执行以下步骤：

令 R 为 this 值。
执行 ? RequireInternalSlot(R, [[RegExpMatcher]])。
令 S 为 ? ToString(string)。
返回 ? RegExpBuiltinExec(R, S)。

22.2.6.3 get RegExp.prototype.dotAll

RegExp.prototype.dotAll 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0073 (LATIN SMALL LETTER S)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.4 get RegExp.prototype.flags

RegExp.prototype.flags 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
如果 R 不是对象，抛出 TypeError 异常。
令 codeUnits 为新的空列表。
令 hasIndices 为 ToBoolean(? Get(R, "hasIndices"))。
如果 hasIndices 为 true，将代码单元 0x0064 (LATIN SMALL LETTER D) 追加到 codeUnits。
令 global 为 ToBoolean(? Get(R, "global"))。
如果 global 为 true，将代码单元 0x0067 (LATIN SMALL LETTER G) 追加到 codeUnits。
令 ignoreCase 为 ToBoolean(? Get(R, "ignoreCase"))。
如果 ignoreCase 为 true，将代码单元 0x0069 (LATIN SMALL LETTER I) 追加到 codeUnits。
令 multiline 为 ToBoolean(? Get(R, "multiline"))。
如果 multiline 为 true，将代码单元 0x006D (LATIN SMALL LETTER M) 追加到 codeUnits。
令 dotAll 为 ToBoolean(? Get(R, "dotAll"))。
如果 dotAll 为 true，将代码单元 0x0073 (LATIN SMALL LETTER S) 追加到 codeUnits。
令 unicode 为 ToBoolean(? Get(R, "unicode"))。
如果 unicode 为 true，将代码单元 0x0075 (LATIN SMALL LETTER U) 追加到 codeUnits。
令 unicodeSets 为 ToBoolean(? Get(R, "unicodeSets"))。
如果 unicodeSets 为 true，将代码单元 0x0076 (LATIN SMALL LETTER V) 追加到 codeUnits。
令 sticky 为 ToBoolean(? Get(R, "sticky"))。
如果 sticky 为 true，将代码单元 0x0079 (LATIN SMALL LETTER Y) 追加到 codeUnits。
返回代码单元为 codeUnits 列表元素的字符串值。若 codeUnits 为空，则返回空字符串。

22.2.6.4.1 RegExpHasFlag ( `R`, `codeUnit` )

The abstract operation RegExpHasFlag takes arguments R (一个 ECMAScript 语言值) and codeUnit (一个代码单元) and returns 包含布尔值或 undefined 的正常完成或抛出完成. It performs the following steps when called:

如果 R 不是对象，抛出 TypeError 异常。
如果 R 没有 [[OriginalFlags]] 内部槽，则
1. 如果 SameValue(R, %RegExp.prototype%) 为 true，返回 undefined。
2. 否则，抛出 TypeError 异常。
令 flags 为 R.[[OriginalFlags]]。
如果 flags 包含 codeUnit，返回 true。
返回 false。

22.2.6.5 get RegExp.prototype.global

RegExp.prototype.global 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0067 (LATIN SMALL LETTER G)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.6 get RegExp.prototype.hasIndices

RegExp.prototype.hasIndices 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0064 (LATIN SMALL LETTER D)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.7 get RegExp.prototype.ignoreCase

RegExp.prototype.ignoreCase 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0069 (LATIN SMALL LETTER I)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.8 RegExp.prototype [ %Symbol.match% ] ( `string` )

调用该方法时执行以下步骤：

令 rx 为 this 值。
如果 rx 不是对象，抛出 TypeError 异常。
令 S 为 ? ToString(string)。
令 flags 为 ? ToString(? Get(rx, "flags"))。
如果 flags 不包含 "g"，则
1. 返回 ? RegExpExec(rx, S)。
否则，
1. 如果 flags 包含 "u" 或 flags 包含 "v"，令 fullUnicode 为 true；否则令 fullUnicode 为 false。
2. 执行 ? Set(rx, "lastIndex", +0_𝔽, true)。
3. 令 A 为 ! ArrayCreate(0)。
4. 令 n 为 0。
5. 重复，
  1. 令 result 为 ? RegExpExec(rx, S)。
  2. 如果 result 为 null，则
    1. 如果 n = 0，返回 null。
    2. 返回 A。
  3. 否则，
    1. 令 matchStr 为 ? ToString(? Get(result, "0"))。
    2. 执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(n)), matchStr)。
    3. 如果 matchStr 为空字符串，则
      1. 令 thisIndex 为 ℝ(? ToLength(? Get(rx, "lastIndex")))。
      2. 令 nextIndex 为 AdvanceStringIndex(S, thisIndex, fullUnicode)。
      3. 执行 ? Set(rx, "lastIndex", 𝔽(nextIndex), true)。
    4. 设 n 为 n + 1。

该方法的 "name" 属性值为 "[Symbol.match]"。

Note

%Symbol.match% 属性被 IsRegExp 抽象操作用于识别具有基本正则行为的对象。缺失 %Symbol.match% 属性或其值布尔化结果不为 true 表示该对象不打算作为正则表达式对象使用。

22.2.6.9 RegExp.prototype [ %Symbol.matchAll% ] ( `string` )

调用该方法时执行以下步骤：

令 R 为 this 值。
如果 R 不是对象，抛出 TypeError 异常。
令 S 为 ? ToString(string)。
令 C 为 ? SpeciesConstructor(R, %RegExp%)。
令 flags 为 ? ToString(? Get(R, "flags"))。
令 matcher 为 ? Construct(C, « R, flags »)。
令 lastIndex 为 ? ToLength(? Get(R, "lastIndex"))。
执行 ? Set(matcher, "lastIndex", lastIndex, true)。
如果 flags 包含 "g"，令 global 为 true。
否则，令 global 为 false。
如果 flags 包含 "u" 或 flags 包含 "v"，令 fullUnicode 为 true。
否则，令 fullUnicode 为 false。
返回 CreateRegExpStringIterator(matcher, S, global, fullUnicode)。

该方法的 "name" 属性值为 "[Symbol.matchAll]"。

22.2.6.10 get RegExp.prototype.multiline

RegExp.prototype.multiline 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x006D (LATIN SMALL LETTER M)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.11 RegExp.prototype [ %Symbol.replace% ] ( `string`, `replaceValue` )

调用该方法时执行以下步骤：

令 rx 为 this 值。
如果 rx 不是对象，抛出 TypeError 异常。
令 S 为 ? ToString(string)。
令 lengthS 为 S 的长度。
令 functionalReplace 为 IsCallable(replaceValue)。
如果 functionalReplace 为 false，则
1. 将 replaceValue 设为 ? ToString(replaceValue)。
令 flags 为 ? ToString(? Get(rx, "flags"))。
如果 flags 包含 "g"，令 global 为 true；否则令 global 为 false。
如果 global 为 true，则
1. 执行 ? Set(rx, "lastIndex", +0_𝔽, true)。
令 results 为新的空列表。
令 done 为 false。
当 done 为 false 时重复，
1. 令 result 为 ? RegExpExec(rx, S)。
2. 如果 result 为 null，则
  1. 设 done 为 true。
3. 否则，
  1. 将 result 追加到 results。
  2. 如果 global 为 false，则
    1. 设 done 为 true。
  3. 否则，
    1. 令 matchStr 为 ? ToString(? Get(result, "0"))。
    2. 如果 matchStr 为空字符串，则
      1. 令 thisIndex 为 ℝ(? ToLength(? Get(rx, "lastIndex")))。
      2. 如果 flags 包含 "u" 或 flags 包含 "v"，令 fullUnicode 为 true；否则令 fullUnicode 为 false。
      3. 令 nextIndex 为 AdvanceStringIndex(S, thisIndex, fullUnicode)。
      4. 执行 ? Set(rx, "lastIndex", 𝔽(nextIndex), true)。
令 accumulatedResult 为空字符串。
令 nextSourcePosition 为 0。
对 results 中的每个元素 result，执行
1. 令 resultLength 为 ? LengthOfArrayLike(result)。
2. 令 nCaptures 为 max(resultLength - 1, 0)。
3. 令 matched 为 ? ToString(? Get(result, "0"))。
4. 令 matchLength 为 matched 的长度。
5. 令 position 为 ? ToIntegerOrInfinity(? Get(result, "index"))。
6. 将 position 钳制到 0 与 lengthS 之间。
7. 令 captures 为新的空列表。
8. 令 n 为 1。
9. 当 n ≤ nCaptures 时重复，
  1. 令 capN 为 ? Get(result, ! ToString(𝔽(n)))。
  2. 如果 capN 不为 undefined，则
    1. 设 capN 为 ? ToString(capN)。
  3. 将 capN 追加到 captures。
  4. 注：当 n = 1 时，上一步将首元素放入 captures（索引 0）。更一般地，第 n 个捕获位于 captures[n - 1]。
  5. 设 n 为 n + 1。
10. 令 namedCaptures 为 ? Get(result, "groups")。
11. 如果 functionalReplace 为 true，则
  1. 令 replacerArgs 为 « matched » 与 captures 以及 « 𝔽(position), S » 的列表拼接。
  2. 如果 namedCaptures 不为 undefined，则
    1. 将 namedCaptures 追加到 replacerArgs。
  3. 令 replacementValue 为 ? Call(replaceValue, undefined, replacerArgs)。
  4. 令 replacementString 为 ? ToString(replacementValue)。
12. 否则，
  1. 如果 namedCaptures 不为 undefined，则
    1. 设 namedCaptures 为 ? ToObject(namedCaptures)。
  2. 令 replacementString 为 ? GetSubstitution(matched, S, position, captures, namedCaptures, replaceValue)。
13. 如果 position ≥ nextSourcePosition，则
  1. 注：position 通常不应后移；若后移，说明存在行为不良的 RegExp 子类或通过访问触发副作用更改 rx 的 global 标志或其他特性。在此类情况下，对应替换被忽略。
  2. 设 accumulatedResult 为 accumulatedResult、S 的子串（从 nextSourcePosition 到 position）、replacementString 的串联。
  3. 设 nextSourcePosition 为 position + matchLength。
如果 nextSourcePosition ≥ lengthS，返回 accumulatedResult。
返回 accumulatedResult 与 S 自 nextSourcePosition 起子串的拼接。

该方法的 "name" 属性值为 "[Symbol.replace]"。

22.2.6.12 RegExp.prototype [ %Symbol.search% ] ( `string` )

调用该方法时执行以下步骤：

令 rx 为 this 值。
如果 rx 不是对象，抛出 TypeError 异常。
令 S 为 ? ToString(string)。
令 previousLastIndex 为 ? Get(rx, "lastIndex")。
如果 previousLastIndex 不是 +0_𝔽，则
1. 执行 ? Set(rx, "lastIndex", +0_𝔽, true)。
令 result 为 ? RegExpExec(rx, S)。
令 currentLastIndex 为 ? Get(rx, "lastIndex")。
如果 SameValue(currentLastIndex, previousLastIndex) 为 false，则
1. 执行 ? Set(rx, "lastIndex", previousLastIndex, true)。
如果 result 为 null，返回 -1_𝔽。
返回 ? Get(result, "index")。

该方法的 "name" 属性值为 "[Symbol.search]"。

Note

执行搜索时忽略该 RegExp 对象的 "lastIndex" 与 "global" 属性。"lastIndex" 属性保持不变。

22.2.6.13 get RegExp.prototype.source

RegExp.prototype.source 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
如果 R 不是对象，抛出 TypeError 异常。
如果 R 没有 [[OriginalSource]] 内部槽，则
1. 如果 SameValue(R, %RegExp.prototype%) 为 true，返回 "(?:)"。
2. 否则，抛出 TypeError 异常。
断言：R 具有 [[OriginalFlags]] 内部槽。
令 src 为 R.[[OriginalSource]]。
令 flags 为 R.[[OriginalFlags]]。
返回 EscapeRegExpPattern(src, flags)。

22.2.6.13.1 EscapeRegExpPattern ( `P`, `F` )

The abstract operation EscapeRegExpPattern takes arguments P (一个字符串) and F (一个字符串) and returns 一个字符串. It performs the following steps when called:

如果 F 包含 "v"，则
1. 令 patternSymbol 为 Pattern[+UnicodeMode, +UnicodeSetsMode]。
否则如果 F 包含 "u"，则
1. 令 patternSymbol 为 Pattern[+UnicodeMode, ~UnicodeSetsMode]。
否则，
1. 令 patternSymbol 为 Pattern[~UnicodeMode, ~UnicodeSetsMode]。
令 S 为一种 patternSymbol 形式的字符串，该字符串等价于将 P 按 UTF-16 编码的 Unicode 代码点解释（6.1.4）后对特定代码点按下述规则转义得到的结果。S 可能与 P 相同或不同；但对 S 作为 patternSymbol 求值得到的抽象闭包必须与构造对象 [[RegExpMatcher]] 内部槽给出的抽象闭包行为一致。对同一 P 与 F 的多次调用必须产生相同结果。
模式中出现的 / 或任何 LineTerminator 必须在 S 中按需转义，以确保 "/"、S、"/" 与 F 的字符串拼接（在适当词法上下文中）可被解析为与所构造正则表达式行为一致的 RegularExpressionLiteral。例如，如果 P 是 "/"，则 S 可以是 "\/" 或 "\u002F" 等，但不能是 "/"，因为 /// 后随 F 会被解析为 SingleLineComment 而非 RegularExpressionLiteral。如果 P 是空字符串，本规范可通过让 S 为 "(?:)" 来满足。
返回 S。

Note

尽管名字相似，RegExp.escape 与 EscapeRegExpPattern 所做事情不同。前者转义一个字符串以用于模式内部，后者转义一个模式以作为字符串表示。

22.2.6.14 RegExp.prototype [ %Symbol.split% ] ( `string`, `limit` )

Note 1

该方法返回一个数组，数组中存放将 string 转为字符串结果的各子串。子串通过自左向右搜索 this 值正则表达式的匹配来确定；这些出现不属于返回数组中任何字符串，而是用来分割字符串值。

this 值可以是一个空正则或一个可匹配空字符串的正则。在此情况下，正则不会匹配输入字符串开头或末尾的空 substring，也不会匹配前一次分隔符匹配末尾的空 substring。（例如，如果正则匹配空字符串，则字符串被拆分为单个代码单元；结果数组长度等于字符串长度，每个 substring 包含一个代码单元。）在给定索引处只考虑第一次匹配，即便回溯可能在该索引产生非空 substring。（例如，/a*?/[Symbol.split]("ab") 结果为 ["a", "b"]，而 /a*/[Symbol.split]("ab") 结果为 ["","b"]。）

如果 string 是（或转换为）空字符串，结果取决于正则是否能匹配空字符串。若能，结果数组无元素；否则结果数组含一个元素，即空字符串。

如果正则包含捕获括号，则每次匹配 separator 时捕获括号的结果（包括 undefined）会被拼接进输出数组。例如：

/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")

计算结果为数组

["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]

如果 limit 不为 undefined，则输出数组截断为不超过 limit 个元素。

调用该方法时执行以下步骤：

令 rx 为 this 值。
如果 rx 不是对象，抛出 TypeError 异常。
令 S 为 ? ToString(string)。
令 C 为 ? SpeciesConstructor(rx, %RegExp%)。
令 flags 为 ? ToString(? Get(rx, "flags"))。
如果 flags 包含 "u" 或 flags 包含 "v"，令 unicodeMatching 为 true。
否则，令 unicodeMatching 为 false。
如果 flags 包含 "y"，令 newFlags 为 flags。
否则，令 newFlags 为 flags 与 "y" 的串联。
令 splitter 为 ? Construct(C, « rx, newFlags »)。
令 A 为 ! ArrayCreate(0)。
令 lengthA 为 0。
如果 limit 是 undefined，令 lim 为 2³² - 1；否则令 lim 为 ℝ(? ToUint32(limit))。
如果 lim = 0，返回 A。
如果 S 是空字符串，则
1. 令 z 为 ? RegExpExec(splitter, S)。
2. 如果 z 不为 null，返回 A。
3. 执行 ! CreateDataPropertyOrThrow(A, "0", S)。
4. 返回 A。
令 size 为 S 的长度。
令 p 为 0。
令 q 为 p。
当 q < size 时重复，
1. 执行 ? Set(splitter, "lastIndex", 𝔽(q), true)。
2. 令 z 为 ? RegExpExec(splitter, S)。
3. 如果 z 为 null，则
  1. 设 q 为 AdvanceStringIndex(S, q, unicodeMatching)。
4. 否则，
  1. 令 e 为 ℝ(? ToLength(? Get(splitter, "lastIndex")))。
  2. 设 e 为 min(e, size)。
  3. 如果 e = p，则
    1. 设 q 为 AdvanceStringIndex(S, q, unicodeMatching)。
  4. 否则，
    1. 令 T 为 S 从 p 到 q 的子串。
    2. 执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T)。
    3. 设 lengthA 为 lengthA + 1。
    4. 如果 lengthA = lim，返回 A。
    5. 设 p 为 e。
    6. 令 numberOfCaptures 为 ? LengthOfArrayLike(z)。
    7. 设 numberOfCaptures 为 max(numberOfCaptures - 1, 0)。
    8. 令 i 为 1。
    9. 当 i ≤ numberOfCaptures 时重复，
      1. 令 nextCapture 为 ? Get(z, ! ToString(𝔽(i)))。
      2. 执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), nextCapture)。
      3. 设 i 为 i + 1。
      4. 设 lengthA 为 lengthA + 1。
      5. 如果 lengthA = lim，返回 A。
    10. 设 q 为 p。
令 T 为 S 从 p 到 size 的子串。
执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(lengthA)), T)。
返回 A。

该方法的 "name" 属性值为 "[Symbol.split]"。

Note 2

该方法忽略此 RegExp 对象的 "global" 与 "sticky" 属性的值。

22.2.6.15 get RegExp.prototype.sticky

RegExp.prototype.sticky 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0079 (LATIN SMALL LETTER Y)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.16 RegExp.prototype.test ( `S` )

调用该方法时执行以下步骤：

令 R 为 this 值。
如果 R 不是对象，抛出 TypeError 异常。
令 string 为 ? ToString(S)。
令 match 为 ? RegExpExec(R, string)。
如果 match 不为 null，返回 true；否则返回 false。

22.2.6.17 RegExp.prototype.toString ( )

令 R 为 this 值。
如果 R 不是对象，抛出 TypeError 异常。
令 pattern 为 ? ToString(? Get(R, "source"))。
令 flags 为 ? ToString(? Get(R, "flags"))。
令 result 为 "/"、pattern、"/"、flags 的字符串拼接。
返回 result。

Note

返回的字符串形式为 RegularExpressionLiteral，其求值得到的另一个 RegExp 对象与此对象行为相同。

22.2.6.18 get RegExp.prototype.unicode

RegExp.prototype.unicode 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0075 (LATIN SMALL LETTER U)。
返回 ? RegExpHasFlag(R, cu)。

22.2.6.19 get RegExp.prototype.unicodeSets

RegExp.prototype.unicodeSets 是一个存取器属性，其 set 访问器函数为 undefined。其 get 访问器函数被调用时执行以下步骤：

令 R 为 this 值。
令 cu 为代码单元 0x0076 (LATIN SMALL LETTER V)。
返回 ? RegExpHasFlag(R, cu)。

22.2.7 用于 RegExp 匹配的抽象操作（Abstract Operations for RegExp Matching）

22.2.7.1 RegExpExec ( `R`, `S` )

The abstract operation RegExpExec takes arguments R (一个对象) and S (一个字符串) and returns 包含对象或 null 的正常完成或抛出完成. It performs the following steps when called:

令 exec 为 ? Get(R, "exec")。
如果 IsCallable(exec) 为 true，则
1. 令 result 为 ? Call(exec, R, « S »)。
2. 如果 result 不是对象且不为 null，抛出 TypeError 异常。
3. 返回 result。
执行 ? RequireInternalSlot(R, [[RegExpMatcher]])。
返回 ? RegExpBuiltinExec(R, S)。

Note

如果未找到可调用的 "exec" 属性，本算法会回退到使用内置的 RegExp 匹配算法。这为旧版本编写的代码提供兼容行为——在那些版本中，大多数使用正则的内置算法不会对 "exec" 进行动态属性查找。

22.2.7.2 RegExpBuiltinExec ( `R`, `S` )

The abstract operation RegExpBuiltinExec takes arguments R (一个已初始化的 RegExp 实例) and S (一个字符串) and returns 包含数组特异对象或 null 的正常完成或抛出完成. It performs the following steps when called:

令 length 为 S 的长度。
令 lastIndex 为 ℝ(? ToLength(! Get(R, "lastIndex")))。
令 flags 为 R.[[OriginalFlags]]。
如果 flags 包含 "g"，令 global 为 true；否则令 global 为 false。
如果 flags 包含 "y"，令 sticky 为 true；否则令 sticky 为 false。
如果 flags 包含 "d"，令 hasIndices 为 true；否则令 hasIndices 为 false。
如果 global 为 false 且 sticky 为 false，设 lastIndex 为 0。
令 matcher 为 R.[[RegExpMatcher]]。
如果 flags 包含 "u" 或 flags 包含 "v"，令 fullUnicode 为 true；否则令 fullUnicode 为 false。
令 matchSucceeded 为 false。
如果 fullUnicode 为 true，令 input 为 StringToCodePoints(S)；否则令 input 为一个列表，其元素为 S 的代码单元。
注：input 的每个元素都被视为一个字符。
当 matchSucceeded 为 false 时重复，
1. 如果 lastIndex > length，则
  1. 如果 global 为 true 或 sticky 为 true，则
    1. 执行 ? Set(R, "lastIndex", +0_𝔽, true)。
  2. 返回 null。
2. 令 inputIndex 为对应 S 第 lastIndex 个元素的字符在 input 中的索引。
3. 令 r 为 matcher(input, inputIndex)。
4. 如果 r 为 failure，则
  1. 如果 sticky 为 true，则
    1. 执行 ? Set(R, "lastIndex", +0_𝔽, true)。
    2. 返回 null。
  2. 设 lastIndex 为 AdvanceStringIndex(S, lastIndex, fullUnicode)。
5. 否则，
  1. 断言：r 是 MatchState。
  2. 设 matchSucceeded 为 true。
令 e 为 r.[[EndIndex]]。
如果 fullUnicode 为 true，设 e 为 GetStringIndex(S, e)。
如果 global 为 true 或 sticky 为 true，则
1. 执行 ? Set(R, "lastIndex", 𝔽(e), true)。
令 n 为 r.[[Captures]] 中的元素数量。
断言：n = R.[[RegExpRecord]].[[CapturingGroupsCount]]。
断言：n < 2³² - 1。
令 A 为 ! ArrayCreate(n + 1)。
断言：A 的 "length" 属性的数学值为 n + 1。
执行 ! CreateDataPropertyOrThrow(A, "index", 𝔽(lastIndex))。
执行 ! CreateDataPropertyOrThrow(A, "input", S)。
令 match 为 Match 记录 { [[StartIndex]]: lastIndex, [[EndIndex]]: e }。
令 indices 为新的空列表。
令 groupNames 为新的空列表。
将 match 追加到 indices。
令 matchedSubstr 为 GetMatchString(S, match)。
执行 ! CreateDataPropertyOrThrow(A, "0", matchedSubstr)。
如果 R 包含任意 GroupName，则
1. 令 groups 为 OrdinaryObjectCreate(null)。
2. 令 hasGroups 为 true。
否则，
1. 令 groups 为 undefined。
2. 令 hasGroups 为 false。
执行 ! CreateDataPropertyOrThrow(A, "groups", groups)。
令 matchedGroupNames 为新的空列表。
对每个整数 i（1 ≤ i ≤ n，升序），执行
1. 令 captureI 为 r.[[Captures]] 的第 i 个元素。
2. 如果 captureI 为 undefined，则
  1. 令 capturedValue 为 undefined。
  2. 将 undefined 追加到 indices。
3. 否则，
  1. 令 captureStart 为 captureI.[[StartIndex]]。
  2. 令 captureEnd 为 captureI.[[EndIndex]]。
  3. 如果 fullUnicode 为 true，则
    1. 设 captureStart 为 GetStringIndex(S, captureStart)。
    2. 设 captureEnd 为 GetStringIndex(S, captureEnd)。
  4. 令 capture 为 Match 记录 { [[StartIndex]]: captureStart, [[EndIndex]]: captureEnd }。
  5. 令 capturedValue 为 GetMatchString(S, capture)。
  6. 将 capture 追加到 indices。
4. 执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), capturedValue)。
5. 如果 R 的第 i 个捕获是用 GroupName 定义的，则
  1. 令 s 为该 GroupName 的 CapturingGroupName。
  2. 如果 matchedGroupNames 包含 s，则
    1. 断言：capturedValue 为 undefined。
    2. 将 undefined 追加到 groupNames。
  3. 否则，
    1. 如果 capturedValue 不为 undefined，将 s 追加到 matchedGroupNames。
    2. 注：若多个组名为 s，groups 此时可能已有 s 属性；但由于 groups 是所有属性皆可写的数据属性的普通对象，CreateDataPropertyOrThrow 调用仍保证成功。
    3. 执行 ! CreateDataPropertyOrThrow(groups, s, capturedValue)。
    4. 将 s 追加到 groupNames。
6. 否则，
  1. 将 undefined 追加到 groupNames。
如果 hasIndices 为 true，则
1. 令 indicesArray 为 MakeMatchIndicesIndexPairArray(S, indices, groupNames, hasGroups)。
2. 执行 ! CreateDataPropertyOrThrow(A, "indices", indicesArray)。
返回 A。

22.2.7.3 AdvanceStringIndex ( `S`, `index`, `unicode` )

The abstract operation AdvanceStringIndex takes arguments S (一个字符串), index (一个非负整数), and unicode (一个布尔值) and returns 一个整数. It performs the following steps when called:

断言：index ≤ 2⁵³ - 1。
如果 unicode 为 false，返回 index + 1。
令 length 为 S 的长度。
如果 index + 1 ≥ length，返回 index + 1。
令 cp 为 CodePointAt(S, index)。
返回 index + cp.[[CodeUnitCount]]。

22.2.7.4 GetStringIndex ( `S`, `codePointIndex` )

The abstract operation GetStringIndex takes arguments S (一个字符串) and codePointIndex (一个非负整数) and returns 一个非负整数. 按 6.1.4 将 S 解释为 UTF-16 编码代码点序列，返回与代码点索引 codePointIndex 对应的代码单元索引（若存在）。否则返回 S 的长度。 It performs the following steps when called:

如果 S 是空字符串，返回 0。
令 len 为 S 的长度。
令 codeUnitCount 为 0。
令 codePointCount 为 0。
当 codeUnitCount < len 时重复，
1. 如果 codePointCount = codePointIndex，返回 codeUnitCount。
2. 令 cp 为 CodePointAt(S, codeUnitCount)。
3. 设 codeUnitCount 为 codeUnitCount + cp.[[CodeUnitCount]]。
4. 设 codePointCount 为 codePointCount + 1。
返回 len。

22.2.7.5 匹配记录（Match Records）

Match 记录是用于封装正则匹配或捕获起止索引的记录值。

Match 记录具有 Table 70 中列出的字段。

Table 70: Match 记录字段

字段名	值	含义
`[[StartIndex]]`	一个非负整数	从字符串起始处开始（含）匹配开始的代码单元数量。
`[[EndIndex]]`	一个 ≥ `[[StartIndex]]` 的整数	从字符串起始处开始匹配结束（不含）位置的代码单元数量。

22.2.7.6 GetMatchString ( `S`, `match` )

The abstract operation GetMatchString takes arguments S (一个字符串) and match (一个 Match 记录) and returns 一个字符串. It performs the following steps when called:

断言：match.[[StartIndex]] ≤ match.[[EndIndex]] ≤ S 的长度。
返回 S 中从 match.[[StartIndex]] 到 match.[[EndIndex]] 的子串。

22.2.7.7 GetMatchIndexPair ( `S`, `match` )

The abstract operation GetMatchIndexPair takes arguments S (一个字符串) and match (一个 Match 记录) and returns 一个数组. It performs the following steps when called:

断言：match.[[StartIndex]] ≤ match.[[EndIndex]] ≤ S 的长度。
返回 CreateArrayFromList(« 𝔽(match.[[StartIndex]]), 𝔽(match.[[EndIndex]]) »)。

22.2.7.8 MakeMatchIndicesIndexPairArray ( `S`, `indices`, `groupNames`, `hasGroups` )

The abstract operation MakeMatchIndicesIndexPairArray takes arguments S (一个字符串), indices (一个由 Match 记录或 undefined 组成的列表), groupNames (一个由字符串或 undefined 组成的列表), and hasGroups (一个布尔值) and returns 一个数组. It performs the following steps when called:

令 n 为 indices 的元素数量。
断言：n < 2³² - 1。
断言：groupNames 有 n - 1 个元素。
注：groupNames 列表的元素与 indices 列表自 indices[1] 起对齐。
令 A 为 ! ArrayCreate(n)。
如果 hasGroups 为 true，则
1. 令 groups 为 OrdinaryObjectCreate(null)。
否则，
1. 令 groups 为 undefined。
执行 ! CreateDataPropertyOrThrow(A, "groups", groups)。
对每个整数 i（0 ≤ i < n，升序），执行
1. 令 matchIndices 为 indices[i]。
2. 如果 matchIndices 不为 undefined，则
  1. 令 matchIndexPair 为 GetMatchIndexPair(S, matchIndices)。
3. 否则，
  1. 令 matchIndexPair 为 undefined。
4. 执行 ! CreateDataPropertyOrThrow(A, ! ToString(𝔽(i)), matchIndexPair)。
5. 如果 i > 0，则
  1. 令 s 为 groupNames[i - 1]。
  2. 如果 s 不为 undefined，则
    1. 断言：groups 不为 undefined。
    2. 注：若多个组名为 s，groups 此时可能已有 s 属性；但由于 groups 是所有属性皆可写的数据属性的普通对象，CreateDataPropertyOrThrow 调用仍保证成功。
    3. 执行 ! CreateDataPropertyOrThrow(groups, s, matchIndexPair)。
返回 A。

22.2.8 RegExp 实例的属性（Properties of RegExp Instances）

RegExp 实例是普通对象，从 RegExp 原型对象继承属性。RegExp 实例具有内部槽 [[OriginalSource]]、[[OriginalFlags]]、[[RegExpRecord]] 与 [[RegExpMatcher]]。[[RegExpMatcher]] 内部槽的值是该 RegExp 对象 Pattern 的抽象闭包表示。

Note

在 ECMAScript 2015 之前，RegExp 实例被指定为拥有自身数据属性 "source"、"global"、"ignoreCase" 与 "multiline"。这些属性现在被指定为 RegExp.prototype 的存取器属性。

RegExp 实例还具有以下属性：

22.2.8.1 lastIndex

"lastIndex" 属性的值指定下一次匹配开始的字符串索引。使用时将其强制转换为整数 Number（参见 22.2.7.2）。此属性具有 { [[Writable]]: true, [[Enumerable]]: false, [[Configurable]]: false }。

22.2.9 RegExp 字符串迭代器对象

RegExp String Iterator 是一个对象，表示针对某个特定 String 实例对象、以某个特定 RegExp 实例对象进行匹配时的特定迭代过程。RegExp String Iterator 对象没有命名构造函数；相反，它们通过调用某些 RegExp 实例对象的方法创建。

22.2.9.1 CreateRegExpStringIterator ( `R`, `S`, `global`, `fullUnicode` )

The abstract operation CreateRegExpStringIterator takes arguments R (一个对象), S (一个字符串), global (一个布尔值), and fullUnicode (一个布尔值) and returns 一个对象. It performs the following steps when called:

令 iterator 为 OrdinaryObjectCreate(%RegExpStringIteratorPrototype%, « [[IteratingRegExp]], [[IteratedString]], [[Global]], [[Unicode]], [[Done]] »)。
设 iterator.[[IteratingRegExp]] 为 R。
设 iterator.[[IteratedString]] 为 S。
设 iterator.[[Global]] 为 global。
设 iterator.[[Unicode]] 为 fullUnicode。
设 iterator.[[Done]] 为 false。
返回 iterator。

22.2.9.2 %RegExpStringIteratorPrototype% 对象

%RegExpStringIteratorPrototype% 对象：

具有所有 RegExp String Iterator 对象继承的属性。
是一个普通对象。
有一个 [[Prototype]] 内部槽，其值为 %Iterator.prototype%。
具有以下属性：

22.2.9.2.1 %RegExpStringIteratorPrototype%.next ( )

令 O 为 this 值。
若 O 不是对象，抛出 TypeError 异常。
若 O 不具有 RegExp String Iterator 对象实例的全部内部槽（见 22.2.9.3），抛出 TypeError 异常。
若 O.[[Done]] 为 true，则
1. 返回 CreateIteratorResultObject(undefined, true)。
令 R 为 O.[[IteratingRegExp]]。
令 S 为 O.[[IteratedString]]。
令 global 为 O.[[Global]]。
令 fullUnicode 为 O.[[Unicode]]。
令 match 为 ? RegExpExec(R, S)。
若 match 为 null，则
1. 设 O.[[Done]] 为 true。
2. 返回 CreateIteratorResultObject(undefined, true)。
若 global 为 false，则
1. 设 O.[[Done]] 为 true。
2. 返回 CreateIteratorResultObject(match, false)。
令 matchStr 为 ? ToString(? Get(match, "0"))。
若 matchStr 为空字符串，则
1. 令 thisIndex 为 ℝ(? ToLength(? Get(R, "lastIndex")))。
2. 令 nextIndex 为 AdvanceStringIndex(S, thisIndex, fullUnicode)。
3. 执行 ? Set(R, "lastIndex", 𝔽(nextIndex), true)。
返回 CreateIteratorResultObject(match, false)。

22.2.9.2.2 %RegExpStringIteratorPrototype% [ %Symbol.toStringTag% ]

%Symbol.toStringTag% 属性的初始值为字符串 "RegExp String Iterator"。

该属性具有 { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }。

22.2.9.3 RegExp String Iterator 实例的属性

RegExp String Iterator 实例是普通对象，从内建对象 %RegExpStringIteratorPrototype% 继承属性。RegExp String Iterator 实例最初以 Table 71 中列出的内部槽创建。

Table 71: RegExp String Iterator 实例的内部槽

内部槽	类型	描述
`[[IteratingRegExp]]`	an Object	用于迭代的正则表达式。IsRegExp(`[[IteratingRegExp]]`) 初始为 true。
`[[IteratedString]]`	a String	当前被迭代的字符串值。
`[[Global]]`	a Boolean	指示 `[[IteratingRegExp]]` 是否为全局模式。
`[[Unicode]]`	a Boolean	指示 `[[IteratingRegExp]]` 是否处于 Unicode 模式。
`[[Done]]`	a Boolean	指示迭代是否完成。

22 文本处理

22.1 String 对象

22.1.1 String 构造函数

22.1.1.1 String ( value )

22.1.2 String 构造函数的属性

22.1.2.1 String.fromCharCode ( ...codeUnits )

22.1.2.2 String.fromCodePoint ( ...codePoints )

22.1.2.3 String.prototype

22.1.2.4 String.raw ( template, ...substitutions )

22.1.3 String 原型对象的属性

22.1.3.1 String.prototype.at ( index )

22.1.3.2 String.prototype.charAt ( pos )

22.1.3.3 String.prototype.charCodeAt ( pos )

22.1.3.4 String.prototype.codePointAt ( pos )

22.1.3.5 String.prototype.concat ( ...args )

22.1.3.6 String.prototype.constructor

22.1.3.7 String.prototype.endsWith ( searchString [ , endPosition ] )

22.1.3.8 String.prototype.includes ( searchString [ , position ] )

22.1.3.9 String.prototype.indexOf ( searchString [ , position ] )

22.1.3.10 String.prototype.isWellFormed ( )

22.1.3.11 String.prototype.lastIndexOf ( searchString [ , position ] )

22.1.3.12 String.prototype.localeCompare ( that [ , reserved1 [ , reserved2 ] ] )

22.1.3.13 String.prototype.match ( regexp )

22.1.3.14 String.prototype.matchAll ( regexp )

22.1.3.15 String.prototype.normalize ( [ form ] )

22.1.3.16 String.prototype.padEnd ( maxLength [ , fillString ] )

22.1.3.17 String.prototype.padStart ( maxLength [ , fillString ] )

22.1.3.17.1 StringPaddingBuiltinsImpl ( O, maxLength, fillString, placement )

22.1.3.17.2 StringPad ( S, maxLength, fillString, placement )

22.1.3.17.3 ToZeroPaddedDecimalString ( n, minLength )

22.1.3.18 String.prototype.repeat ( count )

22.1.3.19 String.prototype.replace ( searchValue, replaceValue )

22.1.3.19.1 GetSubstitution ( matched, str, position, captures, namedCaptures, replacementTemplate )

22.1.3.20 String.prototype.replaceAll ( searchValue, replaceValue )

22.1.3.21 String.prototype.search ( regexp )

22.1.3.22 String.prototype.slice ( start, end )

22.1.3.23 String.prototype.split ( separator, limit )

22.1.3.24 String.prototype.startsWith ( searchString [ , position ] )

22.1.3.25 String.prototype.substring ( start, end )

22.1.3.26 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )

22.1.3.27 String.prototype.toLocaleUpperCase ( [ reserved1 [ , reserved2 ] ] )

22.1.3.28 String.prototype.toLowerCase ( )

22.1.3.29 String.prototype.toString ( )

22.1.3.30 String.prototype.toUpperCase ( )

22.1.3.31 String.prototype.toWellFormed ( )

22.1.3.32 String.prototype.trim ( )

22.1.3.32.1 TrimString ( string, where )

22.1.3.33 String.prototype.trimEnd ( )

22.1.3.34 String.prototype.trimStart ( )

22.1.3.35 String.prototype.valueOf ( )

22.1.3.35.1 ThisStringValue ( value )

22.1.3.36 String.prototype [ %Symbol.iterator% ] ( )

22.1.4 String 实例的属性

22.1.4.1 length

22.1.5 String 迭代器对象

22.1.5.1 %StringIteratorPrototype% 对象

22.1.5.1.1 %StringIteratorPrototype%.next ( )

22.1.5.1.2 %StringIteratorPrototype% [ %Symbol.toStringTag% ]

22.2 RegExp（正则表达式）对象

22.2.1 模式（Patterns）

语法（Syntax）

22.2.1.1 静态语义：早期错误（Early Errors）

22.2.1.2 静态语义：CountLeftCapturingParensWithin ( node )

22.2.1.3 静态语义：CountLeftCapturingParensBefore ( node )

22.2.1.4 静态语义：MightBothParticipate ( x, y )

22.2.1.5 静态语义：CapturingGroupNumber

22.2.1.6 静态语义：IsCharacterClass

22.2.1.7 静态语义：CharacterValue

22.2.1.8 静态语义：MayContainStrings

22.2.1.9 静态语义：GroupSpecifiersThatMatch ( thisGroupName )

22.2.1.10 静态语义：CapturingGroupName

22.2.1.11 静态语义：RegExpIdentifierCodePoints

22.2.1.12 静态语义：RegExpIdentifierCodePoint

22.2.2 模式语义（Pattern Semantics）

22.2.2.1 记号（Notation）

22.2.2.1.1 RegExp 记录（RegExp Records）

22.2.2.2 运行时语义：CompilePattern

22.2.2.3 运行时语义：CompileSubpattern

22.2.2.3.1 RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )

22.2.2.3.2 EmptyMatcher ( )

22.1.1.1 String ( `value` )

22.1.2.1 String.fromCharCode ( ...`codeUnits` )

22.1.2.2 String.fromCodePoint ( ...`codePoints` )

22.1.2.4 String.raw ( `template`, ...`substitutions` )

22.1.3.1 String.prototype.at ( `index` )

22.1.3.2 String.prototype.charAt ( `pos` )

22.1.3.3 String.prototype.charCodeAt ( `pos` )

22.1.3.4 String.prototype.codePointAt ( `pos` )

22.1.3.5 String.prototype.concat ( ...`args` )

22.1.3.7 String.prototype.endsWith ( `searchString` [ , `endPosition` ] )

22.1.3.8 String.prototype.includes ( `searchString` [ , `position` ] )

22.1.3.9 String.prototype.indexOf ( `searchString` [ , `position` ] )

22.1.3.11 String.prototype.lastIndexOf ( `searchString` [ , `position` ] )

22.1.3.12 String.prototype.localeCompare ( `that` [ , `reserved1` [ , `reserved2` ] ] )

22.1.3.13 String.prototype.match ( `regexp` )

22.1.3.14 String.prototype.matchAll ( `regexp` )

22.1.3.15 String.prototype.normalize ( [ `form` ] )

22.1.3.16 String.prototype.padEnd ( `maxLength` [ , `fillString` ] )

22.1.3.17 String.prototype.padStart ( `maxLength` [ , `fillString` ] )

22.1.3.17.1 StringPaddingBuiltinsImpl ( `O`, `maxLength`, `fillString`, `placement` )

22.1.3.17.2 StringPad ( `S`, `maxLength`, `fillString`, `placement` )

22.1.3.17.3 ToZeroPaddedDecimalString ( `n`, `minLength` )

22.1.3.18 String.prototype.repeat ( `count` )

22.1.3.19 String.prototype.replace ( `searchValue`, `replaceValue` )

22.1.3.19.1 GetSubstitution ( `matched`, `str`, `position`, `captures`, `namedCaptures`, `replacementTemplate` )

22.1.3.20 String.prototype.replaceAll ( `searchValue`, `replaceValue` )

22.1.3.21 String.prototype.search ( `regexp` )

22.1.3.22 String.prototype.slice ( `start`, `end` )

22.1.3.23 String.prototype.split ( `separator`, `limit` )

22.1.3.24 String.prototype.startsWith ( `searchString` [ , `position` ] )

22.1.3.25 String.prototype.substring ( `start`, `end` )

22.1.3.26 String.prototype.toLocaleLowerCase ( [ `reserved1` [ , `reserved2` ] ] )

22.1.3.27 String.prototype.toLocaleUpperCase ( [ `reserved1` [ , `reserved2` ] ] )

22.1.3.32.1 TrimString ( `string`, `where` )

22.1.3.35.1 ThisStringValue ( `value` )

22.2.1.2 静态语义：CountLeftCapturingParensWithin ( `node` )

22.2.1.3 静态语义：CountLeftCapturingParensBefore ( `node` )

22.2.1.4 静态语义：MightBothParticipate ( `x`, `y` )

22.2.1.9 静态语义：GroupSpecifiersThatMatch ( `thisGroupName` )

22.2.2.3.1 RepeatMatcher ( `m`, `min`, `max`, `greedy`, `x`, `c`, `parenIndex`, `parenCount` )

22.2.2.3.3 MatchTwoAlternatives ( `m1`, `m2` )

22.2.2.3.4 MatchSequence ( `m1`, `m2`, `direction` )

22.2.2.4.1 IsWordChar ( `rer`, `Input`, `e` )

22.2.2.7.1 CharacterSetMatcher ( `rer`, `A`, `invert`, `direction` )

22.2.2.7.2 BackreferenceMatcher ( `rer`, `ns`, `direction` )

22.2.2.7.3 Canonicalize ( `rer`, `ch` )

22.2.2.7.4 UpdateModifiers ( `rer`, `add`, `remove` )

22.2.2.9.1 CharacterRange ( `A`, `B` )

22.2.2.9.2 HasEitherUnicodeFlag ( `rer` )

22.2.2.9.3 WordCharacters ( `rer` )

22.2.2.9.4 AllCharacters ( `rer` )

22.2.2.9.5 MaybeSimpleCaseFolding ( `rer`, `A` )

22.2.2.9.6 CharacterComplement ( `rer`, `S` )

22.2.2.9.7 UnicodeMatchProperty ( `rer`, `p` )

22.2.2.9.8 UnicodeMatchPropertyValue ( `p`, `v` )

22.2.3.1 RegExpCreate ( `P`, `F` )

22.2.3.2 RegExpAlloc ( `newTarget` )

22.2.3.3 RegExpInitialize ( `obj`, `pattern`, `flags` )

22.2.3.4 静态语义：ParsePattern ( `patternText`, `u`, `v` )

22.2.4.1 RegExp ( `pattern`, `flags` )

22.2.5.1 RegExp.escape ( `S` )

22.2.5.1.1 EncodeForRegExpEscape ( `cp` )

22.2.6.2 RegExp.prototype.exec ( `string` )

22.2.6.4.1 RegExpHasFlag ( `R`, `codeUnit` )

22.2.6.8 RegExp.prototype [ %Symbol.match% ] ( `string` )

22.2.6.9 RegExp.prototype [ %Symbol.matchAll% ] ( `string` )

22.2.6.11 RegExp.prototype [ %Symbol.replace% ] ( `string`, `replaceValue` )

22.2.6.12 RegExp.prototype [ %Symbol.search% ] ( `string` )

22.2.6.13.1 EscapeRegExpPattern ( `P`, `F` )

22.2.6.14 RegExp.prototype [ %Symbol.split% ] ( `string`, `limit` )

22.2.6.16 RegExp.prototype.test ( `S` )

22.2.7.1 RegExpExec ( `R`, `S` )

22.2.7.2 RegExpBuiltinExec ( `R`, `S` )

22.2.7.3 AdvanceStringIndex ( `S`, `index`, `unicode` )

22.2.7.4 GetStringIndex ( `S`, `codePointIndex` )

22.2.7.6 GetMatchString ( `S`, `match` )

22.2.7.7 GetMatchIndexPair ( `S`, `match` )

22.2.7.8 MakeMatchIndicesIndexPairArray ( `S`, `indices`, `groupNames`, `hasGroups` )

22.2.9.1 CreateRegExpStringIterator ( `R`, `S`, `global`, `fullUnicode` )