Author: David Walker ([email protected])
Proposed version:php 7.2+
Status: Under Discussion
First Published at: http://wiki.php.net/rfc/replace_parse_urlIntroduction
This RFC came about for an attempt to resolveBug #72811. In the attempt, discussion shifted from trying to patch the current implementation to more generally replacing the current one. The current implementation of parse_url() makes a bunch of exceptions to RFC 3986 . I do not know if these are conscious exceptions, or, if parse_url() was never based off of following the RFC .
So, this RFC proposes replacing the current implementation of parse_url() using a re2c based parser that will be strict to the RFC when parsing URI 's.Reasoning
The bug described an issue where using parse_url() with an IPv4 address would correctly parse the host, but with IPv6 it would not.<?php var_dump(parse_url("127.0.0.1:80", PHP_URL_HOST)); var_dump(parse_url("[::1]:80", PHP_URL_HOST)); /* Outputs: string(9) "127.0.0.1" NULL */
While we may agree the that former line is sensible and maybe expected; the behavior is contrary to how the RFC defines parsing a URI . To be compliant it should parse as a single PATH element string(12) “127.0.0.1:80” . Why? The RFC defines the host as a component of the authority . The authority is only parsed if it's preceded by a double-slash. Since the above example lacks a double-slash, the authority portion of the hier-part should not be processed, and the example would match into the path-rootless portion.The bug does state that the parsing difference between IPv4 address and IPv6 addresses are handled differently (in the sense that the IPv4 parsing isn't standards compliant). However, according to the RFC , the IPv6 case the user reported in the bug is accurate per the spec . None of the path elements permit a [ as the first character of the path, so the IPv6 formatted line should be NULL.
An accurate example of standards compliant parsing:<?php var_dump(parse_url("127.0.0.1:80", PHP_URL_PATH)); var_dump(parse_url("[::1]:80", PHP_URL_PATH)); /* Outputs: string(12) "127.0.0.1:80" NULL */
With that in mind, a correct example of parsing URI 's to acquire the host portion, per the bugs request would look similar to the following:<?php var_dump(parse_url("127.0.0.1:80/index.php", PHP_URL_HOST)); var_dump(parse_url("[::1]:80/index.php", PHP_URL_HOST)); var_dump(parse_url("//127.0.0.1:80/index.php", PHP_URL_HOST)); var_dump(parse_url("//[::1]:80/index.php", PHP_URL_HOST)); /* Outputs: NULL NULL string(9) "127.0.0.1" string(5) "[::1]" */ Proposal
The proposal of this RFC is two fold. One, replace the current parser used for parse_url() to utilize re2c. Two, ensure parse_url() more closely follows the RFC . The function signature will not change, however, the return value will be more consistent.
The function can return
An array consisting of each component of the URI found.
A string|int of the component requested by the 2nd argument
NULL when we can not parse the URI , or, the component request contains no valueBackward Incompatible Changes
Many of the tests that were developed for the current implementation of parse_url() have been changed to reflect a more standards compliant test. This change will break anyone who is using the function with a non-standards compliant URI format. This is the most problematic in terms of a BC break. By this point, many people who use parse_url() might expect it to work in a, lets say, forgiving manner. The example provided in the bug report is a perfect example of what I feel is a common use case of this function which will no longer act in a standards compliant method.
This function will no longer return false.RFC Impact To Existing Extensions
Deprecate parse_url() and create a new function with new parsing
Allow for certain breaks in the RFC to provide more lenient parsing? (i.e. allow 'example.com:80' to parse as a host & port, not a path)Proposed Voting Choices
Vote to replace parse_url() with an re2c parser, and require standard compliant URI formats. Requires 2/3Implementation
After the project is implemented, this section should contain
the version(s) it was merged to
a link to the git commit(s)
a link to the PHP manual entry for the featureReferences
PR with working Implementation: https://github.com/php/php-src/pull/2079
本文开发（php）相关术语:php代码审计工具 php开发工程师 移动开发者大会 移动互联网开发 web开发工程师 软件开发流程 软件开发工程师