未加星标

PHP: rfc:replace_parse_url

字体大小 | |
[开发(php) 所属分类 开发(php) | 发布者 店小二05 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏
PHP RFC: Replace parse_url()

Version: 0.1

Date: 2016-10-04

Author: David Walker ([email protected])

Proposed version:php 7.2+

Status: Under Discussion

First Published at: http://wiki.php.net/rfc/replace_parse_url

Introduction

This RFC came about for an attempt to resolveBug #72811. In the attempt, discussion shifted from trying to patch the current implementation to more generally replacing the current one. The current implementation of parse_url() makes a bunch of exceptions to RFC 3986 . I do not know if these are conscious exceptions, or, if parse_url() was never based off of following the RFC .

So, this RFC proposes replacing the current implementation of parse_url() using a re2c based parser that will be strict to the RFC when parsing URI 's.

Reasoning

The bug described an issue where using parse_url() with an IPv4 address would correctly parse the host, but with IPv6 it would not.

<?php var_dump(parse_url("127.0.0.1:80", PHP_URL_HOST)); var_dump(parse_url("[::1]:80", PHP_URL_HOST)); /* Outputs: string(9) "127.0.0.1" NULL */

While we may agree the that former line is sensible and maybe expected; the behavior is contrary to how the RFC defines parsing a URI . To be compliant it should parse as a single PATH element string(12) “127.0.0.1:80” . Why? The RFC defines the host as a component of the authority . The authority is only parsed if it's preceded by a double-slash. Since the above example lacks a double-slash, the authority portion of the hier-part should not be processed, and the example would match into the path-rootless portion.

The bug does state that the parsing difference between IPv4 address and IPv6 addresses are handled differently (in the sense that the IPv4 parsing isn't standards compliant). However, according to the RFC , the IPv6 case the user reported in the bug is accurate per the spec . None of the path elements permit a [ as the first character of the path, so the IPv6 formatted line should be NULL.

An accurate example of standards compliant parsing:

<?php var_dump(parse_url("127.0.0.1:80", PHP_URL_PATH)); var_dump(parse_url("[::1]:80", PHP_URL_PATH)); /* Outputs: string(12) "127.0.0.1:80" NULL */

With that in mind, a correct example of parsing URI 's to acquire the host portion, per the bugs request would look similar to the following:

<?php var_dump(parse_url("127.0.0.1:80/index.php", PHP_URL_HOST)); var_dump(parse_url("[::1]:80/index.php", PHP_URL_HOST)); var_dump(parse_url("//127.0.0.1:80/index.php", PHP_URL_HOST)); var_dump(parse_url("//[::1]:80/index.php", PHP_URL_HOST)); /* Outputs: NULL NULL string(9) "127.0.0.1" string(5) "[::1]" */ Proposal

The proposal of this RFC is two fold. One, replace the current parser used for parse_url() to utilize re2c. Two, ensure parse_url() more closely follows the RFC . The function signature will not change, however, the return value will be more consistent.

The function can return

An array consisting of each component of the URI found.

A string|int of the component requested by the 2nd argument

NULL when we can not parse the URI , or, the component request contains no value

Backward Incompatible Changes

Many of the tests that were developed for the current implementation of parse_url() have been changed to reflect a more standards compliant test. This change will break anyone who is using the function with a non-standards compliant URI format. This is the most problematic in terms of a BC break. By this point, many people who use parse_url() might expect it to work in a, lets say, forgiving manner. The example provided in the bug report is a perfect example of what I feel is a common use case of this function which will no longer act in a standards compliant method.

This function will no longer return false.

RFC Impact To Existing Extensions

standard

Open Issues

Deprecate parse_url() and create a new function with new parsing

Allow for certain breaks in the RFC to provide more lenient parsing? (i.e. allow 'example.com:80' to parse as a host & port, not a path)

Proposed Voting Choices

Vote to replace parse_url() with an re2c parser, and require standard compliant URI formats. Requires 2/3

Implementation

After the project is implemented, this section should contain

the version(s) it was merged to

a link to the git commit(s)

a link to the PHP manual entry for the feature

References

PR with working Implementation: https://github.com/php/php-src/pull/2079

本文开发(php)相关术语:php代码审计工具 php开发工程师 移动开发者大会 移动互联网开发 web开发工程师 软件开发流程 软件开发工程师

主题: PHPIPv6IPv4
分页:12
转载请注明
本文标题:PHP: rfc:replace_parse_url
本站链接:http://www.codesec.net/view/480126.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(php) | 评论(0) | 阅读(40)