未加星标

Testing out snapshots in Apple’s next-generation APFS file system

字体大小 | |
[系统(linux) 所属分类 系统(linux) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

Testing out snapshots in Apple’s next-generation APFS file system

Enlarge / We’re not saying that APFS snapshots will be used in a future revision of Time Machine, but if you’re a betting person, now might be a good time to place your bets.

Aurich / Thinkstock

Back in June, Apple announced its new upcoming file system: APFS, orApple File System. There was no mention of it in the WWDC keynote, but devotees needed no encouragement. They picked over every scintilla of data from the documentation on Apple’s developer site, extrapolating, interpolating, eager for whatever was about to come. In the WWDC session hall, the crowd buzzed with a nervous energy, eager for the grand unveiling of APFS. I myself badge-swapped my way into the conference just to get that first glimpse of Apple’s first original filesystem in the 30+ years since HFS.

Apple’s presentation didn’t disappoint the hungry crowd. We hoped for a modern filesystem, optimized for next generation hardware, rich with features that have become the norm for data centers and professionals. With APFS, Apple showed a path to meeting those expectations. Dominic Giampaolo and Eric Tamura, leaders of the APFS team, shared performance optimizations, data integrity design, volume management, efficient storage of copied data, and snapshots―arguably the feature of APFS most directly in the user’s control.

Far from vaporware, Apple made APFS available to registered developers that day. The company included it in macOS Sierra as a technology preview. You can play with APFS today and a lot of the features are there. You can use space sharing to carve up a single disk into multiple volumes. You can see the speed of its directory size calculation―nearly instantaneous―compared with the slow process on HFS+. You can use clones to make constant-time copies of files or directories. At WWDC, Apple demonstrated the feature folks were the most eager to play with: snapshots. Tamura used snapshotUtil to create, list, and mount snapshots. But early adopters quickly discovered that snapshotUtil wasn’t part of the APFS technology preview.

Apple promised delivery in 2017. We all double-checked our HFS backups and waited.

A brand new day

It’s 2017, and Apple already appears to be making good on its promise with the revelation that the forthcoming iOS 10.3 will use APFS . The number of APFS tinkerers using it for their personal data has instantly gone from a few hundred to a few million. Beta users of iOS 10.3 have already made the switch apparently without incident. They have even ascribed unscientifically-significant performance improvements to APFS.

With APFS taking the next step, I decided to check back in on snapshots. There had been no news from Apple and nothing obviously new in macOS updates, but back in JuneI wrote about a clue Apple had left in macOS Sierra:

I used DTrace (technology I'm increasingly amazed that Apple ported from OpenSolaris) to find a tantalizingly named new system call fs_snapshot ; I'll leave it to others to reverse engineer its proper use.

With its proper use still, apparently, a mystery, and APFS freshly of interest, I dove back in.

The game is afoot

First a little background. An operating system roughly divides the world into the kernel and user processes. The kernel can, for the most part, do anything. It can talk to hardware devices; it can access all memory; it can execute privileged instructions. In short, it has unfettered access.

The kernel provides abstractions and imposes security for regular user processes. Have you ever seen 'kernel_task' in Activity Monitor? That's the kernel using CPU, memory, or other resources. User programs are everything else: applications you run, the Finder, the windowing system, even the Dock or other pieces that modern parlance includes as part of the "operating system."

A system call is simply a way for a user process to communicate with the kernel. If a program wants to write data to disk or get a larger memory allocation, it needs the kernel to verify permissions and execute those tasks; the system call is the mechanism that the user process uses. Note that the root user (or "sudo") still relates to user processes, just ones that the kernel imbues with greater privileges.

I used DTrace to find the system call. DTrace is the dynamic tracing facility I co-authored at Sun with Bryan Cantrill and Mike Shapiro. It provides visibility into the whole system, from the kernel and device I/O to Java or Swift function calls. Naturally, DTrace includes visibility into system calls. Apple ported DTrace from Solaris in 2006; a typical Mac has hundreds of thousands of probes, discrete points of instrumentation; we can list them with dtrace -l :

$ sudo dtrace -l | wc -l
415636

(Note that some parts of DTrace are protected by SIP and need to be disabled before you can use them!)

I found the system call of interest by looking through DTrace system-call probes:

$ sudo dtrace -l -n syscall:::entry | grep snapshot
1129 syscall stack_snapshot_with_config entry
1183 syscall fs_snapshot entry

DTrace is an incredibly powerful tool for understanding how a system is behaving. Here, however, we're just taking advantage of how DTrace can show us a definitive list of system calls. We can also see the fs_snapshot system call in the file /usr/include/sys/syscall.h (you'll need the Xcode developer tools installed to do this):

$ grep fs_snapshot /usr/include/sys/syscall.h
#define SYS_fs_snapshot 518

It's a little more straightforward, but less definitive since there's no guarantee that code in a header file matches the running kernel.

A simple Google search for fs_snapshot immediately pointed me in the right direction, turning up a file in XNU on Apple's open source website. XNU is the macOS kernel that came over from NeXT. Run uname -v and you'll see the specific XNU version that your computer is running. For well over a decade, Apple has made XNU available as open source (and has done the same for many other macOS components). For a company known for its secrecy, it's commendable that Apple has built such a tradition of transparency with at least some subset of their software. Commendable and quite the boon for anyone trying to enable an unpublished feature!

The first snapshot

Learning from XNU and making some educated guesses, I wrote my first C program to create an APFS snapshot. This section has a bit of code, which you can find in this Github repo :

#include <fcntl.h>
#include <unistd.h>
#include <sys/syscall.h>
int
main(int argc, char **argv)
{
int ret;
int dirfd = open(argv[1], O_RDONLY, 0);
if (dirfd < 0) {
perror("open");
exit(1);
}
ret = syscall(SYS_fs_snapshot, 0x01, dirfd, argv[2], NULL, NULL, 0);
if (ret != 0)
perror("fs_snapshot");
return (0);
}

Now to test it.

First, I created an APFS volume and mounted it:

$ hdiutil create -size 1g -fs APFS -volname "APFS" apfs.dmg
WARNING: You are using a pre-release version of the Apple File
System called APFS which is meant for evaluation and development
purposes only. Files stored on this volume may not be accessible
in future releases of OS X.
You should back up all of your data before using APFS and regularly
back up data while using APFS, including before upgrading to future
releases of OS X.
Continue? [y/N] y
...................................................................
created: /Users/ahl/src/apfs_snap/apfs.dmg
$ hdiutil mount apfs.dmg
/dev/disk2 GUID_partition_scheme
/dev/disk2s1 Apple_APFS
/dev/disk2s1s1 41504653-0000-11AA-AA11-0030654 /Volumes/APFS
$ mount | grep /Volumes/APFS
/dev/disk2s1s1 on /Volumes/APFS (apfs, local, nodev, nosuid, journaled,
noowners, mounted by ahl)

Then, I tried to take the first APFS snapshot outside of Apple (that we know of, at least):

$ ./firstSnap /Volumes/APFS first_snap
fs_snapshot: Operation not permitted

Anticlimactic. The "Operation not permitted" error message corresponds to the error code EPERM , whose value is 1. We need to find out where that error is coming from. Fortunately, DTrace can help us figure out what's going on.

DTrace uses its own language to describe probes and actions; here's a simple script with comments about what each clause does:

#!/usr/sbin/dtrace -s
#pragma D option flowindent
/*
* When a thread calls the fs_snapshot system call set a
* thread-local variable called 'follow' to 1.
*/
syscall::fs_snapshot:entry
{
self->follow = 1;
}
/*
* For every function entry and return in the kernel (of
* which there are many!) if the thread has its 'follow'
* value set, print out the first two arguments (or
* the offset and return value for a return probe).
*/
fbt:::
/self->follow/
{
printf("%x %x", arg0, arg1);
}
/*
* When the thread returns from the fs_snapshot system
* call, set follow to 0 and exit this DTrace invocation
* (thus removing all instrumentation).
*/
syscall::fs_snapshot:return
/self->follow/
{
self->follow = 0;
exit(0);
}

Running this DTrace script in one terminal while running the snapshot program in another shows the code flow through the kernel as the program executes:

$ sudo ./fs_snapshot.d
dtrace: script './fs_snapshot.d' matched 137082 probes
CPU FUNCTION
6 -> fs_snapshot ffffff8034f8b5a0 ffffff805c685330
6 -> vfs_context_current ffffff8034f8b5a0 ffffff805c685330
6 priv_check_cred ffffff80575447f0 36b2
6 -> mac_priv_check ffffff80575447f0 36b2
6 -> mac_label_get ffffff805559ac40 2
6 lck_rw_unlock_shared ffffff7f8acdf750 2
6 -> lck_rw_done_gen ffffff7f8acdf750 21000001
6 <- lck_rw_done_gen c0 1
6 mac_policy_list_conditional_busy 0 13
6 <- mac_policy_list_conditional_busy 50 0
6 mac_priv_grant ffffff80575447f0 36b2
6 -> mac_label_get ffffff805559ac40 2
6 lck_rw_unlock_shared ffffff7f8acdf750 2
6 -> lck_rw_done_gen ffffff7f8acdf750 21000001
6 <- lck_rw_done_gen c0 1
6 mac_policy_list_conditional_busy 0 13
6 <- mac_policy_list_conditional_busy 50 0
6 <- mac_priv_grant cd 1
6 <- priv_check_cred 56 1
6 <- fs_snapshot def 1
6 <= fs_snapshot

Note first that DTrace turned out 137,082 discrete points of instrumentation, and then restore the system to its optimal state for this experiment. In the code flow, the priv_check_cred() function jumps out as a good place to continue because of its name, the fact that fs_snapshot calls it directly, and the fact that it returns 1 which corresponds with EPERM , the error we were getting.

Looking again at the XNU source code, we find this delightful comment:

/*
* Check a credential for privilege. Lots of good reasons to deny privilege;
* only a few to grant it.
*/
int
priv_check_cred(kauth_cred_t cred, int priv, __unused int flags)
{

Apple engineers aren't without their own particular brand of humor. Walking through the function it becomes clear that fs_snapshot expects to be run with sudo .

World's first non-Apple snapshot, take two!

$ sudo ./first /Volumes/APFS first_snap

No output. Did it work? Let’s try again:

$ sudo ./first /Volumes/APFS first_snap
fs_snapshot: File exists

By "File exists" let's assume that it means that the snapshot named first_snap already exists. Success!

本文系统(linux)相关术语:linux系统 鸟哥的linux私房菜 linux命令大全 linux操作系统

主题: AppleCPUiOSJavaGitSwiftWDCFUTICTI
分页:12
转载请注明
本文标题:Testing out snapshots in Apple’s next-generation APFS file system
本站链接:http://www.codesec.net/view/532676.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 系统(linux) | 评论(0) | 阅读(53)