Message ID | 1415060918-19954-3-git-send-email-pawel.moll@arm.com |
---|---|
State | New |
Headers | show |
Hi Pawel, On Tue, 4 Nov 2014 00:28:37 +0000, Pawel Moll wrote: > + /* > + * Data in userspace event record is transparent for the kernel > + * > + * Userspace perf tool code maintains a list of known types with > + * reference implementations of parsers for the data field. > + * > + * Overall size of the record (including type and size fields) > + * is always aligned to 8 bytes by adding padding after the data. > + * > + * struct { > + * struct perf_event_header header; > + * u32 type; > + * u32 size; The struct perf_event_header also has 'size' field and it has the entire length of the record so it's redundant. Also there's 'misc' field in the perf_event_header and I guess it can be used as 'type' info as it's mostly for cpumode and we are in user mode by definition. Thanks, Namhyung > + * char data[size]; > + * char __padding[-size & 7]; > + * struct sample_id sample_id; > + * }; > + */ > + PERF_RECORD_UEVENT = 11, > + > PERF_RECORD_MAX, /* non-ABI */ > }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 2014-11-04 at 06:33 +0000, Namhyung Kim wrote: > Hi Pawel, > > On Tue, 4 Nov 2014 00:28:37 +0000, Pawel Moll wrote: > > + /* > > + * Data in userspace event record is transparent for the kernel > > + * > > + * Userspace perf tool code maintains a list of known types with > > + * reference implementations of parsers for the data field. > > + * > > + * Overall size of the record (including type and size fields) > > + * is always aligned to 8 bytes by adding padding after the data. > > + * > > + * struct { > > + * struct perf_event_header header; > > + * u32 type; > > + * u32 size; > > The struct perf_event_header also has 'size' field and it has the entire > length of the record so it's redundant. Well, is it? Correct me if I'm wrong, but as far as I remember the record size must be always aligned to 8 bytes. Thus you can't reliably derive the data size from it - if I my data is 3 bytes long, I have to add 5 bytes of padding thus making the header.size = 24 (I'm ignoring sample_id here), right? So now, decoding the record, all I can do is: header.size - sizeof(header) - sizeof(type) - sizeof(size) = 24 - 8 - 8 = 8. So, basing on the header.size the data is 8 bytes long. But only 3 first bytes are valid... To summarize, there are three options: 1. I'm wrong and the record doesn't have to be padded to make it 8 bytes aligned. Then I can drop the additional size field. 2. I could impose a limitation on the prctl API that the data size must be 8 bytes aligned. Bad idea in my opinion, I'd rather not. 3. The additional size (for the data part) field stays. Notice that PERF_SAMPLE_RAW has it as well :-) > Also there's 'misc' field in the > perf_event_header and I guess it can be used as 'type' info as it's > mostly for cpumode and we are in user mode by definition. Hm. First of all, I don't really like the idea of "overloading" the misc meaning. It's a set of flags and I'd rather see it staying like this. Secondly, I'm not sure that it can be reused - we are in user mode, true, but it can be either PERF_RECORD_MISC_USER or PERF_RECORD_MISC_GUEST_USER. Thirdly, misc is "only" 16 bits wide, and someone even asked for the type to be 64 bit long! (I suspect he wanted to use it in some special, hacky way though :-) 32 bit length seems like a reasonable choice, though. Do you feel that the "unnecessary" type field is a big problem? Thanks for your time! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Nov 04, 2014 at 04:42:11PM +0000, Pawel Moll wrote: > > 1. I'm wrong and the record doesn't have to be padded to make it 8 bytes > aligned. Then I can drop the additional size field. No, you're right, we're supposed to stay 8 byte aligned. > 2. I could impose a limitation on the prctl API that the data size must > be 8 bytes aligned. Bad idea in my opinion, I'd rather not. Agreed. > 3. The additional size (for the data part) field stays. Notice that > PERF_SAMPLE_RAW has it as well :-) Right, with binary data there is no other day. With \0 terminated strings there won't be a problem, but I think we decided we wanted to allow any binary blow. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ba490d5..867415d 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -721,6 +721,8 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks extern void perf_event_exec(void); extern void perf_event_comm(struct task_struct *tsk, bool exec); extern void perf_event_fork(struct task_struct *tsk); +extern int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data); /* Callchains */ DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry); @@ -830,6 +832,8 @@ static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_exec(void) { } static inline void perf_event_comm(struct task_struct *tsk, bool exec) { } static inline void perf_event_fork(struct task_struct *tsk) { } +static inline int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data) { return -1; }; static inline void perf_event_init(void) { } static inline int perf_swevent_get_recursion_context(void) { return -1; } static inline void perf_swevent_put_recursion_context(int rctx) { } diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9d84540..9a64eb1 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -303,7 +303,8 @@ struct perf_event_attr { exclude_callchain_user : 1, /* exclude user callchains */ mmap2 : 1, /* include mmap with inode data */ comm_exec : 1, /* flag comm events that are due to an exec */ - __reserved_1 : 39; + uevents : 1, /* allow uevents into the buffer */ + __reserved_1 : 38; union { __u32 wakeup_events; /* wakeup every n events */ @@ -712,6 +713,26 @@ enum perf_event_type { */ PERF_RECORD_MMAP2 = 10, + /* + * Data in userspace event record is transparent for the kernel + * + * Userspace perf tool code maintains a list of known types with + * reference implementations of parsers for the data field. + * + * Overall size of the record (including type and size fields) + * is always aligned to 8 bytes by adding padding after the data. + * + * struct { + * struct perf_event_header header; + * u32 type; + * u32 size; + * char data[size]; + * char __padding[-size & 7]; + * struct sample_id sample_id; + * }; + */ + PERF_RECORD_UEVENT = 11, + PERF_RECORD_MAX, /* non-ABI */ }; diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 513df75..2a6852f 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -179,4 +179,14 @@ struct prctl_mm_map { #define PR_SET_THP_DISABLE 41 #define PR_GET_THP_DISABLE 42 +/* + * Perf userspace event generation + * + * second argument: event type + * third argument: data size + * fourth argument: pointer to data + * fifth argument: flags (currently unused, pass 0) + */ +#define PR_TASK_PERF_UEVENT 43 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/events/core.c b/kernel/events/core.c index ea3d6d3..3738e9c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5565,6 +5565,77 @@ static void perf_log_throttle(struct perf_event *event, int enable) } /* + * Userspace-generated event + */ + +struct perf_uevent { + struct perf_event_header header; + u32 type; + u32 size; + u8 data[0]; +}; + +static void perf_uevent_output(struct perf_event *event, void *data) +{ + struct perf_uevent *uevent = data; + struct perf_output_handle handle; + struct perf_sample_data sample; + int size = uevent->header.size; + + if (!event->attr.uevents) + return; + + perf_event_header__init_id(&uevent->header, &sample, event); + + if (perf_output_begin(&handle, event, uevent->header.size) != 0) + goto out; + perf_output_put(&handle, uevent->header); + perf_output_put(&handle, uevent->type); + perf_output_put(&handle, uevent->size); + __output_copy(&handle, uevent->data, uevent->size); + + /* Padding to align overall data size to 8 bytes */ + perf_output_skip(&handle, -uevent->size & (sizeof(u64) - 1)); + + perf_event__output_id_sample(event, &handle, &sample); + + perf_output_end(&handle); +out: + uevent->header.size = size; +} + +int perf_uevent(struct task_struct *tsk, u32 type, u32 size, + const char __user *data) +{ + struct perf_uevent *uevent; + + /* Need some reasonable limit */ + if (size > PAGE_SIZE) + return -E2BIG; + + uevent = kmalloc(sizeof(*uevent) + size, GFP_KERNEL); + if (!uevent) + return -ENOMEM; + + uevent->header.type = PERF_RECORD_UEVENT; + uevent->header.size = sizeof(*uevent) + ALIGN(size, sizeof(u64)); + + uevent->type = type; + uevent->size = size; + if (copy_from_user(uevent->data, data, size)) { + kfree(uevent); + return -EFAULT; + } + + perf_event_aux(perf_uevent_output, uevent, NULL); + + kfree(uevent); + + return 0; +} + + +/* * Generic event overflow handling, sampling. */ diff --git a/kernel/sys.c b/kernel/sys.c index 1eaa2f0..1c83677 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2121,6 +2121,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_TASK_PERF_EVENTS_ENABLE: error = perf_event_task_enable(); break; + case PR_TASK_PERF_UEVENT: + if (arg5 != 0) + return -EINVAL; + error = perf_uevent(me, arg2, arg3, (char __user *)arg4); + break; case PR_GET_TIMERSLACK: error = current->timer_slack_ns; break;