Message ID | 20200914183615.2038347-1-sdf@google.com |
---|---|
Headers | show |
Series | Allow storage of flexible metadata information for eBPF programs | expand |
On Mon, Sep 14, 2020 at 11:37 AM Stanislav Fomichev <sdf@google.com> wrote: > > From: YiFei Zhu <zhuyifei@google.com> > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > When the libbpf tries to load a program, it will probe the kernel for > the support of this syscall and unconditionally bind .rodata section > to the program. > > Cc: YiFei Zhu <zhuyifei1999@gmail.com> > Signed-off-by: YiFei Zhu <zhuyifei@google.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- Acked-by: Andrii Nakryiko <andriin@fb.com> > tools/lib/bpf/bpf.c | 16 +++++++++ > tools/lib/bpf/bpf.h | 8 +++++ > tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++ > tools/lib/bpf/libbpf.map | 1 + > 4 files changed, 97 insertions(+) > [...]
On Mon, Sep 14, 2020 at 11:37 AM Stanislav Fomichev <sdf@google.com> wrote: > > From: YiFei Zhu <zhuyifei@google.com> > > The patch adds a simple wrapper bpf_prog_bind_map around the syscall. > When the libbpf tries to load a program, it will probe the kernel for > the support of this syscall and unconditionally bind .rodata section > to the program. > > Cc: YiFei Zhu <zhuyifei1999@gmail.com> > Signed-off-by: YiFei Zhu <zhuyifei@google.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > tools/lib/bpf/bpf.c | 16 +++++++++ > tools/lib/bpf/bpf.h | 8 +++++ > tools/lib/bpf/libbpf.c | 72 ++++++++++++++++++++++++++++++++++++++++ > tools/lib/bpf/libbpf.map | 1 + > 4 files changed, 97 insertions(+) > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > index 82b983ff6569..2baa1308737c 100644 > --- a/tools/lib/bpf/bpf.c > +++ b/tools/lib/bpf/bpf.c > @@ -872,3 +872,19 @@ int bpf_enable_stats(enum bpf_stats_type type) > > return sys_bpf(BPF_ENABLE_STATS, &attr, sizeof(attr)); > } > + > +int bpf_prog_bind_map(int prog_fd, int map_fd, > + const struct bpf_prog_bind_opts *opts) > +{ > + union bpf_attr attr; > + > + if (!OPTS_VALID(opts, bpf_prog_bind_opts)) > + return -EINVAL; > + > + memset(&attr, 0, sizeof(attr)); > + attr.prog_bind_map.prog_fd = prog_fd; > + attr.prog_bind_map.map_fd = map_fd; > + attr.prog_bind_map.flags = OPTS_GET(opts, flags, 0); > + > + return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr)); > +} > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h > index 015d13f25fcc..8c1ac4b42f90 100644 > --- a/tools/lib/bpf/bpf.h > +++ b/tools/lib/bpf/bpf.h > @@ -243,6 +243,14 @@ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, > enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */ > LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); > > +struct bpf_prog_bind_opts { > + size_t sz; /* size of this struct for forward/backward compatibility */ > + __u32 flags; > +}; > +#define bpf_prog_bind_opts__last_field flags > + > +LIBBPF_API int bpf_prog_bind_map(int prog_fd, int map_fd, > + const struct bpf_prog_bind_opts *opts); > #ifdef __cplusplus > } /* extern "C" */ > #endif > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 550950eb1860..b68fa08e2fa9 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -174,6 +174,8 @@ enum kern_feature_id { > FEAT_EXP_ATTACH_TYPE, > /* bpf_probe_read_{kernel,user}[_str] helpers */ > FEAT_PROBE_READ_KERN, > + /* BPF_PROG_BIND_MAP is supported */ > + FEAT_PROG_BIND_MAP, > __FEAT_CNT, > }; > > @@ -409,6 +411,7 @@ struct bpf_object { > struct extern_desc *externs; > int nr_extern; > int kconfig_map_idx; > + int rodata_map_idx; > > bool loaded; > bool has_subcalls; > @@ -1070,6 +1073,7 @@ static struct bpf_object *bpf_object__new(const char *path, > obj->efile.bss_shndx = -1; > obj->efile.st_ops_shndx = -1; > obj->kconfig_map_idx = -1; > + obj->rodata_map_idx = -1; > > obj->kern_version = get_kernel_version(); > obj->loaded = false; > @@ -1428,6 +1432,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj) > obj->efile.rodata->d_size); > if (err) > return err; > + > + obj->rodata_map_idx = obj->nr_maps - 1; > } > if (obj->efile.bss_shndx >= 0) { > err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS, > @@ -3894,6 +3900,55 @@ static int probe_kern_probe_read_kernel(void) > return probe_fd(bpf_load_program_xattr(&attr, NULL, 0)); > } > > +static int probe_prog_bind_map(void) > +{ > + struct bpf_load_program_attr prg_attr; > + struct bpf_create_map_attr map_attr; > + char *cp, errmsg[STRERR_BUFSIZE]; > + struct bpf_insn insns[] = { > + BPF_MOV64_IMM(BPF_REG_0, 0), > + BPF_EXIT_INSN(), > + }; > + int ret, map, prog; > + > + if (!kernel_supports(FEAT_GLOBAL_DATA)) > + return 0; TBH, I don't think this check is needed, and it's actually coupling two independent features together. probe_prog_bind_map() probes PROG_BIND_MAP, it has nothing to do with global data itself. It's all cached now, so there is no problem with that, it just feels unclean. If someone is using .rodata and the kernel doesn't support global data, we'll fail way sooner. On the other hand, if there will be another use case where PROG_BIND_MAP is needed for something else, why would we care about global data support? I know that in the real world it will be hard to find a kernel with PROG_BIND_MAP and no global data support, due to the latter being so much older, but still, unnecessary coupling. Would be nice to follow up and remove this, thanks. > + > + memset(&map_attr, 0, sizeof(map_attr)); > + map_attr.map_type = BPF_MAP_TYPE_ARRAY; > + map_attr.key_size = sizeof(int); > + map_attr.value_size = 32; > + map_attr.max_entries = 1; > + [...]
On Mon, Sep 14, 2020 at 11:37 AM Stanislav Fomichev <sdf@google.com> wrote: > > From: YiFei Zhu <zhuyifei@google.com> > > Dump metadata in the 'bpftool prog' list if it's present. > For some formatting some BTF code is put directly in the > metadata dumping. Sanity checks on the map and the kind of the btf_type > to make sure we are actually dumping what we are expecting. > > A helper jsonw_reset is added to json writer so we can reuse the same > json writer without having extraneous commas. > > Sample output: > > $ bpftool prog > 6: cgroup_skb name prog tag bcf7977d3b93787c gpl > [...] > btf_id 4 > metadata: > a = "foo" > b = 1 > > $ bpftool prog --json --pretty > [{ > "id": 6, > [...] > "btf_id": 4, > "metadata": { > "a": "foo", > "b": 1 > } > } > ] > > Cc: YiFei Zhu <zhuyifei1999@gmail.com> > Signed-off-by: YiFei Zhu <zhuyifei@google.com> > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > tools/bpf/bpftool/json_writer.c | 6 + > tools/bpf/bpftool/json_writer.h | 3 + > tools/bpf/bpftool/prog.c | 232 ++++++++++++++++++++++++++++++++ > 3 files changed, 241 insertions(+) > > diff --git a/tools/bpf/bpftool/json_writer.c b/tools/bpf/bpftool/json_writer.c > index 86501cd3c763..7fea83bedf48 100644 > --- a/tools/bpf/bpftool/json_writer.c > +++ b/tools/bpf/bpftool/json_writer.c > @@ -119,6 +119,12 @@ void jsonw_pretty(json_writer_t *self, bool on) > self->pretty = on; > } > > +void jsonw_reset(json_writer_t *self) > +{ > + assert(self->depth == 0); > + self->sep = '\0'; > +} > + > /* Basic blocks */ > static void jsonw_begin(json_writer_t *self, int c) > { > diff --git a/tools/bpf/bpftool/json_writer.h b/tools/bpf/bpftool/json_writer.h > index 35cf1f00f96c..8ace65cdb92f 100644 > --- a/tools/bpf/bpftool/json_writer.h > +++ b/tools/bpf/bpftool/json_writer.h > @@ -27,6 +27,9 @@ void jsonw_destroy(json_writer_t **self_p); > /* Cause output to have pretty whitespace */ > void jsonw_pretty(json_writer_t *self, bool on); > > +/* Reset separator to create new JSON */ > +void jsonw_reset(json_writer_t *self); > + > /* Add property name */ > void jsonw_name(json_writer_t *self, const char *name); > > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c > index f7923414a052..f3eb4f53dd43 100644 > --- a/tools/bpf/bpftool/prog.c > +++ b/tools/bpf/bpftool/prog.c > @@ -29,6 +29,9 @@ > #include "main.h" > #include "xlated_dumper.h" > > +#define BPF_METADATA_PREFIX "bpf_metadata_" > +#define BPF_METADATA_PREFIX_LEN (sizeof(BPF_METADATA_PREFIX) - 1) > + > const char * const prog_type_name[] = { > [BPF_PROG_TYPE_UNSPEC] = "unspec", > [BPF_PROG_TYPE_SOCKET_FILTER] = "socket_filter", > @@ -151,6 +154,231 @@ static void show_prog_maps(int fd, __u32 num_maps) > } > } > > +static int find_metadata_map_id(int prog_fd, int *map_id) > +{ > + struct bpf_prog_info prog_info = {}; > + struct bpf_map_info map_info; > + __u32 prog_info_len; > + __u32 map_info_len; > + __u32 *map_ids; > + int nr_maps; > + int map_fd; > + int ret; > + __u32 i; > + > + prog_info_len = sizeof(prog_info); > + > + ret = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &prog_info_len); > + if (ret) > + return -errno; > + > + if (!prog_info.nr_map_ids) > + return -ENOENT; > + > + map_ids = calloc(prog_info.nr_map_ids, sizeof(__u32)); > + if (!map_ids) > + return -ENOMEM; > + > + nr_maps = prog_info.nr_map_ids; > + memset(&prog_info, 0, sizeof(prog_info)); > + prog_info.nr_map_ids = nr_maps; > + prog_info.map_ids = ptr_to_u64(map_ids); > + prog_info_len = sizeof(prog_info); > + > + ret = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &prog_info_len); > + if (ret) { > + ret = -errno; > + goto free_map_ids; > + } > + > + for (i = 0; i < prog_info.nr_map_ids; i++) { > + map_fd = bpf_map_get_fd_by_id(map_ids[i]); > + if (map_fd < 0) { > + ret = -errno; > + goto free_map_ids; > + } > + > + memset(&map_info, 0, sizeof(map_info)); > + map_info_len = sizeof(map_info); > + ret = bpf_obj_get_info_by_fd(map_fd, &map_info, &map_info_len); > + if (ret < 0) { > + ret = -errno; > + close(map_fd); > + goto free_map_ids; > + } > + close(map_fd); > + > + if (map_info.type != BPF_MAP_TYPE_ARRAY) > + continue; > + if (map_info.key_size != sizeof(int)) > + continue; > + if (map_info.max_entries != 1) > + continue; > + if (!map_info.btf_value_type_id) > + continue; > + if (!strstr(map_info.name, ".rodata")) > + continue; > + > + *map_id = map_ids[i]; return value_size here to avoid extra syscall below; or rather just accept bpf_map_info pointer and read everything into it? > + goto free_map_ids; > + } > + > + ret = -ENOENT; > + > +free_map_ids: > + free(map_ids); > + return ret; > +} > + > +static void *find_metadata(int prog_fd, struct bpf_map_info *map_info) > +{ > + __u32 map_info_len; > + void *value = NULL; > + int map_id = 0; > + int key = 0; > + int map_fd; > + int err; > + > + err = find_metadata_map_id(prog_fd, &map_id); > + if (err < 0) > + return NULL; > + > + map_fd = bpf_map_get_fd_by_id(map_id); > + if (map_fd < 0) > + return NULL; > + > + map_info_len = sizeof(*map_info); > + err = bpf_obj_get_info_by_fd(map_fd, map_info, &map_info_len); > + if (err) > + goto out_close; > + see above, you are doing bpf_obj_get_info_by_fd just to get value_size, which you already know > + value = malloc(map_info->value_size); > + if (!value) > + goto out_close; > + > + if (bpf_map_lookup_elem(map_fd, &key, value)) > + goto out_free; > + > + close(map_fd); > + return value; > + > +out_free: > + free(value); > +out_close: > + close(map_fd); > + return NULL; > +} > + > +static bool has_metadata_prefix(const char *s) > +{ > + return strstr(s, BPF_METADATA_PREFIX) == s; this is a substring check, not a prefix check, use strncmp instead > +} > + > +static void show_prog_metadata(int fd, __u32 num_maps) > +{ > + const struct btf_type *t_datasec, *t_var; > + struct bpf_map_info map_info = {}; it should be memset > + struct btf_var_secinfo *vsi; > + bool printed_header = false; > + struct btf *btf = NULL; > + unsigned int i, vlen; > + void *value = NULL; > + const char *name; > + int err; > + > + if (!num_maps) > + return; > + [...] > + } else { > + json_writer_t *btf_wtr = jsonw_new(stdout); > + struct btf_dumper d = { > + .btf = btf, > + .jw = btf_wtr, > + .is_plain_text = true, > + }; empty line here? > + if (!btf_wtr) { > + p_err("jsonw alloc failed"); > + goto out_free; > + } > + [...]
On Mon, Sep 14, 2020 at 4:28 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Sep 14, 2020 at 11:37 AM Stanislav Fomichev <sdf@google.com> wrote: > > > > + if (!kernel_supports(FEAT_GLOBAL_DATA)) > > + return 0; > > TBH, I don't think this check is needed, and it's actually coupling > two independent features together. probe_prog_bind_map() probes > PROG_BIND_MAP, it has nothing to do with global data itself. It's all > cached now, so there is no problem with that, it just feels unclean. > If someone is using .rodata and the kernel doesn't support global > data, we'll fail way sooner. On the other hand, if there will be > another use case where PROG_BIND_MAP is needed for something else, why > would we care about global data support? I know that in the real world > it will be hard to find a kernel with PROG_BIND_MAP and no global data > support, due to the latter being so much older, but still, unnecessary > coupling. > > Would be nice to follow up and remove this, thanks. Agreed, will respin, thanks!
On Mon, Sep 14, 2020 at 4:39 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Sep 14, 2020 at 11:37 AM Stanislav Fomichev <sdf@google.com> wrote: > > + if (map_info.type != BPF_MAP_TYPE_ARRAY) > > + continue; > > + if (map_info.key_size != sizeof(int)) > > + continue; > > + if (map_info.max_entries != 1) > > + continue; > > + if (!map_info.btf_value_type_id) > > + continue; > > + if (!strstr(map_info.name, ".rodata")) > > + continue; > > + > > + *map_id = map_ids[i]; > > return value_size here to avoid extra syscall below; or rather just > accept bpf_map_info pointer and read everything into it? Good idea, will just return bpf_map_info. > > + value = malloc(map_info->value_size); > > + if (!value) > > + goto out_close; > > + > > + if (bpf_map_lookup_elem(map_fd, &key, value)) > > + goto out_free; > > + > > + close(map_fd); > > + return value; > > + > > +out_free: > > + free(value); > > +out_close: > > + close(map_fd); > > + return NULL; > > +} > > + > > +static bool has_metadata_prefix(const char *s) > > +{ > > + return strstr(s, BPF_METADATA_PREFIX) == s; > > this is a substring check, not a prefix check, use strncmp instead Right, but I then compare the result to the original value (== s). So if the substring starts with 0th index, we are good. "strncmp(s, BPF_METADATA_PREFIX, BPF_METADATA_PREFIX_LEN) == 0;" felt a bit clunky, but I can use it anyway if it helps the readability. > > +} > > + > > +static void show_prog_metadata(int fd, __u32 num_maps) > > +{ > > + const struct btf_type *t_datasec, *t_var; > > + struct bpf_map_info map_info = {}; > > it should be memset Sounds good. > > > + } else { > > + json_writer_t *btf_wtr = jsonw_new(stdout); > > + struct btf_dumper d = { > > + .btf = btf, > > + .jw = btf_wtr, > > + .is_plain_text = true, > > + }; > > empty line here? Sure.