From patchwork Tue Oct 23 07:02:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 149431 Delivered-To: patch@linaro.org Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp353736ljp; Tue, 23 Oct 2018 00:18:51 -0700 (PDT) X-Google-Smtp-Source: ACcGV61ax/EUJsskxXRRh1pgmSDzRrHJJw7uX97eu81uTY9vYjTejyheIanPtF/wr3xRYoM13M9b X-Received: by 2002:a37:9442:: with SMTP id w63-v6mr46203093qkd.304.1540279131419; Tue, 23 Oct 2018 00:18:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540279131; cv=none; d=google.com; s=arc-20160816; b=syQodHJawES8ZBd16XALPRPqAfOl4CBhH0HBsxT0lqhii+2tPWa/PIbHbsUBM0qgI/ unqjYXVddyHZqDlC6hRK52YQHrzEkbeaVF/vlsMZNN+pMFIaQsgaQy34SuFtDk2B1HYM N0Wx7SekLrlEYHumRgTftTM996xK6It9EwHefLV93GxTAnhwEOCpZCKk4l27USd8zhTT 5K/0rNQZ3gVA8IK9s2LP74/vag9ZvOS8XkO/49rMNZmuappCv+hVKnCDMIXb3XDCyoKc gew10xYFD/l62FCM6Z5yPa8hk1NsucWPJuRtXdzTMT6Ae/3qy9s3uNuL5rzEG7/lzsv+ y/YA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature; bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=; b=y9Vp+3yuV0ezUzrXSDAHs6ZUbYUrzm0RV46VbpRPl6N3/5D4d7Wk2UbRpLKBm9pSvA 4qJkk+20MzVWUR+iLQ+eOIQ51GcxWV7dpzrKxWU0/XtbeEN6vTygYaW6WWtIbz1XuEnT 8TAqT5wL1F3uRnzSOYCfI6omEyUQhjcp4crdKZagON91v2cx9sKSK/xgrKt58yR64B3H CEQ+bdtwR9cJNhNe9LoFWG0awj22mz8WVd6C8CpsfrhogfhYpCFlco61CBCZq4I/hfPB NANQ1Wa5Tkst73YX2sHfy0ugd9SyczzcxWfFPvyTRzaaOV9tRgy+4L2q/MOrd9syr0iJ /Lyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=TBsGqLQd; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id c142-v6si401810qke.97.2018.10.23.00.18.51 for (version=TLS1 cipher=AES128-SHA bits=128/128); Tue, 23 Oct 2018 00:18:51 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=TBsGqLQd; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:38544 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gEqxm-0002m6-UX for patch@linaro.org; Tue, 23 Oct 2018 03:18:50 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60124) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gEqm3-0008Rl-OS for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:06:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gEqif-0003wQ-3D for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:03:18 -0400 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]:35621) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gEqie-0003mn-AB for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:03:12 -0400 Received: by mail-wm1-x343.google.com with SMTP id w186-v6so551491wmf.0 for ; Tue, 23 Oct 2018 00:03:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=; b=TBsGqLQdVdsaCQjhQF0038x7HRkbVUTVzvr8eb+5Y0edjMHsTJlNZyUM9azzqTNu0w ovCKdDOkOd+7SZznFmaODe2aCEKdlb1Pji4vuxfG3QvMMcKHXCIgzVS64UNWZRQ1Tkbf Kj5w5R4t46EkksdA6PbMbobo9wKy0Z5O5jJQA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=; b=aqlajq5xbDSyl1WhEMB6Bq4/FCFUr2t1pkjhYCDxcsBShH9TE89arviatAsW8t1EeM MLi2e1DHACIs+yQWpOjtg8dKXT0ZcqK22+vEuMKmX4fec6ZwR4nQleffZ5i00p6ahWBf CqeF2rlTRZ/imKcKtvMzc8q4n3SlrjvFq18rQNWFcf1I9QjlZF36NawmUmKKbZnEB/WD 9dBebXmSvl3cyTwQGC1UneccdIjOZrgFyB3yH8sxzAeeikKrsqs/YRsTDkXuE/dpQDiJ 3oPc5thh5P7sd1cIOmzU59nsb6rQiXYyMgdCrqD3vlT1AWwTkTlzqaYmrC7hZjz830hp Dvtg== X-Gm-Message-State: AGRZ1gIHaM0ydWluYbl/2CAlSTffTc75clYBJQpUEzj82uGPQRRjlTjF NW+vrRyf/xbyIzFeL+LQudgmTPB8R0o= X-Received: by 2002:a1c:7a0a:: with SMTP id v10-v6mr1842693wmc.41.1540278184499; Tue, 23 Oct 2018 00:03:04 -0700 (PDT) Received: from cloudburst.twiddle.net.ASUS (host86-153-40-62.range86-153.btcentralplus.com. [86.153.40.62]) by smtp.gmail.com with ESMTPSA id t13-v6sm258355wrn.22.2018.10.23.00.03.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 23 Oct 2018 00:03:03 -0700 (PDT) From: Richard Henderson To: qemu-devel@nongnu.org Date: Tue, 23 Oct 2018 08:02:47 +0100 Message-Id: <20181023070253.6407-6-richard.henderson@linaro.org> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181023070253.6407-1-richard.henderson@linaro.org> References: <20181023070253.6407-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::343 Subject: [Qemu-devel] [PATCH 04/10] cputlb: Split large page tracking per mmu_idx X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cota@braap.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary. This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work. Signed-off-by: Richard Henderson --- include/exec/cpu-defs.h | 14 +++- accel/tcg/cputlb.c | 139 ++++++++++++++++++---------------------- 2 files changed, 74 insertions(+), 79 deletions(-) -- 2.17.2 diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index 659c73d2a1..df8ae18d9d 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -141,6 +141,17 @@ typedef struct CPUIOTLBEntry { MemTxAttrs attrs; } CPUIOTLBEntry; +typedef struct CPUTLBDesc { + /* + * Describe a region covering all of the large pages allocated + * into the tlb. When any page within this region is flushed, + * we must flush the entire tlb. The region is matched if + * (addr & large_page_mask) == large_page_addr. + */ + target_ulong large_page_addr; + target_ulong large_page_mask; +} CPUTLBDesc; + /* * Data elements that are shared between all MMU modes. */ @@ -162,13 +173,12 @@ typedef struct CPUTLBCommon { */ #define CPU_COMMON_TLB \ CPUTLBCommon tlb_c; \ + CPUTLBDesc tlb_d[NB_MMU_MODES]; \ CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; \ CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \ CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; \ CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; \ size_t tlb_flush_count; \ - target_ulong tlb_flush_addr; \ - target_ulong tlb_flush_mask; \ target_ulong vtlb_index; \ #else diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index abcd08a8a2..72b0567f70 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -113,6 +113,14 @@ size_t tlb_flush_count(void) return count; } +static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx) +{ + memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); + memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0])); + env->tlb_d[mmu_idx].large_page_addr = -1; + env->tlb_d[mmu_idx].large_page_mask = -1; +} + /* This is OK because CPU architectures generally permit an * implementation to drop entries from the TLB at any time, so * flushing more entries than required is only an efficiency issue, @@ -121,6 +129,7 @@ size_t tlb_flush_count(void) static void tlb_flush_nocheck(CPUState *cpu) { CPUArchState *env = cpu->env_ptr; + int mmu_idx; assert_cpu_is_self(cpu); atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1); @@ -134,15 +143,14 @@ static void tlb_flush_nocheck(CPUState *cpu) */ qemu_spin_lock(&env->tlb_c.lock); env->tlb_c.pending_flush = 0; - memset(env->tlb_table, -1, sizeof(env->tlb_table)); - memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table)); + for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { + tlb_flush_one_mmuidx_locked(env, mmu_idx); + } qemu_spin_unlock(&env->tlb_c.lock); cpu_tb_jmp_cache_clear(cpu); env->vtlb_index = 0; - env->tlb_flush_addr = -1; - env->tlb_flush_mask = 0; } static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data) @@ -192,25 +200,19 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data) assert_cpu_is_self(cpu); - tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask); + tlb_debug("mmu_idx:0x%04lx\n", mmu_idx_bitmask); qemu_spin_lock(&env->tlb_c.lock); env->tlb_c.pending_flush &= ~mmu_idx_bitmask; for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { - if (test_bit(mmu_idx, &mmu_idx_bitmask)) { - tlb_debug("%d\n", mmu_idx); - - memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0])); - memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0])); + tlb_flush_one_mmuidx_locked(env, mmu_idx); } } qemu_spin_unlock(&env->tlb_c.lock); cpu_tb_jmp_cache_clear(cpu); - - tlb_debug("done\n"); } void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap) @@ -287,6 +289,25 @@ static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx, } } +static void tlb_flush_page_locked(CPUArchState *env, int midx, + target_ulong addr) +{ + target_ulong lp_addr = env->tlb_d[midx].large_page_addr; + target_ulong lp_mask = env->tlb_d[midx].large_page_mask; + + /* Check if we need to flush due to large pages. */ + if ((addr & lp_mask) == lp_addr) { + tlb_debug("forcing full flush midx %d (" + TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", + midx, lp_addr, lp_mask); + tlb_flush_one_mmuidx_locked(env, midx); + } else { + int pidx = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); + tlb_flush_entry_locked(&env->tlb_table[midx][pidx], addr); + tlb_flush_vtlb_page_locked(env, midx, addr); + } +} + static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data) { CPUArchState *env = cpu->env_ptr; @@ -295,23 +316,12 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data) assert_cpu_is_self(cpu); - tlb_debug("page :" TARGET_FMT_lx "\n", addr); - - /* Check if we need to flush due to large pages. */ - if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) { - tlb_debug("forcing full flush (" - TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", - env->tlb_flush_addr, env->tlb_flush_mask); - - tlb_flush(cpu); - return; - } + tlb_debug("page addr:" TARGET_FMT_lx "\n", addr); addr &= TARGET_PAGE_MASK; qemu_spin_lock(&env->tlb_c.lock); for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { - tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr); - tlb_flush_vtlb_page_locked(env, mmu_idx, addr); + tlb_flush_page_locked(env, mmu_idx, addr); } qemu_spin_unlock(&env->tlb_c.lock); @@ -346,14 +356,13 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu, assert_cpu_is_self(cpu); - tlb_debug("flush page addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n", + tlb_debug("page addr:" TARGET_FMT_lx " mmu_map:0x%lx\n", addr, mmu_idx_bitmap); qemu_spin_lock(&env->tlb_c.lock); for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { if (test_bit(mmu_idx, &mmu_idx_bitmap)) { - tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr); - tlb_flush_vtlb_page_locked(env, mmu_idx, addr); + tlb_flush_page_locked(env, mmu_idx, addr); } } qemu_spin_unlock(&env->tlb_c.lock); @@ -361,29 +370,6 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu, tb_flush_jmp_cache(cpu, addr); } -static void tlb_check_page_and_flush_by_mmuidx_async_work(CPUState *cpu, - run_on_cpu_data data) -{ - CPUArchState *env = cpu->env_ptr; - target_ulong addr_and_mmuidx = (target_ulong) data.target_ptr; - target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK; - unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS; - - tlb_debug("addr:"TARGET_FMT_lx" mmu_idx: %04lx\n", addr, mmu_idx_bitmap); - - /* Check if we need to flush due to large pages. */ - if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) { - tlb_debug("forced full flush (" - TARGET_FMT_lx "/" TARGET_FMT_lx ")\n", - env->tlb_flush_addr, env->tlb_flush_mask); - - tlb_flush_by_mmuidx_async_work(cpu, - RUN_ON_CPU_HOST_INT(mmu_idx_bitmap)); - } else { - tlb_flush_page_by_mmuidx_async_work(cpu, data); - } -} - void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap) { target_ulong addr_and_mmu_idx; @@ -395,10 +381,10 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap) addr_and_mmu_idx |= idxmap; if (!qemu_cpu_is_self(cpu)) { - async_run_on_cpu(cpu, tlb_check_page_and_flush_by_mmuidx_async_work, + async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_work, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } else { - tlb_check_page_and_flush_by_mmuidx_async_work( + tlb_flush_page_by_mmuidx_async_work( cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx)); } } @@ -406,7 +392,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap) void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr, uint16_t idxmap) { - const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work; + const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work; target_ulong addr_and_mmu_idx; tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); @@ -420,10 +406,10 @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr, } void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu, - target_ulong addr, - uint16_t idxmap) + target_ulong addr, + uint16_t idxmap) { - const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work; + const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work; target_ulong addr_and_mmu_idx; tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap); @@ -577,25 +563,26 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr) /* Our TLB does not support large pages, so remember the area covered by large pages and trigger a full TLB flush if these are invalidated. */ -static void tlb_add_large_page(CPUArchState *env, target_ulong vaddr, - target_ulong size) +static void tlb_add_large_page(CPUArchState *env, int mmu_idx, + target_ulong vaddr, target_ulong size) { - target_ulong mask = ~(size - 1); + target_ulong lp_addr = env->tlb_d[mmu_idx].large_page_addr; + target_ulong lp_mask = ~(size - 1); - if (env->tlb_flush_addr == (target_ulong)-1) { - env->tlb_flush_addr = vaddr & mask; - env->tlb_flush_mask = mask; - return; + if (lp_addr == (target_ulong)-1) { + /* No previous large page. */ + lp_addr = vaddr; + } else { + /* Extend the existing region to include the new page. + This is a compromise between unnecessary flushes and + the cost of maintaining a full variable size TLB. */ + lp_mask &= env->tlb_d[mmu_idx].large_page_mask; + while (((lp_addr ^ vaddr) & lp_mask) != 0) { + lp_mask <<= 1; + } } - /* Extend the existing region to include the new page. - This is a compromise between unnecessary flushes and the cost - of maintaining a full variable size TLB. */ - mask &= env->tlb_flush_mask; - while (((env->tlb_flush_addr ^ vaddr) & mask) != 0) { - mask <<= 1; - } - env->tlb_flush_addr &= mask; - env->tlb_flush_mask = mask; + env->tlb_d[mmu_idx].large_page_addr = lp_addr & lp_mask; + env->tlb_d[mmu_idx].large_page_mask = lp_mask; } /* Add a new TLB entry. At most one entry for a given virtual address @@ -622,12 +609,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr, assert_cpu_is_self(cpu); - if (size < TARGET_PAGE_SIZE) { + if (size <= TARGET_PAGE_SIZE) { sz = TARGET_PAGE_SIZE; } else { - if (size > TARGET_PAGE_SIZE) { - tlb_add_large_page(env, vaddr, size); - } + tlb_add_large_page(env, mmu_idx, vaddr, size); sz = size; } vaddr_page = vaddr & TARGET_PAGE_MASK;