From patchwork Tue Oct 23 07:02:47 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Henderson <richard.henderson@linaro.org>
X-Patchwork-Id: 149431
Delivered-To: patch@linaro.org
Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp353736ljp;
 Tue, 23 Oct 2018 00:18:51 -0700 (PDT)
X-Google-Smtp-Source: ACcGV61ax/EUJsskxXRRh1pgmSDzRrHJJw7uX97eu81uTY9vYjTejyheIanPtF/wr3xRYoM13M9b
X-Received: by 2002:a37:9442:: with SMTP id
 w63-v6mr46203093qkd.304.1540279131419; 
 Tue, 23 Oct 2018 00:18:51 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1540279131; cv=none;
 d=google.com; s=arc-20160816;
 b=syQodHJawES8ZBd16XALPRPqAfOl4CBhH0HBsxT0lqhii+2tPWa/PIbHbsUBM0qgI/
 unqjYXVddyHZqDlC6hRK52YQHrzEkbeaVF/vlsMZNN+pMFIaQsgaQy34SuFtDk2B1HYM
 N0Wx7SekLrlEYHumRgTftTM996xK6It9EwHefLV93GxTAnhwEOCpZCKk4l27USd8zhTT
 5K/0rNQZ3gVA8IK9s2LP74/vag9ZvOS8XkO/49rMNZmuappCv+hVKnCDMIXb3XDCyoKc
 gew10xYFD/l62FCM6Z5yPa8hk1NsucWPJuRtXdzTMT6Ae/3qy9s3uNuL5rzEG7/lzsv+
 y/YA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:subject:references:in-reply-to
 :message-id:date:to:from:dkim-signature;
 bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=;
 b=y9Vp+3yuV0ezUzrXSDAHs6ZUbYUrzm0RV46VbpRPl6N3/5D4d7Wk2UbRpLKBm9pSvA
 4qJkk+20MzVWUR+iLQ+eOIQ51GcxWV7dpzrKxWU0/XtbeEN6vTygYaW6WWtIbz1XuEnT
 8TAqT5wL1F3uRnzSOYCfI6omEyUQhjcp4crdKZagON91v2cx9sKSK/xgrKt58yR64B3H
 CEQ+bdtwR9cJNhNe9LoFWG0awj22mz8WVd6C8CpsfrhogfhYpCFlco61CBCZq4I/hfPB
 NANQ1Wa5Tkst73YX2sHfy0ugd9SyczzcxWfFPvyTRzaaOV9tRgy+4L2q/MOrd9syr0iJ
 /Lyg==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=TBsGqLQd;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11])
 by mx.google.com with ESMTPS id
 c142-v6si401810qke.97.2018.10.23.00.18.51 for <patch@linaro.org>
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Tue, 23 Oct 2018 00:18:51 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 client-ip=2001:4830:134:3::11; 
Authentication-Results: mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=TBsGqLQd;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 2001:4830:134:3::11 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: from localhost ([::1]:38544 helo=lists.gnu.org)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <qemu-devel-bounces+patch=linaro.org@nongnu.org>)
 id 1gEqxm-0002m6-UX
 for patch@linaro.org; Tue, 23 Oct 2018 03:18:50 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60124)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1gEqm3-0008Rl-OS
 for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:06:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1gEqif-0003wQ-3D
 for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:03:18 -0400
Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]:35621)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
 id 1gEqie-0003mn-AB
 for qemu-devel@nongnu.org; Tue, 23 Oct 2018 03:03:12 -0400
Received: by mail-wm1-x343.google.com with SMTP id w186-v6so551491wmf.0
 for <qemu-devel@nongnu.org>; Tue, 23 Oct 2018 00:03:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=;
 b=TBsGqLQdVdsaCQjhQF0038x7HRkbVUTVzvr8eb+5Y0edjMHsTJlNZyUM9azzqTNu0w
 ovCKdDOkOd+7SZznFmaODe2aCEKdlb1Pji4vuxfG3QvMMcKHXCIgzVS64UNWZRQ1Tkbf
 Kj5w5R4t46EkksdA6PbMbobo9wKy0Z5O5jJQA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=Ei4NQbVPRIt5lqbMkVeEwvfJAULjrp+cmAiYYkxBlVk=;
 b=aqlajq5xbDSyl1WhEMB6Bq4/FCFUr2t1pkjhYCDxcsBShH9TE89arviatAsW8t1EeM
 MLi2e1DHACIs+yQWpOjtg8dKXT0ZcqK22+vEuMKmX4fec6ZwR4nQleffZ5i00p6ahWBf
 CqeF2rlTRZ/imKcKtvMzc8q4n3SlrjvFq18rQNWFcf1I9QjlZF36NawmUmKKbZnEB/WD
 9dBebXmSvl3cyTwQGC1UneccdIjOZrgFyB3yH8sxzAeeikKrsqs/YRsTDkXuE/dpQDiJ
 3oPc5thh5P7sd1cIOmzU59nsb6rQiXYyMgdCrqD3vlT1AWwTkTlzqaYmrC7hZjz830hp
 Dvtg==
X-Gm-Message-State: AGRZ1gIHaM0ydWluYbl/2CAlSTffTc75clYBJQpUEzj82uGPQRRjlTjF
 NW+vrRyf/xbyIzFeL+LQudgmTPB8R0o=
X-Received: by 2002:a1c:7a0a:: with SMTP id
 v10-v6mr1842693wmc.41.1540278184499; 
 Tue, 23 Oct 2018 00:03:04 -0700 (PDT)
Received: from cloudburst.twiddle.net.ASUS
 (host86-153-40-62.range86-153.btcentralplus.com. [86.153.40.62])
 by smtp.gmail.com with ESMTPSA id
 t13-v6sm258355wrn.22.2018.10.23.00.03.02
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Tue, 23 Oct 2018 00:03:03 -0700 (PDT)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Tue, 23 Oct 2018 08:02:47 +0100
Message-Id: <20181023070253.6407-6-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.17.2
In-Reply-To: <20181023070253.6407-1-richard.henderson@linaro.org>
References: <20181023070253.6407-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2a00:1450:4864:20::343
Subject: [Qemu-devel] [PATCH 04/10] cputlb: Split large page tracking per
 mmu_idx
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: cota@braap.org
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

The set of large pages in the kernel is probably not the same
as the set of large pages in the application.  Forcing one
range to cover both will flush more often than necessary.

This allows tlb_flush_page_async_work to flush just the one
mmu_idx implicated, which in turn allows us to remove
tlb_check_page_and_flush_by_mmuidx_async_work.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/cpu-defs.h |  14 +++-
 accel/tcg/cputlb.c      | 139 ++++++++++++++++++----------------------
 2 files changed, 74 insertions(+), 79 deletions(-)

-- 
2.17.2

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 659c73d2a1..df8ae18d9d 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -141,6 +141,17 @@ typedef struct CPUIOTLBEntry {
     MemTxAttrs attrs;
 } CPUIOTLBEntry;
 
+typedef struct CPUTLBDesc {
+    /*
+     * Describe a region covering all of the large pages allocated
+     * into the tlb.  When any page within this region is flushed,
+     * we must flush the entire tlb.  The region is matched if
+     * (addr & large_page_mask) == large_page_addr.
+     */
+    target_ulong large_page_addr;
+    target_ulong large_page_mask;
+} CPUTLBDesc;
+
 /*
  * Data elements that are shared between all MMU modes.
  */
@@ -162,13 +173,12 @@ typedef struct CPUTLBCommon {
  */
 #define CPU_COMMON_TLB \
     CPUTLBCommon tlb_c;                                                 \
+    CPUTLBDesc tlb_d[NB_MMU_MODES];                                     \
     CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
     CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];               \
     CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];                    \
     CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE];                 \
     size_t tlb_flush_count;                                             \
-    target_ulong tlb_flush_addr;                                        \
-    target_ulong tlb_flush_mask;                                        \
     target_ulong vtlb_index;                                            \
 
 #else
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index abcd08a8a2..72b0567f70 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -113,6 +113,14 @@ size_t tlb_flush_count(void)
     return count;
 }
 
+static void tlb_flush_one_mmuidx_locked(CPUArchState *env, int mmu_idx)
+{
+    memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
+    memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+    env->tlb_d[mmu_idx].large_page_addr = -1;
+    env->tlb_d[mmu_idx].large_page_mask = -1;
+}
+
 /* This is OK because CPU architectures generally permit an
  * implementation to drop entries from the TLB at any time, so
  * flushing more entries than required is only an efficiency issue,
@@ -121,6 +129,7 @@ size_t tlb_flush_count(void)
 static void tlb_flush_nocheck(CPUState *cpu)
 {
     CPUArchState *env = cpu->env_ptr;
+    int mmu_idx;
 
     assert_cpu_is_self(cpu);
     atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
@@ -134,15 +143,14 @@ static void tlb_flush_nocheck(CPUState *cpu)
      */
     qemu_spin_lock(&env->tlb_c.lock);
     env->tlb_c.pending_flush = 0;
-    memset(env->tlb_table, -1, sizeof(env->tlb_table));
-    memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        tlb_flush_one_mmuidx_locked(env, mmu_idx);
+    }
     qemu_spin_unlock(&env->tlb_c.lock);
 
     cpu_tb_jmp_cache_clear(cpu);
 
     env->vtlb_index = 0;
-    env->tlb_flush_addr = -1;
-    env->tlb_flush_mask = 0;
 }
 
 static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data)
@@ -192,25 +200,19 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 
     assert_cpu_is_self(cpu);
 
-    tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
+    tlb_debug("mmu_idx:0x%04lx\n", mmu_idx_bitmask);
 
     qemu_spin_lock(&env->tlb_c.lock);
     env->tlb_c.pending_flush &= ~mmu_idx_bitmask;
 
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
-
         if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
-            tlb_debug("%d\n", mmu_idx);
-
-            memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
-            memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+            tlb_flush_one_mmuidx_locked(env, mmu_idx);
         }
     }
     qemu_spin_unlock(&env->tlb_c.lock);
 
     cpu_tb_jmp_cache_clear(cpu);
-
-    tlb_debug("done\n");
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
@@ -287,6 +289,25 @@ static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
     }
 }
 
+static void tlb_flush_page_locked(CPUArchState *env, int midx,
+                                  target_ulong addr)
+{
+    target_ulong lp_addr = env->tlb_d[midx].large_page_addr;
+    target_ulong lp_mask = env->tlb_d[midx].large_page_mask;
+
+    /* Check if we need to flush due to large pages.  */
+    if ((addr & lp_mask) == lp_addr) {
+        tlb_debug("forcing full flush midx %d ("
+                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+                  midx, lp_addr, lp_mask);
+        tlb_flush_one_mmuidx_locked(env, midx);
+    } else {
+        int pidx = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+        tlb_flush_entry_locked(&env->tlb_table[midx][pidx], addr);
+        tlb_flush_vtlb_page_locked(env, midx, addr);
+    }
+}
+
 static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
 {
     CPUArchState *env = cpu->env_ptr;
@@ -295,23 +316,12 @@ static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
 
     assert_cpu_is_self(cpu);
 
-    tlb_debug("page :" TARGET_FMT_lx "\n", addr);
-
-    /* Check if we need to flush due to large pages.  */
-    if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
-        tlb_debug("forcing full flush ("
-                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
-                  env->tlb_flush_addr, env->tlb_flush_mask);
-
-        tlb_flush(cpu);
-        return;
-    }
+    tlb_debug("page addr:" TARGET_FMT_lx "\n", addr);
 
     addr &= TARGET_PAGE_MASK;
     qemu_spin_lock(&env->tlb_c.lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
-        tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr);
-        tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
+        tlb_flush_page_locked(env, mmu_idx, addr);
     }
     qemu_spin_unlock(&env->tlb_c.lock);
 
@@ -346,14 +356,13 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
 
     assert_cpu_is_self(cpu);
 
-    tlb_debug("flush page addr:"TARGET_FMT_lx" mmu_idx:0x%lx\n",
+    tlb_debug("page addr:" TARGET_FMT_lx " mmu_map:0x%lx\n",
               addr, mmu_idx_bitmap);
 
     qemu_spin_lock(&env->tlb_c.lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
         if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
-            tlb_flush_entry_locked(tlb_entry(env, mmu_idx, addr), addr);
-            tlb_flush_vtlb_page_locked(env, mmu_idx, addr);
+            tlb_flush_page_locked(env, mmu_idx, addr);
         }
     }
     qemu_spin_unlock(&env->tlb_c.lock);
@@ -361,29 +370,6 @@ static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
     tb_flush_jmp_cache(cpu, addr);
 }
 
-static void tlb_check_page_and_flush_by_mmuidx_async_work(CPUState *cpu,
-                                                          run_on_cpu_data data)
-{
-    CPUArchState *env = cpu->env_ptr;
-    target_ulong addr_and_mmuidx = (target_ulong) data.target_ptr;
-    target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK;
-    unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS;
-
-    tlb_debug("addr:"TARGET_FMT_lx" mmu_idx: %04lx\n", addr, mmu_idx_bitmap);
-
-    /* Check if we need to flush due to large pages.  */
-    if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
-        tlb_debug("forced full flush ("
-                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
-                  env->tlb_flush_addr, env->tlb_flush_mask);
-
-        tlb_flush_by_mmuidx_async_work(cpu,
-                                       RUN_ON_CPU_HOST_INT(mmu_idx_bitmap));
-    } else {
-        tlb_flush_page_by_mmuidx_async_work(cpu, data);
-    }
-}
-
 void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
 {
     target_ulong addr_and_mmu_idx;
@@ -395,10 +381,10 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
     addr_and_mmu_idx |= idxmap;
 
     if (!qemu_cpu_is_self(cpu)) {
-        async_run_on_cpu(cpu, tlb_check_page_and_flush_by_mmuidx_async_work,
+        async_run_on_cpu(cpu, tlb_flush_page_by_mmuidx_async_work,
                          RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
     } else {
-        tlb_check_page_and_flush_by_mmuidx_async_work(
+        tlb_flush_page_by_mmuidx_async_work(
             cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
     }
 }
@@ -406,7 +392,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t idxmap)
 void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
                                        uint16_t idxmap)
 {
-    const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+    const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work;
     target_ulong addr_and_mmu_idx;
 
     tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
@@ -420,10 +406,10 @@ void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
 }
 
 void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
-                                                            target_ulong addr,
-                                                            uint16_t idxmap)
+                                              target_ulong addr,
+                                              uint16_t idxmap)
 {
-    const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+    const run_on_cpu_func fn = tlb_flush_page_by_mmuidx_async_work;
     target_ulong addr_and_mmu_idx;
 
     tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
@@ -577,25 +563,26 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 
 /* Our TLB does not support large pages, so remember the area covered by
    large pages and trigger a full TLB flush if these are invalidated.  */
-static void tlb_add_large_page(CPUArchState *env, target_ulong vaddr,
-                               target_ulong size)
+static void tlb_add_large_page(CPUArchState *env, int mmu_idx,
+                               target_ulong vaddr, target_ulong size)
 {
-    target_ulong mask = ~(size - 1);
+    target_ulong lp_addr = env->tlb_d[mmu_idx].large_page_addr;
+    target_ulong lp_mask = ~(size - 1);
 
-    if (env->tlb_flush_addr == (target_ulong)-1) {
-        env->tlb_flush_addr = vaddr & mask;
-        env->tlb_flush_mask = mask;
-        return;
+    if (lp_addr == (target_ulong)-1) {
+        /* No previous large page.  */
+        lp_addr = vaddr;
+    } else {
+        /* Extend the existing region to include the new page.
+           This is a compromise between unnecessary flushes and
+           the cost of maintaining a full variable size TLB.  */
+        lp_mask &= env->tlb_d[mmu_idx].large_page_mask;
+        while (((lp_addr ^ vaddr) & lp_mask) != 0) {
+            lp_mask <<= 1;
+        }
     }
-    /* Extend the existing region to include the new page.
-       This is a compromise between unnecessary flushes and the cost
-       of maintaining a full variable size TLB.  */
-    mask &= env->tlb_flush_mask;
-    while (((env->tlb_flush_addr ^ vaddr) & mask) != 0) {
-        mask <<= 1;
-    }
-    env->tlb_flush_addr &= mask;
-    env->tlb_flush_mask = mask;
+    env->tlb_d[mmu_idx].large_page_addr = lp_addr & lp_mask;
+    env->tlb_d[mmu_idx].large_page_mask = lp_mask;
 }
 
 /* Add a new TLB entry. At most one entry for a given virtual address
@@ -622,12 +609,10 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
 
     assert_cpu_is_self(cpu);
 
-    if (size < TARGET_PAGE_SIZE) {
+    if (size <= TARGET_PAGE_SIZE) {
         sz = TARGET_PAGE_SIZE;
     } else {
-        if (size > TARGET_PAGE_SIZE) {
-            tlb_add_large_page(env, vaddr, size);
-        }
+        tlb_add_large_page(env, mmu_idx, vaddr, size);
         sz = size;
     }
     vaddr_page = vaddr & TARGET_PAGE_MASK;