SQL Performance - Indexed View VS Multi-Column Index
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have 2 setups that show a different performance, and I want to understand why.
I have to write down a lot of information, so that all if this makes sense in the context.
TLTR: Why am I loosing the logarithmic scalability of my multi-column index?
The table:
CREATE TABLE Schema1.Item
(
Id INT IDENTITY(1,1) PRIMARY KEY,
UniqueName VARCHAR(20) NOT NULL UNIQUE,
GroupId INT NOT NULL FOREIGN KEY REFERENCES Schema1.Group(Id),
Category VARCHAR(200),
Properties VARCHAR(max)
);
The last column 'Properties' contains a JSON dictionary if property-names+property-values. Which properties are in there is specific to the GroupId.
The test data:
- consists of 1 million items
- distributed in 20 groups (so 50000 items per group)
- which contain 10 categories (so 5000 items per category per group)
This is the index with decreasing performance the bigger the table gets:
CREATE NONCLUSTERED INDEX IX_GroupId_Category
ON [Schema1].[Item] (GroupId, Category)
INCLUDE(Id, UniqueName, Properties)
So a query can look like this:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
AND JSON_VALUE(Properties, '$."PropertyName"') LIKE '%PropertyValue%'
But what I want to discuss is just THIS query, because ultimately everything AFTER this query is always < 5000 items:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
The execution plan basically just consists of 100% Index Seek, with Estimated + Actual Number of Rows = 1000 (as expected). Everything looks fine here.
But with 1.000.000 items, this query still needs 2-3 seconds to finish (without query caching). With 100.000 items, this has been <1 second.
This seems to be against the logic of logarithmic scalability of indices? Even with my very big leafs of the index (because they contain the whole column with nvarchar(max)
, which is typically around 500byte), there should still not be this big difference between 100.000 and 1.000.000 items?
So what I tried next is to create an indexed view which
- filters on
GroupId
(so it has max 50.000 rows) - and has an index on Category (+including all columns, same as before)
And for this view, queries like this :
SELECT TOP (1000) *
FROM [Schema1].[Item_ViewGroupId1]
WHERE Category = 'Category4'
only need < 1 second!
Can anyone explain to me why there is such a big difference between these 2 implementations?
Am I missing something?
EDIT:
The problem seems to be related to physical reads:
- Slow: Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Fast: Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
And it just seems that on average, queries on views need physical reads less often?
Does this mean I am just dependent on what the server is caching? Is there any way I can improve this?
sql sql-server azure-sql-database
add a comment |
I have 2 setups that show a different performance, and I want to understand why.
I have to write down a lot of information, so that all if this makes sense in the context.
TLTR: Why am I loosing the logarithmic scalability of my multi-column index?
The table:
CREATE TABLE Schema1.Item
(
Id INT IDENTITY(1,1) PRIMARY KEY,
UniqueName VARCHAR(20) NOT NULL UNIQUE,
GroupId INT NOT NULL FOREIGN KEY REFERENCES Schema1.Group(Id),
Category VARCHAR(200),
Properties VARCHAR(max)
);
The last column 'Properties' contains a JSON dictionary if property-names+property-values. Which properties are in there is specific to the GroupId.
The test data:
- consists of 1 million items
- distributed in 20 groups (so 50000 items per group)
- which contain 10 categories (so 5000 items per category per group)
This is the index with decreasing performance the bigger the table gets:
CREATE NONCLUSTERED INDEX IX_GroupId_Category
ON [Schema1].[Item] (GroupId, Category)
INCLUDE(Id, UniqueName, Properties)
So a query can look like this:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
AND JSON_VALUE(Properties, '$."PropertyName"') LIKE '%PropertyValue%'
But what I want to discuss is just THIS query, because ultimately everything AFTER this query is always < 5000 items:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
The execution plan basically just consists of 100% Index Seek, with Estimated + Actual Number of Rows = 1000 (as expected). Everything looks fine here.
But with 1.000.000 items, this query still needs 2-3 seconds to finish (without query caching). With 100.000 items, this has been <1 second.
This seems to be against the logic of logarithmic scalability of indices? Even with my very big leafs of the index (because they contain the whole column with nvarchar(max)
, which is typically around 500byte), there should still not be this big difference between 100.000 and 1.000.000 items?
So what I tried next is to create an indexed view which
- filters on
GroupId
(so it has max 50.000 rows) - and has an index on Category (+including all columns, same as before)
And for this view, queries like this :
SELECT TOP (1000) *
FROM [Schema1].[Item_ViewGroupId1]
WHERE Category = 'Category4'
only need < 1 second!
Can anyone explain to me why there is such a big difference between these 2 implementations?
Am I missing something?
EDIT:
The problem seems to be related to physical reads:
- Slow: Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Fast: Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
And it just seems that on average, queries on views need physical reads less often?
Does this mean I am just dependent on what the server is caching? Is there any way I can improve this?
sql sql-server azure-sql-database
YourWHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you needORDER BY
withTOP
. Otherwise, the rows returned are random.
– Dan Guzman
Nov 24 '18 at 21:11
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52
add a comment |
I have 2 setups that show a different performance, and I want to understand why.
I have to write down a lot of information, so that all if this makes sense in the context.
TLTR: Why am I loosing the logarithmic scalability of my multi-column index?
The table:
CREATE TABLE Schema1.Item
(
Id INT IDENTITY(1,1) PRIMARY KEY,
UniqueName VARCHAR(20) NOT NULL UNIQUE,
GroupId INT NOT NULL FOREIGN KEY REFERENCES Schema1.Group(Id),
Category VARCHAR(200),
Properties VARCHAR(max)
);
The last column 'Properties' contains a JSON dictionary if property-names+property-values. Which properties are in there is specific to the GroupId.
The test data:
- consists of 1 million items
- distributed in 20 groups (so 50000 items per group)
- which contain 10 categories (so 5000 items per category per group)
This is the index with decreasing performance the bigger the table gets:
CREATE NONCLUSTERED INDEX IX_GroupId_Category
ON [Schema1].[Item] (GroupId, Category)
INCLUDE(Id, UniqueName, Properties)
So a query can look like this:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
AND JSON_VALUE(Properties, '$."PropertyName"') LIKE '%PropertyValue%'
But what I want to discuss is just THIS query, because ultimately everything AFTER this query is always < 5000 items:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
The execution plan basically just consists of 100% Index Seek, with Estimated + Actual Number of Rows = 1000 (as expected). Everything looks fine here.
But with 1.000.000 items, this query still needs 2-3 seconds to finish (without query caching). With 100.000 items, this has been <1 second.
This seems to be against the logic of logarithmic scalability of indices? Even with my very big leafs of the index (because they contain the whole column with nvarchar(max)
, which is typically around 500byte), there should still not be this big difference between 100.000 and 1.000.000 items?
So what I tried next is to create an indexed view which
- filters on
GroupId
(so it has max 50.000 rows) - and has an index on Category (+including all columns, same as before)
And for this view, queries like this :
SELECT TOP (1000) *
FROM [Schema1].[Item_ViewGroupId1]
WHERE Category = 'Category4'
only need < 1 second!
Can anyone explain to me why there is such a big difference between these 2 implementations?
Am I missing something?
EDIT:
The problem seems to be related to physical reads:
- Slow: Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Fast: Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
And it just seems that on average, queries on views need physical reads less often?
Does this mean I am just dependent on what the server is caching? Is there any way I can improve this?
sql sql-server azure-sql-database
I have 2 setups that show a different performance, and I want to understand why.
I have to write down a lot of information, so that all if this makes sense in the context.
TLTR: Why am I loosing the logarithmic scalability of my multi-column index?
The table:
CREATE TABLE Schema1.Item
(
Id INT IDENTITY(1,1) PRIMARY KEY,
UniqueName VARCHAR(20) NOT NULL UNIQUE,
GroupId INT NOT NULL FOREIGN KEY REFERENCES Schema1.Group(Id),
Category VARCHAR(200),
Properties VARCHAR(max)
);
The last column 'Properties' contains a JSON dictionary if property-names+property-values. Which properties are in there is specific to the GroupId.
The test data:
- consists of 1 million items
- distributed in 20 groups (so 50000 items per group)
- which contain 10 categories (so 5000 items per category per group)
This is the index with decreasing performance the bigger the table gets:
CREATE NONCLUSTERED INDEX IX_GroupId_Category
ON [Schema1].[Item] (GroupId, Category)
INCLUDE(Id, UniqueName, Properties)
So a query can look like this:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
AND JSON_VALUE(Properties, '$."PropertyName"') LIKE '%PropertyValue%'
But what I want to discuss is just THIS query, because ultimately everything AFTER this query is always < 5000 items:
SELECT TOP (1000) *
FROM [Schema1].[Item]
WHERE GroupId = 2
AND Category = 'Category4'
The execution plan basically just consists of 100% Index Seek, with Estimated + Actual Number of Rows = 1000 (as expected). Everything looks fine here.
But with 1.000.000 items, this query still needs 2-3 seconds to finish (without query caching). With 100.000 items, this has been <1 second.
This seems to be against the logic of logarithmic scalability of indices? Even with my very big leafs of the index (because they contain the whole column with nvarchar(max)
, which is typically around 500byte), there should still not be this big difference between 100.000 and 1.000.000 items?
So what I tried next is to create an indexed view which
- filters on
GroupId
(so it has max 50.000 rows) - and has an index on Category (+including all columns, same as before)
And for this view, queries like this :
SELECT TOP (1000) *
FROM [Schema1].[Item_ViewGroupId1]
WHERE Category = 'Category4'
only need < 1 second!
Can anyone explain to me why there is such a big difference between these 2 implementations?
Am I missing something?
EDIT:
The problem seems to be related to physical reads:
- Slow: Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Fast: Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0
And it just seems that on average, queries on views need physical reads less often?
Does this mean I am just dependent on what the server is caching? Is there any way I can improve this?
sql sql-server azure-sql-database
sql sql-server azure-sql-database
edited Nov 24 '18 at 21:55
Kevin B.
asked Nov 24 '18 at 21:00
Kevin B.Kevin B.
33
33
YourWHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you needORDER BY
withTOP
. Otherwise, the rows returned are random.
– Dan Guzman
Nov 24 '18 at 21:11
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52
add a comment |
YourWHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you needORDER BY
withTOP
. Otherwise, the rows returned are random.
– Dan Guzman
Nov 24 '18 at 21:11
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52
Your
WHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you need ORDER BY
with TOP
. Otherwise, the rows returned are random.– Dan Guzman
Nov 24 '18 at 21:11
Your
WHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you need ORDER BY
with TOP
. Otherwise, the rows returned are random.– Dan Guzman
Nov 24 '18 at 21:11
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52
add a comment |
1 Answer
1
active
oldest
votes
If the plan is the same and the logical IO is the same, then the elapsed time will be the same, unless there are waits, like IO waits, lock waits, etc. It may be simply that when testing with a larger table, your pages aren't all cached.
The Query Store tracks waits on a per-query and per-plan basis, so you can check with something like:
select qt.query_sql_text, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms
from sys.query_store_query q
left join sys.query_store_query_text qt
on q.query_text_id= qt.query_text_id
left join sys.query_store_plan p
on q.query_id = p.plan_id
left join sys.query_store_wait_stats ws
on p.plan_id = ws.plan_id
order by q.query_id, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms desc
Does this mean I am just dependent on what the server is caching?
Yes. Your query performance will always depend on whether your data is cached.
Is there any way I can improve this?
SQL Server will retain the most used pages in the page cache, and to cache more data you can either increase the amount of memory you have available (by increasing your DTUs or vCores), or increase the number of rows you fit on a page. One thing you might try here is to COMPRESS the JSON data and DECOMPRESS it when needed. That will cache more data, at the cost of additional CPU on read.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462326%2fsql-performance-indexed-view-vs-multi-column-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If the plan is the same and the logical IO is the same, then the elapsed time will be the same, unless there are waits, like IO waits, lock waits, etc. It may be simply that when testing with a larger table, your pages aren't all cached.
The Query Store tracks waits on a per-query and per-plan basis, so you can check with something like:
select qt.query_sql_text, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms
from sys.query_store_query q
left join sys.query_store_query_text qt
on q.query_text_id= qt.query_text_id
left join sys.query_store_plan p
on q.query_id = p.plan_id
left join sys.query_store_wait_stats ws
on p.plan_id = ws.plan_id
order by q.query_id, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms desc
Does this mean I am just dependent on what the server is caching?
Yes. Your query performance will always depend on whether your data is cached.
Is there any way I can improve this?
SQL Server will retain the most used pages in the page cache, and to cache more data you can either increase the amount of memory you have available (by increasing your DTUs or vCores), or increase the number of rows you fit on a page. One thing you might try here is to COMPRESS the JSON data and DECOMPRESS it when needed. That will cache more data, at the cost of additional CPU on read.
add a comment |
If the plan is the same and the logical IO is the same, then the elapsed time will be the same, unless there are waits, like IO waits, lock waits, etc. It may be simply that when testing with a larger table, your pages aren't all cached.
The Query Store tracks waits on a per-query and per-plan basis, so you can check with something like:
select qt.query_sql_text, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms
from sys.query_store_query q
left join sys.query_store_query_text qt
on q.query_text_id= qt.query_text_id
left join sys.query_store_plan p
on q.query_id = p.plan_id
left join sys.query_store_wait_stats ws
on p.plan_id = ws.plan_id
order by q.query_id, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms desc
Does this mean I am just dependent on what the server is caching?
Yes. Your query performance will always depend on whether your data is cached.
Is there any way I can improve this?
SQL Server will retain the most used pages in the page cache, and to cache more data you can either increase the amount of memory you have available (by increasing your DTUs or vCores), or increase the number of rows you fit on a page. One thing you might try here is to COMPRESS the JSON data and DECOMPRESS it when needed. That will cache more data, at the cost of additional CPU on read.
add a comment |
If the plan is the same and the logical IO is the same, then the elapsed time will be the same, unless there are waits, like IO waits, lock waits, etc. It may be simply that when testing with a larger table, your pages aren't all cached.
The Query Store tracks waits on a per-query and per-plan basis, so you can check with something like:
select qt.query_sql_text, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms
from sys.query_store_query q
left join sys.query_store_query_text qt
on q.query_text_id= qt.query_text_id
left join sys.query_store_plan p
on q.query_id = p.plan_id
left join sys.query_store_wait_stats ws
on p.plan_id = ws.plan_id
order by q.query_id, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms desc
Does this mean I am just dependent on what the server is caching?
Yes. Your query performance will always depend on whether your data is cached.
Is there any way I can improve this?
SQL Server will retain the most used pages in the page cache, and to cache more data you can either increase the amount of memory you have available (by increasing your DTUs or vCores), or increase the number of rows you fit on a page. One thing you might try here is to COMPRESS the JSON data and DECOMPRESS it when needed. That will cache more data, at the cost of additional CPU on read.
If the plan is the same and the logical IO is the same, then the elapsed time will be the same, unless there are waits, like IO waits, lock waits, etc. It may be simply that when testing with a larger table, your pages aren't all cached.
The Query Store tracks waits on a per-query and per-plan basis, so you can check with something like:
select qt.query_sql_text, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms
from sys.query_store_query q
left join sys.query_store_query_text qt
on q.query_text_id= qt.query_text_id
left join sys.query_store_plan p
on q.query_id = p.plan_id
left join sys.query_store_wait_stats ws
on p.plan_id = ws.plan_id
order by q.query_id, p.plan_id, ws.wait_category, ws.wait_category_desc, ws.avg_query_wait_time_ms desc
Does this mean I am just dependent on what the server is caching?
Yes. Your query performance will always depend on whether your data is cached.
Is there any way I can improve this?
SQL Server will retain the most used pages in the page cache, and to cache more data you can either increase the amount of memory you have available (by increasing your DTUs or vCores), or increase the number of rows you fit on a page. One thing you might try here is to COMPRESS the JSON data and DECOMPRESS it when needed. That will cache more data, at the cost of additional CPU on read.
edited Nov 24 '18 at 23:34
answered Nov 24 '18 at 21:43
David Browne - MicrosoftDavid Browne - Microsoft
17.8k2827
17.8k2827
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462326%2fsql-performance-indexed-view-vs-multi-column-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your
WHERE
clause filters on Category and ClassId but the index is on GroupId (not used in the query) and Category so it's not useful. Try creating an index on Caegory and ClassId. I would expect that to improve performance but not be quite as fast as the indexed view. Be aware that you needORDER BY
withTOP
. Otherwise, the rows returned are random.– Dan Guzman
Nov 24 '18 at 21:11
When you are testing with 100,000 items, are there 1000 rows returned? As with 20 groups and 10 categories per group, it might be only 500 rows. And if you SET STATISTICS IO ON, do the queries show a difference in logical reads?
– David Browne - Microsoft
Nov 24 '18 at 21:13
@DanGuzman: This was just a typo, I fixed it. It is searching on the correct index. And i am not interested in any order.
– Kevin B.
Nov 24 '18 at 21:17
@DavidBrowne-Microsoft: The results of the 2 queries are identical, except the duration of the query. I will look into the logical reads.
– Kevin B.
Nov 24 '18 at 21:18
@DavidBrowne-Microsoft: I did some further testing, and it seems that physical reads is the cause of the problem? They seem to occur less on my views. SLOW Table 'Item'. Scan count 1, logical reads 362, physical reads 148, read-ahead reads 547, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. FAST Table 'Item'. Scan count 1, logical reads 362, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
– Kevin B.
Nov 24 '18 at 21:52