When update statistics

Discussion:

(too old to reply)

unknown

2009-02-10 18:51:09 UTC

Hi,
I'm hoping to find someone who tell me when I can expect to
must run update statissics. We have system of orders where
many data comes to the tables and over millions of each day.
Now my stored procedures are slow performance and some say
update statistics will work. How do I know when I need to.

Sherlock, Kevin [TeamSybase]

2009-02-10 19:29:50 UTC

Permalink

What version of ASE?

IF "update statistics" does indeed solve your performance problems, you're
probably better off just scheduling an "update stats" script to run every
day/week/month or however often you find that it's necessary.

On the other hand, depending on your version of ASE, some might suggest
automating this process using the "datachange()" function and the built-in
job scheduler for ASE. I think there are details of how to do that in the
manuals:
http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

In my opinion, I'm a bit skeptical of the "automating update stats" using
datachange(), but that's just my opinion. You might find that it solves
your problem.

Post by unknown
Hi,
I'm hoping to find someone who tell me when I can expect to
must run update statissics. We have system of orders where
many data comes to the tables and over millions of each day.
Now my stored procedures are slow performance and some say
update statistics will work. How do I know when I need to.

Bret Halford [Sybase]

2009-02-10 20:06:14 UTC

Permalink

There is another approach you can use if you "know your data".
You really only need to update the statistics on columns whose
distribution of data changes.

So if you have an index on an "order_date" field, that should certainly
be updated frequently as new ever-greater values keep getting added to
the table. If your customer base is staying steady, then
updating statistics on "last_name" won't be necessary very often, but
if your customer base mostly consisted of residents of Denver CO, and
you suddenly started doing business in China, statistics would need to
be updated to reflect the shift from "Smith" towards "Chang" as the most
common surname.

-bret

Post by Sherlock, Kevin [TeamSybase]
What version of ASE?
IF "update statistics" does indeed solve your performance problems, you're
probably better off just scheduling an "update stats" script to run every
day/week/month or however often you find that it's necessary.
On the other hand, depending on your version of ASE, some might suggest
automating this process using the "datachange()" function and the built-in
job scheduler for ASE. I think there are details of how to do that in the
http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm
In my opinion, I'm a bit skeptical of the "automating update stats" using
datachange(), but that's just my opinion. You might find that it solves
your problem.

Sherlock, Kevin [TeamSybase]

2009-02-10 20:37:56 UTC

Permalink

That's why I wish there were something better than datachange(). Something
that indicated what percentage of a column's distribution has "changed". I
can't think of any way to do it though without a performance penalty. So I
suppose datachange() is the best we can hope for...

Post by Bret Halford [Sybase]
There is another approach you can use if you "know your data".
You really only need to update the statistics on columns whose
distribution of data changes.
So if you have an index on an "order_date" field, that should certainly
be updated frequently as new ever-greater values keep getting added to the
table. If your customer base is staying steady, then
updating statistics on "last_name" won't be necessary very often, but
if your customer base mostly consisted of residents of Denver CO, and
you suddenly started doing business in China, statistics would need to
be updated to reflect the shift from "Smith" towards "Chang" as the most
common surname.
-bret

Post by Sherlock, Kevin [TeamSybase]
What version of ASE?
IF "update statistics" does indeed solve your performance problems,
you're probably better off just scheduling an "update stats" script to
run every day/week/month or however often you find that it's necessary.
On the other hand, depending on your version of ASE, some might suggest
automating this process using the "datachange()" function and the
built-in job scheduler for ASE. I think there are details of how to do
http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm
In my opinion, I'm a bit skeptical of the "automating update stats" using
datachange(), but that's just my opinion. You might find that it solves
your problem.

unknown

2009-02-16 21:16:51 UTC

Permalink

I'm version ASE 15.0. But to much update statistics run
very long. Also when all statistics except only indexes? I
doubt think datachange works when I read on ISUG list. How
can I know if datachange will be right. Do other users it
often?

http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

Post by Sherlock, Kevin [TeamSybase]
In my opinion, I'm a bit skeptical of the "automating
update stats" using datachange(), but that's just my
opinion. You might find that it solves your problem.

Post by unknown
I'm hoping to find someone who tell me when I can expect
to must run update statissics. We have system of orders
where many data comes to the tables and over millions of
each day. Now my stored procedures are slow performance
and some say update statistics will work. How do I know

when I need to.

Derek Asirvadem

2009-02-17 01:57:33 UTC

Permalink

I'm version ASE 15.0. But to much update statistics run very long.
Also when all statistics except only indexes?

1 You should use "update index stats" in preference to "update stats".
The frequency depends on your table size and the amount of change to
each INDEXED column. Assuming that the db is reasonably normalised,
and the correct indices have been created, then yes, you want to run
update index stats on indices only, not on the non-indexed columns.
2 You need to run update table stats less frequently
3 In 15.0 the new function datachange() provides a method of
identifying the amount of change in each table [partition [column] ].
4 One of the simple ways to reduce the duration of update stats (any
flavour) is to partition large tables (that improves parallel
performance as well), and run it on single partitions
5 THEN, there may be the intent to constrain inserts to one or a few
partitions, but that requires table implementation/DDL changes. And
the app has to be good enough to be isolated from such changes.

I doubt think datachange works when I read on ISUG list.

Re ISUG list in general. The ISUG list is a small pond where a few
incompetents dominate the forum, the level of technical information
there is very superficial. It is often narrow (only applies to a small
set of circumstances) and should not be used as general rules or
guidelines. The list is full of misinformation and errors, if you have
been watching it for a while you will notice the quiet reversals of
technical advice (unfortunately undeclared). Good for newbies, to whom
any knowledge is impressive, but a waste of time otherwise. The second
problem is that the volumes of discussion (between the few) re low
level technical issues and settings, may give the reader the impression
that Sybase is difficult to master, that DBAs need to know these
details. Completely untrue, Sybase ASE P&T works best (large benefit)
from the top down; other than when debugging errors, the nuts and bolts
at the bottom are irrelevant. But such discussions serve to paint the
picture of "technical" "discussions" where technical discussion are
actually quite absent. And finally, if the information is disputed,
the incumbent few attack the new information, so there is a strong
protection of the incumbent incompetents, and thus the superficial and
incorrect information.

Re the ISUG thread on datachange(). It is not reasonable for a
technical person to want a function to work a certain way, and then
decide it is useless (in all circumstances) because it doesn't do what
they want. Datachange() works exactly as documented. Sure, it is a
first cut, and enhancements can be identified, but you are better off
understanding what the function does, what it was intended for, and how
to use (that's how technology works). The thread provides false and
incorrect information, Kevin alludes to that in the thread.

How can I know if datachange will be right. Do other users it often?

I have been using it on 85gb and 550gb databases, for daily execution
(rolling, so that each partition is updated once in seven days), e.g.
testing specifically the ability to constrain inserts into one
partition, and subsequently run:
- update index stats table [partition] and
- update index stats index [partittion]
without event. It is the [only] way to go if you want to reduce the
update stats duration, by running it only when there is a enough change
(which you decide/quantify yourself). I have upgraded my routine
update stats scripts to include inspection of datachange() and drive
the update stats (although I have not released them to all production
sites yet).

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
"Patient, normalise thyself"

Sherlock, Kevin [TeamSybase]

2009-02-17 17:43:08 UTC

Permalink

Post by unknown
I'm version ASE 15.0. But to much update statistics run
very long. Also when all statistics except only indexes? I
doubt think datachange works when I read on ISUG list. How
can I know if datachange will be right. Do other users it
often?

http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

Post by Sherlock, Kevin [TeamSybase]
In my opinion, I'm a bit skeptical of the "automating
update stats" using datachange(), but that's just my
opinion. You might find that it solves your problem.

when I need to.

unknown

2009-02-17 21:24:15 UTC

Permalink

Yes, however i can

Adaptive Server Enterprise/15.0.2/EBF 15683 ESD#5/P/NT
(IX86)/Windows 2000/ase1502/2528/32-bit/OPT/Tue Jun 17
09:13:11 2008

Do you have agree with Derek that ISUG group is bad about
datachage?

ASE 15.0. But to much update statistics run very long.
Also when all statistics except only indexes? I doubt
think datachange works when I read on ISUG list. How
can I know if datachange will be right. Do other users

it often? >

Post by Sherlock, Kevin [TeamSybase]
What version of ASE?
IF "update statistics" does indeed solve your

performance >> problems, you're probably better off just
scheduling an >> "update stats" script to run every
day/week/month or >> however often you find that it's
necessary. >>

Post by Sherlock, Kevin [TeamSybase]
On the other hand, depending on your version of ASE,

some >> might suggest automating this process using the

Post by Sherlock, Kevin [TeamSybase]
"datachange()" function and the built-in job scheduler
for ASE. I think there are details of how to do that

http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

Post by Sherlock, Kevin [TeamSybase]
In my opinion, I'm a bit skeptical of the "automating
update stats" using datachange(), but that's just my
opinion. You might find that it solves your problem.

Post by unknown
I'm hoping to find someone who tell me when I can

expect >> > to must run update statissics. We have system
of orders >> > where many data comes to the tables and
over millions of >> > each day. Now my stored procedures
are slow performance >> > and some say update statistics
will work. How do I know >> when I need to.

Sherlock, Kevin [TeamSybase]

2009-02-17 21:52:19 UTC

Permalink

I clearly can't and won't speak for Derek. He would say the same for me as
well I'm sure. All I can say is that from my experience on the topic, the
"datachange()" function works as documented.

Like any web/internet/public forum, there is both good and not so good
information given out. I find both these forums and the ISUG forums HIGHLY
useful and full of many great contributors, technical advice and sharing of
experiences. Both forums have a wide range of information. Your best
advice is to test any recommendations on your own environment and judge the
"usefullness" of them on your own. Most everything I've ever learned of ASE
and other products have come from the fine people past and present in these
newsgroups, ISUG, and even the old comp.database.sybase usenet forum (those
were the days...).

As for your version, you have Windows 15.0.2 ESD5. It would be best to post
this level of detail whenever someone asks what version you have.

The reason I asked for your version is that in this and recent ASE releases,
update statistics on columns which belong to an index, but are not the FIRST
column of the index, ASE will scan the smallest index which contain that
column, rather than scan the base table. This means that fewer pages are
needed to update statistics for "inner" columns of composite indexes.

So, let's say you have 3 indexes on a table like:

create index index_one on mytable( a, b, c )
create index index_two on mytable( d, c )
create index index_three on mytable ( d, b, c, a )

It would be a mistake to have a script like this:

update index statistics mytable index_one
update index statistics mytable index_two
update index statistics mytable index_three

A mistake because statistics are gathered for column "c" three times,
columns "a","b","d" twice each. Worse was that in previous versions of ASE,
non-leading columns for each "update index statistics" command were scanned
by the accessing the base table (data pages). For more recent versions like
yours, the smallest index is used to gather stats.

Some might argue that it is best to use

update index statistics mytable

That would be equivalent to a serial execution of:

update statistics mytable index_one
update statistics mytable index_two
update statistics mytable index_three
update statistics mytable(b)
update statistics mytable(c)

for the last two commands, the smallest index is used to scan all values of
"b", and "c" respectively.

If you have multiple engines (say 6 engines?), then you're might be better
off separating this into 5 concurrent executions of each of the above
statements so that ASE keeps 5 engines busy working maximizing your
throughput and making update stats job run MUCH faster.

My point is that you should experiment with various combinations of
concurrent "update statistics" jobs to see if you can accomplish more (and
faster) with multiple engines instead of serially executing "update
statistics" table by table until you get through a list of tables.

Post by unknown
Yes, however i can
Adaptive Server Enterprise/15.0.2/EBF 15683 ESD#5/P/NT
(IX86)/Windows 2000/ase1502/2528/32-bit/OPT/Tue Jun 17
09:13:11 2008
Do you have agree with Derek that ISUG group is bad about
datachage?

it often? >

Post by Sherlock, Kevin [TeamSybase]
What version of ASE?
IF "update statistics" does indeed solve your

performance >> problems, you're probably better off just
scheduling an >> "update stats" script to run every
day/week/month or >> however often you find that it's
necessary. >>

Post by Sherlock, Kevin [TeamSybase]
On the other hand, depending on your version of ASE,

some >> might suggest automating this process using the

Post by Sherlock, Kevin [TeamSybase]
"datachange()" function and the built-in job scheduler
for ASE. I think there are details of how to do that

http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

Post by Sherlock, Kevin [TeamSybase]
In my opinion, I'm a bit skeptical of the "automating
update stats" using datachange(), but that's just my
opinion. You might find that it solves your problem.

Post by unknown
I'm hoping to find someone who tell me when I can

Derek Asirvadem

2009-02-17 23:48:30 UTC

Permalink

On 2009-02-18 08:52:19 +1100, "Sherlock, Kevin [TeamSybase]"
Like any web/internet/public forum, there is both good and not so good
information given out. I find both these forums and the ISUG forums HIGHLY
useful and full of many great contributors, technical advice and sharing of
experiences.

So what did you mean, when you posted the following re the
misinformation on the ISUG list re the datachange() function (which BTW

I think it's irresponsible to spread rumors here about bugs that may or
may not exist, unless you can prove it with empirical data. My
inclination here is to assume that you don't really understand how
datachange works, and therefore, you conclude that it "does not work
properly". That may be misleading others who are trying to understand
how to use it here.

--
Regards
Derek Asirvadem
Director / Senior Sybase DBA / Information Architect
Sybase BSA [OEM/VAR] Partner
Copyright © 2008 Software Gems Pty Ltd

Derek Asirvadem

2009-02-18 00:43:42 UTC

Permalink

On 2009-02-18 08:52:19 +1100, "Sherlock, Kevin [TeamSybase]"
Some might argue that it is best to use
update index statistics mytable
update statistics mytable index_one
update statistics mytable index_two
update statistics mytable index_three
update statistics mytable(b)
update statistics mytable(c)

1 Agreed, on that specific technical point. In this particular
example (which is not "normal" since columns appear in more than one
index, but let's treat it as an example), what is your response to:
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)

2 Given the many items of documentation for 15.0 re use of "update
index stats" over "update stats", and the fact that unless people
really knew what they were doing in pre-15.0 versions, they did not
update stats of non-leading columns, do you agree with the general rule
'use of "update index stats" over "update stats" ' ? Now please do not
go into "It Depends". I am fully aware of the relevance of bad
indices, poor indices, missing indices, duplicate indices; however this
is a newsgroup and people are hungry for information they can actually
use, rather than "It Depends" which keeps them in a state of confusion
about the issue.

I have stated the condition "Assuming that the db is reasonably
normalised, and the correct indices have been created" so that we are
not addressing the flotsam and jetsam of data heaps. Obviously I know
the answer, or more precisely, I have my own experience and I have
concluded my own general rules, notably because I have a suite of
scripts that run on several customer sites, on databases that I do not
control. To be clear, I am not looking for an argument or debate, but
it would be nice to give the readership a bit of closure on this
technical issue.

3 If we progress that line of inquiry, we might get to a point where
we can address the original question, re datachange() in more detail.

(this could be a good thread.)

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
--
Tired of databases that are more cost than benefit ? Wondering why you
cannot get Sybase performance from Sybase ? Find out

Sherlock, Kevin [TeamSybase]

2009-02-18 20:57:47 UTC

Permalink

1 Agreed, on that specific technical point. In this particular example
(which is not "normal" since columns appear in more than one index, but
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)

My only response to that is that it's not congruous to "update index
statistics mytable".

Derek Asirvadem

2009-02-18 23:59:33 UTC

Permalink

On 2009-02-19 07:57:47 +1100, "Sherlock, Kevin [TeamSybase]"

Post by Sherlock, Kevin [TeamSybase]
update index statistics mytable index_one
update index statistics mytable index_two
update index statistics mytable index_three
A mistake because statistics are gathered for column "c" three times,
columns "a","b","d" twice each. Worse was that in previous versions of ASE,
non-leading columns for each "update index statistics" command were scanned
by the accessing the base table (data pages). For more recent versions like
yours, the smallest index is used to gather stats.
B Some might argue that it is best to use
update index statistics mytable
update statistics mytable index_one -- ( a, b, c ) --> (a) only
update statistics mytable index_two -- ( d, c ) --> (d) only
update statistics mytable index_three -- ( d, b, c, a ) --> (d) only, again
update statistics mytable(b)
update statistics mytable(c)

C Agreed, on that specific technical point. In this particular example
(which is not "normal" since columns appear in more than one index, but
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)

My only response to that is that it's not congruous to "update index
statistics mytable".

Sorry, I do not understand what you mean by 'congruous' in this
context. AFAIK, the stats are column-level and normalised; the
Optimiser does not care about column_a + column_b + column_c stats.
The 'update index stats' results in 'update stats table (column) x N'
anyway. The difference in performance of the no and duration of
'updates stats' executions is obvious, but is anything lost by updating
the stats for the four columns separately (four executions, no
repetitions) vs update index stats mytable' (five executions, one
repetition) ? What is the relevance of the missing congruence ?

Is [C] not the best option (better than [B] and nothing is lost) ?

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
--
Feel free to drop me an email, if you which to discuss anything offline

Sherlock, Kevin [TeamSybase]

2009-02-19 17:53:40 UTC

Permalink

Sorry, I do not understand what you mean by 'congruous' in this context.

congruous = exhibiting a harmony of parts

That is, when you break down ALL of the parts of "update index statistics
mytable", they are not the same as "update statistics mytable(column) x N"

AFAIK, the stats are column-level and normalised; the Optimiser does not
care about column_a + column_b + column_c stats.

?? I think you are saying that "column group" statistics aren't ever used
by the optimizer. If so, I'd say that's not correct.

Post by Sherlock, Kevin [TeamSybase]
update statistics mytable index_one -- ( a, b, c ) --> (a) only
update statistics mytable index_two -- ( d, c ) --> (d) only
update statistics mytable index_three -- ( d, b, c, a ) --> (d) only, again
update statistics mytable(b)
update statistics mytable(c)

The 'update index stats' results in 'update stats table (column) x N'
anyway. The difference in performance of the no and duration of 'updates
stats' executions is obvious, but is anything lost by updating the stats
for the four columns separately (four executions, no repetitions) vs
update index stats mytable' (five executions, one repetition) ? What is
the relevance of the missing congruence ?
Is [C] not the best option (better than [B] and nothing is lost) ?

Um. The reference to the lettering can be a little confusing. Here is what
i think you mean by all the letters:

[A] = "update index statistics mytable"
[B] =
update statistics mytable index_one -- ( a, b, c )
update statistics mytable index_two -- ( d, c )
update statistics mytable index_three -- ( d, b, c, a )
update statistics mytable(b)
update statistics mytable(c)
[C] =
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)

Here is how I understand the basic parts of operations; for example:

update statistics mytable(d) ->
scan smallest index or table for column "d" values
create worktable in tempdb
sort worktable
write histogram and column density stats for "d"

update statistics mytable index_two ->
scan leaf level of index_two -
create histogram and column density stats for "d"
create column group density stats for (d,c)
update index level stats for "avg row size", "index level", "cr counts",
etc

So there are two big differences between [B (my breakdown)] and [C (update
stats on each column)] -

1. column group densities, and index level statistics are not updated with
[C]
2. method [C] will ALWAYS require a separate worktable and sorting
operation regardless of the column being updated. As you noted, this is
just a performance difference, but no difference in terms of the end result
per column.

Even if you don't necessarily care about the column group densities and
index level stats being current, then this is still faster than [B] so as to
avoid the performance penalty of (2) above:

[D] (modified [B] disregarding a need for column group densities and
up-to-date index level stats)
---------------
update statistics mytable index_one -- ( a, b, c ) --> (a) only
update statistics mytable index_two -- ( d, c ) --> (d) only
update statistics mytable(b)
update statistics mytable(c)

And noting that even though you don't "want" the column group stats, you
still get them here (for free!) from the first two commands which are a
faster gathering of column stats (avoids tempdb io) for columns "a" and "d".
However, [D] is still not the same as [A] (update index statistics mytable).

There are of course other "equivalencies" to consider too, like:

update index statistics mytable index_one -- ( a, b, c ) --> (a),(b),(c)
histograms serially
update statistics mytable index_two -- ( d, c ) --> (d) only

The interesting part of the above is that the histogram for (c) comes from
index_two (assuming it's smaller than index_one).

Bottom line is that the only "optdiag output" difference between updating
stats on the columns separately as opposed to using the "update statistics
mytable <index>" syntax is the lack of the column group stats, and
up-to-date index level stats (those that are _not_ flushed to systabstats
periodically).

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
--
Feel free to drop me an email, if you which to discuss anything offline

Derek Asirvadem

2009-02-20 14:20:18 UTC

Permalink

On 2009-02-20 04:53:40 +1100, "Sherlock, Kevin [TeamSybase]"

AFAIK, the stats are column-level and normalised; the Optimiser does not
care about column_a + column_b + column_c stats.

?? I think you are saying that "column group" statistics aren't ever used
by the optimizer. If so, I'd say that's not correct.

No, the stats are normalised, the column stats not duplicated in the
column group stats, they are referred to.

To be more precise, with relevance to the thread (we are not discussing
update stats outside the context of this thread). Of course the
Optimiser cares about and uses column group stats ... but we as DBAs do
not have to worry about that, because the column stats are stored once
(and grouped as required); we only have to worry about keeping column
stats up-to-date

Um. The reference to the lettering can be a little confusing. Here is what
[A] = "update index statistics mytable"
[B] =
update statistics mytable index_one -- ( a, b, c )
update statistics mytable index_two -- ( d, c )
update statistics mytable index_three -- ( d, b, c, a )
update statistics mytable(b)
update statistics mytable(c)
[C] =
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)
update statistics mytable(d) ->
scan smallest index or table for column "d" values
create worktable in tempdb
sort worktable
write histogram and column density stats for "d"
update statistics mytable index_two ->
scan leaf level of index_two -
create histogram and column density stats for "d"
create column group density stats for (d,c)
update index level stats for "avg row size", "index level", "cr counts",
etc

(exactly. now we are progressing the thread)

So there are two big differences between [B (my breakdown)] and [C (update
stats on each column)] -
1. column group densities, and index level statistics are not updated with
[C]

Almost there on this one. Not worrying about the column group stats
(as that is a late introduction), but staying with the thread which is
the three indices, and staying with the example give (rather than
introducing a new/more complex example): what is the resulting (on the
queries) performance effect of NOT updating the index level stats WHILE
having up-to-date column stats for all columns in the index ? These
are retained in memory anyway and flushed periodically.

2. method [C] will ALWAYS require a separate worktable and sorting
operation regardless of the column being updated. As you noted, this is
just a performance difference, but no difference in terms of the end result
per column.

Agreed.

Even if you don't necessarily care about the column group densities and
index level stats being current, then this is still faster than [B] so as to

I am concerned with the performance of queries, not concerned too much
re performance of the update stats itself (good caching and sort
buffers alleviates that anyway). In any case, I would rather not cloud
the issue of getting resolution on the thread by concerning ourselves
with whether [1] or [2] is faster.

[D] (modified [B] disregarding a need for column group densities and
up-to-date index level stats)
---------------
update statistics mytable index_one -- ( a, b, c ) --> (a) only
update statistics mytable index_two -- ( d, c ) --> (d) only
update statistics mytable(b)
update statistics mytable(c)
And noting that even though you don't "want" the column group stats, you
still get them here (for free!) from the first two commands which are a
faster gathering of column stats (avoids tempdb io) for columns "a" and "d".
However, [D] is still not the same as [A] (update index statistics mytable).
update index statistics mytable index_one -- ( a, b, c ) --> (a),(b),(c)
histograms serially
update statistics mytable index_two -- ( d, c ) --> (d) only
The interesting part of the above is that the histogram for (c) comes from
index_two (assuming it's smaller than index_one).
Bottom line is that the only "optdiag output" difference between updating
stats on the columns separately as opposed to using the "update statistics
mytable <index>" syntax is the lack of the column group stats, and
up-to-date index level stats (those that are _not_ flushed to systabstats
periodically).

See, now it seems, just in that last para, that you understand my
intent (as posted from the top, I was looking for categorical answers
to the posted questions, and not to other questions, because I am
looking at the value of upgrading my scripts [not-database-specific]
which affect many servers).

So B is marginally better than C, in terms of effect, for the reasons
explained above; and B may be faster than C in terms of the
performance/duration itself.

--
Regards
Derek Asirvadem
Director / Senior Sybase DBA / Information Architect
Sybase BSA [OEM/VAR] Partner
Copyright © 2008 Software Gems Pty Ltd

Sherlock, Kevin [TeamSybase]

2009-02-21 01:45:55 UTC

Permalink

So now we've successfully hijacked this thread :). Did you want to get into
datachange()? If so, I'll suggest that we start a new thread. I think
datachange is a VERY interesting topic, and probably worthy of some kind of
white paper, or TW submission when the call for papers comes out.

See, now it seems, just in that last para, that you understand my intent
(as posted from the top, I was looking for categorical answers to the
posted questions, and not to other questions, because I am looking at the
value of upgrading my scripts [not-database-specific] which affect many
servers).
So B is marginally better than C, in terms of effect, for the reasons
explained above; and B may be faster than C in terms of the
performance/duration itself.
--
Regards
Derek Asirvadem
Director / Senior Sybase DBA / Information Architect
Sybase BSA [OEM/VAR] Partner
Copyright © 2008 Software Gems Pty Ltd

Derek Asirvadem

2009-02-21 09:30:25 UTC

Permalink

On 2009-02-21 12:45:55 +1100, "Sherlock, Kevin [TeamSybase]"

Post by Sherlock, Kevin [TeamSybase]
So now we've successfully hijacked this thread :). Did you want to get into
datachange()? If so, I'll suggest that we start a new thread. I think
datachange is a VERY interesting topic, and probably worthy of some kind of
white paper, or TW submission when the call for papers comes out.

Right.

Due to my testing as described, I can vouch for datachange() working as
documented; it is certainly useful as a method of determining when
update stats should be run, for the community as a whole, which is why
I get upset with the misinformation on the other site, and why we have
to correct same, maintain technical accuracy, on this professional site.

Actually, I have a method of determining if and when update stats
should be run, for both databases and data heaps, which pre-dates
datachange() by many years, which is part of our proprietary scripts,
so I cannot post it. Obviously the intent there is high performance
and minimising the maintenance inwdow. It completely validates your
intent, though. At this stage I have no plan to replace our
determination method with datachange() because the latter produces info
in more primitive state and it has limitations (eg. when cleared/reset)
that ours does not have, and ours provides actual data change per
table/partition/column vs previously stored data. Plus it is
integrated with the stored data, with more than one dependency.

What I am saying is:
- excellent and relevant subject, go for it with WP or TW pres
- I do not want to start a new thread, but go right ahead
- happy to participate in a thread or to correct misinformation but I
do not intend to be a datachange() special-ist (for reasons above) ...
I may not be a good counterpart
- I would like that last small question on this thread closed

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
--
Quality Standards = Zero Maintenance + Zero Surprises
Performance Standards = Predictability + Scaleability

Sherlock, Kevin

2009-02-25 01:56:33 UTC

Permalink

Almost there on this one. Not worrying about the column group stats (as
that is a late introduction), but staying with the thread which is the
three indices, and staying with the example give (rather than introducing
a new/more complex example): what is the resulting (on the queries)
performance effect of NOT updating the index level stats WHILE having
up-to-date column stats for all columns in the index ?

If you start with the assumption that column group stats don't make any
difference, then there is no difference in "performance", but I don't like
the premise here. Column group stats _can_ make a difference so they belong
in this "thread" when describing the end effect of one series of "update
stats" commands versus another. Otherwise I don't know where you're headed
with this line of thought. If you find that you don't ever need statistics
other than single column level densities and histograms, then by all means,
you only need to use that flavor of update stats. I don't believe that it
is a universal thing though. Every site is different.

These are retained in memory anyway and flushed periodically.

Not all of the index/table level statistics are flushed periodically to
systabstats. There are some that are not updated until you run "update
statistics".

Derek Asirvadem

2009-02-25 05:05:17 UTC

Permalink

On 2009-02-25 12:56:33 +1100, "Sherlock, Kevin"
If you start with the assumption that column group stats don't make any
difference, then there is no difference in "performance", but I don't like
the premise here. Column group stats _can_ make a difference so they belong
in this "thread" when describing the end effect of one series of "update
stats" commands versus another. Otherwise I don't know where you're headed
with this line of thought.

I wasn't saying that column-group stats are not important, not at all.
I was separating them out, in order to clarify the progressing
discussion, and to get closure on this point without complications.
Which we have done now, thanks.

End goal is smallest maintenace window, which means shortest execution
of update stats, which means the smallest number to cover indexed
columns. (Every site is different, to a degree, so I will not make a
statement that we do or do not need column-groups stats.)

Therefore if column-group stats need to be included, the following (as

Post by Sherlock, Kevin [TeamSybase]
B Some might argue that it is best to use
update index statistics mytable
update statistics mytable index_one -- ( a, b, c )
update statistics mytable index_two -- ( d, c )
update statistics mytable index_three -- ( d, b, c, a )
update statistics mytable(b)
update statistics mytable(c)
C In this particular example (which is not "normal" since columns
update statistics mytable(a)
update statistics mytable(b)
update statistics mytable(c)
update statistics mytable(d)

And [A] remains a mistake because it misses ( b ) and ( c ).

Derek Asirvadem

2009-02-18 01:17:21 UTC

Permalink

On 2009-02-18 08:52:19 +1100, "Sherlock, Kevin [TeamSybase]"
I find both these forums and the ISUG forums HIGHLY
useful and full of many great contributors, technical advice and sharing of
experiences. Both forums have a wide range of information.

Of course, all of us have a right to our own opinions, but I have to
say, I strongly disagree with that. Could you please provide one
reference to "highly useful" technical information that you have found
on ISUG, say in the last year.

----------

I should have mentioned in my previous post, there is a great
difference between the ISUG list and this news group. The Sybase
public newsgroup is:
- supplied and maintained by a commercial organisation, Sybase
- who remain answerable
- who have a product
- for which it is in their interest to foster the exchange (note, that
is two-way) of technical information
- they maintain the service
- they have declared guidelines or rules
- they provide a formal moderation service, which constrains posts to
those guidelines
- this eliminates ad hominem attacks, defamation, and inappropriate posts
- people here are free to contribute without being attacked, the
technical informtion gets corrected and improved in an organic manner
- the newsgroup is therefore "honest"

The ISUG list is (as evidenced in their written posts/responses, and I
will post evidence of, if required):
- supplied and maintained by an "independent" organisation
- who openly state in writing, they are answerable to no one (they run
their own banana republic)
- they have no product
- they provide an information monologue (note one-way) and actively bar
responses to "technical" "information", thus the "discussion" is
between the incumbent few ONLY
- their declared guidelines are loose and less than the Sybase guidelines
- they do not maintain the list or enforce the declared guidelines (in
spite of complaints from many people, there is no enforcement or
moderation)
- instead they have an active and evidenced form of censorship (as
distinct from moderation), they caution and suspend people who do not
break the rules, and do nothing about people who break the rules
hundreds of times
- this state of affairs fosters ad hominem attacks, defamation, and
inappropriate posts, of which the list is rife; and does NOT foster
"exchange of technical information"
- the newsgroup is therefore dishonest (there is no "free exchange of
technical information")
- the newsgroup is the de facto presentation platform of the few (just
read the posts, the incumbent few openly demonstrate their ownership of
their blogsite, and hilariously speak for the entire membership)
- some people who are evidently totally ignorant of the technical
subject matter, openly use the blogsite to obtain new information for
the "books" they are writing; they learn as they go, which is an abuse
of the service
- evidenced theft and plagiary is rife
- in two years of reading the list, I have not noticed any
contributions of value (for $75 USD), or any closure of "technical"
"discussions" (this excludes the few, very occasional posts by
respectable people such as Kevin S, Mark K, Joe W, Rob V)

ISUG therefore does a great disservice to both Sybase and the ISUG
membership. Gratefully, they do not affect, and have no idea about,
the Sybase community, which is present on this newsgroup.

Most honest technical people want resolution, not argument; without
honesty and a fostered two-way discussion (as opposed to voluminous
monologue), there is no closure, just superficial info-mercials. Good
for selling books and getting new memberships (there are very few old
members).

--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
--
With the financial meltdown, consolidating many databases into one ASE
server and managing mixed load is a demand. Ask people who have been
doing it for years.

unknown

2009-02-18 19:31:49 UTC

Permalink

Thanks to you and Derek for this kinds of help. I will
repeat this to my data teams for input. I like to know that
datachange will help me when to run statistics, but I can't
understand it yet. I like ISUG lists to.

Post by Sherlock, Kevin [TeamSybase]
I clearly can't and won't speak for Derek. He would say
the same for me as well I'm sure. All I can say is that
from my experience on the topic, the "datachange()"
function works as documented.
Like any web/internet/public forum, there is both good and
not so good information given out. I find both these
forums and the ISUG forums HIGHLY useful and full of many
great contributors, technical advice and sharing of
experiences. Both forums have a wide range of
information. Your best advice is to test any
recommendations on your own environment and judge the
"usefullness" of them on your own. Most everything I've
ever learned of ASE and other products have come from the
fine people past and present in these newsgroups, ISUG,
and even the old comp.database.sybase usenet forum (those
were the days...).
As for your version, you have Windows 15.0.2 ESD5. It
would be best to post this level of detail whenever
someone asks what version you have.
The reason I asked for your version is that in this and
recent ASE releases, update statistics on columns which
belong to an index, but are not the FIRST column of the
index, ASE will scan the smallest index which contain that
column, rather than scan the base table. This means that
fewer pages are needed to update statistics for "inner"
columns of composite indexes.
create index index_one on mytable( a, b, c )
create index index_two on mytable( d, c )
create index index_three on mytable ( d, b, c, a )
update index statistics mytable index_one
update index statistics mytable index_two
update index statistics mytable index_three
A mistake because statistics are gathered for column "c"
three times, columns "a","b","d" twice each. Worse was
that in previous versions of ASE, non-leading columns for
each "update index statistics" command were scanned by
the accessing the base table (data pages). For more
recent versions like yours, the smallest index is used to
gather stats.
Some might argue that it is best to use
update index statistics mytable
update statistics mytable index_one
update statistics mytable index_two
update statistics mytable index_three
update statistics mytable(b)
update statistics mytable(c)
for the last two commands, the smallest index is used to
scan all values of "b", and "c" respectively.
If you have multiple engines (say 6 engines?), then you're
might be better off separating this into 5 concurrent
executions of each of the above statements so that ASE
keeps 5 engines busy working maximizing your throughput
and making update stats job run MUCH faster.
My point is that you should experiment with various
combinations of concurrent "update statistics" jobs to
see if you can accomplish more (and faster) with multiple
engines instead of serially executing "update statistics"
table by table until you get through a list of tables.
i can >

Post by unknown
Adaptive Server Enterprise/15.0.2/EBF 15683 ESD#5/P/NT
(IX86)/Windows 2000/ase1502/2528/32-bit/OPT/Tue Jun 17
09:13:11 2008
Do you have agree with Derek that ISUG group is bad
about datachage?

version >> > ASE 15.0. But to much update statistics run
very long. >> > Also when all statistics except only
indexes? I doubt >> > think datachange works when I read
on ISUG list. How >> > can I know if datachange will be
right. Do other users >> it often? >

Post by unknown

Post by Sherlock, Kevin [TeamSybase]
What version of ASE?
IF "update statistics" does indeed solve your

performance >> problems, you're probably better off

just >> scheduling an >> "update stats" script to run
every >> day/week/month or >> however often you find that
it's >> necessary. >>

Post by unknown

Post by Sherlock, Kevin [TeamSybase]
On the other hand, depending on your version of ASE,

some >> might suggest automating this process using

the >> >> "datachange()" function and the built-in job
scheduler >> >> for ASE. I think there are details of how

http://infocenter.sybase.com/help/topic/com.sybase.dc00743_1500/html/qp_abstrpln/CIHJHHFD.htm

Post by Sherlock, Kevin [TeamSybase]

Post by unknown

Post by Sherlock, Kevin [TeamSybase]
In my opinion, I'm a bit skeptical of the

"automating >> >> update stats" using datachange(), but
that's just my >> >> opinion. You might find that it
solves your problem. >> >>

Post by unknown

Post by Sherlock, Kevin [TeamSybase]

Post by unknown
I'm hoping to find someone who tell me when I can

expect >> > to must run update statissics. We have

system >> of orders >> > where many data comes to the
tables and >> over millions of >> > each day. Now my
stored procedures >> are slow performance >> > and some
say update statistics >> will work. How do I know >> when
I need to. >> >>