[impala-daemon-host.com:21000] > SELECT COUNT(*) FROM sample_07; Query: SELECT COUNT(*) FROM sample_07 Query submitted at: 2018-09-14 15:57:35 (Coordinator: https://impala-daemon-host.com:25000) dQuery progress can be monitored at: https://impala-daemon-host.com:25000/query_plan?query_id=36433472787e1cab:29c30e7800000000 +----------+ | count(*) | +----------+ | 823 | +----------+ Fetched 1 row(s) in 6.68s [impala-daemon-host.com:21000] > PROFILE; <-- Simply run "PROFILE" as a query Query Runtime Profile: Query (id=36433472787e1cab:29c30e7800000000): Summary: Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf Session Type: BEESWAX .....You can also collect from Cloudera Manager Web UI, by navigating to CM > Impala > Queries, locate the query you just ran and click on “Query Details”




Query (id=36433472787e1cab:29c30e7800000000): Summary: Session ID: 443110cc7292c92:6e3ff4d76f0c5aaf Session Type: BEESWAX Start Time: 2018-09-14 15:57:35.883111000 End Time: 2018-09-14 15:57:42.565042000 Query Type: QUERY Query State: FINISHED Query Status: OK Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1) User: hive@XXXXXX.COM Connected User: hive@XXXXXX.COM Delegated User: Network Address: ::ffff:xxx.xx.xxx.xx:58834 Default Db: default Sql Statement: SELECT COUNT(*) FROM sample_07 Coordinator: impala-daemon-url.com:22000 Query Options (set by configuration): Query Options (set by configuration and planner): MT_DOP=0 Plan: ----------------Let’s break it into sections and walk through one by one. There are a few important information here that used more often: a. Query ID:
Query (id=36433472787e1cab:29c30e7800000000):This is useful to identify relevant Query related information from Impala Daemon logs. Simply search this query ID and you can find out what it was doing behind the scene, especially useful for finding out related error messages. b. Session Type:
Session Type: BEESWAXThis can tell us where the connection is from. BEESWAX means that the query ran from impala-shell client. If you run from Hue, the type will be “HIVESERVER2” since Hue connects via HiveServer2 thrift. c. Start and End time:
Start Time: 2018-09-14 15:57:35.883111000 End Time: 2018-09-14 15:57:42.565042000This is useful to tell how long the query ran for. Please keep it in mind that this time includes session idle time. So if you run a simple query that returns in a few seconds in Hue, since Hue keeps session open until session is closed or user runs another query, so the time here might show longer time than normal. The start and end time should match exactly the run time if run through impala-shell however, since impala-shell closes query handler straightaway after query finishes. d. Query status:
Query Status: OKThis tells if the query finished successfully or not. OK means good. If there are errors, normally will show here, for example, cancelled by user, session timeout, Exceptions etc. e. Impala version:
Impala Version: impalad version 2.11.0-cdh5.14.x RELEASE (build 50eddf4550faa6200f51e98413de785bf1bf0de1)This confirms the version that is used to run the query, if you see this is not matching with your installation, then something is not setup properly. f. User information:
User: hive@XXX.XXXXXX.COM Connected User: hive@XXX.XXXXXX.COM Delegated User:You can find out who ran the query from this session, so you know who to blame :). g. DB selected on connection:
Default Db: defaultNot used a lot, but good to know. h. The query that used to return this PROFILE:
Sql Statement: SELECT COUNT(*) FROM sample_07You will need this info if you are helping others to troubleshoot, as you need to know how query was constructed and what tables are involved. In lots of cases that a simple rewrite of the query will help to resolve issues or boost query performance. i. The impala daemon that is used to run the query, what we called the Coordinator:
Coordinator: impala-daemon-host.com:22000This is important piece of information, as you will determine which host to get the impala daemon log should you wish to check for INFO, WARNING and ERROR level logs. j. Query Options used for this query:
Query Options (set by configuration): Query Options (set by configuration and planner): MT_DOP=0This section tells you what kind of QUERY OPTIONS being applied to the current query, if there are any. This is useful to see if there is any user level, or pool level overrides that will affect this query. One example would be if Impala Daemon’s memory is set at, say 120GB, but a small query still fails with OutOfMemory error. This is the place you will check if user accidentally set MEM_LIMIT in their session to a lower value that could results in OutOfMemory error. This concludes the part 1 of the series to explain the Summary section of the query to understand the basic information. In the next part of the series, I will explain in detail on Query Plan as well as the Execution Summary of the PROFILE. Any comments or suggestions, please let me know from the comments section below. Thanks
Did you ever write Part 2 of this ?
Hi Navin,
Thanks for visiting my blog and post question about it. Unfortunately that I have not got time to do so for Part 2. Since you are asking for it, l will put my best effect to do it in the next couple of week.
Thanks for checking in and I will update here again once it is ready.
Cheers
Hi Navin,
Just to let you know that I have posted Part 2 of the series, you can find it below:
https://www.ericlin.me/2019/04/impala-query-profile-explained-part-2/
Hope that can be helpful. I am looking to continue writing it with part 3 and etc.
Cheers
Hi Navin,
I also wrote Part 3 a few months back, if you are interested, and you feedback will also be useful:
https://www.ericlin.me/2019/05/impala-query-profile-explained-part-3/
Cheers
Eric
Hello Eric. while researching some issue I stumbled upon this blog and learned a lot and loved the way you go in details. I noticed you have been using command profile on impala shell prompt just after executing the query. can I use the command by supplying query id as argument to fetch query profile of queries which ran in past? do you have similar command for downloading thrift encoded profile. any cloudera documentation where I can get more details on it. Thanks Neel
Hi Neel,
Apologies for the late reply. Since I have moved on from Cloudera, I am not actively monitoring this blog anymore. To answer your question, you can go to Impala Daemon’s web UI where you can locate the old queries and fetch query profile from there, or you can also get them from the Cloudera Manager UI. I am not sure if you can do that in the command line.
Cheers
Eric
Hi!
Thank you very much, it was interesting to read.