ls /var/lib/cdsw/current/projects/projects/0/ 1 34 35 36 37 38 39 40 41 42 43 44 45 46 47 482. Run ‘cdsw status’ command on the master host to capture the DB pod id:
Sending detailed logs to [/tmp/cdsw_status_T8jRig.log] ... CDSW Version: [1.3.0:9bb84f6] OK: Application running as root check OK: Sysctl params check ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | STATUS | CREATED-AT | VERSION | EXTERNAL-IP | OS-IMAGE | KERNEL-VERSION | GPU | STATEFUL | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ericlin-xxx.xxx-1.com | True | 2018-07-11 03:30:45+00:00 | v1.6.11 | None | CentOS Linux 7 (Core) | 3.10.0-514.26.2.el7.x86_64 | 0 | False | | ericlin-xxx.xxx-2.com | True | 2018-07-11 03:30:32+00:00 | v1.6.11 | None | CentOS Linux 7 (Core) | 3.10.0-514.26.2.el7.x86_64 | 0 | True | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2/2 nodes are ready. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | etcd-ericlin-xxx.xxx-1.com | 1/1 | Running | 5 | 2018-07-11 03:31:44+00:00 | 172.26.12.157 | 172.26.12.157 | None | | kube-apiserver-ericlin-xxx.xxx-1.com | 1/1 | Running | 5 | 2018-07-11 03:30:31+00:00 | 172.26.12.157 | 172.26.12.157 | None | | kube-controller-manager-ericlin-xxx.xxx-1.com | 1/1 | Running | 5 | 2018-07-11 03:31:54+00:00 | 172.26.12.157 | 172.26.12.157 | None | | kube-dns-3911048160-30l05 | 3/3 | Running | 15 | 2018-07-11 03:30:45+00:00 | 100.66.128.1 | 172.26.12.157 | None | | kube-proxy-c4xk7 | 1/1 | Running | 4 | 2018-07-11 03:30:45+00:00 | 172.26.14.58 | 172.26.14.58 | None | | kube-proxy-k95s2 | 1/1 | Running | 5 | 2018-07-11 03:30:45+00:00 | 172.26.12.157 | 172.26.12.157 | None | | kube-scheduler-ericlin-xxx.xxx-1 .com | 1/1 | Running | 5 | 2018-07-11 03:31:57+00:00 | 172.26.12.157 | 172.26.12.157 | None | | node-problem-detector-v0.1-0624z | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 172.26.12.157 | 172.26.12.157 | None | | node-problem-detector-v0.1-b80tt | 1/1 | Running | 4 | 2018-07-11 03:32:15+00:00 | 172.26.14.58 | 172.26.14.58 | None | | weave-net-469fb | 2/2 | Running | 12 | 2018-07-11 03:30:45+00:00 | 172.26.12.157 | 172.26.12.157 | None | | weave-net-8dzx6 | 2/2 | Running | 10 | 2018-07-11 03:30:45+00:00 | 172.26.14.58 | 172.26.14.58 | None | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All required pods are ready in cluster kube-system. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | cron-1906902965-wzkp5 | 1/1 | Running | 2 | 2018-08-10 08:13:55+00:00 | 100.66.128.9 | 172.26.12.157 | cron | | db-1165222207-dg98q | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 100.66.128.6 | 172.26.12.157 | db | | engine-deps-1rvcl | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 100.66.128.4 | 172.26.12.157 | engine-deps | | engine-deps-njwlc | 1/1 | Running | 4 | 2018-07-11 03:32:15+00:00 | 100.66.0.5 | 172.26.14.58 | engine-deps | | ingress-controller-684706958-6fzh3 | 1/1 | Running | 5 | 2018-07-11 03:32:14+00:00 | 172.26.12.157 | 172.26.12.157 | ingress-controller | | livelog-2502658797-kmq4l | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 100.66.128.3 | 172.26.12.157 | livelog | | reconciler-2738760185-1nnsp | 1/1 | Running | 2 | 2018-08-10 08:13:55+00:00 | 100.66.128.2 | 172.26.12.157 | reconciler | | spark-port-forwarder-krtw6 | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 172.26.12.157 | 172.26.12.157 | spark-port-forwarder | | spark-port-forwarder-rbhc6 | 1/1 | Running | 4 | 2018-07-11 03:32:15+00:00 | 172.26.14.58 | 172.26.14.58 | spark-port-forwarder | | web-3320989329-7php0 | 1/1 | Running | 2 | 2018-08-10 08:13:55+00:00 | 100.66.128.7 | 172.26.12.157 | web | | web-3320989329-ms63k | 1/1 | Running | 5 | 2018-07-11 03:32:15+00:00 | 100.66.128.5 | 172.26.12.157 | web | | web-3320989329-zdpcj | 1/1 | Running | 2 | 2018-08-10 08:13:55+00:00 | 100.66.128.8 | 172.26.12.157 | web | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All required pods are ready in cluster default. All required Application services are configured. All required config maps are ready. All required secrets are available. Persistent volumes are ready. Persistent volume claims are ready. Ingresses are ready. Checking web at url: http://ericlin-xxx.xxx-1.com OK: HTTP port check Cloudera Data Science Workbench is ready!You can see that from above example, the DB pod ID is: db-1165222207-dg98q 3. Run below command to connect to CDSW database:
kubectl exec db-1165222207-dg98q -ti -- psql -U sense4. On prompt, run PostgreSQL to check which project has been delete (the one not in the DB)
sense=# SELECT id, user_id, name, slug FROM projects; id | user_id | name | slug ----+---------+----------------+---------------- 41 | 1 | Impala Project | impala-project 34 | 3 | Test | test 1 | 1 | Test | test 47 | 10 | tensortest | tensortest 44 | 9 | TestEnvVar | testenvvar 40 | 4 | hbase | hbase 36 | 2 | Scala Test | scala-test 46 | 1 | R Project | r-project 45 | 4 | spackshell | spackshell 37 | 4 | tim | tim 48 | 10 | rtest | rtest 35 | 1 | Scala Project | scala-project 39 | 5 | salim | salim 38 | 4 | timtest | timtest (14 rows)To put all above together, I have below quick shell script that can do the job:
for project_id in `ls /var/lib/cdsw/current/projects/projects/0/` do echo "Processing $project_id" rows=`kubectl exec $(cdsw status | grep 'db-' | cut -d '|' -f 2 | sed 's/ //g') -ti -- psql -U sense -c "SELECT * FROM projects WHERE ID = $project_id" | grep '0 row' | wc -l` if [ $rows -gt 0 ]; then echo "Project $project_id has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/$proejct_id" fi doneThe output looks like something below:
Processing project 1 Processing project 34 Processing project 35 Processing project 36 Processing project 37 Processing project 38 Processing project 39 Processing project 40 Processing project 41 Processing project 42 Project 42 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/ Processing project 43 Project 43 has been deleted, you can archive directory /var/lib/cdsw/current/projects/projects/0/ Processing project 44 Processing project 45 Processing project 46 Processing project 47 Processing project 48So, until there is a fix for this issue, I hope above simple shell script can help. Any suggestions or ideas, please let me know in the comments section below, thanks a lot in advance.
Can you please provide the cdsw installation steps with kubernetes/docker setup
Hi Rajesh,
The best way is to install CDSW using Cloudera Manager using Parcels. Instructions can be found here:
https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install_parcel.html
Hope above can be helpful.
Cheers
Eric