Hi. I have some questions regarding to cluster analysis that I need to solve in order to continue with my current work. I hope someone can help me!
I want to study the knowledge level of 203 student over a specific mathematical topic. The knowledge was evaluated over a 7 questions questionnaire. The answers are categorized like ordinal variables, depending on the knowledge level showed by the students in each case.
What I have done so far and questions:
1) Firstly, I did a general description of the set of questions, analysing the frequency and percentages obtained in each category. After, I did a correlation analysis using Spearman (In theory some categories must be related)
Is this Ok as a first step? Spearman is the right choice?
2) My second objective is to establish knowledge profiles of the students, according to this variables. I have been reading in the literature and I am a bit lost in this step. I want to explore the data, but I don't want to define the number of cluster before hand. I have 203 students and 7 ordinal variables.
What kind of cluster analysis I have to use? I think it must be hierarchical, but I don't know if my samples are to big for it. If is hierarchical, I have to select cases or variables?
3) I know that I have to select a distance metric. In this case the variables are ordinal, I have to choose chi-squared or euclidean distance?. I have read that the when you work with ordinal variables its better to work with the variables as intervals, so chi-squared is the best choice
4) What is the best way to validate my cluster?
Sorry for making so many question, but I don't have anyone to ask.
PD: Sorry for my English. Clearly, is not my mother tongue, but I am improving day by day