Hello everybody, I have a set of data which I want to analyse with SPSS. Unfortunately, 16.35 % of the data does not have information for the gender. I have noticed following:
- Female: 25.77 %
- Male: 57.88 %
- No gender: 16.35 %
Proportion of males of data with available gender information: 69.19 %
When predicting gender with a binary regression and saving the values one obtains an average for predicted gender of 0.702596488.
But to use binary data, I want to only have 0 (female) or 1 (male) for gender in the dataset. To achieve this, I exported the data with predicted gender probabilities to Excel. You can find the excel worksheet here: https://1drv.ms/x/s!AtNwspiFU1pNhK9b3Vh8tHcDUA71SA
• When using the Round function in Excel I get a column with following average: 0.97375 (nearly only males, which are coded as 1)
• To get a higher (and more realistic) proportion of females, I have used following formula: =IF(AH2>0.7, 1, 0). This yields an average of 0.589.
Now, I am wondering whether there is a different way in Excel to only have binary data but with a similar distribution (70 % males – 30 % females) of the data with existing gender information while using the results from the regression.
Thank you very much for your answer.