8. APS Data Harmonization Process

Upon receipt of the individual country level data by the Data Team, the data is cleaned, coded, and weighted to create a harmonized data set which ensures representativeness and consistency across all countries in the study.


Coding


After completing the data collection, each National Team submits the data in the pre-defined data input template provided by the GEM Data Team. A small number of questions require verbal or “open-ended” responses. These questions are translated by the survey firm and/or National Team and both native and English-language responses are submitted in the SPSS APS data file. 

The most important open-ended categories refer to the business activities of potential entrepreneurs. In preparing the data, the survey firms are responsible for providing the descriptions of the business activity reported for the start-ups, new or established firms, as well as firms receiving funding from informal investors. Each year the Data Team develops and implements a coding protocol to ensure that a single procedure is used to classify business activities across all countries. The International Standard Industry Classification (ISIC) provided by the United Nations (1990) is used for all sector coding.

Other coding includes re-categorizing text responses to several “other” options in the questionnaire. The GEM Data Team also recodes the education and income demographic categories into harmonized GEM variables.


Weighting


GEM aims at providing representative random samples for each country. Survey firms have the option of supplying sample case weights for all observations, developed such that proportions of different subgroups (gender and age, for example) match the most recent official data descriptions the population of a country. The basic objective of the weighting approach is to ensure that the APS sample data provides as close a match as possible to the adult population of the country along a range of key dimensions, which must include age and gender at a minimum, but may also include factors such as region, education level and urban/rural stratification.

If no weight is provided by the survey vendor, the weights will be computed by GEM based either on 1) age and gender, or, if the sample is stratified, on 2) age, gender and strata. No other weighting factors will be used. Therefore, if a team wishes to improve the precision of their weight variable by including other factors, the weight should be supplied by the team.

GEM calculates weights based on population statistics provided by the team taken from the official sources or, if not available, on US Census International Population Data. The final weights are adjusted to ensure that the average value of the case weights for each country is exactly one. The Census Population Estimates are published HERE.

Age has been categorized in five groups between 18-64 years. The age range of respondents varies substantially across national surveys, from as young as 14 to over 90 years in age. A set of weights has been developed from the adjustments based on standardized national population structure estimates for those who, being 18 to 64 years of age, qualify to be active in the labor force. Of the total sample, 99 percent of the weights are smaller than 3.4.

Loading...