How to import ETER data into SPSS

SPSS is a widely used statistical program that is particularly useful for descriptive statistics thanks to its easy-to-use interface. SPSS has also a good scripting tool in order to automatize analyses. The goal of this section is to help our users to import in the most effective way ETER data in SPSS. You’ll find a SPSS script to run all these steps here (you just have to change the URL of the file to import at the beginning).

How to access the ETER data

The ETER database is accessible at https://www.eter-project.com. We recommend you always use the latest version downloaded from this URL since we are constantly updating and completing the data. We also recommend obtaining your credentials with the registration and then logging in to the ETER website. In this way, you will have access to the most detailed version of the database, including also the values behind several codes (such as “c” and “s”).

We want to make you also aware that there is also an extended version of ETER maintained by the RISIS2 project, which includes additionally the 2008 EUMIDA data and data on scientific publications, EU-FP projects and patents. This version is for research purposes only and data cannot be published individually. The extended version is available at: orgreg.joanneum.at.

You will need to register and accept the RISIS Charter of code of conduct to access to the data.

How to download the ETER data

Once logged in, you have to select “Search HEI Data” (on the top left of the main webpage).

On the next page (https://www.eter-project.com/#/search) you now have different options:

  1. Export All Variables for All Years and All Countries” (see the yellow button). Using the drop down menu, you can choose “Export All Variables”. A display and export settings box will appear.

You are advised to use ‘variable names’ as headers as the names are shorter and more standardised than the labels.

Then select an export format on the right, again with the drop-down menu, the option “Machine Ready (SPSS)”. In the end, you have to click on “Export” (in yellow) on your left. A .csv file will be downloaded.

  1. Select a sample of variables. To select them, you have to flag on the corresponding label; you may also select all the variables included in the section considered (such as Basic Institutional Descriptors or Geographic information). Then:
    1. If you want to download the selected variables for all years and all countries, you have to click again the yellow button “Export All Years and Countries” then, using the drop down menu, select “Export Selected Variables”. The Display Setting box will appear. As in the point 1, you have to check that the “variable name” on the left is flagged, then select an Export Format on the right, again with the drop down menu, the option “Machine Ready (SPSS)”. At the end, you have to click “Export” (in yellow) on your left. A .csv file will be downloaded.
    2. Otherwise, if you want to select the Country/Countries or the year/years you have to click on the green button “Select and continue”. In the next webpage you will choose years (“Select years” on the top left side) and country (“Select countries” on the top right side) then click on “Search HEIs”. The program will return all the data. Now you have to go to “Settings” and select variable names as header and the “Machine Ready (SPSS)” export option and Apply (yellow button). Finally, click on “Export data”, select from the dropdown menu “Export visible data” and wait for the .cvs download.

How to import the ETER file into SPSS

To import the file, select on the menu file > Import Data > CSV Data… and open the file you downloaded from ETER.

In the following dialogue box click “Advanced Options” to define the import settings.

In the following import screens you have to:

  • Select comma for the decimal symbol (screen 2).
  • In screen 4 select double quote as text delimiter.
  • In screen 5 select as ‘Percentage of values that determine automatic data format’ 100% (otherwise text fields with few entries, like remarks, might be interpreted as numeric).

All other settings should be already fine.

Go to the end, import the data and save the file.

Prepare your ETER file for analysis

Once imported, if you want to do calculations, you may have to face several additional matters related to the presence of non-numerical variables and special codes. They are summarized as follows.

  • Non-numerical variables: data for calculations must be numerical; if not (non-numerical – string – format), the whole column including the string variable cannot be used for computations. You can see the variable type in SPSS in the variable tab.

In ETER all variables are string due to the presence of special codes, like ‘m’ for missing. By selecting the SPSS export, special codes are automatically recoded into special numeric codes that are not found in the data. This procedure is of course applied only to the numerical variables.

The conversion is as follows:

  • a > -1000
  • m > -1001
  • x > -1002
  • xc > -1003
  • xr > -1004

You need however to tell SPSS that these are missing values. To do this, you can use the following command, which defines missing values for all numerical variables.

SET ERRORS=off.
DO REPEAT v=ALL.
missing values v (-1004 thru -1000).
END REPEAT.
SET ERRORS=on.
EXECUTE.

then select the menu command Run > All.

If you check in the variable view, you will see that now all numeric variables have these values as missing values.

  • Not applicable code. Moreover, the special code ‘a’ means ‘not applicable, for example for the number of PhD students when the HEI does not award a PhD. For many purposes, codes ‘a’ within numerical variables, such as staff, students and graduates, can be turned into ‘0’.

You already transformed the ‘a’ codes in numerical variables to ‘1000’, so the replacement is easy. You can of course apply this procedure to selected variables only.

Go to file > New > Syntax

Type the following text:

SET ERRORS=off.
DO REPEAT v=ALL.
RECODE v (-1000=0).
END REPEAT.
SET ERRORS=on.
EXECUTE.

then select the menu command Run > All.

Useful commands

These commands can be executed directly in the syntax menu as above.

*How to create a dummy variable for the legal status (where 0  will include both the original codes “0” and “2” representing the public/private government-dependent HEIs and 1 the private ones). This is useful since ETER includes few government-dependent institutes and these are in fact very similar to the public ones.

recode BAS.LEGALSTAT (2=0) (0=0) (1=1) INTO BAS.LEAGALSTAT.BIN.
execute

*How to generate a PhD/not PhD dummy. This syntax permits obtaining a dummy variable assuming the value “0” if the higher degree delivered are ISCED 5, 6 or 7 and the value 1 if the HEI has also the ISCED 8.

recode STUD.HIGHDEG (1=0) (2=0) (3=1)  INTO STUD.PHDAWARDING.

execute.

*How to fill in missing for publication and EU.FP data. Missing data mean that these institutions have not been identified in the source databases (Web of Science and EUPRO), then they can safely be set to ‘0’ for the analysis. This script applies only to the RISIS-ETER version of the database.

recode publications meannormalizedcitationscore (missing = 0)

How to reduce missing values in the Academic Staff. The following script permits filling missing values using the linear regression of FTE and HC for the Academic Staff.

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /ORIGIN
  /DEPENDENT STA.ACAFTETOTAL
  /METHOD=ENTER STA.TOTACAHC STUD.PHDAWARDING
   /SAVE PRED.
RECODE STA.ACAFTETOTAL (ELSE=Copy) INTO STA.ACADEFTETOTALNEW.
EXECUTE. 
DO IF  (STA.ACADEFTETOTALNEW=-1001).
RECODE PRE_1 (ELSE=Copy) INTO STA.ACADEFTETOTALNEW.
END IF.
EXECUTE.

You are now ready to use the full variety of the ETER dataset for your research. If you have questions about this post, please contact us at eter@eter-project.com. If you have additional questions on the ETER project, technical or not, do not hesitate to contact us.