getting your OpenClinica data into SPSS

Capturing data is not a goal in itself, but is done to analyze the data. This can be done with several programs, such as SAS, R, Stata or SPSS. In this page we describe the process of creating a dataset, exporting the data in SPSS format and then importing them into SPSS. We will also discuss some caveats when designing your CRF's so you will not run into problems when you start analyzing.

All this is based on OpenClinica and SPSS 21. If you have questions about other versions of OpenClinica, feel free to ask us.

If you already know enough about datasets to create, extract and download your own, go to the SPSS-part of this story

creating and exporting a dataset

Before we can import data, we must first extract it from OpenClinica. Extracting data is described in the OpenClinica Reference guide and we recommend reading this, but below are also instructions on how to create a dataset and export the data in SPSS-format.

Exporting data is a two-step-process: the first step is defining a collection of items you want to include: this collection is called a dataset.
The second step is exporting this dataset in a format of your choice.

Step 1: create a dataset
Go to Tasks, Extract Data and click Create Dataset. In the introduction screen, click on button Proceed to Create a Dataset. In this screen you can choose per Event which CRF's will be included. You do this by clicking on the Events in the left column. All CRF's used in that Event will be displayed and by clicking on the CRF you can choose which Items will be in the dataset. This is very flexible, but we go for the quick result and click on the link Select All Items in Study

fig. 1: select all items for the dataset

As you can see, you can include several attributes of the Subjects, the Subject-Groups, the Events and the CRF's. All this information will be included in your dataset (except the Subject Group Attributes: at version this information will not make it in your dataset). And as you never know when you need this info, just tick all the boxes and your screen will look something like this:

fig. 2: select all attributes

Now we click the button Continue to Define Scope and this sounds impressive, but what is meant is that you can make a selection about which Subjects you would like to include in your dataset, filtered by EnrollmentDate. For now we leave this as it is and click Continue. Give your dataset a name plus a description. You can choose which CRF's should be in the extract: all CRF's, or only marked complete CRF's or only not marked complete.
"And what about this 'MetaDataVersion ODM': what is it and shouldn't we be doing something with that?" you might wonder. Well, considering that we will be extracting our data in SPSS-format, these fields are of no concern to us. (And even if we were extracting in ODM, whichever flavour, it still would be of no concern, only on very rare occasions.)

fig. 3: naming the dataset

Tip: if you're running your Study in an environment with many other Studies, then start the name of your dataset with the short ID of your Study. If you at a later stage want to schedule your extract, this will be very convenient for the OpenClinica-administrator to recognize your dataset.

The next time you want to extract a dataset, you do not need to create it: you can just use this dataset. Click Tasks, Extract Data, View Datasets and find your dataset. Then click on the rightmost icon and the screen Download Data will open and from there you can choose SPSS, etc., all as described below. This is of course provided that you did not change any CRF's and/or Events. If this is the case, create a new set or edit the existing one to accommodate the changes.

You may notice that your extracted set will be overwritten with the new one. If you do not want this, go to the server. In /tomcat/webapps/OpenClinica/WEB-INF/classes locate the file and uncomment the line

can we now, at last, start SPSS?

Not so fast: we only defined which items we want to use. Before we can get these into SPSS we must first export the items and to do that, we must first choose the format in which we want to have the data. Click the Run Now

fig. 4: choosing SPPS

You will see the message Your extract is running. You will receive an email and message when the extract is complete. If you think or know this won't be a long time, then click Back to Dataset and refresh the page a couple of times, until you see your dataset appear in the list of Archive of Exported Dataset Files. Otherwise, look at your inbox for the mail saying your extract has completed.

fig. 5: the dataset ready to download

Now when you click the white down-arrow the download will start of a zip-file named something like Save this in a convenient location, for example C:\OC\SPSS and when you've done that, open the zip and take the two files, SPSS_DAT.dat and SPSS_SPS.sps, and extract them in the same directory. SPSS_SPS.sps is a syntax file which contains the definition and structure for SPSS and SPSS_DAT.dat contains the data

fig. 6: the SPSS-folder with the extracted files

working with SPSS

You may be tempted to start SPSS and try directly to open the SPSS_DAT.dat file, because this is tab-delimited file and SPSS is perfectly able to load that. But much cleaner and easier is to open the SPSS_SPS.sps file, because this contains the syntax to:
1. read the data: GET DATA
2. give labels to the variables: VARIABLE LABELS
3. give, if applicable, labels to valuesVALUE LABELS

In SPSS select File, Open, Syntax. Browse to the location of the sps-file and select it.

fig. 7: opening a syntax-file

At the first line adjust the location and name and make it the correct name of your .dat file for example: change /FILE = 'SPSS_DAT.dat' into /FILE = 'C:\OC\SPSS\SPSS_DAT.dat'.

fig. 8: modifying the location of the dat-file

With the cursor somewhere in the GET DATA part of the syntax, click the big green Start-button to run the syntax. This will open two additional windows, so now you have 3 SPSS windows:
1. The Viewer with a log of the execution of the syntax.
2. The Syntax Editor window and
3. The Data Editor: this window has 2 views: Data View and Variable View

If there were no errors during the execution of the Get-Data-comand, you can switch to the Data Editor window and have a first look at the data: congratulations! But it gets even better than that. Switch back to the Syntax Editor window, put the cursor somewhere in the VARIABLE LABELS part and hit Run again. All your variables in SPSS will now have a label, as is shown in the Data Editor window, with tab Variable View activated. The label is taken from the column DESCRIPTION in your XL-sheet and not from the column LEFT_ITEM_TEXT.
As a last step we swicth back to the Syntax Editor window, put the cursor somewhere in the VALUE LABELS part and hit Run for the third time. This is for all CRF-items of type Radio or Single-select that have RESPONSE_OPTIONS and RESPONSE_VALUES. Without this bit of syntax, only the actual values will be shown in SPSS, so if you for example defined Yes=1 and No=0, then only the ones and zeros are shown. After running the syntax, Yes and No will be shown.

fig. 9: after applying the VALUE LABELS syntax

about E's and C's

You will see something particular about the Variable-names in SPSS: they all have an extension. For example an Item of CRF Lab, asked at Event Baseline, which is called in your CRF VISITEDATE will appear in SPSS as VISITDATE_E1_C1. And if the CRF is used in another Event, then there will also be a VISITDATE_E2_C1.

You will probably know which Events are meant with E1, E2 etc, but others may not. It is a good idea to include in your dataset the attributes of the Events, especially StartDate and Status. When you do this, you will have two variables in SPSS, with Variable-Labels, such as StartDate_E1 "Start Date For Baseline(E1)" / and EventStatus_E1 "Event Status For Visit1(E1)" / and this way it will be perfectly clear which Event is meant with E1.

Things get even more complicated with Items in a RepeatingItemsGroup: every occurrence has a number. So if a Subjects has data in 3 rows for item TestScore, then these will end up in the SPSS data as TestScore_E1_C1_1, TestScore_E1_C1_2 and TestScore_E1_C1_3.

And you probably guessed it: if there are Repeating Events, then we get an extra ordinal. For example there were two occurrences of Event E2 and in the first occurrence two TestScores were entered and in the second one three. This will result in: TestScore_E2_1_C1_1, TestScore_E2_1_C1_2 and TestScore_E2_2_C1_1, TestScore_E2_2_C1_2, TestScore_E2_2_C1_3.

caveats and tips

There are some things you should keep in mind when working with datasets.

First of all: be aware of the fact that OpenClinica exports only variables that have data. Therefore it is a good idea, before you start exporting data, to make a test site and enter data for as many test subjects needed to ensure you have data in all potential data elements. Don't forget to cover different scenarios when you've used SimpleConditionalDisplay. And if you have RepeatingItemGroups, fill in a realistic number of rows for your test Subjects. Also don't forget to create several occurrences of RepeatingEvents.

When repeating item groups are used, it may be wise to export the data for these separate from the rest of the data.

OpenClinica allows case-sensitive names, SPSS does not. This may lead to a situation of two items having perfectly different ItemNames for OpenClinica, but are identical for SPSS. All ItemNames in capitals can be a solution, but if you like case differences in items names for readabilty be aware of the SPSS requirement.

Incorrect use of the WidthDecimal may result in definitions of SPSS-variables with an impossible format, such as F3.6 which is a floating point variable of length 3, with 6 position behind the decimal separator. Even SPSS can't do that. You get a general error message 4 GET DATA (2265) Unrecognized or invalid variable format. The format is invalid. For numeric formats, the width or decimals value may be invalid. That's not very helpfull and you will have to search and search and search.

Maximum Item length in OpenClinica is 255 caracterst for an item name, SPSS only allows 64.

If all of your variables that are defined as REAL's appear as blanks in SPPS, there's probably something wrong with the setting of the decimal separator. For the correct use of decimals add the folowing line: SET LOCALE = '' run this command.

In case you have VALUE LABELS for Items in CRF's used in Repeating Events, then only for the first occurrence of the Event will the Value Label syntax be generated and not for the others. You must manually change RADIO_ITEM_E2_1_C6 to RADIO_ITEM_E2_1_C6, RADIO_ITEM_E2_2_C6

fig. 10: VALUE LABELS syntax for Repeating Events

In Case the SPSS syntax shows the word BLANKS, do a search and replace and remove these.

Other how-to-pages can be found here.

this page was last reviewed November 2013