添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Hello,
I am currently facing three questions regarding the preserve and restore commands. I would be thankful for any clarifications on this.
(1) My first question is regarding example 1 in the STATA manual preserve.pdf . What exactly is the need for preserve in this example? If the user does not need the data back for further analysis, once she has the results, when she closes her dataset, the dataset will return to its original form unless she decides to save the new data.
(2) On page (1), under options, the manual states "preserve instructs restore to restore the data now, but not to cancel the restoration of the data again at program conclusion. If preserve is not specified, the scheduled restoration at program conclusion is canceled." What does this sentence imply? For every command, would we always not use the preserve command first and then the restore command.
(3) For my code, I am starting by cleaning the data. After I clean the data, I need to start doing different analysis that require the same clean data. I do not want to clean the data every time I work with different research questions, so I am using the preserve and restore commands in STATA. I am assuming the command would go something like:
************************************************** ***************************************
(clean the data)
*preserve the clean data
preserve
(research question 1 analysis)
*saving new research data from research question 1 analysis as a different file
*now I need to restore the clean data for my second research question
restore
*preserve again as I need to keep my original clean data
preserve
(research question 2 analysis)
*saving new research data from research question 2 analysis as a different file
*now restore the clean data
restore
************************************************** **********************************************
Once I close my dataset after my final command "restore", I wanted to confirm that the data would go back to its original form (the one that was there before I cleaned the data).
Thank you for your time!
preserve/restore allows you to do whatever you want, but all is lost (if not saved) upon restore at which time the original data is restored unaltered.
I use it as you are all the time, especially when constructing a dataset.
frames is another way to do what you want. (1) My first question is regarding example 1 in the STATA manual preserve.pdf . What exactly is the need for preserve in this example? If the user does not need the data back for further analysis, once she has the results, when she closes her dataset, the dataset will return to its original form unless she decides to save the new data. The -preserve- command is necessary because the code that follows it destroys the original data in order to make a calculation, and this is in the context of a program . In a program, as opposed to a do-file, one is typically writing code that will solve some particular problem, and is intended for general use. By general use, I mean that you do not know who the users of this program will be, and you cannot know what they plan to do with the results your program creates. The user may well need the original data back after the program terminates--and you can't know if this will be the case or not when you write the program. So the general framework for programs is that you avoid "side-effects." That is, when the user uses your program, they can assume that any results calculated will be returned in -r()- or -e()- (or occasionally elsewhere), or in new variables whose names have been specified in the program-call command, and the data set will not have changed from its pre-program state.
If this example were in a do-file , then the -preserve- would indeed be unneeded because you could simply chose in you do-file to exit without saving the modified version of the data. But in a -program- you are programming for some future user whose circumstances and needs you cannot anticipate.
(2) On page (1), under options, the manual states "preserve instructs restore to restore the data now, but not to cancel the restoration of the data again at program conclusion. If preserve is not specified, the scheduled restoration at program conclusion is canceled." What does this sentence imply? For every command, would we always not use the preserve command first and then the restore command. I think what the manual says here, although accurate as a description of what Stata does, fails to make clear the purpose of this. When you -restore- your data, the copy of the original data that Stata made with the -preserve- command is, by default, abandoned. Suppose you want to do something like this:
Code:
use my_data, clear
//  DO SOME CALCULATIONS THAT ADD SOME NEW VARIABLES
//  TO THE DATA
preserve
// PERFORM A CALCULATION THAT DESTROYS THE AUGMENTED DATA
restore
preserve
// PERFORM ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA,
// THAT, AGAIN, DESTROYS THE DATA
restore
preserve
// PERFORM YET ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA,
// THAT, AGAIN, DESTROYS THE DATA
Notice that you have had to -preserve- the data three times here. Each of those requires a disk write operation*, which, if your data set is large, can be very time-consuming. Imagine, in fact, that instead of a sequence of three calculations, it was a calculation in a loop being iterated maybe thousands of times. You will be thrashing the disk till the cows come home! Wouldn't it make more sense to write the data to the disk just once and at each -restore-, just read in that same first copy? That-s what -restore, preserve- allows you to do. Rather than losing the -preserve-d data, -restore, preserve- brings that data back into memory, but also retains the -preserve-d copy for future use, or for automatic restoration at the end..
(3) For my code, I am starting by cleaning the data. After I clean the data, I need to start doing different analysis that require the same clean data. I do not want to clean the data every time I work with different research questions, so I am using the preserve and restore commands in STATA. I am assuming the command would go something like:... That code will perform as you expect. But, in light of the explanation given for (2), a more efficient version would be:
Code:
(clean the data)
*preserve the clean data
preserve
(research question 1 analysis)
*saving new research data from research question 1 analysis as a different file
*now I need to restore the clean data for my second research question
restore, preserve
*preserve again as I need to keep my original clean data
// preserve THIS COMMAND DELETED
(research question 2 analysis)
*saving new research data from research question 2 analysis as a different file
*now restore the clean data
restore
*In the versions of Stata since -frame-s were introduced, the -preserve-d data may be written to a frame in memory rather than to disk. This will, in general, be faster than a disk write, although there may be considerable overhead negotiating with the operating system to get the required memory allocated. So the use of -restore, preserve- to reduce the number of -preserve- operations applied to the exact same data is still a good idea.
Edit: Crossed with #2.
The -preserve- command is necessary because the code that follows it destroys the original data in order to make a calculation, and this is in the context of a program. In a program, as opposed to a do-file, one is typically writing code that will solve some particular problem, and is intended for general use. By general use, I mean that you do not know who the users of this program will be, and you cannot know what they plan to do with the results your program creates. The user may well need the original data back after the program terminates--and you can't know if this will be the case or not when you write the program. So the general framework for programs is that you avoid "side-effects." That is, when the user uses your program, they can assume that any results calculated will be returned in -r()- or -e()- (or occasionally elsewhere), or in new variables whose names have been specified in the program-call command, and the data set will not have changed from its pre-program state.
If this example were in a do-file, then the -preserve- would indeed be unneeded because you could simply chose in you do-file to exit without saving the modified version of the data. But in a -program- you are programming for some future user whose circumstances and needs you cannot anticipate.
I think what the manual says here, although accurate as a description of what Stata does, fails to make clear the purpose of this. When you -restore- your data, the copy of the original data that Stata made with the -preserve- command is, by default, abandoned. Suppose you want to do something like this:
Code:
use my_data, clear
// DO SOME CALCULATIONS THAT ADD SOME NEW VARIABLES
// TO THE DATA
preserve
// PERFORM A CALCULATION THAT DESTROYS THE AUGMENTED DATA
restore
preserve
// PERFORM ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA,
// THAT, AGAIN, DESTROYS THE DATA
restore
preserve
// PERFORM YET ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA,
// THAT, AGAIN, DESTROYS THE DATA
Notice that you have had to -preserve- the data three times here. Each of those requires a disk write operation*, which, if your data set is large, can be very time-consuming. Imagine, in fact, that instead of a sequence of three calculations, it was a calculation in a loop being iterated maybe thousands of times. You will be thrashing the disk till the cows come home! Wouldn't it make more sense to write the data to the disk just once and at each -restore-, just read in that same first copy? That-s what -restore, preserve- allows you to do. Rather than losing the -preserve-d data, -restore, preserve- brings that data back into memory, but also retains the -preserve-d copy for future use, or for automatic restoration at the end..
That code will perform as you expect. But, in light of the explanation given for (2), a more efficient version would be:
Code:
(clean the data)
*preserve the clean data
preserve
(research question 1 analysis)
*saving new research data from research question 1 analysis as a different file
*now I need to restore the clean data for my second research question
restore, preserve
*preserve again as I need to keep my original clean data
// preserve THIS COMMAND DELETED
(research question 2 analysis)
*saving new research data from research question 2 analysis as a different file
*now restore the clean data
restore
*In the versions of Stata since -frame-s were introduced, the -preserve-d data may be written to a frame in memory rather than to disk. This will, in general, be faster than a disk write, although there may be considerable overhead negotiating with the operating system to get the required memory allocated. So the use of -restore, preserve- to reduce the number of -preserve- operations applied to the exact same data is still a good idea.
Edit: Crossed with #2. restore, preserve // PERFORM ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA, // THAT, AGAIN, DESTROYS THE DATA restore, preserve // PERFORM YET ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA, // THAT, AGAIN, DESTROYS THE DATA With this code, the data is only written to disk once, with the -preserve- command, but is reused with each -restore, preserve- command. restore, preserve // PERFORM ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA, // THAT, AGAIN, DESTROYS THE DATA restore, preserve // PERFORM YET ANOTHER CALCULATION, STARTING FROM THE PREVIOUSLY AUGMENTED DATA, // THAT, AGAIN, DESTROYS THE DATA With this code, the data is only written to disk once, with the -preserve- command, but is reused with each -restore, preserve- command. Thank you so much for your time on this. I understand this now!