Adding the condition if missing(gnppc) restricts the list to cases where gnppc is missing. Note that Stata lists missing values using a dot. We'll learn more about missing values in Section. 1.1.6 Drawing a scatterplot to see how life expectancy varies with gnp per capita we will draw a scatter plot using the graph command, which has a myriad of subcommands and options, some of which we describe in Section. Graph twoway scatter lexp gnppc. Graph export g, width(500) replace (file g written in png format) The plot shows a curvilinear relationship between gnp per capita and life expectancy.

1.1.5 Descriptive statistics Let us run simple descriptive statistics for the two variables we and are interested in, using the s ummarize command followed by the names of the variables (which can be omitted to summarize logo everything. Summarize lexp gnppc Variable obs mean Std. Min Max lexp.27941. Gnppc 63 8674.857 10634. We see that live expectancy averages.3 years and gnp per capita ranges from 370 to 39,980 with an average of 8,675. We also see that Stata reports only 63 observations on gnp per capita, so we must have some missing values. Let us list the countries for which we are missing gnp per capita. List country gnppc if missing(gnppc) country gnppc. we see that we have indeed five missing values. This example illustrates a powerful feature of Stata: the action of any command can be restricted to a subset of the data. If we had typed list country gnppc we would have listed these variables for all 68 countries.

To load the file we want type sysuse lifeexp (the file extension is optional). To see what's in the file type d escribe. (This command can be abbreviated to a single letter, but I prefer desc.). Sysuse lifeexp, clear (Life expectancy, 1998). Desc Contains data from C:Program Files obs: 68 Life expectancy, 1998 vars: 6 09:40 size: 2,652 dta has notes) storage display value variable name type format label variable label region byte.0g region Region country str28 28s country popgrowth float.0g * Avg. Annual growth lexp byte.0g * Life expectancy at birth gnppc float.0g * gnp per capita dates safewater byte.0g * * indicated variables have notes sorted by: we see that we have six variables. The dataset has notes that you can see by typing notes. Four of the variables have annotations that you can see by typing notes varname. You'll learn how to add notes in Section.

Stata can also compute tail probabilities for the normal, chi-squared and F distributions, among others. One of the nicest features of Stata is that, starting with version 11, all the documentation is available in pdf files. (In fact it looks as if starting with version 13 you can no longer get printed manuals.) Moreover, these files are linked from the online help, so you can jump directly to the relevant section of the manual. To learn more about the help system type help help. 1.1.4 loading a sample data file Stata comes with a few sample data files. You will learn how to read your own data into Stata in Section 2, but for now we will load one of the sample files, namely lifeexp. Dta, which has data on life expectancy and gross national product (GNP) per capita in 1998 for 68 countries. To see a list of the files shipped with Stata type sysuse dir.

1.1.3 Getting Help Stata has excellent online help. To obtain help on a command (or function) type help command_name, which displays the help on a separate window called the viewer. (you can also type chelp command_name, which shows the help on the results window; but this is not recommended.) Or just select HelpCommand on the menu system. Each help file appears in a separate viewer tab (a separate window before Stata 12) unless you use the option, nonew. If you don't know the name of the command you need, you can search for.

Stata has a search command that will search the documentation and other resources, type help search to learn more. By default this command searches the net in Stata 13 and later. If you are using an earlier version learn about the findit command. Also, the help command reverts to a search if the argument is not recognized as a command. Try students help Student's. This will list all Stata commands and functions related to the t distribution. Among the list of "Stat functions" you will see t for the distribution function and ttail for right-tail probabilities.

1.1.2 Typing Commands, stata can work as a calculator using the d isplay command. Try typing the following (excluding the dot at the start of a line, which is how Stata marks the lines you type. Display 2 * ttail(20,.1).04861759 Stata commands are case-sensitive, display is not the same as Display and the latter will not work. Commands can also be abbreviated; the documentation and online help underlines the shortest legal abbreviation of each command and we will do the same here. The second command shows the use of a built-in function to compute a p-value, in this case twice the probability that a student's t with.

This result would just make the 5 cutoff. To find the two-tailed 5 critical value try display invttail(20,.025). We list a few other functions you can use in Section. If you issue a command and discover that it doesn't work press the page Up key to recall it (you can cycle through your command history using the page Up and Page down keys) and then edit it using the arrow, insert and delete keys. For example Arrows advance a character at a time and Ctrl-Arrows advance a word at a time. Shift-Arrows select a character at a time and Shift-Ctrl-Arrows select a word at a time, which you can then delete or replace. A command can be as long as needed (up to some 64k characters in an interactive session you just keep on typing and the command window will wrap and scroll as needed.

Finally, it is possible to change the color scheme, selecting from seven preset or three customizable styles. One of the preset schemes is classic, the traditional black background used in earlier versions of Stata. There are other windows that we will discuss as needed, namely the. Graph, viewer, variables Manager, data Editor, and, do file Editor. Starting with version 8 Stata's graphical user interface (GUI) allows selecting commands and options from a menu and dialog system. However, i database strongly recommend using the command book language as a way to ensure reproducibility of your results. In fact, i recommend that you type your commands on a separate file, called a do file, as explained. Section.2 below, but for now we will just type in the command window. The gui can be helpful when you are starting to learn Stata, particularly because after you point and click on the menus and dialogs, Stata types the corresponding command for you.

Review on the left, so you can keep track of the commands you have used. Variables, on the top right, lists the variables in your dataset. Properties window immediately below that, introduced in version 12, displays properties of your variables and dataset. You can resize or even close some of these windows. Stata remembers its settings the next time it runs. You can also save (and then load) named preference sets using the menu. E dit, p references. I happen to like the compact Window layout. You can also choose the font used in each ever window, just right click and select font from the context menu; my own favorite on Windows is Lucida console.

Local Note: At opr you can access Stata/se on Windows by running the network version on your own workstation, just create a shortcut to (If you have a 64-bit workstation change the program name to statase-64.exe.) For computationally intensive jobs you may want to login. Coale via remote desktop and run Stata/SE there. If you prefer Unix systems logon to our Unix server lotka via x-windows and leave your job running there. 1.1.1 The Stata Interface, when Stata starts up you see five docked windows, initially arranged as shown in the figure below. The window labeled, command is where you type your commands. Stata then shows the results in the larger window immediately above, called appropriately enough. Your command is added to a list in the window labeled.

Version 14 added Unicode support, which will come handy when we discuss multilingual labels in Section.3. Version 15 includes, among many new features, graph color transparency or opacity, which we'll use in Section.3. Stata is available for Windows, Unix, and Mac computers. This tutorial focuses on the windows version, but most of the contents applies to the other platforms as well. The standard version is called Stata/IC (or Intercooled Stata) and can handle up to 2,047 variables. There is a special edition called Stata/SE that can handle up to 32,766 variables (and also allows longer string variables and larger matrices and a version for multicore/multiprocessor computers called Stata/mp, which allows larger datasets and is substantially faster. The number of observations is limited by your computer's memory, as long as it doesn't exceed about two billion in Stata/se and about a trillion online in Stata/MP. There are versions of Stata for 32-bit and 64-bit computers; the latter can handle more memory (and hence more observations) and tend to be faster.

This tutorial for is an introduction to Stata emphasizing data management and graphics. A pdf version is available here. The web pages and pdf file were all generated from a stata/Markdown script using the markstat command described here. For a complementary discussion of statistical models see the. Stata section of my glm course. Stata is a powerful statistical package with smart data-management facilities, a wide array of up-to-date statistical techniques, and an excellent system for producing publication-quality graphs. Stata is fast and easy to use. In this tutorial I start with a quick introduction and overview and then discuss data management, statistical graphs, and Stata programming. The tutorial has been updated for version 15, but most of the discussion applies to versions 8 and later.

