\documentclass{article} \usepackage{times}% \usepackage{multicol}% \usepackage{noweb}% \input nowebmargins% \pagestyle{noweb}% %\addtolength{\headwidth}{\marginparsep} %\addtolength{\headwidth}{\marginparwidth} \begin{document} \title{Citation Statistics for Ingram Olkin} \author{B. Narasimhan\\ Department of Statistics\\ Stanford University\\ Stanford, CA 94305} \date{\today} \maketitle \section{Introduction} \label{sec:intro} Ingram Olkin wanted the following statistics for the period 1985--1994. \begin{enumerate} \item The number of single author papers, \item The number of multiple author papers with names in alphabetic order, and \item The number of multiple author papers with names not in alphabetic order. \end{enumerate} We present a quick-and-dirty perl hack to do this. \section{The Perl Program} \label{sec:prog} The program is structured as follows. <<*>>= #! /usr/local/bin/perl <> <> <> <> @ \subsection{Copyright} \label{sec:copyright} <>= # # $Revision: 1.1 $ # # Copyright (C) 1997, B. Narasimhan (naras@stat.Stanford.EDU) # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. # # @ \subsection{Local Variables} \label{sec:local-vars} We need to know where the CIS files are located. Let us use [[\$cis_directory]] to denote the directory and [[\$cis_ext]] denote the extension to the file names. <>= $cis_directory = '/usr/local/lib/cis/cis95/files'; $cis_ext = '.v95'; @ %def $cis_directory $cis_ext @ The years in question. <>= @years = (85..94); @ %def @years @ Some variables for counts which we initialize to zero. \begin{description} \item[\texttt{\$sj}] Number of single author journal articles. \item[\texttt{\$sp}] Number of single author proceedings articles. \item[\texttt{\$mj}] Number of multiple author journal articles. \item[\texttt{\$mp}] Number of multiple author proceedings articles. \item[\texttt{\$mja}] Number of multiple author journal articles with author names in alphabetic order. \item[\texttt{\$mpa}] Number of multiple author proceedings articles with author names in alphabetic order. \end{description} <>= $sj = 0; # No of single author journal articles. $sp = 0; # No of single author proceedings articles. $mj = 0; # No of multiple author journal articles. $mja = 0; # No of multiple author alphabetic journal articles. $mp = 0; # No of multiple author proceedings articles. $mpa = 0; # No of multiple author alphabetic proceedings articles. @ %def $sj $sp $mj $mp $mja $mpa @ \subsection{Processing the Files} \label{sec:process-files} We basically see if each file exists and is a text file and open it. <>= YEAR: foreach $year (@years) { $filename = $cis_directory . '/' . 'cis' . $year . $cis_ext; next YEAR unless -T $filename; if (!open(FH, $filename)) { print STDERR "Can't open $filename---continuing...\n"; next YEAR; } while () { <> <> <> <> } } @ Splitting the fields is trivial once we know the format of the records. Notice that we ignore the trailing `garbage' in the records. <>= ($null,$field1,$title,$authors) = split('#'); @ %def $null $field1 $title $authors @ Skipping irrelevant stuff---we skip books, electronic publications and other administrative records. <>= $cite_ind = substr($field1,-1,1); # Skip books, electronic literature or administrative records next if ($cite_ind =~ /[Bb]/) || ($cite_ind =~ /C/) || ($cite_ind =~ /Z/); @ %def $cite_ind @ Oh! there are a bunch of entries for reviews of books, which we should skip too. <>= # Skip reviews of books. next if ($authors =~ /\(Rev\)/); @ Multiple authors are separated by semicolons in the author field. <>= @authors = split(';', $authors); $no_authors = $#authors + 1; @ %def $no_authors @ So we are now down to the last part: updating counts. To detect if the authors are listed alphabetically, we sort the [[@author]] array and pack it back into a string with semicolons, just as the field would look if the authors were listed in alphabetic order. So, if the newly constructed string matches the field obtained from the file, the authors are indeed in alphabetic order! We only need to take care of to update the relevant counters. <>= if ($no_authors > 1) { $sorted_authors = join(';', sort @authors); } if ($cite_ind =~ /[Jj]/) { # article in Journal if ($no_authors > 1) { $mj++; if ($sorted_authors eq $authors) { $mja++; } } else { $sj++; } } else { # article in proceedings or edited book if ($no_authors > 1) { $mp++; if ($sorted_authors eq $authors) { $mpa++; } } else { $sp++; } } @ \section{Printing Results} \label{sec:print-results} This is straight-forward. <>= print "Statistics from CIS for years @years \n"; printf "Single author papers in Journals: %d\n", $sj; printf "Multiple author papers in Journals (alph): %d\n", $mja; printf "Multiple author papers in Journals (Non-alph): %d\n", $mj - $mja; printf "Single author papers in Proceedings: %d\n", $sp; printf "Multiple author papers in Proceedings (alph): %d\n", $mpa; printf "Multiple author papers in Proceedings (Non-alph): %d\n", $mp - $mpa; @ \section{Index of Code Chunks} \label{sec:code-chunks} This list is generated automatically. The numeral is that of the first definition of the chunk. \nowebchunks \section{Index of Identifiers} \label{sec:identifiers} Here is a list of the identifiers used, and where they appear. Underlined entries indicate the place of definition. This index is generated automatically. \nowebindex \end{document} % Local Variables: % % TeX-master: t % % mode: LaTeX % % cweb-prog-language: "PERL" % % mode: cweb % % End: %