I recently interviewed at Google for a system administrator position. A few friends have expressed interest in the questions. Here’s the quick review in case you’re curious…

First session, one hours, two interviewers:

I think I did pretty well on File systems, Mysql db, Networking. File systems: What is a journalled filesystem? What are its advantages? Can you think of a situation where journalling might cause problems?

Mysql: Say you have a newspaper company (or companies) and companies own one or more newspapers. Newspapers have issues, issues contains stories, and stories have attribution to reporters (say one reporter per story). Draw a schema on the board representing the tables you might use and their relationships.

Networking: Forgot what the question was here… I’ll check my notes and possibly update this entry if I find anything.

Did reasonably well on creating a sample script starting with L1=“words words words” L2=“words words words” and ending with L3 containing words in L1 *not* also in L2. I got the basic concept but probably lost points on the “it has to run” part. My answer drew heavily from having solved this “subtract” type of problem before in a script, but that was for files. Drawing on that, I proposed two for loops that would echo words from L1 to a temp file, each line prefixed by “9”, and words from L2 to the same file prefixed by “1”. Then the file is munged by sort, uniq, and grep to keep only one of each duplicate word (the copy prefixed by “1”) and then grep out the remaining stuff from L1 which survives in the file because it was unique and is prefixed by “9”. Kind of an “ugly hack” but I readily admitted that I had cribbed from a previous script that I wrote designed to subtract files of arbitrary length, not to subtract sets of words given on one line.

Second session, one hour, two more interviewiers:

Did OK on DNS/Resolver. The question started with “If I type ping www.google.com into the shell, describe what happens in terms of resolving and DNS. Answer involved talking through types of queries and root nameservers, though I missed nsswitch.conf and described a resolver that went straight to resolv.conf, and gave one wrong answer and then corrected myself on the describing exactly what was in each query and response packet.

Did OK on the “write a script to identify users whose home dir is not /home/$user and move their directories”… I went for the rewriting /etc/passwd and forgot about nis/ldap users but mostly I showed good understanding of perl and regex.

I believe I did well on describing what happens when you’re tailing a logfile and the file is moved, and what to do.

I correctly described the difference between hard links and soft links. I messed up on “what’s in an inode vs. whats in a dir entry” but corrected myself with a little prompting.

Lunch break. Google cafe rocks. I had braised ox tail and dry-braised string beans.

Third session, one hour, hiring manager only:

Mgr asked me “why do you like being a sysadmin” (A: I enjoy problem solving, among other things). Mgr also asked me to come up with a process for upgrading the kernel on 10,000 machines, I believe I did well on that. I started off describing a complete system upgrade via kickstart, but then with a little prompting also described installing kernel RPMs while the system is still running and then doing controlled reboots.

He then showed a network diagram and asked me to walk through troubleshooting steps in the case of a certain user complaining that access to his mailserver is slow. I asked mostly the right troubleshooting questions, though I assumed perhaps incorrectly that it wasn’t due to congestion/maxing out the link because no packet loss was observed, only high latency. My assumption was that it wasn’t due to maxing out the link, because if you max out the link, even if you are able to buffer a second or two of traffic at the router, eventually the buffer would get full and have to drop something. Eventually I walked through enough “virtual troubleshooting” to determine that the link was running close to capacity since 2am according to MRTG and reporting on the flows showed most of the traffic on TCP port 3389, which I didn’t immediately see the significance of, but suggested to track down the two machines involved in the heavy conversation and netstat -ap to see what was listening on that port.

I asked Mgr how big the team is, how is it structured, and how many levels of management between sysadmins and the CEO.

I kept some notes from when I had a phone interview (about a month before). This was one person, one hour.

I think I did well there too. He asked me about the difference in quoting styles in shell and/or perl, and to describe how I would write a script to parse /etc/passwd to get a list of users. He asked for a simple command to transform comma-separated files into tab-separated, I said “sed -e ’s/,//’” where "” is literal ^I or ^V^I depending on the shell.

He asked how I would delete a file named “-f” and I gave a bunch of alternatives, eventually coming to the one he was looking for. He asked for a description of how to compile a linux kernel, I think I did OK on that one, and also to describe what LILO actually does in detail.

Finally we talked about CIDR, netmask, and how to figure out netmask for a /22 net in some detail, which took me longer to describe over the phone than to draw and point at, but eventually got through it. I also described my “netmask shortcut” which is to figure out how many addresses are in the network based on how many bits, then subtract that number from 256 to get the last byte (example, /28 is 4 bits smaller than /24, 2^4 is 16, 256-16=240, 255.255.255.240)… or if the network is larger than /24, count how many bits removed from /24 it is and use that to figure how many /24s are contained in it, and apply the same logic to the third octet instead (/19 is 5 bits from /24, so it is the size of 32 class C’s, so 256-32 is 224, and you get 255.255.224.0, not that you would ever build a single network with 8096 nodes, but if you did that would be your netmask).