Archive

Posts Tagged ‘computing’

A small observation regarding usability and Gnome 3

September 17, 2011 1 comment

I’ve been running Nautilus 3.0.2 for the last few weeks, I noticed the delete key didn’t actually delete files so I’ve been painstakingly deleting files by right click then “Move to Trash” or “Delete” from the menu. I always thought this was a bug and it would get fixed soon, but after getting sick of this workaround I finally did a Google search and found that this is by design. Now you need to use Ctrl+Del to send to trash and Shift+Del to skip the trash.

I think this is a poor choice, my reasoning is because now you have to use modifier+Del for both send to trash and skip trash. It makes sense to have Del as safe delete (send to trash) and a modified+Del for hard delete (no trash) that is logical and it is easy to remember the difference between the two. But now in this new situation a large majority of users (me included) won’t be able to remember which modifier does what. Is it Ctrl or Shift and will just pick one at random, giving you a 0.5 chance of doing the actual type of delete you wanted.

Also I don’t see any reason to add more safety to the straight delete key, after all it only goes to the trash so you can recover it if you accidentally hit the key. If you want add a sound upon sending something to the trash as an audio cue.

The number one thing I learned in my Human Computer Interaction course at uni was get feedback from the users (this isn’t just asking for feedback, but also observing how they use the software). This is my feedback as a user, and if Gnome developers want to improve usability then they should take in the feedback, i.e. read this.

Advertisement
Categories: Uncategorized Tags: , ,

I’m halfway towards an online home…

July 31, 2011 Leave a comment

Just recently I signed up with Linode for a virtual private server. I should have done this a long time ago but payment methods and too much choice lead me to put it off. So far I’m quite happy with the price and the quality of the product.

I think I would have preferred just buying a new dedicated machine myself, however unless you’re in the CBD you can’t get upload speeds much more than 1.5Mbps around here (if you are lucky), which is not really fast enough, hence I’ve gone with the 3rd party hosting option. Perhaps if this NBN the government keeps taking about is turned on and not too expensive I’ll switch. In the mean time it seems silly that Australians need to host content that is mainly consumed by Australians overseas just because of the price difference.

For example if A wants to host content to Australians, but it is cheaper to do so from overseas, then A will mostly likely host their content overseas. This just means Australia has an even greater disturbance in the overseas traffic up/down ratio (i.e. we pull more than we push), which in turns means our local ISPs have to pay more compared to the overseas ISPs to lay the cables, this in turn leads to higher internet costs for Australians compared to overseas ISPs.

At any time our local ISPs could provide incentives for keeping traffic local inside the country by making overseas traffic more expensive than local traffic (for both people hosting and consuming), which would help Australians host locally and in turn even up the balance. It also means that the overseas pipes won’t need to be as thick (because for sites which mostly get Australian visitors, hosting locally means the total amount of cross country traffic for that site is less, hence more efficient).

There is not much I can do about this, so I’m giving in and hosting overseas with the hope that one day things will return to sanity, but in the meantime I haven’t been left off-line.

Anyway, back to Linode, I having a lot of fun at the moment setting it up and getting things working.

I’m running Debian 6.0, with unattended-upgrades, lighttpd, ufw, etckeeper (and soon either exim or postfix and possibly I’ll migrate this wordpress blog across). I have run through http://www.debian.org/doc/manuals/securing-debian-howto/ it’s a bit dated and I’m by no means following everything, but none the less it’s still a nice read.

I’ve set up ssh with protocol 1 disabled (securing debian howto says it has some design flaws, so why enable it if I don’t need it?), publickey authentication only, fail2ban, and only accepting traffic to the ssh port from IP ranges I know I’ll be connecting from (I know if I’ve got security via publickey auth, I don’t really need this, but it can’t hurt… at least my logs don’t fill up with as many break in attempts.).

However since one can always log into the machine via the console at linode.com, security here comes down to the weakest link of web based username/password and ssh publickey auth (ignoring all the other threats like compromised VM separation, compromised VM host, physical security, etc.. stuff I have no control over).

I have set up lighttpd as RAM is limited and I read that this (and nginx) are better than Apache httpd in this regard. I have also set up munin and munin-node (and adapted /etc/munin/apache.conf for lighttpd). I would prefer the munin stats not be appreciable to the world, so I just expose it to localhost only and access it via an SSH tunnel.

Additionally, while trying to debug lighttpd configuration I noticed that it wasn’t responding to nc. Apparently lighttpd will only respond to my requests when lines end with \r\n but just typing away in the terminal as input to nc and using the Enter key I only end lines with \n. Apparently this is per the HTTP spec and it only works with Apache because it isn’t as strict about this.

Next I have to see about getting a domain name. This is a huge problem in itself, but it since,

  • the IP can change at any time (and will change if I move away from Linode),
  • you can’t have sub-domains to separate parts of the site, and
  • you can’t really do email

I don’t really have a choice. Unfortunately the current ICANN DNS is the only real option at the moment, so I’m just going to have to pay up, and try to avoid having some details which I don’t want listed, listed in WHOIS. At the moment I’ll probably go for a .id.au domain.

I’ll probably move this blog across once I set up a domain name, so more news to come on this later.

Update: I’ve registered with gandi.net. My site is now available at http://tianjara.net/

Categories: Uncategorized Tags: ,

Operating Systems Notes

July 31, 2010 4 comments

Here are are some rough notes I put together as part of revision for a uni course. They are heavily based on the course lecture notes by Kevin Elphinstone and Leonid Ryzhyk. All diagrams are sourced from those lecture notes, some of which are in turn from the text book, A. Tannenbaum, Modern Operating Systems.

OS Overview

  • The OS hides the details of the hardware and provides an abstraction for user code.
  • The OS is responsible for allocating resources (cpu, disk, memory…) to users and processes. It must ensure no starvation, progress, and that the allocation is efficient.
  • The kernel is the portion of the OS running in privileged mode (hardware must have some support for privileged mode for the OS to be able to enforce the user and kernel mode distinction).
    Diagram showing how the OS interacts with the system.
  • When the hardware is in privileged mode all instructions and registers are available. When the hardware is in user mode only a limited sets of instructions, registers and memory locations are available. For instance when in user mode, it is not possible to disable interrupts because otherwise a user could ensure that the OS never gets control again..
  • A process is a instance of a program in execution.
  • A process in memory usually has at least three segments. The text segment for the program code (instructions), a data segment called the heap for global variables and dynamically allocated memory, and a stack segment used for function calls.
  • A process also has an execution context which includes registers in use, program counter, stack pointer, etc.
  • In a Monolithic OS different parts of the OS like processor allocation, memory management, devices and file systems are not strictly separated into separate layers but rather intertwined, although usually some degree of grouping and separation of components is achieved.

System Calls

  • The system call interface represents the abstract machine provided by the operating system.
  • man syscalls
  • Syscall numbers
  • hardware syscall instruction

Processes and Threads

  • Processes can contain one or more threads. These threads are units of execution, and always belong to a process.
  • A process can terminate by,
    • normal exit (voluntary), usually returning EXIT_SUCCESS
    • error exit (voluntary), usually returning EXIT_FAILURE
    • fatal error (involuntary), eg. segfault, divide by zero error
    • killed by another process (involuntary), eg. pkill (sending the SIGTERM signal)
  • Thread states
    Thread states diagram.
  • Running to ready could be from a voluntary yield, or end of time allocated by the CPU, so the scheduler moves the thread to the Ready queue to give another process a go.
  • Running to blocked could be from a calling a syscall and waiting for the response, waiting on a device, waiting for a timer, waiting for some resource to become available, etc.
  • The scheduler (also called the dispatcher) decides which thread to pick from the read queue to run.
    Table of per process and per thread resources.
  • The idea is that a process with multiple threads can still make process when a single thread blocks, as if one thread blocks, the others can still be working. In more recent times, it allows a process to have threads running on multiple CPU’s on modern multi-core machines.
  • The other model is finite-state machine where syscall’s don’t block but instead allow the program to continue running and then send the process an interrupt when the syscall has been dealt with. This is not as easy to program though.

During the system initialisation background processes (called daemon’s on linux, service’s on windows) and foreground processes can be started.

Threads Implementation

  • Kernel-threads vs. User-level threads.
  • User-level threads
    User level threads diagram.

    • Thread management is handled by the process (usually in a runtime support library)
    • The advantages are,
      • switching threads at user-level is much cheaper than the OS doing a context switch
      • you can tune your user-level thread scheduler, and not just use the OS provided one
      • no OS support for threads needed
    • The disadvantages are,
      • your threads must yield(), you can’t really on an interrupt to give your scheduler control again (this is known as co-operative multithreading)
      • cannot take advantage of multiple CPU’s
      • if one of your user-level threads gets blocks by the OS, your whole process gets blocked (because the OS only sees one thread running)
  • Kernel-level Threads
    Kernel level threads diagram.

    • Thread management is done by the kernel through system calls
    • The downside is thread management (creation, destruction, blocking and unblocking threads) requires kernel entry and exit, which is expensive
    • The upside is we now have preemptive multithreading (the OS just takes control when it wants to schedule another process, so no need to rely on the thread to yield), can take advantage of a multiprocessor, and individual threads can have isolated blocking.
  • A Thread switch can happen in between execution of any two instructions, at the same time it must be transparent to the threads involved in the switch. So the OS must save the state of the thread such as the program counter (instruction pointer), registers, etc. as the thread context. The switch between threads by the OS is called a context switch.

Concurrency

  • A race condition occurs when the computation depends on the relative speed of two or more processes.
  • Dealing with race conditions,
    • Mutual exclusion
      • Identify the shared variables,
      • identify the code sections which access these variables (critical region)
      • and use some kind of mutual exclusion (such as a lock) to ensure that at most only one process can enter a critical region at a time.
    • Lock-free data structures
      • Allow the concurrent access to shared variables, but design data structures to be designed for concurrent access
    • Message-based communication
      • Eliminate shared variables and instead use communication and synchronisation between processes.
  • Mutual Exclusion – enter_region() and leave_region().
    Critical regions diagram.
  • Hardware usually provides some mutual exclusion support by providing an atomic test and set instruction.
  • A uniprocessor system runs one thread at a time, so concurrency arises from preemptive scheduling. But with multiprocessor machines concurrency also arises from running code in parallel using shared memory.
  • The problem with the approach to mutual exclusion so far is that when process B is blocked, it just sits in a loop waiting for the critical region to become available. Can fix this by using sleep and wakeup. We use a mutex lock for this.
  • Blocking locks can only be implemented in the kernel, but can be accessed by user-level processes by system calls.
  • A mutex allows at most one process to use a resource, a semaphore allows at most N processes.
  • Producer-consumer problem
    • Producer can sleep when buffer is full, and wake up when space available.
    • Consumer can sleep when buffer is empty and wake up when some items available.
    • Can do using semaphores or condition variables.
  • Monitors are set up to only allow one process/thread to operate inside it at once, with extra requests put in a queue. Implemented by the compiler.
  • Condition variables. cv_create, cv_destroy, cv_wait, cv_signal, cv_broadcast
  • Dining philosophers problem. Need to prevent deadlock.

Deadlock

  • A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause.
  • Four conditions for deadlock:
    • Mutual exclusion condition (device not shareable amongst threads)
    • Hold and wait condition (a resource can be held, but then block awaiting more resources)
      • Can attack this by requiring all process ask for all resources at the start, and only start if they are granted.
    • No preemption condition – previously granted resources cannot be forcibly taken away
    • Circular wait condition
      • Always request resources in the same order
  • Dealing with deadlock:
    • ignore the problem
    • detect and recover
    • dynamic avoidance (careful resource allocation) – we require some information in advance like which resources and how many will be needed before process starts.
      • Bankers algorithm
    • prevention (negate one of the four conditions for deadlock)
  • Starvation is slightly different to deadlock as the system can be making progress, but there are some processes which never make progress.

File Systems

  • File systems provide an abstraction of the physical disk. They allow you to store files in a structured manner on a storage device.
  • The file system abstraction,
    Filesystem abstraction table.
  • Architecture of the OS storage stack,
    OS storage stack.
  • The application interacts with the system call interface which on linux provides creat, open, read, write, etc.
  • In the case of modern hard drives,
    Hard disk diagram.
  • The disk controller (between device driver and physical disk) hides disk geometry and exposes a linear sequence of blocks.
  • The device driver hides the device specific protocol to communicate with the disk.
  • The disk scheduler takes in requests coming from different processes and schedules them to send one by one to the device driver.
  • The buffer cache keeps blocks in memory to improve read and write times for frequently accessed blocks.
  • The file system (FS) hides the block numbers and instead exposes the directory hierarchy, files, file permissions, etc. To do this it must know which blocks relate to which file, and which part of the file. It must also manage how to store this on the blocks of the linear address space of the disk, keeping track of which blocks are in use. See Allocation Strategies
  • The virtual file system (VFS) provides an interface so that different file systems which are suitable for VFS (ie. they have the concept of files, directories, etc..) can be accessed uniformly.
  • The open file table (OF table) and file descriptor table (FD table) keep track of files opened by user-level processes.

There are many popular file systems in use today. FAT16, FAT32 are good for embedded devices; Ext2, Ext3, Ext4 are designed for magnetic disks, ISO9660 is designed for CD-ROM’s, JFFS2 is optimised for flash memory as you don’t want to write to the same location too many times as it wears out. Others include NTFS, ReiserFS, XFS, HFS+, UFS2, ZFS, JFS, OCFS, Btrfs, ExFAT, UBIFS.

Performance issues with standard magnetic disks

  • To access a block on a disk the head must move to the required track (seek time), then we have to wait until the required block on the track reaches the head (rotational delay). We also have some mostly fixed overhead for the data to be passed between the device and the driver (transfer time).
  • Total average access time, T_a = T_s + \frac{1}{2r} + \frac{b}{rN}. Where r is the rotational speed, b is the number of bytes to transfer, N is the number of bytes per track and T_s is the seek time.
  • Seek time is the most expensive, so we want to ensure that our disk arm scheduler gives good performance.
    • First-in, First-out (no starvation)
    • Shortest seek time first (good performance, but can lead to starvation)
    • Elevator (SCAN) (head scans back and forth over the disk. no great starvation, sequential reads are poor in one of the directions)
    • Circular SCAN (head scans over the disk in one direction only)

Block Allocation Strategies

  • Contiguous Allocation
    • Gives good performance for sequential operations as all the blocks for a file are together in order
    • though you need to know the maximum file size at creation
    • as files are deleted, free space is fragmented (external fragmentation)
  • Dynamic Allocation
    • Blocks allocated as needed, hence the blocks for a file could be all over the place on the disk
    • need to keep track of where all the blocks are and the order for each file

External fragmentation space wasted external to the allocated regions, this space becomes unusable as its not contiguous (so you have lots of small spaces but you need a larger space to fit a whole file in)

Internal fragmentationspace wasted internal to the allocated memory regions, eg. you get a 4K block allocated and only use 1K, but the OS can’t give the leftover 3K to someone else.

Keeping track of file blocks

  • File allocation table (used for FAT16 and FAT32)
  • inode based FS; keep an index node for each file which stores all the blocks of the file.
    • inode
  • Directories are stored as normal files but the FS gives these file special meaning.
  • A directory file stores a list of directory entries, where each entry containing the file name (because Ext2/3/4 store these with the directory file, not the inode for the file), attributes and the file i-node number.
  • Disk blocks (sectors) have a hardware set size, and the file system has a filesystem set block size which is sector size * 2^N, for some N. A larger N means large files need less space for the inode, but smaller blocks waste less space for lots of small files.
  • Ext2 inodes
    ext2 inode
  • The top bunch of data is file attributes. Block count is the number of disk blocks the file uses.
  • The direct blocks area stores index’s for the first 12 blocks used by the file.
  • The single indirect is a block numbers to a block which contains more block numbers.w

System call interface

File syscall interface

  • At the system call level a file descriptor is used to keep track of open files. The file descriptor is associated with the FS inode, a file pointer of where to read or write next and the mode the file was opened as like read-only.
  • Most Unix OS’s maintain a per-process fd table with a global open file table.
    File descriptor table.

VFS

  • Provides a single system call interface for many file systems.
    • the application can write file access code that doesn’t depend on the low level file system
    • device can be hard disk, cd-rom, network drive, an interface to devices (/dev), an interface to kernel data structures (/proc)
  • vfs
  • vnode

Journaling file system keeps a journal (or log) of FS actions which allows for the FS to recover if it was interrupted when performing an action that is not atomic, but needs to be.

Memory Management and Virtual Memory

  • The OS needs to keep track of physical memory, what’s in use and which process is it allocated to. It must also provide applications with a view of virtual memory that they are free to use.
  • Swapping (sometimes called paging) is where memory is transferred between RAM and disk to allow for more data in memory than we have space for in physical RAM. On base-limit MMU’s swapping only allows who programs memory allocation to be swapped to disk, rather than just pages at a time as with virtual memory, hence you can’t use swapping here to allow programs larger than memory to run.
  • If we are only running one program we can give it all of memory (and either run the in part of it, or in some other ROM). However many systems need more than one process to be running at the same time. Multiprogramming also means that we can be utilising the CPU even when a process is waiting on I/O (ie. give the CPU to the other process). But to support multiprogramming we need to divide memory up.
  • We could divide memory up into fixed size partitions and allocate these to processes, but this creates internal fragmentation (wasted space inside the partition allocated).
  • Using dynamic partitioning we give each process exactly how much it needs, though this assumes we know how much memory we need before the process is started. This approach also leads to external fragmentation where we have lots of little unallocated gaps, but these are too small to be used individually.
  • Approaches to dynamic partitioning include first-fit, next-fit, best-fit and worst-fit.
  • Binding addresses in programs
    • Compile time
    • Load time
    • Run time
  • Protection – we only want each process to be able to access memory assigned to that process, not other process
    • Base and Limit Registers – set by the kernel on a context switch, the hardware handles the rest
    • Virtual Memory – two variants
      • Paging
        • Partition physical memory into small equal sized chunks called frames.
        • Divide each process’s virtual (logical) address space into same size chunks called pages. So a virtual memory address consists of a page number and an offset within that page.
        • OS maintains a page table which stores the frame number for each allocated virtual page.
          paging
        • No external fragmentation, small internal fragmentation.
        • Allows sharing of memory
        • Provides a nice abstraction for the programmer
        • Implemented with the aid of the MMU (memory management unit).
      • Segmentation
        • A program’s memory can be divided up into segments (stack, symbol table, main program…)
        • Segmentation provides a mapping for each of these segments using a base and limit.
        • segmentation

Virtual Memory

  • If a program attempts to access a memory address which is not mapped (ie. is an invalid page) a page fault is triggered by the MMU which the OS handles.
  • Two kinds of page faults,
    • Illegal Address (protection error) – signal or kill the process
    • Page not resident – get an empty frame or load the page from disk, update the page table, and restart the faulting instruction.
  • Each entry in the page table not only has the corresponding frame number but also a collection of bits like protection bits, caching disabled bit, modified bit, etc.
  • Page tables are implemented as a data structure in main memory. Most processes don’t use the full 4GB address space so we need a data structure that doesn’t waste space.
  • The two level page table is commonly used,
  • Two level page table
  • An alternative is the inverted page table,
  • ipt
  • The IPT grows with size of RAM, not virtual address space like the two level page table.
  • Unlike two level page table which is required for each process, you only need one IPT for the whole machine. The downside is sharing pages is hard.
  • Accessing the page table creates an extra overhead to memory access. A cache for page table entries is used called a translation look-aside buffer (TLB) which contains frequently used page table entries.
  • TLB can be hardware or software loaded.
  • TLB entries are process specific so we can either flush the TLB on a context switch or give entries an address-space ID so that we can have multiple processes entries in the TLB at the same time.
  • Principle of Locality (90/10 rule)
    • Temporal Locality is the principle that accesses close together in terms of time are likely to be to the same small set of resources (for instance memory locations).
    • Spatial locality is the principle that subsequent memory accesses are going to be close together (in terms of their address) rather than random. array loop example
  • The pages or segments required by an application in a time window (\Delta) is called its memory working set.
  • Thrashing,
    thrashing
  • To recover from thrashing, just suspend some processes until it eases.
  • Fetch policy – when should a page be brought into memory? on demand, or pre-fetch?
  • Replacement policy – defines which page to evict from physical memory when its full and you start swapping to disk.
    • Optimal
    • FIFO – problem is that age of a page isn’t necessarily related to its usage
    • Least Recently Used – works well but need to timestamp each page when referenced.
    • Clock (aka. Second Chance) – an approximation of LRU. Uses a usage or reference bit in the frame table
  • Resident Set Size – how many frames should each process have?
    • Fixed allocation
    • Variable allocation – give enough frames to result in an acceptable fault rate

Input Output

  • Programmed I/O (polling or busy waiting)
  • Interrupt Driven I/O
    • Processor is interrupted when I/O module (controller) is ready to exchange data
    • Consumes processor time because every read and write goes through the processor
  • Direct Memory Access (DMA)
    • Processor sent interrupt at start and end, but is free to work on other things while the device is transferring data directly to memory.

double-buffer

Scheduling

  • The scheduler decides which task to run next. This is used in multiple contexts, multi-programmed systems (threads or processes on a ready queue), batch system (deciding which job to run next), multi-user system (which user gets privilege?).
  • Application behaviour
    • CPU bound – completion time largely determined by received CPU time
    • I/O bound – completion time largely determined by I/O request time
  • Preemptive (requires timer interrupt, ensures rouge processes can’t monopolise the system) v non-preemptive scheduling.
  • Scheduling Algorithms can be categorised into three types of systems,
    • Batch systems (no users waiting for the mouse to move)
    • Interactive systems (users waiting for results)
      • Round Robin – each process is run and if it is still running after some timeslice t, the scheduler preempts it, putting it on the end of the ready queue, and scheduling the process on the head of the ready queue.
        If the timeslice is too large the system becomes sluggy and unresponsive, if it is too small too much time is wasted on doing the context switch.
      • The traditional UNIX scheduler uses a priority-based round robin scheduler to allow I/O bound jobs to be favoured over long-running CPU-bound jobs.
    • Realtime systems (jobs have deadlines that must be meet)
      • Realtime systems are not necessarily fast, they are predictable.
      • To schedule realtime tasks we must know, its arrival time a_i, maximum execution time (service time), deadline (d_i).
      • tasks could be periodic or sporadic.
      • slack time is time when the CPU is not being used, we are waiting for the next task to become available.
      • two scheduling algorithms,
        • Rate Monotonic – priority driven where priorities are based on the period of each task. ie. the shorter the period the higher the priority
        • Earliest Deadline First – guaranteed to work if set of tasks are schedulable. The earlier the deadline the higher the priority.

Multiprocessor Systems

  • For this section we look at shared-memory multiprocessors.
  • There are uniform memory accesses multiprocessors and non-uniform memory access multiprocessors which access some parts of memory slower than other parts. We focus on the UMA MP’s.
  • Spinlocking on a uniprocessor is useless, as another thread on the same processor needs to release it, so blocking asap is desirable. On a multiprocessor, the thread holding the lock may be presently active on another processor, and it could release the lock at any time.
    On a multiprocessor, spin-locking can be worthwhile if the average time spent spinning is less than the average overheads of context switch away from, and back to, the lock requester.”
  • Thread affinity refers to a thread having some preferred processor to run on an a multiprocessor machine. For instance if thread A started on cpu0 it may want to be scheduled again on cpu0 rather than cpu1 as parts of the cache may still be intact.
  • Multiprocessor systems have multiple ready queues, as just one ready queue would be a shared resource which introduces lock contention.

Security

Categories: unswcourse Tags:

Security Engineering Notes

July 31, 2010 Leave a comment

Here are are some rough notes I put together as part of revision for a uni course.

Security Engineering

  • CIA
    • Confidentiality
    • Integrity
    • Availability
  • Some other attributes include,
    • Authenticity
    • Privacy
    • Accountability,
    • Non-reputability
    • Receipt-freeness (for voting systems)
  • Methods of attack
    • Brute Force
    • Deception/Social Engineering
    • Calculated attack on a vulnerability
    • Insider (both intentional and unintentional/accidental)
      • Mitigation techniques: Keeping logs and conducting audits could. Also least privilege to prevent accidental insider attacks.
    • Chance (right place at the right time)
    • Accidental
  • Reliable systems (protect against random failures)
  • Secure systems (protect against targeted deliberate attacks)
  • Assets which need to be protected
    • Tangible (eg. documents)
    • Non-tangible (eg. reputation)
  • Security Engineering: Protecting assets against threats.
  • Kerckhoff’s Principle – “the security of the cryptosystem must not depend on keeping secret the crypto-algorithm. It must depend only on keeping secret the key.”
    • Because, its often easier to change the key after a compromise or possible compromise than changing the algorithm,
    • and it’s usually also harder to keep the algorithm secret (can be reverse-engineered or bought) compared to the key

Crypto

  • Code (hidden meaning, ie. A spy using a phase that seems normal, but actually has some other hidden meaning)
  • Plaintext -> E(encryption key) Ciphertext -> D(decryption key)
  • Symmetric-key cryptography – Encryption key can be calculated from the decryption key and vice versa (usually they are the same though). The sender and receiver must agree upon the keys first. Two types,
    • Stream Ciphers – operate one bit at a time (eg. RC4)
    • Block Ciphers – operate on a group of bits at a time (eg. DES and AES/Rijndael)
      • Block chaining is used to make each block of cipher text depend on all the previous blocks of plaintext.
  • Public-key Cryptography (asymmetric-key) – public and private keys where you cannot (or is really hard to) calculate one from the other. (eg. RSA)
    • One problem is that an attacker can use an adaptive chosen plaintext attack, where they encrypt a large number of messages using the target’s public key. A fix is to pad messages with a long random string.
    • Can be used to do digital signatures. (encrypt your message M (though in practice a one way hash is used instead) using your private key, verifiers just decrypt using your public key as only someone with the private key (ie. you) could have successfully encrypted M)
      • Birthday attack – find two messages that have the same hash but only differ by say spaces at the end of the line, and some other key part like dollar amount. Then when you get Alice to sign document A, the signature will be the same for document B (as Alice is signing the hashes and the hashes are the same)
    • Public Key Infrastructure – how to ensure that Bob’s public key really belongs to the Bob you want to communicate with (tries to solve the key distribution problem).
      • Public Key Certificates
      • Certificate distribution (X.509, GPG)
  • Most cryptosystems use public-key crypto just to exchange the symmetric key shared key, and then both parties this shared key to communicate with symmetric-key crypto. These systems are known as hybrid cryptosystems. This is done for mainly two reasons,
    • public-key algorithms are usually around 1000 times slower than symmetric algorithms.
    • public-key algorithms are vulnerable to chosen-plain text attacks.
  • Cryptanalysis is about determining the plaintext from the key.
  • Authentication

Extra notes from Schinder 2nd Ed.

  • Types of attacks,
    • cipher-text only
    • known plain text
    • chosen plain text
    • adaptive chosen plain text (you can modify your next choice based on the results of the previous choice)
    • chosen cipher text
    • known key relationship
    • rubberhose (get the person who knows the key to reveal it)
  • Substitution ciphers
    • Simple (monoalphabetic cipher)
      • A → 2
        B → 3
        C → 1
    • Homophonic (where multiple options available such as 5,2 one is chosen at random)
      • A → 5,2
        B → 1,3
        C → 4
    • Polygram
      • ABC → 819
        ABB → 234
        ABA → 567
        ACA → 587
    • Polyalphabetic (simple substitution cipher, but the mapping also depends on the position of the input character in the plain text), examples include,
      • Caesar cipher – shift each letter forward or back by K. eg. if K = 1, A → B, B → C …
      • rot13
      • Vigenere cipheralign the repeated key K over the plaintext
    • These can be attacked by statistical cryptanalysis of the letter frequency
  • Transposition Ciphers – the order of the characters is changed (rotor machines such as the Enigma did this)
    • eg. write across, but read down
      HELLO
      WORLD
      THISI
      SSOME

      encrypts to HWTSEOHSLRIOLLSMODIE

  • One time pads
  • One-way hash function (also known as message digest, fingerprint, cryptographic checksum)
    • pre-image → f(x) → hash value
    • A hash function is collision free if it is hard (ie. at best becomes brute force) to generate two pre-images with the same hash value.
    • Also a single bit change in the pre-image should on average change half of the bits in the hash value.
  • Types of random sequences,
    • Pseudo-random (looks random, good for applications like writing an random AI game agent)
    • Cryptographically secure pseudo-random (unpredictable)
    • True randomness (cannot be reproduced with the same inputs)
  • Zero-knowledge proofs – proving to someone that you know something without revealinfg them what that something is. Two types,
    • Interactive
    • Non-interactive

RSA

Choose two primes p and q and let n = pq. Choose e such that e and (p – 1)(q – 1) are relatively prime (ie. no common factor and both prime numbers). Let d be a solution of ed = 1 mod (p-1)(q-1). Public key K = (e,n), private key K^{-1} = (d,n).

E_K(M) = M^e \mod n

D(M) = E_{K^-1}(M) = M^d \mod n

Access Control

  • “An access control policy is a description of who may access what (and how).”
  • Principle of least privilege – “The powers that an agent in the system is given should be the minimum that they need to have to play their assigned role in the system.”
  • AC can be done at many levels (eg. hardware, os, database, network, application…)
  • Managing Subjects, Objects and Operations. (can use Groups or Roles to classify subjects)
    • Access Control Matrix (central database of all objects, subjects and operations permitted)
    • Access Control List (store permissions with object)
    • Capabilities (store permissions with subject)
  • Mandatory AC – central entity sets the AC.
    • Multi-level (eg. top secret, secret, unclassified labels…)
      • Bell-LaPadula – no read up, no write down.
  • Discretionary – each object has an owner, and the owner sets the AC.
  • Commercial Policies
    • Chinese Wall – eg set up rules like if a subject can access X they cannot access Y.
    • Dual Control – need two people together to modify X, they cannot do it on their own.
  • Reference Monitor – some entity that sits between the subjects and objects to implement some policy (relies on trusted entity)

Elections

You want your elections to be,

  • verifiable – so the voter can check that their vote was indeed counted (and if possible check that everyone else’s vote was counted correctly)
  • correct – votes are counted correctly and the results match calculated correctly
  • secret (ie. no one knows how you vote, so that you cannot be coerced)
  • coercion free (so you want to be recite free)

(additional notes, but don’t really need to know for exam)

  • True Democracy (like the Greek’s assembly)
  • Representative Democracy (where we vote in someone to represent us)
  • Participatory Democracy (where we represent ourself an all have equal say)
  • Australian Federal Elections are subject to the Mafia problem. Because we have preferential voting that is you can order your preferences that means that for n candidates there are n! ways of voting. So if there are enough candidates and few enough people you can coerce a person into using a specific voting pattern which you can then verify that they indeed did vote that way.
  • An alternative attack would be to obtain a blank voting paper, fill it in the way you want it to be give it to someone and tell them to submit it and return you a blank one. Although the attacker won’t know if the victim has scribbled it out to make it invalid, but at least the attacker has prevented someone from voting.

Security Architecture

Security Design Principles:

  • Economy of Mechanism – keep things simple
  • Fail-safe defaults
  • Complete Mediation – every access request checked
  • Open design, ie. Kerckhoffs’ Principle: Keep the key secret, don’t rely on keeping the algorithm secret.
  • Separation of Privilege
  • Least Privilege
  • Least common mechanism
  • Psychological acceptability

Defence in depth – use many layers of security (or many security perimeters in layers)

Human Factors

  • People are trusting
  • People are lazy (to read the manual or change factory defaults)
  • People are greedy (will exploit the system if given the chance)
  • People are forgetful (forget to revoke access when terminating employees)
  • People are selfish
  • People are stick-beaks (eg. Medicare employees accessing client data for personal interest)

Some strategies for reducing the risk,

  • logging of insider’s actions
  • honeypots for insiders (eg. fake web server running on the network, or fake data in the database)
  • security policy for staff to follow

Risk

Risk is not something I would have thought to be in a security course, but now that I think about it there are few if any bullet proof systems, so there is always some risk.

Whether it be secure communication (there is always some risk that an eavesdropper cracks your cryto and reads the message, so its important to weigh up those risks to decide if its worth sending the message in the first place), or be it running a web server (you cannot be sure that your web server doesn’t have bugs, or even if you have verified the code to be matching the vulnerability free specification other things can happen like, has your CPU been verified to be bug free, are you sure that a cosmic ray won’t flip a bit and subsequently create a vulnerability). So weighing up the risks involved is important to decide how much time and effort you devote to securing a system, or how the system is designed to work.

Business Risk Concepts

  • Exposure – what is the worst case scenario (or loss)
  • Volatility – how predictable is the loss
  • Probability – how likely is that loss

Degree of Risk

       <-- Probability -->
High Exposure/  |  High Exposure/         /\
Low Probability |  High Probability       ||
-----------------------------------    Exposure
Low Exposure/   |  Low Exposure/          ||
Low Probability |  High Probability       \/

Exposure – how widespread is the system?
Probability – how easy is it to exploit the system, and how great is the incentive to do so (which relates to how valuable the assets you are protecting are)?

Risk management

  • Example. The risk of viruses; the mitigation strategy may be to use anti-virus detection, but the residual risk is zero-day attacks.
  • Carry the risk – just accept it.
  • Balancing the risk – eg. data backups at multiple locations, distributed servers at multiple locations (perhaps even running different systems, so for instance if you want to keep your web site up and running you may load balance between both a Linux box running Apache, and a windows box running IIS, so that an attack found in one is not likely to be present in the other so the attacker can’t take both offline with the one attack.)
  • Mitigate it – reduce the risk
  • Transfer the risk – eg. in finance, get insurance
Categories: unswcourse Tags:

A Great Gnome Panel Applet + A UPnP Enabled Router

July 26, 2010 Leave a comment

I have just discovered that my ADSL router supports UPnP, and that this provides an interface to access 3 important bits of information from the router (external IP address, total bytes sent and total bytes received). I had previously been scraping the router’s web interface to grab the external IP. As for the bytes sent and received, I didn’t even know the router had a method of reporting these.

My first instinct was to look for a Perl library for UPnP, I found two. One in the Ubuntu repositories (and in CPAN) http://search.cpan.org/perldoc?Net::UPnP::GW::Gateway and another which appears to be in neither, http://perlupnp.sourceforge.net/.

I tried out the first one and after some fiddling get it working (though I haven’t yet been able to eliminate the 2 seconds it spends blocked, ie. not executing on the CPU but still not complete).

Next I found a great program that allows you to place an arbitrary command’s output in the Gnome Panel, http://code.google.com/p/compa/. Which resulted in,

Compa Gnome panel applet showing total bytes recieved and sent by the router.

The Perl script I use to provide the output to the Compa applet is at http://github.com/andrewharvey/perlmisc/blob/master/upnp_router_inoutbytes_to_compa.pl

Categories: computing Tags: ,

The URL shown in Firefox’s status bar isn’t always where you end up.

June 27, 2010 Leave a comment

Unless you are aware of the more technical details of web browsing its reasonable for the average web user to assume that if you hover your mouse over a link and Firefox tells you in the status bar that the link is to http://foobar.com/, then clicking on the link will actually take you to http://foorbar.com/. But sadly this is not the case for out of the box Firefox.

Take a look at a Google search results pages. Hovering your mouse over the links gives one URL in the status bar, yet clicking the link actually takes you somewhere else.

Here is a sample of the HTML for the link,

<a href="http://www.example.com/page1.html" onmousedown="return rwt(this,'','','res','1','$ID1','&amp;sig2=$ID2','$ID3')">Page Title</a>

Hovering over the link, you see in the status bar http://www.example.com/page1.html, but as soon as you mousedown javascript goes ahead and changes the href to something else (keep in mind that Firefox only goes to the new link on mouserelease), so that when you release the mouse your browser takes you to the replacement URL.

The problem I see with this is what if some unsuspecting user checks the link in the status bar, clicks the link thinking they are going one place then get taken somewhere else. This becomes even more of a problem if that site is susceptible to certain kinds of XSS attacks. So you can think your going to https://paypal.com/, and the URL bar after clicking the link goes to https://paypal.com/ but yet you’ve actually got some malicious js or html injected in the paypal.com/ page that you have loaded in your browser window.

I originally thought this was clickjacking, but the Wikipedia article describes that as when a transparent layer on top of the page provides the concealed URL.

Categories: computing Tags: ,

Australian States Map/Graph API

February 19, 2010 Leave a comment

I’ve managed to do a couple things all in one here. I’ve made use of some Geoscience Australia Creative Commons licensed material, in a nice little program with a web API, and I’ve aggregated some data from the myschool scraper and parser. Putting them all together gives some nice images like this.

The program for generating these images basically takes an SVG template file with placeholder markers and then fills these values based on the CGI parameters. The API is fairly simple so one should be able to work out how to use it from the example in the README file. Here are the files I used to make the graphs (and the svg versions as WordPress.com won’t let me upload them to here).

ps. This gets cut off when viewing it from the default web interface of this blog, use print preview or even better look at the RSS feed to see the cut off parts. Also I tried to ensure the accuracy of the data, but I cannot be 100% sure that there are no bugs, in fact there are discrepancies with the averages I get from my scrape of myschool and the averages provided in the report on the NPLAN website. The numbers I get seem to be consistent (ie. the state rankings seem mostly the same), but nonetheless not exactly the same as those reported in the report. Although I would be very surprised if all the numbers I got were exactly the same as in the report. I mainly did this to use map/graph code I wrote, so if you really care about how certain state averages compare in these tests look at the reports on the NPLAN website.

The lighter the colour the higher the number.

Primary

2008 2009
Literacy
Numeracy
All

Secondary

2008 2009
Literacy
Numeracy
All
Categories: computing Tags: , ,

A Look into the myschool.edu.au Data

February 7, 2010 8 comments

After overcoming a few problems I managed to write a scraper for the myschool.edu.au data. Unfortunately they choose to put data in HTML, so the scraping process may have led my data to have some unknown errors. I publish (see bottom) the scraped data as I believe that per the IceTV v Nine Network [2009] HCA 14 case, any data that my scraper produces as output from the HTML input is not subject to the copyright of the original HTML content (this also means that I cannot publish the HTML pages) and the Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCA 44 case, that the raw data that is scraped is not subject to copyright.

I wish I could bzip2 up all those HTML pages and give them to you just to save your download, because the myschool.edu.au site doesn’t compress their pages when I tell them I accept gzip over HTTP, so it took up almost 2GB of quota to download all the HTML pages, oh well.

Some preliminary statistics from the data.

  • There are a total of 9316 (or 9279 after I ran a newer scraper at a later data) schools. Of these,
    • 1538 are Secondary (of which 30% are non-government and 70% are government)
    • 1407 are Combined (of which 68% are non-government and 32% are government)
    • 6054 are Primary (of which 23% are non-government and 77% are government)
    • 317 are Special (of which 15% are non-government and 85% are government)
  • So,
    • 6451 are Government (69%),
    • 2865 are Non-government (31%)
  • These 9316 schools contain a total of 3 366 351 students of which,
    • 1 745 224 are male (51%)
    • 1 651 127 are female (49%)
  • The most schools in 1 postcode is 40, which are all in the postcode 2480.
  • The average student attendance rate is 92.007%
    • 91.870% for Government, 92.335% for Non-government
    • 89.205% for Secondary, 92.982% for Primary, 90.675% for Combined, 89.170% for Special.
  • There are a total of 265 960 teaching staff (full time equivalent of 241 408) and 124 117 non-teaching staff (full time equivalent of 86 511.9).

I could report a lot of stats like these above, all you need is a basic knowledge of SQL, but as much as I enjoy working out these stats I find graphs and graphics much more intuitive, so that is up next. Because of the vast dimensions to the data you can make all kinds of graphs so what would be best is a system to draw graphics dynamically which allows the user to decide what is graphed, but this takes more work so that is on the todo list.

I’ve also looked into doing some heatmaps using the geographical location of the schools, I could have used Google Maps, or I could use OpenStreetMap and libchamplain. Both have pros and cons… But for now I used Google Maps because their API is simple and I’ve always wanted to experiment with it, the downside is I’m not sure about the copyright of their maps and subsequently any derivative works. This image is just a test showing a dot for each school in the system, but its very easy to change the colour, size and opacity of the dots based on features of the school.

Schools in Sydney Map

Another test (some markers will be missing or in the wrong place, like the ones in NZ!),

Google Earth map showing markers for Australian schools (though not completely accurate).

Google Earth map showing markers for Australian schools (though not completely accurate). (Copyright notices in image)

Source code? http://github.com/andrewharvey/myschool

Don’t want to scrape and parse but want the raw data in a usable form? http://github.com/andrewharvey/myschool/tree/master/data_exports/

Extra thought: Currently the code uses Google’s API for geting the geolocation of the school, I could use OpenStreetMap for this also, however it would take more investiagtion to determine what tools exist. At the moment all I know is I have an .osm file of Australia, but schools aren’t just one dot, they are a polygon so unless I find some other tools which probably exist, I would need to (probably) just use one of the points in the polygon.

Or I could used the Geographic Names Register for NSW, but that is just for NSW… http://www.gnb.nsw.gov.au/__gnb/gnr.zip

anonymous FTP

January 22, 2010 Leave a comment

So I’ve started reading a book about networks, and to complement this I’ve been taking a closer look at my network traffic in Wireshark (really great tool, by the way.).

So I pick an ftp site that I know, ftp://download.nvidia.com/ and see what happens in Wireshark when I visit it in Firefox. At the FTP application level this is what happens,

ftpsite to me: 220 spftp/1.0.0000 Server [69.31.121.43]\r\n
me to ftpsite: USER anonymous\r\n
ftpsite to me: 331 Password required for USER.\r\n
me to ftpsite: PASS mozilla@example.com\r\n
ftpsite to me: 230- \r\n
               230- ---------------------------------------------------------------------------\r\n
               230- WARNING:  This is a restricted access system.  If you do not have explicit\r\n
               230-           permission to access this system, please disconnect immediately!\r\n
               230 ----------------------------------------------------------------------------\r\n

But Firefox does not disconnect. So I did some more research and I found no references to “anonymous” users in either RFC 959 (FTP) or RFC 3659 (extensions to FTP). (Though there are references in latter RFCs, see RFC 2228).

The thing I find disconcerting is that I don’t think I have “explicit permission” to access this system. I (or rather Firefox) just guessed a username and password and they happened to let me in (what if I guessed a different username and password that wasn’t anonymous and it let me in?). If the RFC specified that a user of anonymous requires no password, or any password, then I would assume that the FTP server is granting me permission, but I assume rather people just started using anonymous as the user and it caught on…

The real problem here is that there are laws which govern such areas, and it doesn’t help that that I don’t understand what PART 6 – COMPUTER OFFENCES of the CRIMES ACT 1900 (NSW) is saying.


								
Categories: computing, law Tags: ,

To fix broken audio, unplug faulty USB device.

January 19, 2010 Leave a comment

How weird is this, just recently when I started up my computer lots of stuff was broken, no audio (and /proc/asound/cards was empty, normally it has “0 [Intel          ]: HDA-Intel – HDA Intel\nHDA Intel at 0xfa100000 irq 22”), libsensors weren’t reporting any values (eg. no CPU temp reported), eth0 dissapeared from NetworkManager, and probably a host of other things that I didn’t notice. Restarting didn’t fix it.

Well long story short, it turns out that everything magically fixed when I unplugged a USB hard drive that was plugged in. I had seen a lot of concerning messages sent to /var/log/messages from the kernel about it,

Jan 19 09:45:00 host kernel: [  564.100026] usb 1-3: reset high speed USB device using ehci_hcd and address 2
Jan 19 09:45:00 host kernel: [  564.237716] sd 8:0:0:0: [sdd] Unhandled error code
Jan 19 09:45:00 host kernel: [  564.237719] sd 8:0:0:0: [sdd] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK

that repeated every so often, but I never thought that a dodgy USB device would break the kernel from doing some of its job.

Categories: computing Tags: